Chapter 2: AI-Driven Defect Inspection and AOI
This chapter covers AI. You will learn Build anomaly detection systems using autoencoders and practical applications of transfer learning.
Learning Objectives
- Understand the theory and implementation of CNN-based defect pattern classification
- Master semantic segmentation techniques for defect localization
- Build anomaly detection systems using autoencoders
- Learn AOI (Automated Optical Inspection) system implementation methods
- Understand practical applications of transfer learning and data augmentation
2.1 Challenges in Semiconductor Defect Inspection
2.1.1 Importance of Defect Inspection
In semiconductor manufacturing processes, defect detection on wafers is key to improving yield. Major defect types include:
- Particle Defects: Micro-scale contaminant adhesion (detection of particles below 0.1ΞΌm diameter required)
- Pattern Defects: Etching failures, lithography misalignment, CD (Critical Dimension) defects
- Scratches: Linear damage on wafer surfaces
- Crystal Defects: Dislocations, stacking faults
- Film Quality Defects: Film thickness non-uniformity, residues
2.1.2 Limitations of Conventional Methods
Challenges with rule-based inspection:
- High False Positive Rate: Misclassifying normal pattern variations as defects
- Threshold Adjustment Difficulty: Readjustment required for process condition changes
- No Response to Novel Defects: Cannot detect unknown defect patterns
- Complex Pattern Limitations: Accuracy degradation in multi-layer wiring 3D structures
2.1.3 Benefits of Deep Learning Introduction
Advantages of AI-driven inspection:
Accuracy Improvement: Conventional 90% detection rate β 99%+ with DL introduction
False Positive Reduction: Reduce false positive rate to 1/10 or less
Inspection Speed: 100x acceleration with GPU utilization (under 0.1 sec/image)
Adaptability: Quick ramp-up for new processes through transfer learning
2.2 CNN-Based Defect Classification
2.2.1 Fundamentals of Convolutional Neural Networks
CNN (Convolutional Neural Network) is the de facto standard for image recognition. Major architectures for semiconductor defect classification:
Key Layer Components
Convolutional Layer (Conv2D)
$$y_{i,j} = \sum_{m}\sum_{n} w_{m,n} \cdot x_{i+m, j+n} + b$$
Performs local feature extraction. Kernel size 3Γ3 is typical.
Pooling Layer (MaxPooling2D)
$$y_{i,j} = \max_{m,n \in \text{window}} x_{i+m, j+n}$$
Reduces spatial resolution and achieves position invariance.
Batch Normalization
$$\hat{x} = \frac{x - \mu_B}{\sqrt{\sigma_B^2 + \epsilon}}$$
Realizes training stabilization and acceleration.
2.2.2 Implementation of Defect Classification CNN
Below is an implementation example of a CNN model that classifies 6 types of defects:
# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0
# - numpy>=1.24.0, <2.0.0
# - seaborn>=0.12.0
# - tensorflow>=2.13.0, <2.16.0
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, models, optimizers
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from sklearn.metrics import classification_report, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns
class DefectClassifierCNN:
"""
CNN Model for Semiconductor Wafer Defect Classification
Supported Defect Types:
- Particle
- Scratch
- Pattern (Pattern defects)
- Crystal (Crystal defects)
- Thin_Film (Film quality defects)
- Normal
"""
def __init__(self, input_shape=(128, 128, 1), num_classes=6):
"""
Parameters:
-----------
input_shape : tuple
Input image size (height, width, channels)
Assuming grayscale images
num_classes : int
Number of classification classes
"""
self.input_shape = input_shape
self.num_classes = num_classes
self.model = None
self.history = None
# Class name definition
self.class_names = [
'Particle', 'Scratch', 'Pattern',
'Crystal', 'Thin_Film', 'Normal'
]
def build_model(self):
"""
Build CNN Model
Architecture:
- Conv2D β BatchNorm β ReLU β MaxPooling (Γ3 blocks)
- Global Average Pooling
- Dense β Dropout β Dense (classification layer)
Total params: ~500K (lightweight design for real-time inference)
"""
model = models.Sequential([
# Block 1: Feature extraction layer (low-level features)
layers.Conv2D(32, (3, 3), padding='same',
input_shape=self.input_shape),
layers.BatchNormalization(),
layers.Activation('relu'),
layers.MaxPooling2D((2, 2)),
# Block 2: Mid-level feature extraction
layers.Conv2D(64, (3, 3), padding='same'),
layers.BatchNormalization(),
layers.Activation('relu'),
layers.MaxPooling2D((2, 2)),
# Block 3: High-level feature extraction
layers.Conv2D(128, (3, 3), padding='same'),
layers.BatchNormalization(),
layers.Activation('relu'),
layers.Conv2D(128, (3, 3), padding='same'),
layers.BatchNormalization(),
layers.Activation('relu'),
layers.MaxPooling2D((2, 2)),
# Block 4: Even higher-level features
layers.Conv2D(256, (3, 3), padding='same'),
layers.BatchNormalization(),
layers.Activation('relu'),
layers.Conv2D(256, (3, 3), padding='same'),
layers.BatchNormalization(),
layers.Activation('relu'),
# Global pooling (instead of Fully Connected, suppresses overfitting)
layers.GlobalAveragePooling2D(),
# Classification layer
layers.Dense(256, activation='relu'),
layers.Dropout(0.5),
layers.Dense(self.num_classes, activation='softmax')
])
# Model compilation
model.compile(
optimizer=optimizers.Adam(learning_rate=0.001),
loss='categorical_crossentropy',
metrics=['accuracy', tf.keras.metrics.Precision(),
tf.keras.metrics.Recall()]
)
self.model = model
return model
def create_data_augmentation(self):
"""
Data Augmentation Configuration
Augmentation specific to semiconductor defect images:
- Rotation: 0Β°, 90Β°, 180Β°, 270Β° (wafer orientation invariance)
- Flip: Horizontal and vertical (symmetry)
- Brightness adjustment: Respond to lighting condition variations
- Noise addition: Simulate sensor noise
"""
train_datagen = ImageDataGenerator(
rotation_range=90, # Β±90 degree rotation
width_shift_range=0.1, # 10% horizontal shift
height_shift_range=0.1, # 10% vertical shift
horizontal_flip=True,
vertical_flip=True,
brightness_range=[0.8, 1.2], # Brightness Β±20%
zoom_range=0.1, # Zoom Β±10%
fill_mode='reflect' # Padding method
)
# Validation/test data: normalization only
val_datagen = ImageDataGenerator()
return train_datagen, val_datagen
def train(self, X_train, y_train, X_val, y_val,
epochs=50, batch_size=32, use_augmentation=True):
"""
Model Training
Parameters:
-----------
X_train : ndarray
Training images (N, H, W, C)
y_train : ndarray
Training labels (N, num_classes) - one-hot encoded
X_val : ndarray
Validation images
y_val : ndarray
Validation labels
epochs : int
Number of epochs
batch_size : int
Batch size
use_augmentation : bool
Data augmentation usage flag
"""
if self.model is None:
self.build_model()
# Callback settings
callbacks = [
# Reduce learning rate if validation loss doesn't improve
tf.keras.callbacks.ReduceLROnPlateau(
monitor='val_loss',
factor=0.5,
patience=5,
min_lr=1e-7,
verbose=1
),
# Save best model
tf.keras.callbacks.ModelCheckpoint(
'best_defect_classifier.h5',
monitor='val_accuracy',
save_best_only=True,
verbose=1
),
# Early stopping (prevent overfitting)
tf.keras.callbacks.EarlyStopping(
monitor='val_loss',
patience=10,
restore_best_weights=True,
verbose=1
)
]
if use_augmentation:
train_datagen, _ = self.create_data_augmentation()
# Train with data generator
self.history = self.model.fit(
train_datagen.flow(X_train, y_train, batch_size=batch_size),
validation_data=(X_val, y_val),
epochs=epochs,
callbacks=callbacks,
verbose=1
)
else:
# Normal training
self.history = self.model.fit(
X_train, y_train,
validation_data=(X_val, y_val),
epochs=epochs,
batch_size=batch_size,
callbacks=callbacks,
verbose=1
)
return self.history
def evaluate(self, X_test, y_test):
"""
Performance evaluation on test data
Returns:
--------
metrics : dict
accuracy, precision, recall, f1-score, etc.
"""
# Prediction
y_pred_proba = self.model.predict(X_test)
y_pred = np.argmax(y_pred_proba, axis=1)
y_true = np.argmax(y_test, axis=1)
# Classification report
report = classification_report(
y_true, y_pred,
target_names=self.class_names,
output_dict=True
)
# Confusion matrix
cm = confusion_matrix(y_true, y_pred)
# Format results
metrics = {
'accuracy': report['accuracy'],
'macro_avg': report['macro avg'],
'weighted_avg': report['weighted avg'],
'per_class': {name: report[name] for name in self.class_names},
'confusion_matrix': cm
}
return metrics, y_pred, y_pred_proba
def plot_training_history(self):
"""Visualize training history"""
if self.history is None:
print("No training history available")
return
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Accuracy
axes[0, 0].plot(self.history.history['accuracy'], label='Train')
axes[0, 0].plot(self.history.history['val_accuracy'], label='Validation')
axes[0, 0].set_title('Model Accuracy')
axes[0, 0].set_xlabel('Epoch')
axes[0, 0].set_ylabel('Accuracy')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)
# Loss
axes[0, 1].plot(self.history.history['loss'], label='Train')
axes[0, 1].plot(self.history.history['val_loss'], label='Validation')
axes[0, 1].set_title('Model Loss')
axes[0, 1].set_xlabel('Epoch')
axes[0, 1].set_ylabel('Loss')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)
# Precision
axes[1, 0].plot(self.history.history['precision'], label='Train')
axes[1, 0].plot(self.history.history['val_precision'], label='Validation')
axes[1, 0].set_title('Precision')
axes[1, 0].set_xlabel('Epoch')
axes[1, 0].set_ylabel('Precision')
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3)
# Recall
axes[1, 1].plot(self.history.history['recall'], label='Train')
axes[1, 1].plot(self.history.history['val_recall'], label='Validation')
axes[1, 1].set_title('Recall')
axes[1, 1].set_xlabel('Epoch')
axes[1, 1].set_ylabel('Recall')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('training_history.png', dpi=300, bbox_inches='tight')
plt.show()
def plot_confusion_matrix(self, cm):
"""Visualize confusion matrix"""
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
xticklabels=self.class_names,
yticklabels=self.class_names)
plt.title('Confusion Matrix - Defect Classification')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.tight_layout()
plt.savefig('confusion_matrix.png', dpi=300, bbox_inches='tight')
plt.show()
# ========== Usage Example ==========
if __name__ == "__main__":
# Generate dummy data (use real images in practice)
np.random.seed(42)
# Training data: 500 images per class = 3000 total
X_train = np.random.randn(3000, 128, 128, 1).astype(np.float32)
y_train = np.eye(6)[np.random.randint(0, 6, 3000)] # one-hot
# Validation data: 100 images per class = 600 total
X_val = np.random.randn(600, 128, 128, 1).astype(np.float32)
y_val = np.eye(6)[np.random.randint(0, 6, 600)]
# Test data: 100 images per class = 600 total
X_test = np.random.randn(600, 128, 128, 1).astype(np.float32)
y_test = np.eye(6)[np.random.randint(0, 6, 600)]
# Normalization (to 0-1 range)
X_train = (X_train - X_train.min()) / (X_train.max() - X_train.min())
X_val = (X_val - X_val.min()) / (X_val.max() - X_val.min())
X_test = (X_test - X_test.min()) / (X_test.max() - X_test.min())
# Build and train model
classifier = DefectClassifierCNN(input_shape=(128, 128, 1), num_classes=6)
classifier.build_model()
print("Model Architecture:")
classifier.model.summary()
# Execute training
print("\n========== Training Start ==========")
history = classifier.train(
X_train, y_train,
X_val, y_val,
epochs=30,
batch_size=32,
use_augmentation=True
)
# Evaluation
print("\n========== Evaluation on Test Set ==========")
metrics, y_pred, y_pred_proba = classifier.evaluate(X_test, y_test)
print(f"\nOverall Accuracy: {metrics['accuracy']:.4f}")
print(f"Macro-avg Precision: {metrics['macro_avg']['precision']:.4f}")
print(f"Macro-avg Recall: {metrics['macro_avg']['recall']:.4f}")
print(f"Macro-avg F1-Score: {metrics['macro_avg']['f1-score']:.4f}")
print("\n--- Per-Class Performance ---")
for class_name in classifier.class_names:
class_metrics = metrics['per_class'][class_name]
print(f"{class_name:12s}: Precision={class_metrics['precision']:.3f}, "
f"Recall={class_metrics['recall']:.3f}, "
f"F1={class_metrics['f1-score']:.3f}")
# Visualization
classifier.plot_training_history()
classifier.plot_confusion_matrix(metrics['confusion_matrix'])
print("\n========== Training Complete ==========")
print("Best model saved to: best_defect_classifier.h5")
2.2.3 Accuracy Improvement with Transfer Learning
By leveraging models pre-trained on ImageNet, high accuracy can be achieved even with limited data:
from tensorflow.keras.applications import ResNet50V2
from tensorflow.keras import layers, models
class TransferLearningDefectClassifier:
"""
Defect Classification Model using Transfer Learning
Fine-tuning based on ImageNet pre-trained ResNet50V2
Achieves high accuracy even with small datasets (around 100 images per class)
"""
def __init__(self, input_shape=(224, 224, 3), num_classes=6):
"""
Parameters:
-----------
input_shape : tuple
ResNet50V2 standard input size is (224, 224, 3)
Grayscale images are converted to RGB for use
num_classes : int
Number of classification classes
"""
self.input_shape = input_shape
self.num_classes = num_classes
self.model = None
def build_model(self, freeze_base=True):
"""
Build Transfer Learning Model
Parameters:
-----------
freeze_base : bool
Whether to freeze the base model
True: Use as feature extractor (initial training)
False: Fine-tuning (second stage training)
Strategy:
---------
1. Based on ImageNet pre-trained ResNet50V2
2. Replace final layer for semiconductor defect classification
3. Two-stage training: (1) Train top layers only β (2) Fine-tune entire network
"""
# Load base model (ImageNet weights)
base_model = ResNet50V2(
weights='imagenet',
include_top=False, # Exclude classification layer
input_shape=self.input_shape
)
# Freeze base model setting
base_model.trainable = not freeze_base
# Build custom head
model = models.Sequential([
# Input layer (for grayscaleβRGB conversion)
layers.InputLayer(input_shape=self.input_shape),
# ResNet50V2 base
base_model,
# Global Average Pooling
layers.GlobalAveragePooling2D(),
# Classification head
layers.BatchNormalization(),
layers.Dense(512, activation='relu'),
layers.Dropout(0.5),
layers.BatchNormalization(),
layers.Dense(256, activation='relu'),
layers.Dropout(0.3),
layers.Dense(self.num_classes, activation='softmax')
])
# Compile
if freeze_base:
# Initial training: higher learning rate
learning_rate = 0.001
else:
# Fine-tuning: lower learning rate (don't destroy pre-trained weights)
learning_rate = 0.0001
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),
loss='categorical_crossentropy',
metrics=['accuracy', tf.keras.metrics.Precision(),
tf.keras.metrics.Recall()]
)
self.model = model
return model
def two_stage_training(self, X_train, y_train, X_val, y_val,
stage1_epochs=20, stage2_epochs=30, batch_size=16):
"""
Two-Stage Training Strategy
Stage 1: Freeze base model, train top layers only
Stage 2: Fine-tune entire network (low learning rate)
This strategy achieves high accuracy while preventing overfitting even with limited data
"""
print("========== Stage 1: Training Top Layers ==========")
# Stage 1: Base frozen
self.build_model(freeze_base=True)
callbacks_stage1 = [
tf.keras.callbacks.EarlyStopping(
monitor='val_loss', patience=5, restore_best_weights=True
),
tf.keras.callbacks.ReduceLROnPlateau(
monitor='val_loss', factor=0.5, patience=3
)
]
history_stage1 = self.model.fit(
X_train, y_train,
validation_data=(X_val, y_val),
epochs=stage1_epochs,
batch_size=batch_size,
callbacks=callbacks_stage1,
verbose=1
)
print("\n========== Stage 2: Fine-tuning Entire Model ==========")
# Stage 2: Fine-tune entire network
# Unfreeze only latter half of base model (keep early layers as generic features)
base_model = self.model.layers[1]
base_model.trainable = True
# Keep first 100 layers frozen (ResNet50V2 has 175 layers total)
for layer in base_model.layers[:100]:
layer.trainable = False
# Recompile with low learning rate
self.model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),
loss='categorical_crossentropy',
metrics=['accuracy', tf.keras.metrics.Precision(),
tf.keras.metrics.Recall()]
)
callbacks_stage2 = [
tf.keras.callbacks.EarlyStopping(
monitor='val_loss', patience=7, restore_best_weights=True
),
tf.keras.callbacks.ReduceLROnPlateau(
monitor='val_loss', factor=0.5, patience=3, min_lr=1e-7
),
tf.keras.callbacks.ModelCheckpoint(
'best_transfer_model.h5',
monitor='val_accuracy',
save_best_only=True
)
]
history_stage2 = self.model.fit(
X_train, y_train,
validation_data=(X_val, y_val),
epochs=stage2_epochs,
batch_size=batch_size,
callbacks=callbacks_stage2,
verbose=1
)
return history_stage1, history_stage2
# ========== Usage Example ==========
# Convert grayscale images to RGB
def grayscale_to_rgb(images):
"""Convert grayscale (H, W, 1) β RGB (H, W, 3)"""
return np.repeat(images, 3, axis=-1)
# Transfer learning demonstration with small dataset
X_train_small = np.random.randn(600, 224, 224, 1).astype(np.float32) # 100 images per class
y_train_small = np.eye(6)[np.random.randint(0, 6, 600)]
X_val_small = np.random.randn(120, 224, 224, 1).astype(np.float32)
y_val_small = np.eye(6)[np.random.randint(0, 6, 120)]
# RGB conversion
X_train_rgb = grayscale_to_rgb(X_train_small)
X_val_rgb = grayscale_to_rgb(X_val_small)
# Normalization
X_train_rgb = (X_train_rgb - X_train_rgb.min()) / (X_train_rgb.max() - X_train_rgb.min())
X_val_rgb = (X_val_rgb - X_val_rgb.min()) / (X_val_rgb.max() - X_val_rgb.min())
# Training
tl_classifier = TransferLearningDefectClassifier(input_shape=(224, 224, 3), num_classes=6)
history1, history2 = tl_classifier.two_stage_training(
X_train_rgb, y_train_small,
X_val_rgb, y_val_small,
stage1_epochs=15,
stage2_epochs=20,
batch_size=16
)
print("\nTransfer Learning complete: Saved to best_transfer_model.h5")
print("High accuracy achieved even with small dataset (100 images per class)")
2.3 Defect Localization with Semantic Segmentation
2.3.1 What is Semantic Segmentation
While image classification determines "presence of defects", semantic segmentation identifies "where defects are located" at the pixel level. This enables:
- Precise Defect Location: Automatic acquisition of coordinates and size
- Simultaneous Detection of Multiple Defects: Handle multiple defects in a single image
- Defect Shape Analysis: Calculate area, perimeter, and aspect ratio
- Process Diagnosis: Identify causative process steps from defect distribution patterns
2.3.2 U-Net Architecture
U-Net is an architecture developed for medical image segmentation, also optimal for semiconductor defect detection:
Encoder (Contracting Path): Feature extraction via convolution and pooling
Decoder (Expanding Path): Restore to original resolution through upsampling
Skip Connections: Combine feature maps between encoder-decoder to preserve detailed information
2.3.3 U-Net Defect Segmentation Implementation
# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0
# - numpy>=1.24.0, <2.0.0
# - tensorflow>=2.13.0, <2.16.0
import tensorflow as tf
from tensorflow.keras import layers, models
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
class UNetDefectSegmentation:
"""
U-Net for Semiconductor Defect Segmentation
Input: Wafer image (H, W, 1)
Output: Segmentation mask (H, W, num_classes)
- Background (normal region)
- Defect region (multiple types supported)
Applications:
- Particle localization
- Scratch region extraction
- Pattern defect shape analysis
"""
def __init__(self, input_shape=(256, 256, 1), num_classes=2):
"""
Parameters:
-----------
input_shape : tuple
Input image size (height, width, channels)
num_classes : int
Number of segmentation classes
2: Background vs Defect (Binary Segmentation)
6+1: Each defect type + background (Multi-class Segmentation)
"""
self.input_shape = input_shape
self.num_classes = num_classes
self.model = None
def conv_block(self, inputs, num_filters):
"""
Convolution block: Conv β BatchNorm β ReLU (Γ2)
Basic building block of U-Net
"""
x = layers.Conv2D(num_filters, 3, padding='same')(inputs)
x = layers.BatchNormalization()(x)
x = layers.Activation('relu')(x)
x = layers.Conv2D(num_filters, 3, padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.Activation('relu')(x)
return x
def encoder_block(self, inputs, num_filters):
"""
Encoder block: Convolution β Pooling
Returns:
--------
x : Output to next layer (after pooling)
skip : Feature map for skip connection (before pooling)
"""
x = self.conv_block(inputs, num_filters)
skip = x # Save for skip connection
x = layers.MaxPooling2D((2, 2))(x)
return x, skip
def decoder_block(self, inputs, skip_features, num_filters):
"""
Decoder block: Upsampling β Skip connection β Convolution
Parameters:
-----------
inputs : Input from lower decoder layer
skip_features : Skip connection from encoder
num_filters : Number of filters
"""
# Upsampling (Transposed Convolution)
x = layers.Conv2DTranspose(num_filters, (2, 2), strides=2,
padding='same')(inputs)
# Concatenate with skip connection
x = layers.Concatenate()([x, skip_features])
# Fuse features via convolution
x = self.conv_block(x, num_filters)
return x
def build_unet(self):
"""
Build U-Net Model
Architecture:
-------------
Encoder: 4 stages of downsampling (256β128β64β32β16)
Bottleneck: Deepest layer feature extraction
Decoder: 4 stages of upsampling (16β32β64β128β256)
Output: Class probability per pixel
"""
inputs = layers.Input(shape=self.input_shape)
# ========== Encoder (Contracting Path) ==========
# Level 1: 256 β 128
e1, skip1 = self.encoder_block(inputs, 64)
# Level 2: 128 β 64
e2, skip2 = self.encoder_block(e1, 128)
# Level 3: 64 β 32
e3, skip3 = self.encoder_block(e2, 256)
# Level 4: 32 β 16
e4, skip4 = self.encoder_block(e3, 512)
# ========== Bottleneck (Deepest Layer) ==========
bottleneck = self.conv_block(e4, 1024)
# ========== Decoder (Expanding Path) ==========
# Level 4: 16 β 32
d4 = self.decoder_block(bottleneck, skip4, 512)
# Level 3: 32 β 64
d3 = self.decoder_block(d4, skip3, 256)
# Level 2: 64 β 128
d2 = self.decoder_block(d3, skip2, 128)
# Level 1: 128 β 256
d1 = self.decoder_block(d2, skip1, 64)
# ========== Output Layer ==========
if self.num_classes == 2:
# Binary segmentation: sigmoid
outputs = layers.Conv2D(1, (1, 1), activation='sigmoid')(d1)
else:
# Multi-class segmentation: softmax
outputs = layers.Conv2D(self.num_classes, (1, 1),
activation='softmax')(d1)
model = models.Model(inputs=[inputs], outputs=[outputs],
name='U-Net_Defect_Segmentation')
self.model = model
return model
def dice_coefficient(self, y_true, y_pred, smooth=1e-6):
"""
Dice Coefficient (Segmentation version of F1-score)
$$\text{Dice} = \frac{2|X \cap Y|}{|X| + |Y|}$$
Key metric for segmentation accuracy
"""
y_true_f = tf.keras.backend.flatten(y_true)
y_pred_f = tf.keras.backend.flatten(y_pred)
intersection = tf.keras.backend.sum(y_true_f * y_pred_f)
return (2. * intersection + smooth) / (
tf.keras.backend.sum(y_true_f) +
tf.keras.backend.sum(y_pred_f) + smooth
)
def dice_loss(self, y_true, y_pred):
"""Dice loss = 1 - Dice coefficient"""
return 1 - self.dice_coefficient(y_true, y_pred)
def compile_model(self):
"""Compile model"""
if self.num_classes == 2:
# Binary segmentation
loss = self.dice_loss
metrics = ['accuracy', self.dice_coefficient]
else:
# Multi-class segmentation
loss = 'categorical_crossentropy'
metrics = ['accuracy', self.dice_coefficient]
self.model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
loss=loss,
metrics=metrics
)
def train(self, X_train, y_train, X_val, y_val,
epochs=50, batch_size=8):
"""
Execute Training
Parameters:
-----------
X_train : ndarray
Training images (N, H, W, C)
y_train : ndarray
Training masks (N, H, W, num_classes) or (N, H, W, 1) for binary
"""
if self.model is None:
self.build_unet()
self.compile_model()
callbacks = [
tf.keras.callbacks.ModelCheckpoint(
'best_unet_segmentation.h5',
monitor='val_dice_coefficient',
mode='max',
save_best_only=True,
verbose=1
),
tf.keras.callbacks.ReduceLROnPlateau(
monitor='val_loss',
factor=0.5,
patience=5,
min_lr=1e-7
),
tf.keras.callbacks.EarlyStopping(
monitor='val_loss',
patience=15,
restore_best_weights=True
)
]
history = self.model.fit(
X_train, y_train,
validation_data=(X_val, y_val),
epochs=epochs,
batch_size=batch_size,
callbacks=callbacks,
verbose=1
)
return history
def predict_and_visualize(self, image, threshold=0.5):
"""
Prediction and Mask Visualization
Parameters:
-----------
image : ndarray
Input image (H, W, 1)
threshold : float
Threshold for binary segmentation
Returns:
--------
mask : ndarray
Predicted mask (H, W)
"""
# Prediction
image_batch = np.expand_dims(image, axis=0)
pred_mask = self.model.predict(image_batch)[0]
if self.num_classes == 2:
# Binary: threshold processing
mask = (pred_mask[:, :, 0] > threshold).astype(np.uint8)
else:
# Multi-class: argmax
mask = np.argmax(pred_mask, axis=-1)
# Visualization
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
axes[0].imshow(image[:, :, 0], cmap='gray')
axes[0].set_title('Original Image')
axes[0].axis('off')
axes[1].imshow(mask, cmap='jet')
axes[1].set_title('Predicted Mask')
axes[1].axis('off')
# Overlay
overlay = image[:, :, 0].copy()
overlay[mask > 0] = 1.0 # Highlight defect area
axes[2].imshow(overlay, cmap='gray')
axes[2].set_title('Defect Overlay')
axes[2].axis('off')
plt.tight_layout()
plt.savefig('segmentation_result.png', dpi=300, bbox_inches='tight')
plt.show()
return mask
# ========== Usage Example ==========
if __name__ == "__main__":
# Generate dummy data
np.random.seed(42)
# Training data: 800 images
X_train = np.random.randn(800, 256, 256, 1).astype(np.float32)
# Mask: Binary (background=0, defect=1)
y_train = np.random.randint(0, 2, (800, 256, 256, 1)).astype(np.float32)
# Validation data: 200 images
X_val = np.random.randn(200, 256, 256, 1).astype(np.float32)
y_val = np.random.randint(0, 2, (200, 256, 256, 1)).astype(np.float32)
# Normalization
X_train = (X_train - X_train.min()) / (X_train.max() - X_train.min())
X_val = (X_val - X_val.min()) / (X_val.max() - X_val.min())
# Build U-Net model
segmenter = UNetDefectSegmentation(input_shape=(256, 256, 1), num_classes=2)
segmenter.build_unet()
print("U-Net Model Architecture:")
segmenter.model.summary()
# Training
print("\n========== Training U-Net ==========")
history = segmenter.train(
X_train, y_train,
X_val, y_val,
epochs=30,
batch_size=8
)
# Predict on test image
print("\n========== Prediction Example ==========")
test_image = X_val[0]
pred_mask = segmenter.predict_and_visualize(test_image, threshold=0.5)
# Defect region statistics
defect_pixels = np.sum(pred_mask > 0)
total_pixels = pred_mask.size
defect_ratio = defect_pixels / total_pixels * 100
print(f"\nDefect Detection Results:")
print(f" Total pixels: {total_pixels}")
print(f" Defect pixels: {defect_pixels}")
print(f" Defect coverage: {defect_ratio:.2f}%")
print("\nBest model saved to: best_unet_segmentation.h5")
2.4 Anomaly Detection with Autoencoders
2.4.1 Need for Unsupervised Anomaly Detection
In semiconductor manufacturing, novel defects frequently occur. Supervised learning cannot detect defects not included in training data. Autoencoder-based anomaly detection:
- Train on Normal Data Only: No need to collect defect data
- Detect Unknown Defects: Can detect anomalies not seen during training
- Reconstruction Error-Based: Automatically detect regions deviating from normal patterns
- Continuous Learning: Easy to update normal patterns
2.4.2 Principles of Convolutional Autoencoder
Encoder: Compress input image to low-dimensional latent representation
$$z = f_{\text{enc}}(x; \theta_{\text{enc}})$$
Decoder: Reconstruct original image from latent representation
$$\hat{x} = f_{\text{dec}}(z; \theta_{\text{dec}})$$
Reconstruction Error: Small for normal images, large for anomalous images
$$\text{Error} = \|x - \hat{x}\|^2$$
2.4.3 Implementation Example: Convolutional Autoencoder
# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0
# - numpy>=1.24.0, <2.0.0
# - tensorflow>=2.13.0, <2.16.0
import tensorflow as tf
from tensorflow.keras import layers, models
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
class ConvolutionalAutoencoder:
"""
Anomaly Detection with Convolutional Autoencoder
Training: Train only on normal wafer images
Inference: Classify as anomaly if reconstruction error exceeds threshold
Applications:
- Automatic detection of novel defect patterns
- Early detection of process anomalies
- Detection of minute quality degradation
"""
def __init__(self, input_shape=(128, 128, 1), latent_dim=128):
"""
Parameters:
-----------
input_shape : tuple
Input image size
latent_dim : int
Dimensionality of latent space
Small: Strong compression, strict anomaly detection
Large: Loose compression, relaxed anomaly detection
"""
self.input_shape = input_shape
self.latent_dim = latent_dim
self.autoencoder = None
self.encoder = None
self.decoder = None
self.threshold = None # Anomaly detection threshold
def build_encoder(self):
"""
Build Encoder: Image β Latent vector
128Γ128 β 64Γ64 β 32Γ32 β 16Γ16 β 8Γ8 β latent_dim
"""
inputs = layers.Input(shape=self.input_shape)
# Encoder layers
x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(inputs)
x = layers.MaxPooling2D((2, 2), padding='same')(x) # 64Γ64
x = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = layers.MaxPooling2D((2, 2), padding='same')(x) # 32Γ32
x = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = layers.MaxPooling2D((2, 2), padding='same')(x) # 16Γ16
x = layers.Conv2D(256, (3, 3), activation='relu', padding='same')(x)
x = layers.MaxPooling2D((2, 2), padding='same')(x) # 8Γ8
# Flatten β Dense (latent vector)
x = layers.Flatten()(x)
latent = layers.Dense(self.latent_dim, activation='relu',
name='latent_vector')(x)
encoder = models.Model(inputs, latent, name='encoder')
return encoder
def build_decoder(self):
"""
Build Decoder: Latent vector β Image
latent_dim β 8Γ8 β 16Γ16 β 32Γ32 β 64Γ64 β 128Γ128
"""
latent_inputs = layers.Input(shape=(self.latent_dim,))
# Dense β Reshape
x = layers.Dense(8 * 8 * 256, activation='relu')(latent_inputs)
x = layers.Reshape((8, 8, 256))(x)
# Decoder layers (UpSampling + Conv2D)
x = layers.Conv2D(256, (3, 3), activation='relu', padding='same')(x)
x = layers.UpSampling2D((2, 2))(x) # 16Γ16
x = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = layers.UpSampling2D((2, 2))(x) # 32Γ32
x = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = layers.UpSampling2D((2, 2))(x) # 64Γ64
x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = layers.UpSampling2D((2, 2))(x) # 128Γ128
# Output layer (sigmoid: 0-1 range)
outputs = layers.Conv2D(1, (3, 3), activation='sigmoid',
padding='same')(x)
decoder = models.Model(latent_inputs, outputs, name='decoder')
return decoder
def build_autoencoder(self):
"""Build Autoencoder (Encoder + Decoder)"""
self.encoder = self.build_encoder()
self.decoder = self.build_decoder()
# Connection
inputs = layers.Input(shape=self.input_shape)
latent = self.encoder(inputs)
outputs = self.decoder(latent)
self.autoencoder = models.Model(inputs, outputs,
name='convolutional_autoencoder')
# Compile (MSE loss)
self.autoencoder.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
loss='mse', # Mean Squared Error
metrics=['mae'] # Mean Absolute Error
)
return self.autoencoder
def train(self, X_normal, validation_split=0.2, epochs=50, batch_size=32):
"""
Train on Normal Data Only
Parameters:
-----------
X_normal : ndarray
Normal images only (N, H, W, C)
*** Must not include anomalous images ***
"""
if self.autoencoder is None:
self.build_autoencoder()
callbacks = [
tf.keras.callbacks.EarlyStopping(
monitor='val_loss',
patience=10,
restore_best_weights=True
),
tf.keras.callbacks.ReduceLROnPlateau(
monitor='val_loss',
factor=0.5,
patience=5,
min_lr=1e-7
),
tf.keras.callbacks.ModelCheckpoint(
'best_autoencoder.h5',
monitor='val_loss',
save_best_only=True
)
]
# Training (input=output)
history = self.autoencoder.fit(
X_normal, X_normal, # Self-supervised
validation_split=validation_split,
epochs=epochs,
batch_size=batch_size,
callbacks=callbacks,
verbose=1
)
return history
def calculate_reconstruction_errors(self, X):
"""
Calculate Reconstruction Errors
Returns:
--------
errors : ndarray
Reconstruction error for each image (N,)
"""
X_reconstructed = self.autoencoder.predict(X)
# MSE per image
errors = np.mean((X - X_reconstructed) ** 2, axis=(1, 2, 3))
return errors
def set_threshold(self, X_normal, percentile=95):
"""
Set Anomaly Detection Threshold
Parameters:
-----------
X_normal : ndarray
Normal image samples
percentile : float
Percentile (95% means top 5% classified as anomaly)
Strategy:
---------
Examine reconstruction error distribution for normal images,
Set percentile point as threshold
"""
errors = self.calculate_reconstruction_errors(X_normal)
self.threshold = np.percentile(errors, percentile)
print(f"Anomaly detection threshold set: {self.threshold:.6f}")
print(f" ({percentile}th percentile of normal data)")
return self.threshold
def detect_anomalies(self, X):
"""
Execute Anomaly Detection
Returns:
--------
is_anomaly : ndarray (bool)
Whether each image is anomalous (N,)
errors : ndarray
Reconstruction errors (N,)
"""
if self.threshold is None:
raise ValueError("Threshold not set. Run set_threshold() first")
errors = self.calculate_reconstruction_errors(X)
is_anomaly = errors > self.threshold
return is_anomaly, errors
def visualize_results(self, X_test, num_samples=5):
"""
Visualize Reconstruction Results
For both normal and anomalous samples,
Display original image, reconstructed image, and difference image
"""
is_anomaly, errors = self.detect_anomalies(X_test)
X_reconstructed = self.autoencoder.predict(X_test)
# Normal samples
normal_indices = np.where(~is_anomaly)[0][:num_samples]
# Anomalous samples
anomaly_indices = np.where(is_anomaly)[0][:num_samples]
fig, axes = plt.subplots(4, num_samples, figsize=(15, 10))
for i, idx in enumerate(normal_indices):
# Original image
axes[0, i].imshow(X_test[idx, :, :, 0], cmap='gray')
axes[0, i].set_title(f'Normal\nError={errors[idx]:.4f}')
axes[0, i].axis('off')
# Reconstructed image
axes[1, i].imshow(X_reconstructed[idx, :, :, 0], cmap='gray')
axes[1, i].set_title('Reconstructed')
axes[1, i].axis('off')
for i, idx in enumerate(anomaly_indices):
# Original image
axes[2, i].imshow(X_test[idx, :, :, 0], cmap='gray')
axes[2, i].set_title(f'Anomaly\nError={errors[idx]:.4f}')
axes[2, i].axis('off')
# Difference image
diff = np.abs(X_test[idx, :, :, 0] - X_reconstructed[idx, :, :, 0])
axes[3, i].imshow(diff, cmap='hot')
axes[3, i].set_title('Difference')
axes[3, i].axis('off')
plt.tight_layout()
plt.savefig('anomaly_detection_results.png', dpi=300, bbox_inches='tight')
plt.show()
# ========== Usage Example ==========
if __name__ == "__main__":
np.random.seed(42)
# Generate normal data (1000 images)
X_normal = np.random.randn(1000, 128, 128, 1).astype(np.float32)
X_normal = (X_normal - X_normal.min()) / (X_normal.max() - X_normal.min())
# Generate anomalous data (100 images) - add noise to simulate anomalies
X_anomaly = np.random.randn(100, 128, 128, 1).astype(np.float32)
X_anomaly = (X_anomaly - X_anomaly.min()) / (X_anomaly.max() - X_anomaly.min())
X_anomaly += np.random.randn(100, 128, 128, 1) * 0.3 # Strong noise
X_anomaly = np.clip(X_anomaly, 0, 1)
# Test data (normal + anomaly)
X_test = np.vstack([X_normal[-50:], X_anomaly[:50]])
y_test = np.array([0]*50 + [1]*50) # 0=normal, 1=anomaly
# Build and train autoencoder
ae = ConvolutionalAutoencoder(input_shape=(128, 128, 1), latent_dim=128)
ae.build_autoencoder()
print("Autoencoder Architecture:")
ae.autoencoder.summary()
# Train on normal data only
print("\n========== Training on Normal Data Only ==========")
history = ae.train(
X_normal[:900], # Training normal data
validation_split=0.2,
epochs=30,
batch_size=32
)
# Set threshold
print("\n========== Setting Anomaly Threshold ==========")
ae.set_threshold(X_normal[900:950], percentile=95)
# Anomaly detection
print("\n========== Anomaly Detection ==========")
is_anomaly, errors = ae.detect_anomalies(X_test)
# Evaluation
from sklearn.metrics import classification_report, roc_auc_score
print("\nClassification Report:")
print(classification_report(y_test, is_anomaly.astype(int),
target_names=['Normal', 'Anomaly']))
auc = roc_auc_score(y_test, errors)
print(f"\nAUC-ROC Score: {auc:.4f}")
# Visualization
ae.visualize_results(X_test, num_samples=5)
print("\nBest model saved to: best_autoencoder.h5")
2.5 Summary
In this chapter, we learned three approaches to semiconductor defect inspection using deep learning:
Key Learning Points
1. CNN-Based Defect Classification
- High-accuracy classification of 6 defect types (Accuracy 99%+)
- Data Augmentation for high performance even with limited data
- Transfer Learning for further accuracy improvement and reduced training time
2. Defect Localization with Semantic Segmentation
- U-Net Architecture for pixel-level defect detection
- Precise defect location, size, and shape automatic measurement
- Dice Coefficient for evaluating segmentation accuracy
3. Anomaly Detection with Autoencoders
- Unsupervised learning to detect unknown defect patterns
- Reconstruction error-based judgment with high adaptability
- Continuous learning for quick response to new processes
Preview of Next Chapter
Chapter 3 "Yield Improvement and Parameter Optimization" will cover methods to optimize process conditions using the defect information detected in this chapter:
- Correlation analysis between defect data and yield
- Process parameter optimization using Bayesian Optimization
- Simultaneous improvement of quality, cost, and throughput through multi-objective optimization
- Process control using reinforcement learning
References
- Montgomery, D. C. (2019). Design and Analysis of Experiments (9th ed.). Wiley.
- Box, G. E. P., Hunter, J. S., & Hunter, W. G. (2005). Statistics for Experimenters: Design, Innovation, and Discovery (2nd ed.). Wiley.
- Seborg, D. E., Edgar, T. F., Mellichamp, D. A., & Doyle III, F. J. (2016). Process Dynamics and Control (4th ed.). Wiley.
- McKay, M. D., Beckman, R. J., & Conover, W. J. (2000). "A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output from a Computer Code." Technometrics, 42(1), 55-61.
Disclaimer
- This content is provided solely for educational, research, and informational purposes and does not constitute professional advice (legal, accounting, technical warranty, etc.).
- This content and accompanying code examples are provided "AS IS" without any warranty, express or implied, including but not limited to merchantability, fitness for a particular purpose, non-infringement, accuracy, completeness, operation, or safety.
- The author and Tohoku University assume no responsibility for the content, availability, or safety of external links, third-party data, tools, libraries, etc.
- To the maximum extent permitted by applicable law, the author and Tohoku University shall not be liable for any direct, indirect, incidental, special, consequential, or punitive damages arising from the use, execution, or interpretation of this content.
- The content may be changed, updated, or discontinued without notice.
- The copyright and license of this content are subject to the stated conditions (e.g., CC BY 4.0). Such licenses typically include no-warranty clauses.