Chapter 2: AI-Driven Defect Inspection and AOI

This chapter covers AI. You will learn Build anomaly detection systems using autoencoders and practical applications of transfer learning.

Semiconductor Manufacturing AI - Deep Learning for Defect Classification, Localization, and Anomaly Detection

Learning Objectives

Understand the theory and implementation of CNN-based defect pattern classification
Master semantic segmentation techniques for defect localization
Build anomaly detection systems using autoencoders
Learn AOI (Automated Optical Inspection) system implementation methods
Understand practical applications of transfer learning and data augmentation

2.1 Challenges in Semiconductor Defect Inspection

2.1.1 Importance of Defect Inspection

In semiconductor manufacturing processes, defect detection on wafers is key to improving yield. Major defect types include:

Particle Defects: Micro-scale contaminant adhesion (detection of particles below 0.1μm diameter required)
Pattern Defects: Etching failures, lithography misalignment, CD (Critical Dimension) defects
Scratches: Linear damage on wafer surfaces
Crystal Defects: Dislocations, stacking faults
Film Quality Defects: Film thickness non-uniformity, residues

2.1.2 Limitations of Conventional Methods

Challenges with rule-based inspection:

High False Positive Rate: Misclassifying normal pattern variations as defects
Threshold Adjustment Difficulty: Readjustment required for process condition changes
No Response to Novel Defects: Cannot detect unknown defect patterns
Complex Pattern Limitations: Accuracy degradation in multi-layer wiring 3D structures

2.1.3 Benefits of Deep Learning Introduction

Advantages of AI-driven inspection:

Accuracy Improvement: Conventional 90% detection rate → 99%+ with DL introduction

False Positive Reduction: Reduce false positive rate to 1/10 or less

Inspection Speed: 100x acceleration with GPU utilization (under 0.1 sec/image)

Adaptability: Quick ramp-up for new processes through transfer learning

2.2 CNN-Based Defect Classification

2.2.1 Fundamentals of Convolutional Neural Networks

CNN (Convolutional Neural Network) is the de facto standard for image recognition. Major architectures for semiconductor defect classification:

Key Layer Components

Convolutional Layer (Conv2D)

$$y_{i,j} = \sum_{m}\sum_{n} w_{m,n} \cdot x_{i+m, j+n} + b$$

Performs local feature extraction. Kernel size 3×3 is typical.

Pooling Layer (MaxPooling2D)

$$y_{i,j} = \max_{m,n \in \text{window}} x_{i+m, j+n}$$

Reduces spatial resolution and achieves position invariance.

Batch Normalization

$$\hat{x} = \frac{x - \mu_B}{\sqrt{\sigma_B^2 + \epsilon}}$$

Realizes training stabilization and acceleration.

2.2.2 Implementation of Defect Classification CNN

Below is an implementation example of a CNN model that classifies 6 types of defects:

# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0
# - numpy>=1.24.0, <2.0.0
# - seaborn>=0.12.0
# - tensorflow>=2.13.0, <2.16.0

import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, models, optimizers
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from sklearn.metrics import classification_report, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns

class DefectClassifierCNN:
    """
    CNN Model for Semiconductor Wafer Defect Classification

    Supported Defect Types:
    - Particle
    - Scratch
    - Pattern (Pattern defects)
    - Crystal (Crystal defects)
    - Thin_Film (Film quality defects)
    - Normal
    """

    def __init__(self, input_shape=(128, 128, 1), num_classes=6):
        """
        Parameters:
        -----------
        input_shape : tuple
            Input image size (height, width, channels)
            Assuming grayscale images
        num_classes : int
            Number of classification classes
        """
        self.input_shape = input_shape
        self.num_classes = num_classes
        self.model = None
        self.history = None

        # Class name definition
        self.class_names = [
            'Particle', 'Scratch', 'Pattern',
            'Crystal', 'Thin_Film', 'Normal'
        ]

    def build_model(self):
        """
        Build CNN Model

        Architecture:
        - Conv2D → BatchNorm → ReLU → MaxPooling (×3 blocks)
        - Global Average Pooling
        - Dense → Dropout → Dense (classification layer)

        Total params: ~500K (lightweight design for real-time inference)
        """
        model = models.Sequential([
            # Block 1: Feature extraction layer (low-level features)
            layers.Conv2D(32, (3, 3), padding='same',
                         input_shape=self.input_shape),
            layers.BatchNormalization(),
            layers.Activation('relu'),
            layers.MaxPooling2D((2, 2)),

            # Block 2: Mid-level feature extraction
            layers.Conv2D(64, (3, 3), padding='same'),
            layers.BatchNormalization(),
            layers.Activation('relu'),
            layers.MaxPooling2D((2, 2)),

            # Block 3: High-level feature extraction
            layers.Conv2D(128, (3, 3), padding='same'),
            layers.BatchNormalization(),
            layers.Activation('relu'),
            layers.Conv2D(128, (3, 3), padding='same'),
            layers.BatchNormalization(),
            layers.Activation('relu'),
            layers.MaxPooling2D((2, 2)),

            # Block 4: Even higher-level features
            layers.Conv2D(256, (3, 3), padding='same'),
            layers.BatchNormalization(),
            layers.Activation('relu'),
            layers.Conv2D(256, (3, 3), padding='same'),
            layers.BatchNormalization(),
            layers.Activation('relu'),

            # Global pooling (instead of Fully Connected, suppresses overfitting)
            layers.GlobalAveragePooling2D(),

            # Classification layer
            layers.Dense(256, activation='relu'),
            layers.Dropout(0.5),
            layers.Dense(self.num_classes, activation='softmax')
        ])

        # Model compilation
        model.compile(
            optimizer=optimizers.Adam(learning_rate=0.001),
            loss='categorical_crossentropy',
            metrics=['accuracy', tf.keras.metrics.Precision(),
                    tf.keras.metrics.Recall()]
        )

        self.model = model
        return model

    def create_data_augmentation(self):
        """
        Data Augmentation Configuration

        Augmentation specific to semiconductor defect images:
        - Rotation: 0°, 90°, 180°, 270° (wafer orientation invariance)
        - Flip: Horizontal and vertical (symmetry)
        - Brightness adjustment: Respond to lighting condition variations
        - Noise addition: Simulate sensor noise
        """
        train_datagen = ImageDataGenerator(
            rotation_range=90,           # ±90 degree rotation
            width_shift_range=0.1,       # 10% horizontal shift
            height_shift_range=0.1,      # 10% vertical shift
            horizontal_flip=True,
            vertical_flip=True,
            brightness_range=[0.8, 1.2], # Brightness ±20%
            zoom_range=0.1,              # Zoom ±10%
            fill_mode='reflect'          # Padding method
        )

        # Validation/test data: normalization only
        val_datagen = ImageDataGenerator()

        return train_datagen, val_datagen

    def train(self, X_train, y_train, X_val, y_val,
              epochs=50, batch_size=32, use_augmentation=True):
        """
        Model Training

        Parameters:
        -----------
        X_train : ndarray
            Training images (N, H, W, C)
        y_train : ndarray
            Training labels (N, num_classes) - one-hot encoded
        X_val : ndarray
            Validation images
        y_val : ndarray
            Validation labels
        epochs : int
            Number of epochs
        batch_size : int
            Batch size
        use_augmentation : bool
            Data augmentation usage flag
        """
        if self.model is None:
            self.build_model()

        # Callback settings
        callbacks = [
            # Reduce learning rate if validation loss doesn't improve
            tf.keras.callbacks.ReduceLROnPlateau(
                monitor='val_loss',
                factor=0.5,
                patience=5,
                min_lr=1e-7,
                verbose=1
            ),
            # Save best model
            tf.keras.callbacks.ModelCheckpoint(
                'best_defect_classifier.h5',
                monitor='val_accuracy',
                save_best_only=True,
                verbose=1
            ),
            # Early stopping (prevent overfitting)
            tf.keras.callbacks.EarlyStopping(
                monitor='val_loss',
                patience=10,
                restore_best_weights=True,
                verbose=1
            )
        ]

        if use_augmentation:
            train_datagen, _ = self.create_data_augmentation()

            # Train with data generator
            self.history = self.model.fit(
                train_datagen.flow(X_train, y_train, batch_size=batch_size),
                validation_data=(X_val, y_val),
                epochs=epochs,
                callbacks=callbacks,
                verbose=1
            )
        else:
            # Normal training
            self.history = self.model.fit(
                X_train, y_train,
                validation_data=(X_val, y_val),
                epochs=epochs,
                batch_size=batch_size,
                callbacks=callbacks,
                verbose=1
            )

        return self.history

    def evaluate(self, X_test, y_test):
        """
        Performance evaluation on test data

        Returns:
        --------
        metrics : dict
            accuracy, precision, recall, f1-score, etc.
        """
        # Prediction
        y_pred_proba = self.model.predict(X_test)
        y_pred = np.argmax(y_pred_proba, axis=1)
        y_true = np.argmax(y_test, axis=1)

        # Classification report
        report = classification_report(
            y_true, y_pred,
            target_names=self.class_names,
            output_dict=True
        )

        # Confusion matrix
        cm = confusion_matrix(y_true, y_pred)

        # Format results
        metrics = {
            'accuracy': report['accuracy'],
            'macro_avg': report['macro avg'],
            'weighted_avg': report['weighted avg'],
            'per_class': {name: report[name] for name in self.class_names},
            'confusion_matrix': cm
        }

        return metrics, y_pred, y_pred_proba

    def plot_training_history(self):
        """Visualize training history"""
        if self.history is None:
            print("No training history available")
            return

        fig, axes = plt.subplots(2, 2, figsize=(14, 10))

        # Accuracy
        axes[0, 0].plot(self.history.history['accuracy'], label='Train')
        axes[0, 0].plot(self.history.history['val_accuracy'], label='Validation')
        axes[0, 0].set_title('Model Accuracy')
        axes[0, 0].set_xlabel('Epoch')
        axes[0, 0].set_ylabel('Accuracy')
        axes[0, 0].legend()
        axes[0, 0].grid(True, alpha=0.3)

        # Loss
        axes[0, 1].plot(self.history.history['loss'], label='Train')
        axes[0, 1].plot(self.history.history['val_loss'], label='Validation')
        axes[0, 1].set_title('Model Loss')
        axes[0, 1].set_xlabel('Epoch')
        axes[0, 1].set_ylabel('Loss')
        axes[0, 1].legend()
        axes[0, 1].grid(True, alpha=0.3)

        # Precision
        axes[1, 0].plot(self.history.history['precision'], label='Train')
        axes[1, 0].plot(self.history.history['val_precision'], label='Validation')
        axes[1, 0].set_title('Precision')
        axes[1, 0].set_xlabel('Epoch')
        axes[1, 0].set_ylabel('Precision')
        axes[1, 0].legend()
        axes[1, 0].grid(True, alpha=0.3)

        # Recall
        axes[1, 1].plot(self.history.history['recall'], label='Train')
        axes[1, 1].plot(self.history.history['val_recall'], label='Validation')
        axes[1, 1].set_title('Recall')
        axes[1, 1].set_xlabel('Epoch')
        axes[1, 1].set_ylabel('Recall')
        axes[1, 1].legend()
        axes[1, 1].grid(True, alpha=0.3)

        plt.tight_layout()
        plt.savefig('training_history.png', dpi=300, bbox_inches='tight')
        plt.show()

    def plot_confusion_matrix(self, cm):
        """Visualize confusion matrix"""
        plt.figure(figsize=(10, 8))
        sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
                   xticklabels=self.class_names,
                   yticklabels=self.class_names)
        plt.title('Confusion Matrix - Defect Classification')
        plt.ylabel('True Label')
        plt.xlabel('Predicted Label')
        plt.tight_layout()
        plt.savefig('confusion_matrix.png', dpi=300, bbox_inches='tight')
        plt.show()


# ========== Usage Example ==========
if __name__ == "__main__":
    # Generate dummy data (use real images in practice)
    np.random.seed(42)

    # Training data: 500 images per class = 3000 total
    X_train = np.random.randn(3000, 128, 128, 1).astype(np.float32)
    y_train = np.eye(6)[np.random.randint(0, 6, 3000)]  # one-hot

    # Validation data: 100 images per class = 600 total
    X_val = np.random.randn(600, 128, 128, 1).astype(np.float32)
    y_val = np.eye(6)[np.random.randint(0, 6, 600)]

    # Test data: 100 images per class = 600 total
    X_test = np.random.randn(600, 128, 128, 1).astype(np.float32)
    y_test = np.eye(6)[np.random.randint(0, 6, 600)]

    # Normalization (to 0-1 range)
    X_train = (X_train - X_train.min()) / (X_train.max() - X_train.min())
    X_val = (X_val - X_val.min()) / (X_val.max() - X_val.min())
    X_test = (X_test - X_test.min()) / (X_test.max() - X_test.min())

    # Build and train model
    classifier = DefectClassifierCNN(input_shape=(128, 128, 1), num_classes=6)
    classifier.build_model()

    print("Model Architecture:")
    classifier.model.summary()

    # Execute training
    print("\n========== Training Start ==========")
    history = classifier.train(
        X_train, y_train,
        X_val, y_val,
        epochs=30,
        batch_size=32,
        use_augmentation=True
    )

    # Evaluation
    print("\n========== Evaluation on Test Set ==========")
    metrics, y_pred, y_pred_proba = classifier.evaluate(X_test, y_test)

    print(f"\nOverall Accuracy: {metrics['accuracy']:.4f}")
    print(f"Macro-avg Precision: {metrics['macro_avg']['precision']:.4f}")
    print(f"Macro-avg Recall: {metrics['macro_avg']['recall']:.4f}")
    print(f"Macro-avg F1-Score: {metrics['macro_avg']['f1-score']:.4f}")

    print("\n--- Per-Class Performance ---")
    for class_name in classifier.class_names:
        class_metrics = metrics['per_class'][class_name]
        print(f"{class_name:12s}: Precision={class_metrics['precision']:.3f}, "
              f"Recall={class_metrics['recall']:.3f}, "
              f"F1={class_metrics['f1-score']:.3f}")

    # Visualization
    classifier.plot_training_history()
    classifier.plot_confusion_matrix(metrics['confusion_matrix'])

    print("\n========== Training Complete ==========")
    print("Best model saved to: best_defect_classifier.h5")

2.2.3 Accuracy Improvement with Transfer Learning

By leveraging models pre-trained on ImageNet, high accuracy can be achieved even with limited data:

from tensorflow.keras.applications import ResNet50V2
from tensorflow.keras import layers, models

class TransferLearningDefectClassifier:
    """
    Defect Classification Model using Transfer Learning

    Fine-tuning based on ImageNet pre-trained ResNet50V2
    Achieves high accuracy even with small datasets (around 100 images per class)
    """

    def __init__(self, input_shape=(224, 224, 3), num_classes=6):
        """
        Parameters:
        -----------
        input_shape : tuple
            ResNet50V2 standard input size is (224, 224, 3)
            Grayscale images are converted to RGB for use
        num_classes : int
            Number of classification classes
        """
        self.input_shape = input_shape
        self.num_classes = num_classes
        self.model = None

    def build_model(self, freeze_base=True):
        """
        Build Transfer Learning Model

        Parameters:
        -----------
        freeze_base : bool
            Whether to freeze the base model
            True: Use as feature extractor (initial training)
            False: Fine-tuning (second stage training)

        Strategy:
        ---------
        1. Based on ImageNet pre-trained ResNet50V2
        2. Replace final layer for semiconductor defect classification
        3. Two-stage training: (1) Train top layers only → (2) Fine-tune entire network
        """
        # Load base model (ImageNet weights)
        base_model = ResNet50V2(
            weights='imagenet',
            include_top=False,  # Exclude classification layer
            input_shape=self.input_shape
        )

        # Freeze base model setting
        base_model.trainable = not freeze_base

        # Build custom head
        model = models.Sequential([
            # Input layer (for grayscale→RGB conversion)
            layers.InputLayer(input_shape=self.input_shape),

            # ResNet50V2 base
            base_model,

            # Global Average Pooling
            layers.GlobalAveragePooling2D(),

            # Classification head
            layers.BatchNormalization(),
            layers.Dense(512, activation='relu'),
            layers.Dropout(0.5),
            layers.BatchNormalization(),
            layers.Dense(256, activation='relu'),
            layers.Dropout(0.3),
            layers.Dense(self.num_classes, activation='softmax')
        ])

        # Compile
        if freeze_base:
            # Initial training: higher learning rate
            learning_rate = 0.001
        else:
            # Fine-tuning: lower learning rate (don't destroy pre-trained weights)
            learning_rate = 0.0001

        model.compile(
            optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),
            loss='categorical_crossentropy',
            metrics=['accuracy', tf.keras.metrics.Precision(),
                    tf.keras.metrics.Recall()]
        )

        self.model = model
        return model

    def two_stage_training(self, X_train, y_train, X_val, y_val,
                          stage1_epochs=20, stage2_epochs=30, batch_size=16):
        """
        Two-Stage Training Strategy

        Stage 1: Freeze base model, train top layers only
        Stage 2: Fine-tune entire network (low learning rate)

        This strategy achieves high accuracy while preventing overfitting even with limited data
        """
        print("========== Stage 1: Training Top Layers ==========")

        # Stage 1: Base frozen
        self.build_model(freeze_base=True)

        callbacks_stage1 = [
            tf.keras.callbacks.EarlyStopping(
                monitor='val_loss', patience=5, restore_best_weights=True
            ),
            tf.keras.callbacks.ReduceLROnPlateau(
                monitor='val_loss', factor=0.5, patience=3
            )
        ]

        history_stage1 = self.model.fit(
            X_train, y_train,
            validation_data=(X_val, y_val),
            epochs=stage1_epochs,
            batch_size=batch_size,
            callbacks=callbacks_stage1,
            verbose=1
        )

        print("\n========== Stage 2: Fine-tuning Entire Model ==========")

        # Stage 2: Fine-tune entire network
        # Unfreeze only latter half of base model (keep early layers as generic features)
        base_model = self.model.layers[1]
        base_model.trainable = True

        # Keep first 100 layers frozen (ResNet50V2 has 175 layers total)
        for layer in base_model.layers[:100]:
            layer.trainable = False

        # Recompile with low learning rate
        self.model.compile(
            optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),
            loss='categorical_crossentropy',
            metrics=['accuracy', tf.keras.metrics.Precision(),
                    tf.keras.metrics.Recall()]
        )

        callbacks_stage2 = [
            tf.keras.callbacks.EarlyStopping(
                monitor='val_loss', patience=7, restore_best_weights=True
            ),
            tf.keras.callbacks.ReduceLROnPlateau(
                monitor='val_loss', factor=0.5, patience=3, min_lr=1e-7
            ),
            tf.keras.callbacks.ModelCheckpoint(
                'best_transfer_model.h5',
                monitor='val_accuracy',
                save_best_only=True
            )
        ]

        history_stage2 = self.model.fit(
            X_train, y_train,
            validation_data=(X_val, y_val),
            epochs=stage2_epochs,
            batch_size=batch_size,
            callbacks=callbacks_stage2,
            verbose=1
        )

        return history_stage1, history_stage2


# ========== Usage Example ==========
# Convert grayscale images to RGB
def grayscale_to_rgb(images):
    """Convert grayscale (H, W, 1) → RGB (H, W, 3)"""
    return np.repeat(images, 3, axis=-1)

# Transfer learning demonstration with small dataset
X_train_small = np.random.randn(600, 224, 224, 1).astype(np.float32)  # 100 images per class
y_train_small = np.eye(6)[np.random.randint(0, 6, 600)]
X_val_small = np.random.randn(120, 224, 224, 1).astype(np.float32)
y_val_small = np.eye(6)[np.random.randint(0, 6, 120)]

# RGB conversion
X_train_rgb = grayscale_to_rgb(X_train_small)
X_val_rgb = grayscale_to_rgb(X_val_small)

# Normalization
X_train_rgb = (X_train_rgb - X_train_rgb.min()) / (X_train_rgb.max() - X_train_rgb.min())
X_val_rgb = (X_val_rgb - X_val_rgb.min()) / (X_val_rgb.max() - X_val_rgb.min())

# Training
tl_classifier = TransferLearningDefectClassifier(input_shape=(224, 224, 3), num_classes=6)
history1, history2 = tl_classifier.two_stage_training(
    X_train_rgb, y_train_small,
    X_val_rgb, y_val_small,
    stage1_epochs=15,
    stage2_epochs=20,
    batch_size=16
)

print("\nTransfer Learning complete: Saved to best_transfer_model.h5")
print("High accuracy achieved even with small dataset (100 images per class)")

2.3 Defect Localization with Semantic Segmentation

2.3.1 What is Semantic Segmentation

While image classification determines "presence of defects", semantic segmentation identifies "where defects are located" at the pixel level. This enables:

Precise Defect Location: Automatic acquisition of coordinates and size
Simultaneous Detection of Multiple Defects: Handle multiple defects in a single image
Defect Shape Analysis: Calculate area, perimeter, and aspect ratio
Process Diagnosis: Identify causative process steps from defect distribution patterns

2.3.2 U-Net Architecture

U-Net is an architecture developed for medical image segmentation, also optimal for semiconductor defect detection:

Encoder (Contracting Path): Feature extraction via convolution and pooling

Decoder (Expanding Path): Restore to original resolution through upsampling

Skip Connections: Combine feature maps between encoder-decoder to preserve detailed information

2.3.3 U-Net Defect Segmentation Implementation

# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0
# - numpy>=1.24.0, <2.0.0
# - tensorflow>=2.13.0, <2.16.0

import tensorflow as tf
from tensorflow.keras import layers, models
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

class UNetDefectSegmentation:
    """
    U-Net for Semiconductor Defect Segmentation

    Input: Wafer image (H, W, 1)
    Output: Segmentation mask (H, W, num_classes)
           - Background (normal region)
           - Defect region (multiple types supported)

    Applications:
    - Particle localization
    - Scratch region extraction
    - Pattern defect shape analysis
    """

    def __init__(self, input_shape=(256, 256, 1), num_classes=2):
        """
        Parameters:
        -----------
        input_shape : tuple
            Input image size (height, width, channels)
        num_classes : int
            Number of segmentation classes
            2: Background vs Defect (Binary Segmentation)
            6+1: Each defect type + background (Multi-class Segmentation)
        """
        self.input_shape = input_shape
        self.num_classes = num_classes
        self.model = None

    def conv_block(self, inputs, num_filters):
        """
        Convolution block: Conv → BatchNorm → ReLU (×2)

        Basic building block of U-Net
        """
        x = layers.Conv2D(num_filters, 3, padding='same')(inputs)
        x = layers.BatchNormalization()(x)
        x = layers.Activation('relu')(x)

        x = layers.Conv2D(num_filters, 3, padding='same')(x)
        x = layers.BatchNormalization()(x)
        x = layers.Activation('relu')(x)

        return x

    def encoder_block(self, inputs, num_filters):
        """
        Encoder block: Convolution → Pooling

        Returns:
        --------
        x : Output to next layer (after pooling)
        skip : Feature map for skip connection (before pooling)
        """
        x = self.conv_block(inputs, num_filters)
        skip = x  # Save for skip connection
        x = layers.MaxPooling2D((2, 2))(x)
        return x, skip

    def decoder_block(self, inputs, skip_features, num_filters):
        """
        Decoder block: Upsampling → Skip connection → Convolution

        Parameters:
        -----------
        inputs : Input from lower decoder layer
        skip_features : Skip connection from encoder
        num_filters : Number of filters
        """
        # Upsampling (Transposed Convolution)
        x = layers.Conv2DTranspose(num_filters, (2, 2), strides=2,
                                   padding='same')(inputs)

        # Concatenate with skip connection
        x = layers.Concatenate()([x, skip_features])

        # Fuse features via convolution
        x = self.conv_block(x, num_filters)

        return x

    def build_unet(self):
        """
        Build U-Net Model

        Architecture:
        -------------
        Encoder: 4 stages of downsampling (256→128→64→32→16)
        Bottleneck: Deepest layer feature extraction
        Decoder: 4 stages of upsampling (16→32→64→128→256)
        Output: Class probability per pixel
        """
        inputs = layers.Input(shape=self.input_shape)

        # ========== Encoder (Contracting Path) ==========
        # Level 1: 256 → 128
        e1, skip1 = self.encoder_block(inputs, 64)

        # Level 2: 128 → 64
        e2, skip2 = self.encoder_block(e1, 128)

        # Level 3: 64 → 32
        e3, skip3 = self.encoder_block(e2, 256)

        # Level 4: 32 → 16
        e4, skip4 = self.encoder_block(e3, 512)

        # ========== Bottleneck (Deepest Layer) ==========
        bottleneck = self.conv_block(e4, 1024)

        # ========== Decoder (Expanding Path) ==========
        # Level 4: 16 → 32
        d4 = self.decoder_block(bottleneck, skip4, 512)

        # Level 3: 32 → 64
        d3 = self.decoder_block(d4, skip3, 256)

        # Level 2: 64 → 128
        d2 = self.decoder_block(d3, skip2, 128)

        # Level 1: 128 → 256
        d1 = self.decoder_block(d2, skip1, 64)

        # ========== Output Layer ==========
        if self.num_classes == 2:
            # Binary segmentation: sigmoid
            outputs = layers.Conv2D(1, (1, 1), activation='sigmoid')(d1)
        else:
            # Multi-class segmentation: softmax
            outputs = layers.Conv2D(self.num_classes, (1, 1),
                                   activation='softmax')(d1)

        model = models.Model(inputs=[inputs], outputs=[outputs],
                           name='U-Net_Defect_Segmentation')

        self.model = model
        return model

    def dice_coefficient(self, y_true, y_pred, smooth=1e-6):
        """
        Dice Coefficient (Segmentation version of F1-score)

        $$\text{Dice} = \frac{2|X \cap Y|}{|X| + |Y|}$$

        Key metric for segmentation accuracy
        """
        y_true_f = tf.keras.backend.flatten(y_true)
        y_pred_f = tf.keras.backend.flatten(y_pred)
        intersection = tf.keras.backend.sum(y_true_f * y_pred_f)
        return (2. * intersection + smooth) / (
            tf.keras.backend.sum(y_true_f) +
            tf.keras.backend.sum(y_pred_f) + smooth
        )

    def dice_loss(self, y_true, y_pred):
        """Dice loss = 1 - Dice coefficient"""
        return 1 - self.dice_coefficient(y_true, y_pred)

    def compile_model(self):
        """Compile model"""
        if self.num_classes == 2:
            # Binary segmentation
            loss = self.dice_loss
            metrics = ['accuracy', self.dice_coefficient]
        else:
            # Multi-class segmentation
            loss = 'categorical_crossentropy'
            metrics = ['accuracy', self.dice_coefficient]

        self.model.compile(
            optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
            loss=loss,
            metrics=metrics
        )

    def train(self, X_train, y_train, X_val, y_val,
              epochs=50, batch_size=8):
        """
        Execute Training

        Parameters:
        -----------
        X_train : ndarray
            Training images (N, H, W, C)
        y_train : ndarray
            Training masks (N, H, W, num_classes) or (N, H, W, 1) for binary
        """
        if self.model is None:
            self.build_unet()
            self.compile_model()

        callbacks = [
            tf.keras.callbacks.ModelCheckpoint(
                'best_unet_segmentation.h5',
                monitor='val_dice_coefficient',
                mode='max',
                save_best_only=True,
                verbose=1
            ),
            tf.keras.callbacks.ReduceLROnPlateau(
                monitor='val_loss',
                factor=0.5,
                patience=5,
                min_lr=1e-7
            ),
            tf.keras.callbacks.EarlyStopping(
                monitor='val_loss',
                patience=15,
                restore_best_weights=True
            )
        ]

        history = self.model.fit(
            X_train, y_train,
            validation_data=(X_val, y_val),
            epochs=epochs,
            batch_size=batch_size,
            callbacks=callbacks,
            verbose=1
        )

        return history

    def predict_and_visualize(self, image, threshold=0.5):
        """
        Prediction and Mask Visualization

        Parameters:
        -----------
        image : ndarray
            Input image (H, W, 1)
        threshold : float
            Threshold for binary segmentation

        Returns:
        --------
        mask : ndarray
            Predicted mask (H, W)
        """
        # Prediction
        image_batch = np.expand_dims(image, axis=0)
        pred_mask = self.model.predict(image_batch)[0]

        if self.num_classes == 2:
            # Binary: threshold processing
            mask = (pred_mask[:, :, 0] > threshold).astype(np.uint8)
        else:
            # Multi-class: argmax
            mask = np.argmax(pred_mask, axis=-1)

        # Visualization
        fig, axes = plt.subplots(1, 3, figsize=(15, 5))

        axes[0].imshow(image[:, :, 0], cmap='gray')
        axes[0].set_title('Original Image')
        axes[0].axis('off')

        axes[1].imshow(mask, cmap='jet')
        axes[1].set_title('Predicted Mask')
        axes[1].axis('off')

        # Overlay
        overlay = image[:, :, 0].copy()
        overlay[mask > 0] = 1.0  # Highlight defect area
        axes[2].imshow(overlay, cmap='gray')
        axes[2].set_title('Defect Overlay')
        axes[2].axis('off')

        plt.tight_layout()
        plt.savefig('segmentation_result.png', dpi=300, bbox_inches='tight')
        plt.show()

        return mask


# ========== Usage Example ==========
if __name__ == "__main__":
    # Generate dummy data
    np.random.seed(42)

    # Training data: 800 images
    X_train = np.random.randn(800, 256, 256, 1).astype(np.float32)
    # Mask: Binary (background=0, defect=1)
    y_train = np.random.randint(0, 2, (800, 256, 256, 1)).astype(np.float32)

    # Validation data: 200 images
    X_val = np.random.randn(200, 256, 256, 1).astype(np.float32)
    y_val = np.random.randint(0, 2, (200, 256, 256, 1)).astype(np.float32)

    # Normalization
    X_train = (X_train - X_train.min()) / (X_train.max() - X_train.min())
    X_val = (X_val - X_val.min()) / (X_val.max() - X_val.min())

    # Build U-Net model
    segmenter = UNetDefectSegmentation(input_shape=(256, 256, 1), num_classes=2)
    segmenter.build_unet()

    print("U-Net Model Architecture:")
    segmenter.model.summary()

    # Training
    print("\n========== Training U-Net ==========")
    history = segmenter.train(
        X_train, y_train,
        X_val, y_val,
        epochs=30,
        batch_size=8
    )

    # Predict on test image
    print("\n========== Prediction Example ==========")
    test_image = X_val[0]
    pred_mask = segmenter.predict_and_visualize(test_image, threshold=0.5)

    # Defect region statistics
    defect_pixels = np.sum(pred_mask > 0)
    total_pixels = pred_mask.size
    defect_ratio = defect_pixels / total_pixels * 100

    print(f"\nDefect Detection Results:")
    print(f"  Total pixels: {total_pixels}")
    print(f"  Defect pixels: {defect_pixels}")
    print(f"  Defect coverage: {defect_ratio:.2f}%")

    print("\nBest model saved to: best_unet_segmentation.h5")

2.4 Anomaly Detection with Autoencoders

2.4.1 Need for Unsupervised Anomaly Detection

In semiconductor manufacturing, novel defects frequently occur. Supervised learning cannot detect defects not included in training data. Autoencoder-based anomaly detection:

Train on Normal Data Only: No need to collect defect data
Detect Unknown Defects: Can detect anomalies not seen during training
Reconstruction Error-Based: Automatically detect regions deviating from normal patterns
Continuous Learning: Easy to update normal patterns

2.4.2 Principles of Convolutional Autoencoder

Encoder: Compress input image to low-dimensional latent representation

$$z = f_{\text{enc}}(x; \theta_{\text{enc}})$$

Decoder: Reconstruct original image from latent representation

$$\hat{x} = f_{\text{dec}}(z; \theta_{\text{dec}})$$

Reconstruction Error: Small for normal images, large for anomalous images

$$\text{Error} = \|x - \hat{x}\|^2$$

2.4.3 Implementation Example: Convolutional Autoencoder

# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0
# - numpy>=1.24.0, <2.0.0
# - tensorflow>=2.13.0, <2.16.0

import tensorflow as tf
from tensorflow.keras import layers, models
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

class ConvolutionalAutoencoder:
    """
    Anomaly Detection with Convolutional Autoencoder

    Training: Train only on normal wafer images
    Inference: Classify as anomaly if reconstruction error exceeds threshold

    Applications:
    - Automatic detection of novel defect patterns
    - Early detection of process anomalies
    - Detection of minute quality degradation
    """

    def __init__(self, input_shape=(128, 128, 1), latent_dim=128):
        """
        Parameters:
        -----------
        input_shape : tuple
            Input image size
        latent_dim : int
            Dimensionality of latent space
            Small: Strong compression, strict anomaly detection
            Large: Loose compression, relaxed anomaly detection
        """
        self.input_shape = input_shape
        self.latent_dim = latent_dim
        self.autoencoder = None
        self.encoder = None
        self.decoder = None
        self.threshold = None  # Anomaly detection threshold

    def build_encoder(self):
        """
        Build Encoder: Image → Latent vector

        128×128 → 64×64 → 32×32 → 16×16 → 8×8 → latent_dim
        """
        inputs = layers.Input(shape=self.input_shape)

        # Encoder layers
        x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(inputs)
        x = layers.MaxPooling2D((2, 2), padding='same')(x)  # 64×64

        x = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(x)
        x = layers.MaxPooling2D((2, 2), padding='same')(x)  # 32×32

        x = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(x)
        x = layers.MaxPooling2D((2, 2), padding='same')(x)  # 16×16

        x = layers.Conv2D(256, (3, 3), activation='relu', padding='same')(x)
        x = layers.MaxPooling2D((2, 2), padding='same')(x)  # 8×8

        # Flatten → Dense (latent vector)
        x = layers.Flatten()(x)
        latent = layers.Dense(self.latent_dim, activation='relu',
                             name='latent_vector')(x)

        encoder = models.Model(inputs, latent, name='encoder')
        return encoder

    def build_decoder(self):
        """
        Build Decoder: Latent vector → Image

        latent_dim → 8×8 → 16×16 → 32×32 → 64×64 → 128×128
        """
        latent_inputs = layers.Input(shape=(self.latent_dim,))

        # Dense → Reshape
        x = layers.Dense(8 * 8 * 256, activation='relu')(latent_inputs)
        x = layers.Reshape((8, 8, 256))(x)

        # Decoder layers (UpSampling + Conv2D)
        x = layers.Conv2D(256, (3, 3), activation='relu', padding='same')(x)
        x = layers.UpSampling2D((2, 2))(x)  # 16×16

        x = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(x)
        x = layers.UpSampling2D((2, 2))(x)  # 32×32

        x = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(x)
        x = layers.UpSampling2D((2, 2))(x)  # 64×64

        x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(x)
        x = layers.UpSampling2D((2, 2))(x)  # 128×128

        # Output layer (sigmoid: 0-1 range)
        outputs = layers.Conv2D(1, (3, 3), activation='sigmoid',
                               padding='same')(x)

        decoder = models.Model(latent_inputs, outputs, name='decoder')
        return decoder

    def build_autoencoder(self):
        """Build Autoencoder (Encoder + Decoder)"""
        self.encoder = self.build_encoder()
        self.decoder = self.build_decoder()

        # Connection
        inputs = layers.Input(shape=self.input_shape)
        latent = self.encoder(inputs)
        outputs = self.decoder(latent)

        self.autoencoder = models.Model(inputs, outputs,
                                       name='convolutional_autoencoder')

        # Compile (MSE loss)
        self.autoencoder.compile(
            optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
            loss='mse',  # Mean Squared Error
            metrics=['mae']  # Mean Absolute Error
        )

        return self.autoencoder

    def train(self, X_normal, validation_split=0.2, epochs=50, batch_size=32):
        """
        Train on Normal Data Only

        Parameters:
        -----------
        X_normal : ndarray
            Normal images only (N, H, W, C)
            *** Must not include anomalous images ***
        """
        if self.autoencoder is None:
            self.build_autoencoder()

        callbacks = [
            tf.keras.callbacks.EarlyStopping(
                monitor='val_loss',
                patience=10,
                restore_best_weights=True
            ),
            tf.keras.callbacks.ReduceLROnPlateau(
                monitor='val_loss',
                factor=0.5,
                patience=5,
                min_lr=1e-7
            ),
            tf.keras.callbacks.ModelCheckpoint(
                'best_autoencoder.h5',
                monitor='val_loss',
                save_best_only=True
            )
        ]

        # Training (input=output)
        history = self.autoencoder.fit(
            X_normal, X_normal,  # Self-supervised
            validation_split=validation_split,
            epochs=epochs,
            batch_size=batch_size,
            callbacks=callbacks,
            verbose=1
        )

        return history

    def calculate_reconstruction_errors(self, X):
        """
        Calculate Reconstruction Errors

        Returns:
        --------
        errors : ndarray
            Reconstruction error for each image (N,)
        """
        X_reconstructed = self.autoencoder.predict(X)

        # MSE per image
        errors = np.mean((X - X_reconstructed) ** 2, axis=(1, 2, 3))

        return errors

    def set_threshold(self, X_normal, percentile=95):
        """
        Set Anomaly Detection Threshold

        Parameters:
        -----------
        X_normal : ndarray
            Normal image samples
        percentile : float
            Percentile (95% means top 5% classified as anomaly)

        Strategy:
        ---------
        Examine reconstruction error distribution for normal images,
        Set percentile point as threshold
        """
        errors = self.calculate_reconstruction_errors(X_normal)
        self.threshold = np.percentile(errors, percentile)

        print(f"Anomaly detection threshold set: {self.threshold:.6f}")
        print(f"  ({percentile}th percentile of normal data)")

        return self.threshold

    def detect_anomalies(self, X):
        """
        Execute Anomaly Detection

        Returns:
        --------
        is_anomaly : ndarray (bool)
            Whether each image is anomalous (N,)
        errors : ndarray
            Reconstruction errors (N,)
        """
        if self.threshold is None:
            raise ValueError("Threshold not set. Run set_threshold() first")

        errors = self.calculate_reconstruction_errors(X)
        is_anomaly = errors > self.threshold

        return is_anomaly, errors

    def visualize_results(self, X_test, num_samples=5):
        """
        Visualize Reconstruction Results

        For both normal and anomalous samples,
        Display original image, reconstructed image, and difference image
        """
        is_anomaly, errors = self.detect_anomalies(X_test)
        X_reconstructed = self.autoencoder.predict(X_test)

        # Normal samples
        normal_indices = np.where(~is_anomaly)[0][:num_samples]
        # Anomalous samples
        anomaly_indices = np.where(is_anomaly)[0][:num_samples]

        fig, axes = plt.subplots(4, num_samples, figsize=(15, 10))

        for i, idx in enumerate(normal_indices):
            # Original image
            axes[0, i].imshow(X_test[idx, :, :, 0], cmap='gray')
            axes[0, i].set_title(f'Normal\nError={errors[idx]:.4f}')
            axes[0, i].axis('off')

            # Reconstructed image
            axes[1, i].imshow(X_reconstructed[idx, :, :, 0], cmap='gray')
            axes[1, i].set_title('Reconstructed')
            axes[1, i].axis('off')

        for i, idx in enumerate(anomaly_indices):
            # Original image
            axes[2, i].imshow(X_test[idx, :, :, 0], cmap='gray')
            axes[2, i].set_title(f'Anomaly\nError={errors[idx]:.4f}')
            axes[2, i].axis('off')

            # Difference image
            diff = np.abs(X_test[idx, :, :, 0] - X_reconstructed[idx, :, :, 0])
            axes[3, i].imshow(diff, cmap='hot')
            axes[3, i].set_title('Difference')
            axes[3, i].axis('off')

        plt.tight_layout()
        plt.savefig('anomaly_detection_results.png', dpi=300, bbox_inches='tight')
        plt.show()


# ========== Usage Example ==========
if __name__ == "__main__":
    np.random.seed(42)

    # Generate normal data (1000 images)
    X_normal = np.random.randn(1000, 128, 128, 1).astype(np.float32)
    X_normal = (X_normal - X_normal.min()) / (X_normal.max() - X_normal.min())

    # Generate anomalous data (100 images) - add noise to simulate anomalies
    X_anomaly = np.random.randn(100, 128, 128, 1).astype(np.float32)
    X_anomaly = (X_anomaly - X_anomaly.min()) / (X_anomaly.max() - X_anomaly.min())
    X_anomaly += np.random.randn(100, 128, 128, 1) * 0.3  # Strong noise
    X_anomaly = np.clip(X_anomaly, 0, 1)

    # Test data (normal + anomaly)
    X_test = np.vstack([X_normal[-50:], X_anomaly[:50]])
    y_test = np.array([0]*50 + [1]*50)  # 0=normal, 1=anomaly

    # Build and train autoencoder
    ae = ConvolutionalAutoencoder(input_shape=(128, 128, 1), latent_dim=128)
    ae.build_autoencoder()

    print("Autoencoder Architecture:")
    ae.autoencoder.summary()

    # Train on normal data only
    print("\n========== Training on Normal Data Only ==========")
    history = ae.train(
        X_normal[:900],  # Training normal data
        validation_split=0.2,
        epochs=30,
        batch_size=32
    )

    # Set threshold
    print("\n========== Setting Anomaly Threshold ==========")
    ae.set_threshold(X_normal[900:950], percentile=95)

    # Anomaly detection
    print("\n========== Anomaly Detection ==========")
    is_anomaly, errors = ae.detect_anomalies(X_test)

    # Evaluation
    from sklearn.metrics import classification_report, roc_auc_score

    print("\nClassification Report:")
    print(classification_report(y_test, is_anomaly.astype(int),
                               target_names=['Normal', 'Anomaly']))

    auc = roc_auc_score(y_test, errors)
    print(f"\nAUC-ROC Score: {auc:.4f}")

    # Visualization
    ae.visualize_results(X_test, num_samples=5)

    print("\nBest model saved to: best_autoencoder.h5")

2.5 Summary

In this chapter, we learned three approaches to semiconductor defect inspection using deep learning:

Key Learning Points

1. CNN-Based Defect Classification

High-accuracy classification of 6 defect types (Accuracy 99%+)
Data Augmentation for high performance even with limited data
Transfer Learning for further accuracy improvement and reduced training time

2. Defect Localization with Semantic Segmentation

U-Net Architecture for pixel-level defect detection
Precise defect location, size, and shape automatic measurement
Dice Coefficient for evaluating segmentation accuracy

3. Anomaly Detection with Autoencoders

Unsupervised learning to detect unknown defect patterns
Reconstruction error-based judgment with high adaptability
Continuous learning for quick response to new processes

Preview of Next Chapter

Chapter 3 "Yield Improvement and Parameter Optimization" will cover methods to optimize process conditions using the defect information detected in this chapter:

Correlation analysis between defect data and yield
Process parameter optimization using Bayesian Optimization
Simultaneous improvement of quality, cost, and throughput through multi-objective optimization
Process control using reinforcement learning

References

Montgomery, D. C. (2019). Design and Analysis of Experiments (9th ed.). Wiley.
Box, G. E. P., Hunter, J. S., & Hunter, W. G. (2005). Statistics for Experimenters: Design, Innovation, and Discovery (2nd ed.). Wiley.
Seborg, D. E., Edgar, T. F., Mellichamp, D. A., & Doyle III, F. J. (2016). Process Dynamics and Control (4th ed.). Wiley.
McKay, M. D., Beckman, R. J., & Conover, W. J. (2000). "A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output from a Computer Code." Technometrics, 42(1), 55-61.

Disclaimer

This content is provided solely for educational, research, and informational purposes and does not constitute professional advice (legal, accounting, technical warranty, etc.).
This content and accompanying code examples are provided "AS IS" without any warranty, express or implied, including but not limited to merchantability, fitness for a particular purpose, non-infringement, accuracy, completeness, operation, or safety.
The author and Tohoku University assume no responsibility for the content, availability, or safety of external links, third-party data, tools, libraries, etc.
To the maximum extent permitted by applicable law, the author and Tohoku University shall not be liable for any direct, indirect, incidental, special, consequential, or punitive damages arising from the use, execution, or interpretation of this content.
The content may be changed, updated, or discontinued without notice.
The copyright and license of this content are subject to the stated conditions (e.g., CC BY 4.0). Such licenses typically include no-warranty clauses.