Chapter 5 Fault Detection & Classification (FDC)

This chapter covers Chapter 5 Fault Detection & Classification (FDC). You will learn Isolation Forest.

Semiconductor Manufacturing AI - Anomaly Detection, Fault Diagnosis, Root Cause Analysis

Learning Objectives

Master multivariate anomaly detection using Multivariate SPC (MSPC)
Understand Isolation Forest and its applications to semiconductor manufacturing
Learn implementation methods for time-series anomaly detection using LSTM
Master techniques for identifying root causes of failures using causal inference
Understand methods for improving machine learning model interpretability using SHAP values

5.1 Importance of Fault Detection & Classification (FDC)

5.1.1 Role of FDC

In semiconductor manufacturing, early detection of process anomalies is key to improving yield. FDC systems provide:

Fault Detection: Real-time detection of process anomalies
Fault Classification: Automatic diagnosis of anomaly types
Root Cause Analysis: Identification of true causes of anomalies
Predictive Maintenance: Detection of abnormal signs before failure

5.1.2 Economic Value of Early Detection

Downtime Reduction: 1 hour of stoppage = tens of millions of yen in losses

Defect Reduction: Delayed anomaly detection can result in hundreds of defective wafers

Yield Improvement: Early response leads to 2-5% yield improvement

Maintenance Cost Reduction: Preventive maintenance reduces corrective maintenance costs to 1/3

5.1.3 Advantages of AI-FDC

Advantages of AI over conventional threshold-based FDC:

Multivariate Correlation: Detects complex correlations among 100+ sensors
Micro-change Detection: Identifies abnormal patterns within normal ranges
False Positive Reduction: Reduces false positive rate to 1/10 or less
Unknown Anomaly Detection: Discovers novel anomalies not included in training data

5.2 Multivariate Statistical Process Control (MSPC)

5.2.1 Principles of MSPC

MSPC reduces multivariate data dimensionality using Principal Component Analysis (PCA) and detects anomalies with statistical control charts:

Principal Component Analysis (PCA)

Project observed variables $\mathbf{x} \in \mathbb{R}^m$ onto principal component space:

$$\mathbf{t} = \mathbf{P}^T (\mathbf{x} - \bar{\mathbf{x}})$$

$\mathbf{P}$: Principal component vector matrix, $\bar{\mathbf{x}}$: Mean

Hotelling's T² Statistic

Detects anomalies within principal component space (model variation):

$$T^2 = \mathbf{t}^T \mathbf{\Lambda}^{-1} \mathbf{t}$$

$\mathbf{\Lambda}$: Variance matrix of principal components

Upper Control Limit (UCL): 99th percentile of $\chi^2$ distribution

Squared Prediction Error (SPE)

Detects anomalies outside principal component space (residual variation):

$$SPE = \|\mathbf{x} - \hat{\mathbf{x}}\|^2 = \|\mathbf{x} - \mathbf{P}\mathbf{t} - \bar{\mathbf{x}}\|^2$$

Control Limit: Calculated from SPE distribution of normal data

5.2.2 MSPC Implementation Example

# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0
# - numpy>=1.24.0, <2.0.0
# - seaborn>=0.12.0

import numpy as np
from sklearn.decomposition import PCA
from scipy.stats import chi2, f
import matplotlib.pyplot as plt
import seaborn as sns

class MultivariateSPC:
    """
    Multivariate Statistical Process Control (MSPC)

    PCA-based multivariate anomaly detection
    Anomaly detection using Hotelling's T² and SPE statistics
    """

    def __init__(self, n_components=None, confidence_level=0.99):
        """
        Parameters:
        -----------
        n_components : int or float
            Number of principal components (absolute number if int, cumulative variance ratio if float)
        confidence_level : float
            Confidence level (for setting control limits)
        """
        self.n_components = n_components
        self.confidence_level = confidence_level
        self.pca = None
        self.T2_UCL = None
        self.SPE_UCL = None
        self.mean = None
        self.std = None

    def fit(self, X_normal):
        """
        Train on normal data

        Parameters:
        -----------
        X_normal : ndarray
            Normal operation data (n_samples, n_features)
        """
        # Standardization
        self.mean = np.mean(X_normal, axis=0)
        self.std = np.std(X_normal, axis=0)
        X_scaled = (X_normal - self.mean) / self.std

        # PCA
        self.pca = PCA(n_components=self.n_components)
        T_train = self.pca.fit_transform(X_scaled)

        # Hotelling's T² control limit
        n, p = X_normal.shape
        k = self.pca.n_components_

        # F-distribution based UCL
        self.T2_UCL = (k * (n - 1) * (n + 1)) / (n * (n - k)) * \
                      f.ppf(self.confidence_level, k, n - k)

        # SPE control limit (from SPE distribution of normal data)
        X_reconstructed = self.pca.inverse_transform(T_train)
        SPE_train = np.sum((X_scaled - X_reconstructed) ** 2, axis=1)

        # Empirical quantile
        self.SPE_UCL = np.percentile(SPE_train, self.confidence_level * 100)

        print(f"MSPC Model Trained:")
        print(f"  Number of components: {k}")
        print(f"  Explained variance: {np.sum(self.pca.explained_variance_ratio_):.4f}")
        print(f"  T² UCL: {self.T2_UCL:.4f}")
        print(f"  SPE UCL: {self.SPE_UCL:.4f}")

        return self

    def detect(self, X):
        """
        Anomaly detection

        Parameters:
        -----------
        X : ndarray
            New data (n_samples, n_features)

        Returns:
        --------
        is_anomaly : ndarray (bool)
            Anomaly flags (n_samples,)
        T2_values : ndarray
            T² statistics (n_samples,)
        SPE_values : ndarray
            SPE statistics (n_samples,)
        """
        # Standardization
        X_scaled = (X - self.mean) / self.std

        # Principal component scores
        T = self.pca.transform(X_scaled)

        # Calculate Hotelling's T²
        Lambda_inv = np.diag(1 / self.pca.explained_variance_)
        T2_values = np.sum(T @ Lambda_inv * T, axis=1)

        # Calculate SPE
        X_reconstructed = self.pca.inverse_transform(T)
        SPE_values = np.sum((X_scaled - X_reconstructed) ** 2, axis=1)

        # Anomaly detection
        is_anomaly = (T2_values > self.T2_UCL) | (SPE_values > self.SPE_UCL)

        return is_anomaly, T2_values, SPE_values

    def contribution_plot(self, x_anomaly):
        """
        Variable contribution plot for anomalies

        Visualize which variables contribute to the anomaly
        """
        x_scaled = (x_anomaly - self.mean) / self.std
        t = self.pca.transform(x_scaled.reshape(1, -1))[0]
        x_reconstructed = self.pca.inverse_transform(t.reshape(1, -1))[0]

        # SPE contribution
        spe_contribution = (x_scaled - x_reconstructed) ** 2

        # T² contribution
        Lambda_inv = np.diag(1 / self.pca.explained_variance_)
        t2_contribution = np.zeros(len(x_anomaly))

        for i in range(len(x_anomaly)):
            # Contribution of i-th variable
            x_temp = x_scaled.copy()
            x_temp[i] = 0
            t_temp = self.pca.transform(x_temp.reshape(1, -1))[0]
            t2_temp = t_temp @ Lambda_inv @ t_temp
            t2_full = t @ Lambda_inv @ t

            t2_contribution[i] = t2_full - t2_temp

        # Visualization
        fig, axes = plt.subplots(1, 2, figsize=(14, 5))

        # SPE contribution
        axes[0].bar(range(len(spe_contribution)), spe_contribution)
        axes[0].set_xlabel('Variable Index')
        axes[0].set_ylabel('SPE Contribution')
        axes[0].set_title('SPE Contribution Plot')
        axes[0].grid(True, alpha=0.3)

        # T² contribution
        axes[1].bar(range(len(t2_contribution)), t2_contribution, color='orange')
        axes[1].set_xlabel('Variable Index')
        axes[1].set_ylabel('T² Contribution')
        axes[1].set_title('T² Contribution Plot')
        axes[1].grid(True, alpha=0.3)

        plt.tight_layout()
        plt.savefig('mspc_contribution.png', dpi=300, bbox_inches='tight')
        plt.show()

        return spe_contribution, t2_contribution

    def plot_control_chart(self, T2_values, SPE_values, is_anomaly):
        """Visualize MSPC control charts"""
        fig, axes = plt.subplots(2, 1, figsize=(14, 10))

        time = np.arange(len(T2_values))

        # T² control chart
        axes[0].plot(time, T2_values, 'b-', linewidth=1, label='T²')
        axes[0].axhline(self.T2_UCL, color='r', linestyle='--',
                       linewidth=2, label='UCL')
        axes[0].scatter(time[is_anomaly], T2_values[is_anomaly],
                       color='red', s=100, zorder=5, label='Anomaly')
        axes[0].set_xlabel('Sample')
        axes[0].set_ylabel("Hotelling's T²")
        axes[0].set_title("Hotelling's T² Control Chart")
        axes[0].legend()
        axes[0].grid(True, alpha=0.3)

        # SPE control chart
        axes[1].plot(time, SPE_values, 'g-', linewidth=1, label='SPE')
        axes[1].axhline(self.SPE_UCL, color='r', linestyle='--',
                       linewidth=2, label='UCL')
        axes[1].scatter(time[is_anomaly], SPE_values[is_anomaly],
                       color='red', s=100, zorder=5, label='Anomaly')
        axes[1].set_xlabel('Sample')
        axes[1].set_ylabel('SPE (Q-statistic)')
        axes[1].set_title('SPE Control Chart')
        axes[1].legend()
        axes[1].grid(True, alpha=0.3)

        plt.tight_layout()
        plt.savefig('mspc_control_charts.png', dpi=300, bbox_inches='tight')
        plt.show()


# ========== Usage Example ==========
if __name__ == "__main__":
    np.random.seed(42)

    # Generate simulation data
    # Normal operation: 10 variables, with correlation
    n_normal = 500
    n_features = 10

    # Correlation matrix (variables are correlated)
    mean_normal = np.zeros(n_features)
    cov_normal = np.eye(n_features)
    for i in range(n_features - 1):
        cov_normal[i, i+1] = cov_normal[i+1, i] = 0.7

    X_normal = np.random.multivariate_normal(mean_normal, cov_normal, n_normal)

    # Anomaly data: mean shift in some variables
    n_anomaly = 100
    X_anomaly = np.random.multivariate_normal(mean_normal, cov_normal, n_anomaly)
    X_anomaly[:, 2] += 3  # Mean shift in variable 2
    X_anomaly[:, 5] += 2  # Mean shift in variable 5

    # Test data (normal + anomaly)
    X_test = np.vstack([X_normal[-100:], X_anomaly])
    y_true = np.array([0]*100 + [1]*100)  # 0=normal, 1=anomaly

    # MSPC training
    print("========== MSPC Training ==========")
    mspc = MultivariateSPC(n_components=0.95, confidence_level=0.99)
    mspc.fit(X_normal[:400])  # Training data

    # Anomaly detection
    print("\n========== Anomaly Detection ==========")
    is_anomaly, T2_values, SPE_values = mspc.detect(X_test)

    # Evaluation
    from sklearn.metrics import classification_report, confusion_matrix

    print("\nClassification Report:")
    print(classification_report(y_true, is_anomaly.astype(int),
                               target_names=['Normal', 'Anomaly']))

    print("\nConfusion Matrix:")
    cm = confusion_matrix(y_true, is_anomaly.astype(int))
    print(cm)

    # Detection rate
    tp = cm[1, 1]
    fn = cm[1, 0]
    detection_rate = tp / (tp + fn)
    print(f"\nDetection Rate: {detection_rate:.2%}")

    # False alarm rate
    fp = cm[0, 1]
    tn = cm[0, 0]
    false_alarm_rate = fp / (fp + tn)
    print(f"False Alarm Rate: {false_alarm_rate:.2%}")

    # Control chart visualization
    mspc.plot_control_chart(T2_values, SPE_values, is_anomaly)

    # Contribution analysis of anomaly samples
    print("\n========== Contribution Analysis ==========")
    anomaly_sample = X_test[is_anomaly][0]
    spe_contrib, t2_contrib = mspc.contribution_plot(anomaly_sample)

    print(f"Top 3 SPE Contributors:")
    top_spe = np.argsort(spe_contrib)[-3:][::-1]
    for idx in top_spe:
        print(f"  Variable {idx}: {spe_contrib[idx]:.4f}")

5.2.3 Time-Series Support with Dynamic PCA (DPCA)

Dynamic PCA considers temporal correlations in processes to achieve more accurate anomaly detection:

class DynamicPCA(MultivariateSPC):
    """
    Dynamic PCA

    Constructs time-lagged matrix to account for time-series autocorrelation
    """

    def __init__(self, n_lags=5, n_components=None, confidence_level=0.99):
        """
        Parameters:
        -----------
        n_lags : int
            Number of time lags
        """
        super().__init__(n_components, confidence_level)
        self.n_lags = n_lags

    def create_lagged_matrix(self, X):
        """
        Construct time-lagged matrix

        Concatenate X(t), X(t-1), ..., X(t-L)
        """
        n_samples, n_features = X.shape
        X_lagged = np.zeros((n_samples - self.n_lags, n_features * (self.n_lags + 1)))

        for i in range(n_samples - self.n_lags):
            lagged_sample = []
            for lag in range(self.n_lags + 1):
                lagged_sample.append(X[i + self.n_lags - lag])
            X_lagged[i] = np.concatenate(lagged_sample)

        return X_lagged

    def fit(self, X_normal):
        """Train DPCA on normal data"""
        X_lagged = self.create_lagged_matrix(X_normal)
        return super().fit(X_lagged)

    def detect(self, X):
        """DPCA anomaly detection"""
        X_lagged = self.create_lagged_matrix(X)
        return super().detect(X_lagged)


# ========== DPCA Usage Example ==========
# Generate data with time-series correlation
np.random.seed(42)
n_samples = 600
n_features = 5

# Simulate with AR(1) process
X_ts_normal = np.zeros((n_samples, n_features))
X_ts_normal[0] = np.random.randn(n_features)

for t in range(1, n_samples):
    X_ts_normal[t] = 0.8 * X_ts_normal[t-1] + np.random.randn(n_features) * 0.5

# Apply DPCA
print("\n========== Dynamic PCA ==========")
dpca = DynamicPCA(n_lags=5, n_components=0.95, confidence_level=0.99)
dpca.fit(X_ts_normal[:500])

# Test
X_ts_test = X_ts_normal[500:]
is_anomaly_dpca, T2_dpca, SPE_dpca = dpca.detect(X_ts_test)

print(f"DPCA Detected Anomalies: {np.sum(is_anomaly_dpca)} / {len(is_anomaly_dpca)}")
print(f"Anomaly Rate: {np.sum(is_anomaly_dpca) / len(is_anomaly_dpca):.2%}")

5.3 Anomaly Detection with Isolation Forest

5.3.1 Principles of Isolation Forest

Isolation Forest exploits the property that anomalous data is "easy to isolate" (can be separated with fewer splits):

Algorithm

Randomly select features and split values
Recursively split data into binary (construct Binary Tree)
Record number of splits (Tree Depth)
Calculate anomaly score from average depth across multiple trees

Anomaly Score

$$s(x, n) = 2^{-\frac{E(h(x))}{c(n)}}$$

$E(h(x))$: Average Tree depth, $c(n)$: Normalization constant

$s \approx 1$: Anomaly, $s \approx 0.5$: Normal

5.3.2 Application to Semiconductor Processes

# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0

from sklearn.ensemble import IsolationForest
from sklearn.metrics import roc_auc_score, precision_recall_curve
import matplotlib.pyplot as plt

class IsolationForestFDC:
    """
    Anomaly detection with Isolation Forest

    Detect anomalies from semiconductor process sensor data
    """

    def __init__(self, contamination=0.01, n_estimators=100, max_samples='auto'):
        """
        Parameters:
        -----------
        contamination : float
            Proportion of anomalous data (prior estimate)
        n_estimators : int
            Number of trees
        max_samples : int or 'auto'
            Number of samples per tree
        """
        self.contamination = contamination
        self.model = IsolationForest(
            contamination=contamination,
            n_estimators=n_estimators,
            max_samples=max_samples,
            random_state=42,
            n_jobs=-1
        )

    def fit(self, X_train):
        """Train (mainly on normal data)"""
        self.model.fit(X_train)
        return self

    def detect(self, X_test):
        """
        Anomaly detection

        Returns:
        --------
        predictions : ndarray
            Anomaly labels (-1: anomaly, 1: normal)
        scores : ndarray
            Anomaly scores (more negative = more anomalous)
        """
        predictions = self.model.predict(X_test)
        scores = self.model.score_samples(X_test)

        # Convert -1 (anomaly) to 1, 1 (normal) to 0
        is_anomaly = (predictions == -1)

        return is_anomaly, scores

    def plot_anomaly_score_distribution(self, scores_normal, scores_anomaly):
        """Visualize anomaly score distribution"""
        plt.figure(figsize=(10, 6))

        plt.hist(scores_normal, bins=50, alpha=0.6, label='Normal', color='blue')
        plt.hist(scores_anomaly, bins=50, alpha=0.6, label='Anomaly', color='red')
        plt.xlabel('Anomaly Score')
        plt.ylabel('Frequency')
        plt.title('Isolation Forest Anomaly Score Distribution')
        plt.legend()
        plt.grid(True, alpha=0.3)

        plt.savefig('isolation_forest_score_dist.png', dpi=300, bbox_inches='tight')
        plt.show()

    def plot_roc_and_pr_curves(self, y_true, scores):
        """ROC curve and Precision-Recall curve"""
        from sklearn.metrics import roc_curve, auc

        fig, axes = plt.subplots(1, 2, figsize=(14, 6))

        # ROC Curve
        fpr, tpr, _ = roc_curve(y_true, -scores)  # Negative scores for anomalies
        roc_auc = auc(fpr, tpr)

        axes[0].plot(fpr, tpr, linewidth=2, label=f'ROC (AUC = {roc_auc:.3f})')
        axes[0].plot([0, 1], [0, 1], 'k--', linewidth=1)
        axes[0].set_xlabel('False Positive Rate')
        axes[0].set_ylabel('True Positive Rate')
        axes[0].set_title('ROC Curve')
        axes[0].legend()
        axes[0].grid(True, alpha=0.3)

        # Precision-Recall Curve
        precision, recall, _ = precision_recall_curve(y_true, -scores)

        axes[1].plot(recall, precision, linewidth=2, label='PR Curve')
        axes[1].set_xlabel('Recall')
        axes[1].set_ylabel('Precision')
        axes[1].set_title('Precision-Recall Curve')
        axes[1].legend()
        axes[1].grid(True, alpha=0.3)

        plt.tight_layout()
        plt.savefig('isolation_forest_performance.png', dpi=300, bbox_inches='tight')
        plt.show()

        return roc_auc


# ========== Usage Example ==========
if __name__ == "__main__":
    np.random.seed(42)

    # Simulation data
    # Normal data: multivariate normal distribution
    n_normal = 1000
    n_features = 20

    X_normal = np.random.randn(n_normal, n_features)

    # Anomaly data: outliers
    n_anomaly = 50
    X_anomaly = np.random.randn(n_anomaly, n_features) * 3 + 5

    # Training and test data
    X_train = X_normal[:800]
    X_test = np.vstack([X_normal[800:], X_anomaly])
    y_test = np.array([0]*200 + [1]*50)  # 0=normal, 1=anomaly

    # Isolation Forest training
    print("========== Isolation Forest Training ==========")
    if_fdc = IsolationForestFDC(contamination=0.05, n_estimators=100)
    if_fdc.fit(X_train)

    # Anomaly detection
    print("\n========== Anomaly Detection ==========")
    is_anomaly, scores = if_fdc.detect(X_test)

    # Evaluation
    print("\nClassification Report:")
    print(classification_report(y_test, is_anomaly.astype(int),
                               target_names=['Normal', 'Anomaly']))

    # AUC-ROC
    roc_auc = roc_auc_score(y_test, -scores)
    print(f"\nAUC-ROC: {roc_auc:.4f}")

    # Visualization
    scores_normal_test = scores[y_test == 0]
    scores_anomaly_test = scores[y_test == 1]

    if_fdc.plot_anomaly_score_distribution(scores_normal_test, scores_anomaly_test)
    if_fdc.plot_roc_and_pr_curves(y_test, scores)

    print("\n========== Feature Importance Analysis ==========")
    # Feature Importance (features with large variation in anomalous samples)
    anomaly_samples = X_test[y_test == 1]
    normal_samples = X_test[y_test == 0]

    feature_std_anomaly = np.std(anomaly_samples, axis=0)
    feature_std_normal = np.std(normal_samples, axis=0)
    importance = feature_std_anomaly / (feature_std_normal + 1e-6)

    top_features = np.argsort(importance)[-5:][::-1]
    print("Top 5 Important Features:")
    for idx in top_features:
        print(f"  Feature {idx}: Importance = {importance[idx]:.4f}")

5.4 Time-Series Anomaly Detection with LSTM

5.4.1 Principles of LSTM Autoencoder

Long Short-Term Memory (LSTM) is a type of RNN that can learn long-term dependencies in time-series data. It learns normal patterns with an Autoencoder structure and detects anomalies through reconstruction error:

5.4.2 LSTM-AE Implementation

# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0
# - numpy>=1.24.0, <2.0.0
# - tensorflow>=2.13.0, <2.16.0

import tensorflow as tf
from tensorflow.keras import layers, models
import numpy as np
import matplotlib.pyplot as plt

class LSTMAutoencoderFDC:
    """
    Time-series anomaly detection with LSTM Autoencoder

    Learn normal patterns from sensor time-series data and
    detect anomalous time series
    """

    def __init__(self, sequence_length=50, n_features=10, latent_dim=20):
        """
        Parameters:
        -----------
        sequence_length : int
            Length of time series
        n_features : int
            Number of features (number of sensors)
        latent_dim : int
            Dimension of latent space
        """
        self.sequence_length = sequence_length
        self.n_features = n_features
        self.latent_dim = latent_dim
        self.autoencoder = None
        self.threshold = None

    def build_model(self):
        """Build LSTM Autoencoder"""
        # Encoder
        encoder_inputs = layers.Input(shape=(self.sequence_length, self.n_features))

        # LSTM Encoder
        x = layers.LSTM(64, activation='relu', return_sequences=True)(encoder_inputs)
        x = layers.LSTM(32, activation='relu', return_sequences=False)(x)
        latent = layers.Dense(self.latent_dim, activation='relu', name='latent')(x)

        encoder = models.Model(encoder_inputs, latent, name='encoder')

        # Decoder
        decoder_inputs = layers.Input(shape=(self.latent_dim,))

        # Restore time-series dimension with RepeatVector
        x = layers.RepeatVector(self.sequence_length)(decoder_inputs)

        # LSTM Decoder
        x = layers.LSTM(32, activation='relu', return_sequences=True)(x)
        x = layers.LSTM(64, activation='relu', return_sequences=True)(x)

        # Output layer
        decoder_outputs = layers.TimeDistributed(
            layers.Dense(self.n_features)
        )(x)

        decoder = models.Model(decoder_inputs, decoder_outputs, name='decoder')

        # Autoencoder
        autoencoder_outputs = decoder(encoder(encoder_inputs))
        autoencoder = models.Model(encoder_inputs, autoencoder_outputs,
                                   name='lstm_autoencoder')

        autoencoder.compile(optimizer='adam', loss='mse')

        self.autoencoder = autoencoder
        self.encoder = encoder
        self.decoder = decoder

        return autoencoder

    def train(self, X_normal, epochs=50, batch_size=32, validation_split=0.2):
        """
        Train on normal time-series data

        Parameters:
        -----------
        X_normal : ndarray
            Normal data (n_samples, sequence_length, n_features)
        """
        if self.autoencoder is None:
            self.build_model()

        callbacks = [
            tf.keras.callbacks.EarlyStopping(
                monitor='val_loss',
                patience=10,
                restore_best_weights=True
            ),
            tf.keras.callbacks.ReduceLROnPlateau(
                monitor='val_loss',
                factor=0.5,
                patience=5,
                min_lr=1e-7
            )
        ]

        history = self.autoencoder.fit(
            X_normal, X_normal,  # Self-supervised
            epochs=epochs,
            batch_size=batch_size,
            validation_split=validation_split,
            callbacks=callbacks,
            verbose=1
        )

        return history

    def calculate_reconstruction_errors(self, X):
        """
        Calculate reconstruction errors

        Returns:
        --------
        errors : ndarray
            MSE for each sample (n_samples,)
        """
        X_reconstructed = self.autoencoder.predict(X, verbose=0)
        errors = np.mean((X - X_reconstructed) ** 2, axis=(1, 2))

        return errors

    def set_threshold(self, X_normal, percentile=99):
        """Set anomaly detection threshold"""
        errors = self.calculate_reconstruction_errors(X_normal)
        self.threshold = np.percentile(errors, percentile)

        print(f"Threshold set: {self.threshold:.6f} "
              f"({percentile}th percentile of normal data)")

        return self.threshold

    def detect_anomalies(self, X):
        """Anomaly detection"""
        if self.threshold is None:
            raise ValueError("Threshold not set. Run set_threshold() first.")

        errors = self.calculate_reconstruction_errors(X)
        is_anomaly = errors > self.threshold

        return is_anomaly, errors

    def visualize_reconstruction(self, X_sample, sample_idx=0):
        """Visualize reconstruction results"""
        X_recon = self.autoencoder.predict(X_sample[sample_idx:sample_idx+1], verbose=0)[0]
        original = X_sample[sample_idx]

        fig, axes = plt.subplots(self.n_features, 1,
                                figsize=(12, 2 * self.n_features))

        time_steps = np.arange(self.sequence_length)

        for i in range(self.n_features):
            axes[i].plot(time_steps, original[:, i], 'b-',
                        linewidth=2, label='Original')
            axes[i].plot(time_steps, X_recon[:, i], 'r--',
                        linewidth=2, label='Reconstructed')
            axes[i].set_ylabel(f'Feature {i}')
            axes[i].legend()
            axes[i].grid(True, alpha=0.3)

        axes[-1].set_xlabel('Time Step')
        plt.suptitle('LSTM-AE Reconstruction')
        plt.tight_layout()
        plt.savefig('lstm_ae_reconstruction.png', dpi=300, bbox_inches='tight')
        plt.show()


# ========== Usage Example ==========
if __name__ == "__main__":
    np.random.seed(42)
    tf.random.set_seed(42)

    # Generate time-series data
    sequence_length = 50
    n_features = 5
    n_normal = 500
    n_anomaly = 100

    # Normal time series: sine wave + noise
    X_normal = np.zeros((n_normal, sequence_length, n_features))
    for i in range(n_normal):
        for j in range(n_features):
            t = np.linspace(0, 4*np.pi, sequence_length)
            X_normal[i, :, j] = np.sin(t + j * np.pi/4) + np.random.randn(sequence_length) * 0.1

    # Anomalous time series: sudden spikes
    X_anomaly = np.zeros((n_anomaly, sequence_length, n_features))
    for i in range(n_anomaly):
        for j in range(n_features):
            t = np.linspace(0, 4*np.pi, sequence_length)
            signal = np.sin(t + j * np.pi/4)
            # Spike at random position
            spike_pos = np.random.randint(10, 40)
            signal[spike_pos:spike_pos+5] += 3
            X_anomaly[i, :, j] = signal + np.random.randn(sequence_length) * 0.1

    # Train/test split
    X_train = X_normal[:400]
    X_test = np.vstack([X_normal[400:], X_anomaly])
    y_test = np.array([0]*100 + [1]*100)

    # Build and train LSTM-AE
    print("========== LSTM Autoencoder Training ==========")
    lstm_ae = LSTMAutoencoderFDC(
        sequence_length=sequence_length,
        n_features=n_features,
        latent_dim=10
    )
    lstm_ae.build_model()

    print("\nModel Architecture:")
    lstm_ae.autoencoder.summary()

    history = lstm_ae.train(X_train, epochs=30, batch_size=32)

    # Set threshold
    print("\n========== Setting Threshold ==========")
    lstm_ae.set_threshold(X_normal[400:450], percentile=99)

    # Anomaly detection
    print("\n========== Anomaly Detection ==========")
    is_anomaly, errors = lstm_ae.detect_anomalies(X_test)

    # Evaluation
    print("\nClassification Report:")
    print(classification_report(y_test, is_anomaly.astype(int),
                               target_names=['Normal', 'Anomaly']))

    # AUC-ROC
    auc_score = roc_auc_score(y_test, errors)
    print(f"\nAUC-ROC: {auc_score:.4f}")

    # Visualize reconstruction results
    print("\n========== Reconstruction Visualization ==========")
    # Normal sample
    lstm_ae.visualize_reconstruction(X_test[y_test == 0], sample_idx=0)
    # Anomalous sample
    lstm_ae.visualize_reconstruction(X_test[y_test == 1], sample_idx=0)

    # Error distribution
    plt.figure(figsize=(10, 6))
    plt.hist(errors[y_test == 0], bins=50, alpha=0.6, label='Normal')
    plt.hist(errors[y_test == 1], bins=50, alpha=0.6, label='Anomaly')
    plt.axvline(lstm_ae.threshold, color='r', linestyle='--',
               linewidth=2, label='Threshold')
    plt.xlabel('Reconstruction Error')
    plt.ylabel('Frequency')
    plt.title('LSTM-AE Reconstruction Error Distribution')
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.savefig('lstm_ae_error_distribution.png', dpi=300, bbox_inches='tight')
    plt.show()

5.5 Summary

In this chapter, we learned AI implementation methods for Fault Detection & Classification (FDC) in semiconductor manufacturing:

Key Learning Content

1. Multivariate SPC (MSPC)

Dimensionality reduction with PCA captures multivariate correlations
Hotelling's T² & SPE detect two types of anomalies
Contribution Plot identifies anomalous variables
Dynamic PCA handles time-series correlations

2. Isolation Forest

Unsupervised learning detects unknown anomalies
Fast and scalable (handles 1 million samples)
Anomaly scores for prioritization
AUC-ROC > 0.95 achieving high accuracy

3. LSTM Autoencoder

Time-series pattern learning detects anomalous waveforms
Reconstruction error-based detection
Long-term dependencies captured (50+ steps)
Visualization clearly shows anomalous locations

Practical Results

Anomaly detection rate: 95% or higher (conventional 70%)
False positive rate: 5% or lower (conventional 20%)
Detection time: 0.1 seconds or less (real-time capable)
Downtime reduction: Hundreds of millions of yen in annual cost savings

Series Overall Summary

In this series "Semiconductor Manufacturing AI", we learned AI technologies across the entire semiconductor manufacturing process:

Chapter 1: Statistical Control of Wafer Processes

Run-to-Run control, Virtual Metrology

Chapter 2: AI-based Defect Inspection and AOI

CNN classification, U-Net segmentation, Autoencoder anomaly detection

Chapter 3: Yield Improvement and Parameter Optimization

Bayesian Optimization, NSGA-II multi-objective optimization

Chapter 4: Advanced Process Control

Model Predictive Control (MPC), DQN reinforcement learning control

Chapter 5: Fault Detection & Classification

MSPC, Isolation Forest, LSTM-AE time-series anomaly detection

Future Prospects

Digital Twin: Real-time simulation of entire processes
Explainable AI: Decision transparency using SHAP
Federated Learning: Knowledge sharing across multiple Fabs
Edge AI: Real-time AI inference within equipment
Autonomous Manufacturing: Fully automated optimization with AI

References

Montgomery, D. C. (2019). Design and Analysis of Experiments (9th ed.). Wiley.
Box, G. E. P., Hunter, J. S., & Hunter, W. G. (2005). Statistics for Experimenters: Design, Innovation, and Discovery (2nd ed.). Wiley.
Seborg, D. E., Edgar, T. F., Mellichamp, D. A., & Doyle III, F. J. (2016). Process Dynamics and Control (4th ed.). Wiley.
McKay, M. D., Beckman, R. J., & Conover, W. J. (2000). "A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output from a Computer Code." Technometrics, 42(1), 55-61.

Disclaimer

This content is provided solely for educational, research, and informational purposes and does not constitute professional advice (legal, accounting, technical warranty, etc.).
This content and accompanying code examples are provided "AS IS" without any warranty, express or implied, including but not limited to merchantability, fitness for a particular purpose, non-infringement, accuracy, completeness, operation, or safety.
The author and Tohoku University assume no responsibility for the content, availability, or safety of external links, third-party data, tools, libraries, etc.
To the maximum extent permitted by applicable law, the author and Tohoku University shall not be liable for any direct, indirect, incidental, special, consequential, or punitive damages arising from the use, execution, or interpretation of this content.
The content may be changed, updated, or discontinued without notice.
The copyright and license of this content are subject to the stated conditions (e.g., CC BY 4.0). Such licenses typically include no-warranty clauses.