Chapter 4 Advanced Process Control (APC)

This chapter covers advanced topics in Chapter 4 Advanced Process Control (APC). You will master Build process simulations using digital twins.

Semiconductor Manufacturing AI - Model Predictive Control, Adaptive Control, Digital Twin, Reinforcement Learning

Learning Objectives

Master the theory and implementation methods of Model Predictive Control (MPC)
Understand adaptive control techniques for automatic response to process variations
Build process simulations using digital twins
Learn controller training methods using reinforcement learning (DQN, PPO)
Acquire practical techniques for APC system implementation and real-time control

4.1 Overview of Advanced Process Control (APC)

4.1.1 Role and Importance of APC

In semiconductor manufacturing, various disturbances such as equipment aging, environmental fluctuations, and raw material lot variations affect the process. APC is an advanced control system that compensates for these disturbances and maintains stable target values:

Multivariable Control: Simultaneous control of multiple inputs and outputs
Predictive Control: Optimal control by predicting the future with process models
Adaptive Control: Automatic response to process characteristic changes
Constraint Handling: Strict adherence to safety and performance ranges

4.1.2 Limitations of Conventional PID Control

Single-variable Control: Cannot consider interactions between variables

Reactive Control: Correction after error occurs (reactive approach)

Difficulty in Constraint Handling: Difficult to explicitly handle physical and performance constraints

Lack of Optimality: Cannot incorporate optimization objectives such as energy minimization

4.1.3 Advantages of AI-based APC

Multi-objective Optimization: Simultaneous optimization of quality, cost, and throughput
Learning Capability: Automatic learning of control laws from historical data
Robustness: Strong resistance to model errors and disturbances
Real-time Performance: High-speed computation utilizing GPUs

4.2 Model Predictive Control (MPC)

4.2.1 Principles of MPC

MPC predicts future behavior with a process model and calculates control input sequences that minimize a performance index:

Prediction Model

$$x_{k+1} = f(x_k, u_k)$$

$x_k$: state, $u_k$: control input

Cost Function (Prediction Horizon N)

$$J = \sum_{i=0}^{N-1} \left[ \|y_{k+i} - r_{k+i}\|_Q^2 + \|u_{k+i}\|_R^2 \right]$$

$y$: output, $r$: target value, $Q, R$: weighting matrices

Constraints

$$u_{\min} \leq u_k \leq u_{\max}$$

$$y_{\min} \leq y_k \leq y_{\max}$$

Optimization Problem

At each time step, find the control input sequence $\{u_k, u_{k+1}, \ldots, u_{k+N-1}\}$ that minimizes the above cost function, and apply only the first $u_k$ (Receding Horizon)

4.2.2 CVD Process MPC Implementation

We implement film thickness control in Chemical Vapor Deposition (CVD) using MPC:

# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0
# - numpy>=1.24.0, <2.0.0

import numpy as np
from scipy.optimize import minimize
import matplotlib.pyplot as plt

class ModelPredictiveController:
    """
    Model Predictive Control (MPC) for CVD process

    Control objective: Track film thickness to target value
    Control variables: Gas flow rate, RF power, pressure
    State variables: Film thickness, deposition rate
    """

    def __init__(self, prediction_horizon=10, control_horizon=5, dt=1.0):
        """
        Parameters:
        -----------
        prediction_horizon : int
            Prediction horizon N
        control_horizon : int
            Control horizon M (M ≤ N)
        dt : float
            Sampling time (seconds)
        """
        self.N = prediction_horizon
        self.M = control_horizon
        self.dt = dt

        # State-space model parameters
        # x = [film thickness (nm), deposition rate (nm/s)]
        # u = [gas flow (sccm), RF power (W), pressure (mTorr)]
        self.A = np.array([
            [1, self.dt],
            [0, 0.95]
        ])

        self.B = np.array([
            [0, 0, 0],
            [0.01, 0.02, -0.005]
        ])

        # Output matrix (only film thickness observed)
        self.C = np.array([[1, 0]])

        # Weighting matrices
        self.Q = np.diag([100, 1])  # State cost
        self.R = np.diag([0.1, 0.1, 0.1])  # Control input cost

        # Constraints
        self.u_min = np.array([50, 100, 10])
        self.u_max = np.array([200, 400, 100])
        self.y_min = 0
        self.y_max = 200  # Film thickness upper limit (nm)

    def predict(self, x0, u_sequence):
        """
        State prediction

        Parameters:
        -----------
        x0 : ndarray
            Initial state (2,)
        u_sequence : ndarray
            Control input sequence (M, 3)

        Returns:
        --------
        x_pred : ndarray
            Predicted state trajectory (N+1, 2)
        y_pred : ndarray
            Predicted output trajectory (N+1,)
        """
        x_pred = np.zeros((self.N + 1, 2))
        y_pred = np.zeros(self.N + 1)

        x_pred[0] = x0
        y_pred[0] = self.C @ x0

        for k in range(self.N):
            if k < self.M:
                u_k = u_sequence[k]
            else:
                # Hold last input after control horizon
                u_k = u_sequence[self.M - 1]

            # State transition
            x_pred[k + 1] = self.A @ x_pred[k] + self.B @ u_k
            y_pred[k + 1] = self.C @ x_pred[k + 1]

        return x_pred, y_pred

    def cost_function(self, u_flat, x0, r_sequence):
        """
        Cost function

        Parameters:
        -----------
        u_flat : ndarray
            Flattened control input sequence (M*3,)
        x0 : ndarray
            Current state
        r_sequence : ndarray
            Target value sequence (N+1,)
        """
        # Restore control input
        u_sequence = u_flat.reshape((self.M, 3))

        # Prediction
        x_pred, y_pred = self.predict(x0, u_sequence)

        # Cost calculation
        cost = 0.0

        # Tracking error
        for k in range(self.N + 1):
            error = y_pred[k] - r_sequence[k]
            cost += error ** 2 * self.Q[0, 0]

        # Control input cost
        for k in range(self.M):
            cost += u_sequence[k] @ self.R @ u_sequence[k]

        # Control input change cost (smooth control)
        for k in range(1, self.M):
            du = u_sequence[k] - u_sequence[k - 1]
            cost += 0.1 * (du @ du)

        return cost

    def solve_mpc(self, x0, r_sequence, u_prev):
        """
        Solve MPC optimization problem

        Parameters:
        -----------
        x0 : ndarray
            Current state
        r_sequence : ndarray
            Target value sequence (N+1,)
        u_prev : ndarray
            Previous control input (3,)

        Returns:
        --------
        u_opt : ndarray
            Optimal control input (3,)
        """
        # Initial guess (hold previous input)
        u0 = np.tile(u_prev, self.M)

        # Constraints
        bounds = []
        for _ in range(self.M):
            for i in range(3):
                bounds.append((self.u_min[i], self.u_max[i]))

        # Optimization
        result = minimize(
            fun=lambda u: self.cost_function(u, x0, r_sequence),
            x0=u0,
            method='SLSQP',
            bounds=bounds,
            options={'maxiter': 100, 'ftol': 1e-6}
        )

        # Optimal control input (use only first step)
        u_opt_sequence = result.x.reshape((self.M, 3))
        u_opt = u_opt_sequence[0]

        return u_opt

    def simulate_closed_loop(self, x0, r_trajectory, n_steps):
        """
        Closed-loop simulation

        Parameters:
        -----------
        x0 : ndarray
            Initial state
        r_trajectory : ndarray
            Target value trajectory (n_steps,)
        n_steps : int
            Number of simulation steps

        Returns:
        --------
        results : dict
            Simulation results
        """
        # History storage
        x_history = np.zeros((n_steps + 1, 2))
        y_history = np.zeros(n_steps + 1)
        u_history = np.zeros((n_steps, 3))
        r_history = np.zeros(n_steps + 1)

        x_history[0] = x0
        y_history[0] = self.C @ x0
        r_history[0] = r_trajectory[0]

        u_prev = np.array([125, 250, 55])  # Initial control input

        for k in range(n_steps):
            # Target value sequence (for prediction horizon)
            r_sequence = np.zeros(self.N + 1)
            for i in range(self.N + 1):
                if k + i < n_steps:
                    r_sequence[i] = r_trajectory[k + i]
                else:
                    r_sequence[i] = r_trajectory[-1]

            # MPC optimization
            u_opt = self.solve_mpc(x_history[k], r_sequence, u_prev)
            u_history[k] = u_opt

            # Apply to process (actual process includes disturbances)
            noise = np.random.normal(0, 0.1, 2)  # Process noise
            x_history[k + 1] = self.A @ x_history[k] + self.B @ u_opt + noise
            y_history[k + 1] = self.C @ x_history[k + 1]
            r_history[k + 1] = r_trajectory[k + 1] if k + 1 < n_steps else r_trajectory[-1]

            u_prev = u_opt

        results = {
            'x': x_history,
            'y': y_history,
            'u': u_history,
            'r': r_history,
            'time': np.arange(n_steps + 1) * self.dt
        }

        return results

    def plot_results(self, results):
        """Result visualization"""
        fig, axes = plt.subplots(2, 2, figsize=(14, 10))

        time = results['time']
        y = results['y']
        r = results['r']
        u = results['u']

        # Film thickness tracking
        axes[0, 0].plot(time, y, 'b-', linewidth=2, label='Actual Thickness')
        axes[0, 0].plot(time, r, 'r--', linewidth=2, label='Target Thickness')
        axes[0, 0].set_xlabel('Time (s)')
        axes[0, 0].set_ylabel('Thickness (nm)')
        axes[0, 0].set_title('Film Thickness Tracking')
        axes[0, 0].legend()
        axes[0, 0].grid(True, alpha=0.3)

        # Tracking error
        error = y - r
        axes[0, 1].plot(time, error, 'g-', linewidth=2)
        axes[0, 1].axhline(0, color='k', linestyle='--', alpha=0.3)
        axes[0, 1].set_xlabel('Time (s)')
        axes[0, 1].set_ylabel('Error (nm)')
        axes[0, 1].set_title('Tracking Error')
        axes[0, 1].grid(True, alpha=0.3)

        # Control inputs
        axes[1, 0].plot(time[:-1], u[:, 0], label='Gas Flow (sccm)')
        axes[1, 0].plot(time[:-1], u[:, 1], label='RF Power (W)')
        axes[1, 0].set_xlabel('Time (s)')
        axes[1, 0].set_ylabel('Control Input')
        axes[1, 0].set_title('Control Inputs (Gas & RF)')
        axes[1, 0].legend()
        axes[1, 0].grid(True, alpha=0.3)

        axes[1, 1].plot(time[:-1], u[:, 2], 'purple', label='Pressure (mTorr)')
        axes[1, 1].set_xlabel('Time (s)')
        axes[1, 1].set_ylabel('Pressure (mTorr)')
        axes[1, 1].set_title('Control Input (Pressure)')
        axes[1, 1].legend()
        axes[1, 1].grid(True, alpha=0.3)

        plt.tight_layout()
        plt.savefig('mpc_control_results.png', dpi=300, bbox_inches='tight')
        plt.show()


# ========== Usage Example ==========
if __name__ == "__main__":
    np.random.seed(42)

    # MPC configuration
    mpc = ModelPredictiveController(
        prediction_horizon=10,
        control_horizon=5,
        dt=1.0
    )

    # Initial state [film thickness, deposition rate]
    x0 = np.array([0.0, 0.0])

    # Target value trajectory (step response + ramp)
    n_steps = 100
    r_trajectory = np.zeros(n_steps)
    r_trajectory[:30] = 50  # 50nm
    r_trajectory[30:60] = 100  # 100nm
    r_trajectory[60:] = np.linspace(100, 150, 40)  # Ramp

    # Closed-loop simulation
    print("========== MPC Closed-Loop Simulation ==========")
    results = mpc.simulate_closed_loop(x0, r_trajectory, n_steps)

    # Performance evaluation
    tracking_error = results['y'] - results['r']
    mae = np.mean(np.abs(tracking_error))
    rmse = np.sqrt(np.mean(tracking_error ** 2))

    print(f"\nTracking Performance:")
    print(f"  MAE (Mean Absolute Error): {mae:.4f} nm")
    print(f"  RMSE (Root Mean Squared Error): {rmse:.4f} nm")

    # Control input statistics
    print(f"\nControl Input Statistics:")
    print(f"  Gas Flow: {np.mean(results['u'][:, 0]):.2f} ± {np.std(results['u'][:, 0]):.2f} sccm")
    print(f"  RF Power: {np.mean(results['u'][:, 1]):.2f} ± {np.std(results['u'][:, 1]):.2f} W")
    print(f"  Pressure: {np.mean(results['u'][:, 2]):.2f} ± {np.std(results['u'][:, 2]):.2f} mTorr")

    # Visualization
    mpc.plot_results(results)

4.2.3 Nonlinear MPC and Neural Network Model

For complex nonlinear processes, neural networks are used as process models:

# Requirements:
# - Python 3.9+
# - tensorflow>=2.13.0, <2.16.0

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

class NeuralNetworkMPC:
    """
    Neural Network-based MPC

    Model complex nonlinear processes with NN,
    Execute MPC optimization using gradient methods
    """

    def __init__(self, state_dim=2, control_dim=3, prediction_horizon=10):
        """
        Parameters:
        -----------
        state_dim : int
            State dimension
        control_dim : int
            Control input dimension
        prediction_horizon : int
            Prediction horizon
        """
        self.state_dim = state_dim
        self.control_dim = control_dim
        self.N = prediction_horizon

        # Neural Network Process Model
        self.process_model = self._build_process_model()

    def _build_process_model(self):
        """
        Build process model NN

        Input: [x_k, u_k] (concat)
        Output: x_{k+1}
        """
        inputs = layers.Input(shape=(self.state_dim + self.control_dim,))

        x = layers.Dense(64, activation='relu')(inputs)
        x = layers.BatchNormalization()(x)
        x = layers.Dense(64, activation='relu')(x)
        x = layers.BatchNormalization()(x)
        x = layers.Dense(32, activation='relu')(x)

        outputs = layers.Dense(self.state_dim)(x)

        model = keras.Model(inputs, outputs, name='process_model')
        model.compile(optimizer='adam', loss='mse')

        return model

    def train_process_model(self, X_train, y_train, epochs=50):
        """
        Train process model

        Parameters:
        -----------
        X_train : ndarray
            Training data [x_k, u_k] (N, state_dim + control_dim)
        y_train : ndarray
            Labels x_{k+1} (N, state_dim)
        """
        history = self.process_model.fit(
            X_train, y_train,
            validation_split=0.2,
            epochs=epochs,
            batch_size=32,
            verbose=0
        )

        return history

    def predict_trajectory(self, x0, u_sequence):
        """
        Trajectory prediction with NN process model

        Parameters:
        -----------
        x0 : ndarray
            Initial state (state_dim,)
        u_sequence : ndarray
            Control input sequence (N, control_dim)

        Returns:
        --------
        x_trajectory : ndarray
            Predicted state trajectory (N+1, state_dim)
        """
        x_trajectory = np.zeros((self.N + 1, self.state_dim))
        x_trajectory[0] = x0

        for k in range(self.N):
            xu_k = np.concatenate([x_trajectory[k], u_sequence[k]]).reshape(1, -1)
            x_trajectory[k + 1] = self.process_model.predict(xu_k, verbose=0)[0]

        return x_trajectory

    def mpc_optimization(self, x0, r_sequence):
        """
        MPC optimization using TensorFlow automatic differentiation

        Parameters:
        -----------
        x0 : ndarray
            Current state
        r_sequence : ndarray
            Target value sequence (N+1,)

        Returns:
        --------
        u_opt : ndarray
            Optimal control input sequence (N, control_dim)
        """
        # Initial control input
        u_var = tf.Variable(
            np.random.uniform(50, 200, (self.N, self.control_dim)),
            dtype=tf.float32
        )

        optimizer = tf.keras.optimizers.Adam(learning_rate=0.1)

        # Optimization loop
        for iteration in range(50):
            with tf.GradientTape() as tape:
                # Prediction
                x_pred = tf.constant(x0, dtype=tf.float32)
                cost = 0.0

                for k in range(self.N):
                    # State transition
                    xu_k = tf.concat([x_pred, u_var[k]], axis=0)
                    xu_k = tf.reshape(xu_k, (1, -1))
                    x_pred = self.process_model(xu_k, training=False)[0]

                    # Tracking error cost
                    error = x_pred[0] - r_sequence[k + 1]  # Film thickness error
                    cost += 100 * error ** 2

                    # Control input cost
                    cost += 0.01 * tf.reduce_sum(u_var[k] ** 2)

            # Gradient calculation and update
            gradients = tape.gradient(cost, [u_var])
            optimizer.apply_gradients(zip(gradients, [u_var]))

        u_opt = u_var.numpy()

        return u_opt


# ========== Usage Example ==========
# Generate dummy data for process model training
np.random.seed(42)
n_samples = 5000

X_train = np.random.randn(n_samples, 5)  # [x1, x2, u1, u2, u3]
# Dummy nonlinear process
y_train = np.zeros((n_samples, 2))
y_train[:, 0] = X_train[:, 0] + 0.1 * X_train[:, 2] + 0.02 * X_train[:, 3]
y_train[:, 1] = 0.95 * X_train[:, 1] + 0.01 * X_train[:, 2]

# Build and train NN-MPC
nn_mpc = NeuralNetworkMPC(state_dim=2, control_dim=3, prediction_horizon=10)
print("\n========== Training NN Process Model ==========")
history = nn_mpc.train_process_model(X_train, y_train, epochs=30)

print(f"Training Loss: {history.history['loss'][-1]:.6f}")
print(f"Validation Loss: {history.history['val_loss'][-1]:.6f}")

# MPC optimization
x0_nn = np.array([0.0, 0.0])
r_sequence_nn = np.full(11, 100.0)

print("\n========== NN-MPC Optimization ==========")
u_opt_nn = nn_mpc.mpc_optimization(x0_nn, r_sequence_nn)

print(f"Optimal Control Sequence (first 3 steps):")
for k in range(3):
    print(f"  Step {k}: u = {u_opt_nn[k]}")

4.3 Controller Learning with Reinforcement Learning

4.3.1 Concept of Reinforcement Learning APC

Reinforcement Learning (RL) learns optimal control laws through trial and error:

Model-free: No process model required (learn directly from actual data)
Adaptability: Automatic adaptation to process changes
Optimality: Maximize long-term rewards
Nonlinear Control: Handle complex nonlinear processes

4.3.2 Discrete Control with DQN (Deep Q-Network)

We learn discrete control action selection (e.g., power level Low/Medium/High) using DQN:

# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0
# - tensorflow>=2.13.0, <2.16.0

import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from collections import deque
import random

class DQNController:
    """
    DQN (Deep Q-Network) based controller

    Learn discrete control of CVD process
    Actions: Increase/maintain/decrease gas flow, RF power, and pressure
    """

    def __init__(self, state_dim=4, action_dim=27, learning_rate=0.001):
        """
        Parameters:
        -----------
        state_dim : int
            State dimension [thickness, rate, target_thickness, error]
        action_dim : int
            Number of actions (3 variables × 3 levels = 27 patterns)
        """
        self.state_dim = state_dim
        self.action_dim = action_dim
        self.learning_rate = learning_rate

        # Hyperparameters
        self.gamma = 0.99  # Discount factor
        self.epsilon = 1.0  # ε-greedy initial value
        self.epsilon_min = 0.01
        self.epsilon_decay = 0.995
        self.batch_size = 64
        self.memory = deque(maxlen=10000)

        # Q-Network
        self.q_network = self._build_network()
        self.target_network = self._build_network()
        self.update_target_network()

    def _build_network(self):
        """Build Q-Network"""
        inputs = layers.Input(shape=(self.state_dim,))

        x = layers.Dense(128, activation='relu')(inputs)
        x = layers.Dense(128, activation='relu')(x)
        x = layers.Dense(64, activation='relu')(x)

        # Q-value output (Q value for each action)
        q_values = layers.Dense(self.action_dim, activation='linear')(x)

        model = keras.Model(inputs, q_values)
        model.compile(
            optimizer=keras.optimizers.Adam(learning_rate=self.learning_rate),
            loss='mse'
        )

        return model

    def update_target_network(self):
        """Update Target Network weights"""
        self.target_network.set_weights(self.q_network.get_weights())

    def select_action(self, state):
        """
        Select action with ε-greedy policy

        Parameters:
        -----------
        state : ndarray
            Current state (state_dim,)

        Returns:
        --------
        action : int
            Selected action (0 ~ action_dim-1)
        """
        if np.random.rand() < self.epsilon:
            # Random action (exploration)
            return np.random.randint(self.action_dim)
        else:
            # Action with maximum Q value (exploitation)
            q_values = self.q_network.predict(state.reshape(1, -1), verbose=0)[0]
            return np.argmax(q_values)

    def remember(self, state, action, reward, next_state, done):
        """Store experience in memory"""
        self.memory.append((state, action, reward, next_state, done))

    def replay(self):
        """
        Learn with Experience Replay

        Update Q-Network by random sampling from memory
        """
        if len(self.memory) < self.batch_size:
            return

        # Mini-batch sampling
        minibatch = random.sample(self.memory, self.batch_size)

        states = np.array([exp[0] for exp in minibatch])
        actions = np.array([exp[1] for exp in minibatch])
        rewards = np.array([exp[2] for exp in minibatch])
        next_states = np.array([exp[3] for exp in minibatch])
        dones = np.array([exp[4] for exp in minibatch])

        # Current Q values
        q_values = self.q_network.predict(states, verbose=0)

        # Target Q values (Double DQN)
        next_q_values = self.target_network.predict(next_states, verbose=0)

        # Bellman update
        for i in range(self.batch_size):
            if dones[i]:
                q_values[i, actions[i]] = rewards[i]
            else:
                q_values[i, actions[i]] = (
                    rewards[i] + self.gamma * np.max(next_q_values[i])
                )

        # Train Q-Network
        self.q_network.fit(states, q_values, epochs=1, verbose=0)

        # Epsilon decay
        if self.epsilon > self.epsilon_min:
            self.epsilon *= self.epsilon_decay

    def action_to_control(self, action):
        """
        Convert action number to control input

        Action: 0-26 (3^3 = 27 patterns)
        Each variable: 0=decrease, 1=maintain, 2=increase
        """
        # Ternary expansion
        gas_action = action // 9
        rf_action = (action % 9) // 3
        pressure_action = action % 3

        # Convert to control amount
        gas_delta = (gas_action - 1) * 10  # ±10 sccm
        rf_delta = (rf_action - 1) * 20  # ±20 W
        pressure_delta = (pressure_action - 1) * 5  # ±5 mTorr

        return np.array([gas_delta, rf_delta, pressure_delta])


class CVDEnvironment:
    """CVD Process Environment (for RL)"""

    def __init__(self, target_thickness=100):
        self.target_thickness = target_thickness
        self.reset()

    def reset(self):
        """Reset environment"""
        self.thickness = 0.0
        self.rate = 0.0
        self.gas_flow = 125
        self.rf_power = 250
        self.pressure = 55
        self.step_count = 0

        return self._get_state()

    def _get_state(self):
        """Get state"""
        error = self.target_thickness - self.thickness
        return np.array([self.thickness, self.rate, self.target_thickness, error])

    def step(self, action_delta):
        """
        Execute one step

        Parameters:
        -----------
        action_delta : ndarray
            Control amount change [Δgas, ΔRF, Δpressure]

        Returns:
        --------
        next_state : ndarray
        reward : float
        done : bool
        """
        # Update control input
        self.gas_flow = np.clip(self.gas_flow + action_delta[0], 50, 200)
        self.rf_power = np.clip(self.rf_power + action_delta[1], 100, 400)
        self.pressure = np.clip(self.pressure + action_delta[2], 10, 100)

        # Process simulation (simplified model)
        self.rate = (
            0.01 * self.gas_flow + 0.02 * self.rf_power - 0.005 * self.pressure
        ) / 10
        self.thickness += self.rate + np.random.normal(0, 0.1)

        # Reward design
        error = abs(self.target_thickness - self.thickness)

        if error < 1:
            reward = 10  # Target achieved
        elif error < 5:
            reward = 5 - error
        else:
            reward = -error / 10

        # Termination condition
        self.step_count += 1
        done = (self.step_count >= 50) or (error < 1)

        next_state = self._get_state()

        return next_state, reward, done


# ========== DQN Training ==========
if __name__ == "__main__":
    np.random.seed(42)
    random.seed(42)
    tf.random.set_seed(42)

    env = CVDEnvironment(target_thickness=100)
    agent = DQNController(state_dim=4, action_dim=27, learning_rate=0.001)

    print("========== DQN Training ==========")
    episodes = 200
    target_update_freq = 10

    episode_rewards = []

    for episode in range(episodes):
        state = env.reset()
        total_reward = 0

        for step in range(50):
            # Select action
            action = agent.select_action(state)
            action_delta = agent.action_to_control(action)

            # Environment step
            next_state, reward, done = env.step(action_delta)
            total_reward += reward

            # Store experience
            agent.remember(state, action, reward, next_state, done)

            # Learn
            agent.replay()

            state = next_state

            if done:
                break

        episode_rewards.append(total_reward)

        # Update Target Network
        if episode % target_update_freq == 0:
            agent.update_target_network()

        # Progress display
        if (episode + 1) % 20 == 0:
            avg_reward = np.mean(episode_rewards[-20:])
            print(f"Episode {episode+1}/{episodes}: "
                  f"Avg Reward (last 20) = {avg_reward:.2f}, "
                  f"ε = {agent.epsilon:.3f}")

    print("\n========== Training Complete ==========")

    # Learning curve
    plt.figure(figsize=(10, 6))
    plt.plot(episode_rewards, alpha=0.3)
    plt.plot(np.convolve(episode_rewards, np.ones(20)/20, mode='valid'),
             linewidth=2, label='Moving Average (20 episodes)')
    plt.xlabel('Episode')
    plt.ylabel('Total Reward')
    plt.title('DQN Learning Curve')
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.savefig('dqn_learning_curve.png', dpi=300, bbox_inches='tight')
    plt.show()

    print(f"Final ε: {agent.epsilon:.4f}")
    print(f"Final Average Reward (last 20 episodes): "
          f"{np.mean(episode_rewards[-20:]):.2f}")

4.4 Summary

In this chapter, we learned AI implementation methods for Advanced Process Control (APC):

Key Learning Content

1. Model Predictive Control (MPC)

Prediction horizon optimization considering future constraints
Multivariable control for simultaneous optimization of multiple I/O
Linear MPC: State-space model based (CVD film thickness control)
Nonlinear MPC: Utilizing Neural Network process models

2. Reinforcement Learning Control (DQN)

Model-free learning acquiring control laws from actual data
Experience Replay for efficient learning
ε-greedy policy balancing exploration and exploitation
Discrete control: Optimal control in 27-action space

Practical Results

Film thickness control accuracy: Within ±0.5nm (conventional ±2nm)
Tracking error: RMSE < 1nm
Constraint violations: 0 cases (guaranteed operation within safe range)
Learning convergence: 200 episodes to reach practical level

Looking Ahead to the Next Chapter

In Chapter 5 "Fault Detection & Classification (FDC)", we will learn early detection and diagnosis techniques for process anomalies:

Multivariate Statistical Process Control (MSPC)
Anomaly detection using Isolation Forest
Fault diagnosis classification using Deep Learning
Root Cause Analysis (RCA) for cause identification

References

Montgomery, D. C. (2019). Design and Analysis of Experiments (9th ed.). Wiley.
Box, G. E. P., Hunter, J. S., & Hunter, W. G. (2005). Statistics for Experimenters: Design, Innovation, and Discovery (2nd ed.). Wiley.
Seborg, D. E., Edgar, T. F., Mellichamp, D. A., & Doyle III, F. J. (2016). Process Dynamics and Control (4th ed.). Wiley.
McKay, M. D., Beckman, R. J., & Conover, W. J. (2000). "A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output from a Computer Code." Technometrics, 42(1), 55-61.

Disclaimer

This content is provided solely for educational, research, and informational purposes and does not constitute professional advice (legal, accounting, technical warranty, etc.).
This content and accompanying code examples are provided "AS IS" without any warranty, express or implied, including but not limited to merchantability, fitness for a particular purpose, non-infringement, accuracy, completeness, operation, or safety.
The author and Tohoku University assume no responsibility for the content, availability, or safety of external links, third-party data, tools, libraries, etc.
To the maximum extent permitted by applicable law, the author and Tohoku University shall not be liable for any direct, indirect, incidental, special, consequential, or punitive damages arising from the use, execution, or interpretation of this content.
The content may be changed, updated, or discontinued without notice.
The copyright and license of this content are subject to the stated conditions (e.g., CC BY 4.0). Such licenses typically include no-warranty clauses.