Chapter 4: Multi-Objective and Constrained Optimization - Bayesian Optimization Introduction

This chapter covers Multi. You will learn essential concepts and techniques.

4.1 Real-World Complexity

In actual process optimization, simply maximizing a single objective function is insufficient. We need to simultaneously satisfy multiple requirements, including trade-offs between quality and efficiency, safety constraints, and physical constraints.

=¡ Typical Requirements in Real Processes

Multi-objective: Increase yield ‘ AND reduce energy “ AND reduce cost “
Constraints: Temperature d 150°C, Pressure d 5 bar, pH 6-8
Feasibility: Only physically possible conditions
Safety: Outside explosion limits, toxic substance concentration limits

In this chapter, we will learn extended Bayesian optimization methods that address these complex requirements.

4.2 Pareto Front Exploration

In multi-objective optimization, there is no single solution that is best in all objectives. Instead, we seek a set of Pareto optimal solutions (Pareto front).

Example 1: Multi-Objective Bayesian Optimization (Yield vs Energy)

# Multi-objective Bayesian Optimization: Pareto Front Exploration
import numpy as np
import matplotlib.pyplot as plt
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, ConstantKernel as C

# Two-objective process function (yield and energy consumption)
def process_objectives(X):
    """
    Args:
        X: [temperature, pressure] (N, 2)
    Returns:
        yield: Yield (to be maximized)
        energy: Energy consumption (to be minimized)
    """
    temp, pres = X[:, 0], X[:, 1]

    # Yield model (function of temperature and pressure)
    yield_rate = (
        50 + 20 * np.sin(temp/20) +
        15 * np.cos(pres/10) -
        0.1 * (temp - 100)**2 -
        0.05 * (pres - 50)**2 +
        np.random.normal(0, 1, len(temp))
    )

    # Energy consumption (proportional to temperature and pressure)
    energy = (
        0.5 * temp + 0.3 * pres +
        0.01 * temp * pres +
        np.random.normal(0, 2, len(temp))
    )

    return yield_rate, energy

# Pareto dominance check
def is_pareto_efficient(costs):
    """Pareto optimality check (converted to minimization problem)

    Args:
        costs: (N, M) objective function values (minimization direction)
    Returns:
        is_efficient: (N,) bool array
    """
    is_efficient = np.ones(costs.shape[0], dtype=bool)
    for i, c in enumerate(costs):
        if is_efficient[i]:
            # Points worse than i in all objectives are inefficient
            is_efficient[is_efficient] = np.any(
                costs[is_efficient] < c, axis=1
            )
            is_efficient[i] = True
    return is_efficient

# Execute multi-objective Bayesian optimization
np.random.seed(42)
n_iterations = 30

# Initial sampling
X = np.random.uniform([50, 20], [150, 80], (10, 2))
Y1, Y2 = process_objectives(X)

# Recording arrays
all_X = X.copy()
all_Y1 = Y1.copy()
all_Y2 = Y2.copy()

for iteration in range(n_iterations):
    # Build GP model for each objective
    kernel = C(1.0) * RBF([10.0, 10.0])

    gp1 = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=5)
    gp1.fit(all_X, all_Y1)

    gp2 = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=5)
    gp2.fit(all_X, all_Y2)

    # Generate candidate points
    X_candidates = np.random.uniform([50, 20], [150, 80], (500, 2))

    # Prediction at each candidate
    mu1, sigma1 = gp1.predict(X_candidates, return_std=True)
    mu2, sigma2 = gp2.predict(X_candidates, return_std=True)

    # Scalarization: weighted sum with uncertainty
    # Maximize yield = minimize -yield
    # Change weight per iteration to explore diverse Pareto solutions
    weight = iteration / n_iterations  # 0’1
    scalarized = (
        -(1 - weight) * (mu1 + 2 * sigma1) +  # Yield (maximize)
        weight * (mu2 - 2 * sigma2)            # Energy (minimize)
    )

    # Next experiment point
    next_idx = np.argmin(scalarized)
    next_x = X_candidates[next_idx]

    # Execute experiment
    next_y1, next_y2 = process_objectives(next_x.reshape(1, -1))

    # Add data
    all_X = np.vstack([all_X, next_x])
    all_Y1 = np.hstack([all_Y1, next_y1])
    all_Y2 = np.hstack([all_Y2, next_y2])

# Extract Pareto front
costs = np.column_stack([-all_Y1, all_Y2])  # Convert yield to negative (minimization)
pareto_mask = is_pareto_efficient(costs)
pareto_X = all_X[pareto_mask]
pareto_Y1 = all_Y1[pareto_mask]
pareto_Y2 = all_Y2[pareto_mask]

# Sort (for visualization)
sorted_idx = np.argsort(pareto_Y1)
pareto_Y1_sorted = pareto_Y1[sorted_idx]
pareto_Y2_sorted = pareto_Y2[sorted_idx]

# Visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))

# Objective function space
ax1.scatter(all_Y1, all_Y2, c='lightgray', alpha=0.5, s=30, label='All evaluations')
ax1.scatter(pareto_Y1, pareto_Y2, c='red', s=100, marker='*',
            label=f'Pareto front ({len(pareto_Y1)} points)', zorder=10)
ax1.plot(pareto_Y1_sorted, pareto_Y2_sorted, 'r--', alpha=0.5, linewidth=2)
ax1.set_xlabel('Yield (%)', fontsize=12)
ax1.set_ylabel('Energy Consumption (kWh)', fontsize=12)
ax1.set_title('Pareto Front in Objective Space', fontsize=13, fontweight='bold')
ax1.legend()
ax1.grid(alpha=0.3)

# Decision variable space
scatter = ax2.scatter(pareto_X[:, 0], pareto_X[:, 1], c=pareto_Y1,
                      s=150, cmap='RdYlGn', marker='*', edgecolors='black', linewidth=1.5)
ax2.set_xlabel('Temperature (°C)', fontsize=12)
ax2.set_ylabel('Pressure (bar)', fontsize=12)
ax2.set_title('Pareto Solutions in Decision Space', fontsize=13, fontweight='bold')
cbar = plt.colorbar(scatter, ax=ax2)
cbar.set_label('Yield (%)', fontsize=10)
ax2.grid(alpha=0.3)

plt.tight_layout()
plt.savefig('pareto_front.png', dpi=150, bbox_inches='tight')

print(f"Discovered Pareto optimal solutions: {len(pareto_Y1)} points")
print(f"\nRepresentative trade-off solutions:")
print(f"  High yield priority: Yield={pareto_Y1.max():.1f}%, Energy={pareto_Y2[np.argmax(pareto_Y1)]:.1f}kWh")
print(f"  Low energy priority: Yield={pareto_Y1[np.argmin(pareto_Y2)]:.1f}%, Energy={pareto_Y2.min():.1f}kWh")

Advantages of Pareto Optimization

Visualize trade-offs between multiple objectives
Decision makers can select solutions according to preferences
Can handle cases where objective importance is unknown in advance

4.3 Expected Hypervolume Improvement (EHVI)

An acquisition function specialized for multi-objective optimization. It maximizes the improvement in hypervolume occupied by the Pareto front.

Example 2: EHVI Acquisition Function Implementation

# Expected Hypervolume Improvement (EHVI)
from scipy.stats import norm

def compute_hypervolume_2d(pareto_front, ref_point):
    """Calculate 2D hypervolume (simplified version)

    Args:
        pareto_front: (N, 2) Pareto optimal solutions (minimization problem)
        ref_point: (2,) reference point
    """
    # Sort
    sorted_front = pareto_front[np.argsort(pareto_front[:, 0])]

    hv = 0.0
    prev_x = ref_point[0]

    for point in sorted_front:
        width = prev_x - point[0]
        height = ref_point[1] - point[1]
        if width > 0 and height > 0:
            hv += width * height
        prev_x = point[0]

    return hv

def expected_hypervolume_improvement(X, gp1, gp2, pareto_front, ref_point, n_samples=100):
    """EHVI acquisition function (Monte Carlo approximation)

    Args:
        X: Evaluation points
        gp1, gp2: GP models for each objective
        pareto_front: Current Pareto front
        ref_point: Reference point
        n_samples: Number of samples
    """
    mu1, sigma1 = gp1.predict(X, return_std=True)
    mu2, sigma2 = gp2.predict(X, return_std=True)

    # Current hypervolume
    current_hv = compute_hypervolume_2d(pareto_front, ref_point)

    ehvi = np.zeros(len(X))

    for i in range(len(X)):
        hv_improvements = []

        for _ in range(n_samples):
            # Sample objective function values at new point
            y1_new = np.random.normal(mu1[i], sigma1[i])
            y2_new = np.random.normal(mu2[i], sigma2[i])

            # Pareto front with new point added
            new_front_candidates = np.vstack([
                pareto_front,
                [-y1_new, y2_new]  # Convert yield to negative
            ])

            # New Pareto front
            new_pareto_mask = is_pareto_efficient(new_front_candidates)
            new_pareto_front = new_front_candidates[new_pareto_mask]

            # Hypervolume improvement
            new_hv = compute_hypervolume_2d(new_pareto_front, ref_point)
            hv_improvements.append(max(0, new_hv - current_hv))

        ehvi[i] = np.mean(hv_improvements)

    return ehvi

# Multi-objective BO using EHVI
print("Executing EHVI optimization...")

# Initial setup
X_init = np.random.uniform([50, 20], [150, 80], (10, 2))
Y1_init, Y2_init = process_objectives(X_init)

X_all = X_init.copy()
Y1_all = Y1_init.copy()
Y2_all = Y2_init.copy()

n_iter = 20

for iteration in range(n_iter):
    # Build GP models
    gp1 = GaussianProcessRegressor(kernel=C(1.0)*RBF([10, 10]), normalize_y=True)
    gp1.fit(X_all, Y1_all)

    gp2 = GaussianProcessRegressor(kernel=C(1.0)*RBF([10, 10]), normalize_y=True)
    gp2.fit(X_all, Y2_all)

    # Current Pareto front
    costs = np.column_stack([-Y1_all, Y2_all])
    pareto_mask = is_pareto_efficient(costs)
    current_pareto = costs[pareto_mask]

    # Reference point (slightly worse than worst values)
    ref_point = np.array([
        costs[:, 0].max() + 10,
        costs[:, 1].max() + 10
    ])

    # Candidate points
    X_candidates = np.random.uniform([50, 20], [150, 80], (100, 2))

    # EHVI calculation
    if iteration % 5 == 0:
        print(f"  Iteration {iteration}/{n_iter}")

    ehvi = expected_hypervolume_improvement(
        X_candidates, gp1, gp2, current_pareto, ref_point, n_samples=50
    )

    # Next experiment point
    next_x = X_candidates[np.argmax(ehvi)]
    next_y1, next_y2 = process_objectives(next_x.reshape(1, -1))

    X_all = np.vstack([X_all, next_x])
    Y1_all = np.hstack([Y1_all, next_y1])
    Y2_all = np.hstack([Y2_all, next_y2])

# Final Pareto front
costs_final = np.column_stack([-Y1_all, Y2_all])
pareto_mask_final = is_pareto_efficient(costs_final)

print(f"\nEHVI optimization results:")
print(f"  Number of evaluations: {len(X_all)}")
print(f"  Pareto optimal solutions: {pareto_mask_final.sum()} points")
print(f"  Hypervolume: {compute_hypervolume_2d(costs_final[pareto_mask_final], ref_point):.2f}")

=¡ EHVI Characteristics

Considers improvement of entire Pareto front (not just a single point)
Theoretically extensible to 3 or more objectives
Computational cost is high (approximation methods recommended)

4.4 Constrained Optimization

In real processes, physical and safety constraints exist. We will learn methods to optimize while avoiding evaluation at constraint-violating points.

Example 3: Constraint Handling with Gaussian Processes

# Constrained Bayesian Optimization
def constrained_process(X):
    """Constrained process

    Objective: Maximize reaction rate
    Constraints: Temperature < 130°C and Pressure < 60 bar (safety)
    """
    temp, pres = X[:, 0], X[:, 1]

    # Objective function (reaction rate)
    rate = (
        10 + 0.5*temp + 0.3*pres -
        0.002*temp**2 - 0.001*pres**2 +
        0.01*temp*pres +
        np.random.normal(0, 0.5, len(temp))
    )

    # Constraint functions (negative means violation)
    # g1: 130 - temp >= 0
    # g2: 60 - pres >= 0
    constraint1 = 130 - temp
    constraint2 = 60 - pres

    return rate, constraint1, constraint2

def probability_of_feasibility(X, gp_constraints):
    """Calculate probability of feasibility

    Args:
        X: Evaluation points
        gp_constraints: [GP(c1), GP(c2), ...]
    Returns:
        prob: Probability of satisfying all constraints
    """
    prob = np.ones(len(X))

    for gp_c in gp_constraints:
        mu_c, sigma_c = gp_c.predict(X, return_std=True)
        # Probability of c >= 0
        prob *= norm.cdf(mu_c / (sigma_c + 1e-6))

    return prob

def constrained_ei(X, gp_obj, gp_constraints, y_best, xi=0.01):
    """Constrained Expected Improvement

    Args:
        X: Evaluation points
        gp_obj: GP for objective function
        gp_constraints: List of GPs for constraints
        y_best: Best value in feasible region
    """
    # Standard EI
    mu, sigma = gp_obj.predict(X, return_std=True)
    imp = mu - y_best - xi
    Z = imp / (sigma + 1e-6)
    ei = imp * norm.cdf(Z) + sigma * norm.pdf(Z)

    # Weight by feasibility probability
    pof = probability_of_feasibility(X, gp_constraints)

    return ei * pof

# Execute constrained BO
np.random.seed(42)

# Initial sampling (including constraint violations)
X = np.random.uniform([80, 30], [150, 80], (15, 2))
rate, c1, c2 = constrained_process(X)

X_all = X.copy()
rate_all = rate.copy()
c1_all = c1.copy()
c2_all = c2.copy()

n_iter = 25

for iteration in range(n_iter):
    # Build GPs (objective and constraints)
    gp_obj = GaussianProcessRegressor(kernel=C(1.0)*RBF([10, 10]), normalize_y=True)
    gp_obj.fit(X_all, rate_all)

    gp_c1 = GaussianProcessRegressor(kernel=C(1.0)*RBF([10, 10]), normalize_y=True)
    gp_c1.fit(X_all, c1_all)

    gp_c2 = GaussianProcessRegressor(kernel=C(1.0)*RBF([10, 10]), normalize_y=True)
    gp_c2.fit(X_all, c2_all)

    # Best value in feasible region
    feasible_mask = (c1_all >= 0) & (c2_all >= 0)
    if feasible_mask.any():
        y_best = rate_all[feasible_mask].max()
    else:
        y_best = rate_all.min()  # No feasible solution yet

    # Candidate points
    X_cand = np.random.uniform([80, 30], [150, 80], (500, 2))

    # Constrained EI
    cei = constrained_ei(X_cand, gp_obj, [gp_c1, gp_c2], y_best)

    # Next experiment point
    next_x = X_cand[np.argmax(cei)]
    next_rate, next_c1, next_c2 = constrained_process(next_x.reshape(1, -1))

    X_all = np.vstack([X_all, next_x])
    rate_all = np.hstack([rate_all, next_rate])
    c1_all = np.hstack([c1_all, next_c1])
    c2_all = np.hstack([c2_all, next_c2])

# Results analysis
final_feasible = (c1_all >= 0) & (c2_all >= 0)
feasible_X = X_all[final_feasible]
feasible_rate = rate_all[final_feasible]

best_idx = np.argmax(feasible_rate)
best_X = feasible_X[best_idx]
best_rate = feasible_rate[best_idx]

# Visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))

# Search history
colors = ['red' if not f else 'green' for f in final_feasible]
ax1.scatter(X_all[:, 0], X_all[:, 1], c=colors, alpha=0.6, s=50)
ax1.scatter(best_X[0], best_X[1], c='gold', s=300, marker='*',
            edgecolors='black', linewidth=2, label='Best feasible', zorder=10)
ax1.axvline(130, color='red', linestyle='--', alpha=0.7, label='Temp constraint')
ax1.axhline(60, color='blue', linestyle='--', alpha=0.7, label='Pressure constraint')
ax1.fill_between([80, 130], 30, 60, alpha=0.1, color='green', label='Feasible region')
ax1.set_xlabel('Temperature (°C)', fontsize=12)
ax1.set_ylabel('Pressure (bar)', fontsize=12)
ax1.set_title('Constrained Optimization History', fontsize=13, fontweight='bold')
ax1.legend()
ax1.grid(alpha=0.3)

# Convergence curve
feasible_best = []
for i in range(len(rate_all)):
    mask = final_feasible[:i+1]
    if mask.any():
        feasible_best.append(rate_all[:i+1][mask].max())
    else:
        feasible_best.append(np.nan)

ax2.plot(feasible_best, 'g-', linewidth=2, marker='o', markersize=4)
ax2.set_xlabel('Iteration', fontsize=12)
ax2.set_ylabel('Best Feasible Reaction Rate', fontsize=12)
ax2.set_title('Convergence in Feasible Region', fontsize=13, fontweight='bold')
ax2.grid(alpha=0.3)

plt.tight_layout()
plt.savefig('constrained_optimization.png', dpi=150, bbox_inches='tight')

print(f"\nConstrained optimization results:")
print(f"  Total evaluations: {len(X_all)}")
print(f"  Feasible points: {final_feasible.sum()} ({final_feasible.sum()/len(X_all)*100:.1f}%)")
print(f"  Best feasible solution:")
print(f"    Temperature = {best_X[0]:.1f}°C, Pressure = {best_X[1]:.1f} bar")
print(f"    Reaction rate = {best_rate:.2f}")

Constraint Handling Strategies

Soft constraints: Weight by feasibility probability (above example)
Hard constraints: Exclude constraint-violating points from candidates
Penalty method: Add penalty term to objective function

4.5 Probability of Feasibility

A method that explicitly models the probability of satisfying constraints.

Example 4: Probability of Feasibility Implementation

# Detailed implementation of Probability of Feasibility (PoF)
def visualize_feasibility_probability(gp_constraints, X_grid):
    """Visualize feasibility probability"""
    pof = probability_of_feasibility(X_grid, gp_constraints)
    return pof

# Generate grid
temp_range = np.linspace(80, 150, 100)
pres_range = np.linspace(30, 80, 100)
Temp, Pres = np.meshgrid(temp_range, pres_range)
X_grid = np.column_stack([Temp.ravel(), Pres.ravel()])

# PoF with final model
gp_c1_final = GaussianProcessRegressor(kernel=C(1.0)*RBF([10, 10]))
gp_c1_final.fit(X_all, c1_all)

gp_c2_final = GaussianProcessRegressor(kernel=C(1.0)*RBF([10, 10]))
gp_c2_final.fit(X_all, c2_all)

pof_grid = visualize_feasibility_probability([gp_c1_final, gp_c2_final], X_grid)
PoF = pof_grid.reshape(Temp.shape)

# Objective function prediction
gp_obj_final = GaussianProcessRegressor(kernel=C(1.0)*RBF([10, 10]))
gp_obj_final.fit(X_all, rate_all)
rate_pred = gp_obj_final.predict(X_grid).reshape(Temp.shape)

# Visualization
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(18, 5))

# PoF
im1 = ax1.contourf(Temp, Pres, PoF, levels=20, cmap='RdYlGn')
ax1.contour(Temp, Pres, PoF, levels=[0.9, 0.95, 0.99], colors='black',
            linewidths=1.5, linestyles='--')
ax1.scatter(X_all[:, 0], X_all[:, 1], c='blue', s=30, alpha=0.6, edgecolors='white')
ax1.set_xlabel('Temperature (°C)')
ax1.set_ylabel('Pressure (bar)')
ax1.set_title('Probability of Feasibility')
plt.colorbar(im1, ax=ax1, label='PoF')

# Predicted reaction rate
im2 = ax2.contourf(Temp, Pres, rate_pred, levels=20, cmap='viridis')
ax2.contour(Temp, Pres, PoF, levels=[0.5], colors='red', linewidths=2, linestyles='-')
ax2.scatter(best_X[0], best_X[1], c='gold', s=300, marker='*', edgecolors='black', linewidth=2)
ax2.set_xlabel('Temperature (°C)')
ax2.set_ylabel('Pressure (bar)')
ax2.set_title('Predicted Reaction Rate')
plt.colorbar(im2, ax=ax2, label='Rate')

# Constrained EI
cei_grid = constrained_ei(X_grid, gp_obj_final, [gp_c1_final, gp_c2_final],
                          rate_all[final_feasible].max())
CEI = cei_grid.reshape(Temp.shape)

im3 = ax3.contourf(Temp, Pres, CEI, levels=20, cmap='plasma')
ax3.scatter(X_all[:, 0], X_all[:, 1], c='white', s=20, alpha=0.8)
ax3.set_xlabel('Temperature (°C)')
ax3.set_ylabel('Pressure (bar)')
ax3.set_title('Constrained EI Acquisition')
plt.colorbar(im3, ax=ax3, label='CEI')

plt.tight_layout()
plt.savefig('pof_landscape.png', dpi=150, bbox_inches='tight')

print("\nFeasibility probability statistics:")
print(f"  PoF > 0.5: {(PoF > 0.5).sum() / PoF.size * 100:.1f}%")
print(f"  PoF > 0.9: {(PoF > 0.9).sum() / PoF.size * 100:.1f}%")
print(f"  PoF > 0.99: {(PoF > 0.99).sum() / PoF.size * 100:.1f}%")

Interpretation of PoF

PoF = 0.9 means "90% probability of satisfying constraints". For safety-critical applications, requiring PoF > 0.95 or 0.99 is recommended.

4.6 Safe Bayesian Optimization

Safe BO algorithm that avoids evaluation at unknown constraint-violating points.

Example 5: Safe Bayesian Optimization Implementation

# Safe Bayesian Optimization
def safe_ucb(X, gp_obj, gp_constraints, beta_obj=2.0, beta_safe=3.0):
    """Safe UCB acquisition function

    Args:
        X: Candidate points
        gp_obj: Objective function GP
        gp_constraints: Constraint GPs
        beta_obj: Exploration parameter for objective
        beta_safe: Confidence parameter for safety
    """
    # UCB for objective function
    mu_obj, sigma_obj = gp_obj.predict(X, return_std=True)
    ucb_obj = mu_obj + beta_obj * sigma_obj

    # Safety probability (conservative: mu - beta*sigma >= 0)
    safety_prob = np.ones(len(X))
    for gp_c in gp_constraints:
        mu_c, sigma_c = gp_c.predict(X, return_std=True)
        # Safe if lower confidence bound is positive
        lcb_c = mu_c - beta_safe * sigma_c
        safety_prob *= (lcb_c >= 0).astype(float)

    # Select only safe points
    ucb_obj[safety_prob == 0] = -np.inf

    return ucb_obj, safety_prob

# Execute Safe BO
print("\nExecuting Safe Bayesian Optimization...")

# Initial points (starting from known safe region)
X_safe = np.array([
    [100, 40],
    [105, 45],
    [110, 50],
    [95, 35],
    [90, 40]
])
rate_safe, c1_safe, c2_safe = constrained_process(X_safe)

X_all_safe = X_safe.copy()
rate_all_safe = rate_safe.copy()
c1_all_safe = c1_safe.copy()
c2_all_safe = c2_safe.copy()

safety_violations = 0

for iteration in range(20):
    # Build GPs
    gp_obj_s = GaussianProcessRegressor(kernel=C(1.0)*RBF([10, 10]), normalize_y=True)
    gp_obj_s.fit(X_all_safe, rate_all_safe)

    gp_c1_s = GaussianProcessRegressor(kernel=C(1.0)*RBF([10, 10]), normalize_y=True)
    gp_c1_s.fit(X_all_safe, c1_all_safe)

    gp_c2_s = GaussianProcessRegressor(kernel=C(1.0)*RBF([10, 10]), normalize_y=True)
    gp_c2_s.fit(X_all_safe, c2_all_safe)

    # Candidate points
    X_cand = np.random.uniform([80, 30], [150, 80], (500, 2))

    # Safe UCB
    safe_ucb_values, safety = safe_ucb(X_cand, gp_obj_s, [gp_c1_s, gp_c2_s],
                                        beta_obj=2.0, beta_safe=3.0)

    if np.all(np.isinf(safe_ucb_values)):
        print(f"  Warning: Iteration {iteration} - No safe candidate points")
        # Select safest point (emergency measure)
        pof = probability_of_feasibility(X_cand, [gp_c1_s, gp_c2_s])
        next_x = X_cand[np.argmax(pof)]
    else:
        next_x = X_cand[np.argmax(safe_ucb_values)]

    # Experiment
    next_rate, next_c1, next_c2 = constrained_process(next_x.reshape(1, -1))

    # Safety check
    if next_c1[0] < 0 or next_c2[0] < 0:
        safety_violations += 1
        print(f"    Constraint violation occurred (Iteration {iteration})")

    X_all_safe = np.vstack([X_all_safe, next_x])
    rate_all_safe = np.hstack([rate_all_safe, next_rate])
    c1_all_safe = np.hstack([c1_all_safe, next_c1])
    c2_all_safe = np.hstack([c2_all_safe, next_c2])

print(f"\nSafe BO results:")
print(f"  Total evaluations: {len(X_all_safe)}")
print(f"  Constraint violations: {safety_violations} times ({safety_violations/len(X_all_safe)*100:.1f}%)")

# Compare with standard BO
print(f"\nStandard BO (reference):")
print(f"  Constraint violations: {(~final_feasible).sum()} times ({(~final_feasible).sum()/len(X_all)*100:.1f}%)")

print(f"\nSafety improvement by Safe BO: {((~final_feasible).sum() - safety_violations)} violations reduced")

Advantages of Safe BO

Avoids experiments in dangerous regions with high probability
Applicable to safety-critical processes
Conservative exploration (adjustable via ² values)

4.7 Batch Bayesian Optimization

Batch optimization that proposes multiple points simultaneously to utilize parallel experimental equipment.

Example 6: Batch Acquisition Function

# Batch Bayesian Optimization
def batch_ucb_selection(X_candidates, gp, batch_size=5, kappa=2.0, diversity_weight=0.1):
    """Batch UCB selection (sequential greedy method)

    Args:
        X_candidates: Candidate points
        gp: GP model
        batch_size: Batch size
        kappa: UCB exploration parameter
        diversity_weight: Diversity weight
    """
    selected_batch = []
    remaining_candidates = X_candidates.copy()

    for i in range(batch_size):
        # UCB calculation
        mu, sigma = gp.predict(remaining_candidates, return_std=True)
        ucb = mu + kappa * sigma

        # Diversity penalty (favor points far from already selected)
        if len(selected_batch) > 0:
            selected_array = np.array(selected_batch)
            for selected_x in selected_array:
                distances = np.linalg.norm(remaining_candidates - selected_x, axis=1)
                diversity_bonus = diversity_weight * distances.min() / distances
                ucb += diversity_bonus

        # Select best point
        best_idx = np.argmax(ucb)
        selected_batch.append(remaining_candidates[best_idx])

        # Remove selected point
        remaining_candidates = np.delete(remaining_candidates, best_idx, axis=0)

    return np.array(selected_batch)

# Execute batch BO
print("\nExecuting Batch Bayesian Optimization...")

batch_size = 5
n_batches = 8

# Initial sampling
X_batch = np.random.uniform([50, 20], [150, 80], (10, 2))
Y1_batch, Y2_batch = process_objectives(X_batch)

for batch_idx in range(n_batches):
    # Build GP (simplified with only objective 1)
    gp_batch = GaussianProcessRegressor(kernel=C(1.0)*RBF([10, 10]), normalize_y=True)
    gp_batch.fit(X_batch, Y1_batch)

    # Batch selection
    X_cand = np.random.uniform([50, 20], [150, 80], (200, 2))
    next_batch = batch_ucb_selection(X_cand, gp_batch, batch_size=batch_size,
                                     kappa=2.0, diversity_weight=0.5)

    # Parallel experiments (simulation)
    next_y1, next_y2 = process_objectives(next_batch)

    # Add data
    X_batch = np.vstack([X_batch, next_batch])
    Y1_batch = np.hstack([Y1_batch, next_y1])
    Y2_batch = np.hstack([Y2_batch, next_y2])

    print(f"  Batch {batch_idx+1}/{n_batches}: Best yield = {Y1_batch.max():.2f}")

# Results visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))

# Color by batch
colors = np.repeat(np.arange(n_batches + 1), [10] + [batch_size]*n_batches)
scatter = ax1.scatter(X_batch[:, 0], X_batch[:, 1], c=colors, cmap='tab10',
                      s=100, alpha=0.7, edgecolors='black', linewidth=0.5)
ax1.set_xlabel('Temperature (°C)')
ax1.set_ylabel('Pressure (bar)')
ax1.set_title('Batch Optimization Trajectory')
plt.colorbar(scatter, ax=ax1, label='Batch Number', ticks=range(n_batches+1))
ax1.grid(alpha=0.3)

# Diversity within batch (last batch)
last_batch = X_batch[-batch_size:]
distances = []
for i in range(len(last_batch)):
    for j in range(i+1, len(last_batch)):
        dist = np.linalg.norm(last_batch[i] - last_batch[j])
        distances.append(dist)

ax2.hist(distances, bins=10, color='skyblue', alpha=0.7, edgecolor='black')
ax2.set_xlabel('Euclidean Distance')
ax2.set_ylabel('Frequency')
ax2.set_title(f'Diversity in Last Batch ({batch_size} points)')
ax2.axvline(np.mean(distances), color='red', linestyle='--', linewidth=2,
            label=f'Mean: {np.mean(distances):.2f}')
ax2.legend()
ax2.grid(alpha=0.3)

plt.tight_layout()
plt.savefig('batch_optimization.png', dpi=150, bbox_inches='tight')

print(f"\nBatch optimization results:")
print(f"  Total evaluations: {len(X_batch)}")
print(f"  Number of batches: {n_batches}")
print(f"  Speedup by parallel experiments: {batch_size}x (theoretical)")
print(f"  Best yield: {Y1_batch.max():.2f}")

=¡ Batch Selection Strategies

Sequential greedy method: Select one point at a time, add diversity penalty
Parallel EI: Kriging Believer approach
Thompson Sampling: Naturally ensures diversity
Diversity-focused: Combined with clustering

4.8 High-Dimensional Optimization

Strategies for cases with many parameters (10 dimensions or more).

Example 7: Addressing High-Dimensional Problems

# High-dimensional Bayesian Optimization (dimensionality reduction approach)
from sklearn.decomposition import PCA

def high_dimensional_process(X):
    """10-dimensional process function

    Args:
        X: (N, 10) 10 parameters
    Returns:
        y: Objective function value
    """
    # In reality, optimal solution exists in low-dimensional subspace (common case)
    # Effective dimensions: x0, x1, x5 only
    effective_dims = X[:, [0, 1, 5]]

    y = (
        -np.sum((effective_dims - 0.5)**2, axis=1) +
        0.1 * np.sum(X, axis=1) +  # Small influence from other dimensions
        np.random.normal(0, 0.01, len(X))
    )
    return y

# High-dimensional BO strategy 1: Random embedding
print("\nExecuting high-dimensional BO...")

n_dim = 10
n_effective_dim = 3  # Estimated effective dimensions

# Initial sampling
n_init = 20  # More for high dimensions
X_high = np.random.uniform(0, 1, (n_init, n_dim))
y_high = high_dimensional_process(X_high)

X_all_high = X_high.copy()
y_all_high = y_high.copy()

# Dimensionality reduction with PCA
pca = PCA(n_components=n_effective_dim)
X_reduced = pca.fit_transform(X_all_high)

best_values_high = [y_all_high.max()]

for iteration in range(30):
    # Build GP in low-dimensional space
    gp_high = GaussianProcessRegressor(
        kernel=C(1.0) * RBF([1.0]*n_effective_dim),
        normalize_y=True,
        n_restarts_optimizer=5
    )
    gp_high.fit(X_reduced, y_all_high)

    # Generate candidates in low-dimensional space
    X_cand_reduced = np.random.uniform(
        X_reduced.min(axis=0),
        X_reduced.max(axis=0),
        (500, n_effective_dim)
    )

    # Select with UCB
    mu, sigma = gp_high.predict(X_cand_reduced, return_std=True)
    ucb = mu + 2.0 * sigma
    next_x_reduced = X_cand_reduced[np.argmax(ucb)]

    # Inverse transform to original space
    next_x_high = pca.inverse_transform(next_x_reduced.reshape(1, -1))

    # Range check and clip
    next_x_high = np.clip(next_x_high, 0, 1)

    # Experiment
    next_y_high = high_dimensional_process(next_x_high)

    # Add data
    X_all_high = np.vstack([X_all_high, next_x_high])
    y_all_high = np.hstack([y_all_high, next_y_high])

    # Update PCA
    X_reduced = pca.fit_transform(X_all_high)

    best_values_high.append(y_all_high.max())

# Comparison: Random search
X_random = np.random.uniform(0, 1, (len(y_all_high), n_dim))
y_random = high_dimensional_process(X_random)
best_random = [y_random[:i+1].max() for i in range(len(y_random))]

# Visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))

# Convergence comparison
ax1.plot(best_values_high, 'g-', linewidth=2, marker='o', markersize=4,
         label='BO with PCA (3D)')
ax1.plot(best_random, 'gray', linewidth=2, marker='s', markersize=4,
         alpha=0.6, label='Random Search')
ax1.set_xlabel('Iteration')
ax1.set_ylabel('Best Value Found')
ax1.set_title('High-Dimensional Optimization (10D)')
ax1.legend()
ax1.grid(alpha=0.3)

# PCA explained variance
pca_final = PCA(n_components=n_dim)
pca_final.fit(X_all_high)
explained_var = pca_final.explained_variance_ratio_

ax2.bar(range(1, n_dim+1), explained_var, color='steelblue', alpha=0.7, edgecolor='black')
ax2.plot(range(1, n_dim+1), np.cumsum(explained_var), 'ro-', linewidth=2, markersize=6,
         label='Cumulative')
ax2.axhline(0.95, color='green', linestyle='--', label='95% threshold')
ax2.set_xlabel('Principal Component')
ax2.set_ylabel('Explained Variance Ratio')
ax2.set_title('PCA Analysis of Search Space')
ax2.legend()
ax2.grid(alpha=0.3)

plt.tight_layout()
plt.savefig('high_dimensional_bo.png', dpi=150, bbox_inches='tight')

print(f"\nHigh-dimensional BO results:")
print(f"  Dimensions: {n_dim}")
print(f"  Effective dimensions (estimated): {n_effective_dim}")
print(f"  Best value (BO+PCA): {y_all_high.max():.4f}")
print(f"  Best value (Random): {y_random.max():.4f}")
print(f"  Improvement rate: {(y_all_high.max() - y_random.max())/abs(y_random.max())*100:.1f}%")
print(f"\nExplained variance of top 3 PCA components: {explained_var[:3].sum()*100:.1f}%")

Challenges in High-Dimensional BO

As dimensions increase, GP computational cost increases rapidly (O(n³)), and sample efficiency decreases. Countermeasures: (1) Dimensionality reduction (PCA, UMAP), (2) Random Embedding, (3) Additive GP, (4) Sparse approximation

Dimensions	Recommended Method	Initial Samples	Considerations
1-3D	Standard BO (EI, UCB)	5-10	No issues
4-7D	Standard BO + good kernel	10-20	Increased computational cost
8-15D	Dimensionality reduction + BO	20-50	Estimating effective dimensions is important
16D+	Random Embedding/TuRBO	50-100	Scalability required

Summary

In this chapter, we learned advanced methods for addressing complex real-world optimization problems.

Key Points

Multi-objective optimization: Pareto front exploration, EHVI for simultaneous achievement of multiple objectives
Constrained optimization: Safe experimental design with PoF and CEI
Safe BO: Optimization while avoiding dangerous regions
Batch optimization: Speedup with parallel experiments (diversity preservation is key)
High-dimensional handling: Practical application with dimensionality reduction and Random Embedding

Preview of Next Chapter

In Chapter 5, we will integrate the techniques learned so far and provide detailed explanations of applications to actual industrial processes. We will develop practical skills through 7 case studies including reactor optimization, catalyst design, and quality control.

References

Montgomery, D. C. (2019). Design and Analysis of Experiments (9th ed.). Wiley.
Box, G. E. P., Hunter, J. S., & Hunter, W. G. (2005). Statistics for Experimenters: Design, Innovation, and Discovery (2nd ed.). Wiley.
Seborg, D. E., Edgar, T. F., Mellichamp, D. A., & Doyle III, F. J. (2016). Process Dynamics and Control (4th ed.). Wiley.
McKay, M. D., Beckman, R. J., & Conover, W. J. (2000). "A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output from a Computer Code." Technometrics, 42(1), 55-61.

Disclaimer

This content is provided solely for educational, research, and informational purposes and does not constitute professional advice (legal, accounting, technical warranty, etc.).
This content and accompanying code examples are provided "AS IS" without any warranty, express or implied, including but not limited to merchantability, fitness for a particular purpose, non-infringement, accuracy, completeness, operation, or safety.
The author and Tohoku University assume no responsibility for the content, availability, or safety of external links, third-party data, tools, libraries, etc.
To the maximum extent permitted by applicable law, the author and Tohoku University shall not be liable for any direct, indirect, incidental, special, consequential, or punitive damages arising from the use, execution, or interpretation of this content.
The content may be changed, updated, or discontinued without notice.
The copyright and license of this content are subject to the stated conditions (e.g., CC BY 4.0). Such licenses typically include no-warranty clauses.