This chapter covers Linear and Nonlinear Programming. You will learn Formulation of linear programming problems and Evaluation of convergence.
Learning Objectives
By reading this chapter, you will master:
- ✅ Formulation of linear programming problems and understanding the simplex method
- ✅ Implementation of linear programming using scipy.optimize and PuLP library
- ✅ Principles and implementation of gradient descent, Newton's method, and quasi-Newton methods
- ✅ Evaluation of convergence and computational efficiency of nonlinear optimization algorithms
- ✅ Application of optimization techniques to chemical process problems
2.1 Linear Programming
What is a Linear Programming Problem
Linear Programming (LP) is the optimization of a linear objective function subject to linear constraints.
Standard form:
$$ \begin{aligned} \text{minimize} \quad & \mathbf{c}^T \mathbf{x} \\ \text{subject to} \quad & \mathbf{A} \mathbf{x} \leq \mathbf{b} \\ & \mathbf{x} \geq \mathbf{0} \end{aligned} $$
Where $\mathbf{c}$ is the coefficient vector of the objective function, $\mathbf{A}$ is the constraint matrix, and $\mathbf{b}$ is the constraint vector.
Characteristics of Linear Programming
| Characteristic | Description | Advantage |
|---|---|---|
| Convexity | Feasible region is a convex polyhedron | Local optimum = Global optimum |
| Efficiency | Simplex method is efficient | Fast solution even for large-scale problems |
| Versatility | Various applications: production planning, blending problems | Applicable to chemical process operation optimization |
| Duality | Relationship between primal and dual problems | Easy sensitivity analysis |
Code Example 1: Production Planning Optimization with scipy.optimize.linprog
# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0
# - numpy>=1.24.0, <2.0.0
import numpy as np
from scipy.optimize import linprog
import matplotlib.pyplot as plt
"""
Production Planning Problem
-----------
Chemical plant produces products A and B. Goal: maximize profit.
Decision variables:
x1 = Production of product A [kg/day]
x2 = Production of product B [kg/day]
Objective function (profit maximization):
maximize: 40*x1 + 30*x2 → minimize: -40*x1 - 30*x2
Constraints:
1. Raw material constraint: 2*x1 + x2 <= 100 (raw material inventory)
2. Equipment time constraint: x1 + 2*x2 <= 80 (equipment operation time)
3. Non-negativity constraint: x1, x2 >= 0
"""
# Objective function coefficients (sign reversed for minimization)
c = np.array([-40, -30])
# Inequality constraint coefficient matrix A*x <= b
A_ub = np.array([
[2, 1], # Raw material constraint
[1, 2] # Equipment time constraint
])
b_ub = np.array([100, 80])
# Variable bounds (non-negativity constraint)
x_bounds = [(0, None), (0, None)]
# Solve linear programming problem
result = linprog(c, A_ub=A_ub, b_ub=b_ub, bounds=x_bounds, method='highs')
# Display results
print("=" * 60)
print("Production Planning Optimization Results")
print("=" * 60)
print(f"Optimization status: {result.message}")
print(f"Optimal solution:")
print(f" Product A production (x1): {result.x[0]:.2f} kg/day")
print(f" Product B production (x2): {result.x[1]:.2f} kg/day")
print(f"Maximum profit: {-result.fun:.2f} $/day") # Restore sign
print(f"Iterations: {result.nit}")
# Visualize feasible region
x1 = np.linspace(0, 60, 400)
# Constraint boundary lines
x2_constraint1 = 100 - 2*x1 # Raw material constraint
x2_constraint2 = (80 - x1) / 2 # Equipment time constraint
plt.figure(figsize=(10, 8))
# Draw constraints
plt.plot(x1, x2_constraint1, 'r-', linewidth=2, label='Raw material constraint: 2x₁ + x₂ = 100')
plt.plot(x1, x2_constraint2, 'b-', linewidth=2, label='Equipment time constraint: x₁ + 2x₂ = 80')
# Fill feasible region
x2_feasible = np.minimum(x2_constraint1, x2_constraint2)
x2_feasible = np.maximum(x2_feasible, 0)
plt.fill_between(x1, 0, x2_feasible, where=(x2_feasible >= 0),
alpha=0.3, color='green', label='Feasible region')
# Objective function contour lines (profit lines)
for profit in [800, 1200, 1600, 2000]:
x2_profit = (profit - 40*x1) / 30
plt.plot(x1, x2_profit, '--', alpha=0.4, color='gray')
# Plot optimal solution
plt.scatter([result.x[0]], [result.x[1]], color='red', s=300, zorder=5,
marker='*', edgecolors='black', linewidth=2,
label=f'Optimal solution ({result.x[0]:.1f}, {result.x[1]:.1f})')
plt.xlabel('Product A production x₁ [kg/day]', fontsize=12)
plt.ylabel('Product B production x₂ [kg/day]', fontsize=12)
plt.title('Geometric Interpretation of Production Planning Optimization', fontsize=14, fontweight='bold')
plt.xlim(0, 60)
plt.ylim(0, 60)
plt.legend(fontsize=10)
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()
# Constraint slack (slack variables)
slack = b_ub - A_ub @ result.x
print(f"\nConstraint slack:")
print(f" Raw material constraint slack: {slack[0]:.2f} kg")
print(f" Equipment time constraint slack: {slack[1]:.2f} hours")
Sample Output:
============================================================
Production Planning Optimization Results
============================================================
Optimization status: Optimization terminated successfully.
Optimal solution:
Product A production (x1): 40.00 kg/day
Product B production (x2): 20.00 kg/day
Maximum profit: 2200.00 $/day
Iterations: 2
Constraint slack:
Raw material constraint slack: 0.00 kg
Equipment time constraint slack: 0.00 hours
Explanation: In linear programming, the optimal solution exists at a vertex of the feasible region. In this example, the optimal solution is at the point where both constraints are simultaneously active.
Code Example 2: Chemical Blending Problem with PuLP Library
# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0
# - pandas>=2.0.0, <2.2.0
from pulp import LpProblem, LpVariable, LpMaximize, lpSum, LpStatus, value
import pandas as pd
"""
Chemical Raw Material Blending Optimization
---------------------
Blending 3 raw materials (A, B, C) to produce a product.
Minimize raw material cost while meeting product specifications.
Constraints:
- Sulfur content >= 2.0%
- Viscosity <= 100 cP
- Total blend amount = 1000 kg
"""
# Raw material data
raw_materials = {
'A': {'cost': 50, 'sulfur': 3.0, 'viscosity': 80}, # $/kg, %, cP
'B': {'cost': 40, 'sulfur': 2.0, 'viscosity': 120},
'C': {'cost': 60, 'sulfur': 1.5, 'viscosity': 60}
}
# Problem definition (cost minimization)
prob = LpProblem("Chemical_Blending", LpMaximize) # Will reverse sign later
# Decision variables (usage of each material kg)
x = {}
for material in raw_materials.keys():
x[material] = LpVariable(f"x_{material}", lowBound=0)
# Objective function (cost minimization → negative profit maximization)
prob += -lpSum([x[material] * raw_materials[material]['cost']
for material in raw_materials.keys()]), "Total_Cost"
# Constraint 1: Total blend amount = 1000 kg
prob += lpSum([x[material] for material in raw_materials.keys()]) == 1000, "Total_Amount"
# Constraint 2: Sulfur content >= 2.0%
prob += lpSum([x[material] * raw_materials[material]['sulfur']
for material in raw_materials.keys()]) >= 2.0 * 1000, "Sulfur_Content"
# Constraint 3: Viscosity <= 100 cP (weighted average)
prob += lpSum([x[material] * raw_materials[material]['viscosity']
for material in raw_materials.keys()]) <= 100 * 1000, "Viscosity_Limit"
# Solve
prob.solve()
# Display results
print("=" * 70)
print("Chemical Raw Material Blending Optimization Results")
print("=" * 70)
print(f"Status: {LpStatus[prob.status]}")
print(f"\nOptimal blend:")
results = []
for material in raw_materials.keys():
amount = value(x[material])
percentage = (amount / 1000) * 100
cost = amount * raw_materials[material]['cost']
print(f" Material {material}: {amount:.2f} kg ({percentage:.1f}%) Cost: ${cost:.2f}")
results.append({
'Material': material,
'Usage [kg]': amount,
'Blend ratio [%]': percentage,
'Cost [$]': cost
})
total_cost = -value(prob.objective)
print(f"\nTotal cost: ${total_cost:.2f}")
# Verify product specifications
sulfur_content = sum(value(x[material]) * raw_materials[material]['sulfur']
for material in raw_materials.keys()) / 1000
viscosity = sum(value(x[material]) * raw_materials[material]['viscosity']
for material in raw_materials.keys()) / 1000
print(f"\nProduct specifications:")
print(f" Sulfur content: {sulfur_content:.2f}% (Spec: >= 2.0%)")
print(f" Viscosity: {viscosity:.2f} cP (Spec: <= 100 cP)")
# Organize results in DataFrame
df_results = pd.DataFrame(results)
print(f"\nBlend details:")
print(df_results.to_string(index=False))
# Visualization
import matplotlib.pyplot as plt
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
# Pie chart of blend ratio
materials = [r['Material'] for r in results]
amounts = [r['Usage [kg]'] for r in results]
colors = ['#11998e', '#38ef7d', '#66d9ef']
ax1.pie(amounts, labels=materials, autopct='%1.1f%%', startangle=90, colors=colors)
ax1.set_title('Optimal Blend Ratio', fontsize=13, fontweight='bold')
# Bar chart of cost breakdown
costs = [r['Cost [$]'] for r in results]
ax2.bar(materials, costs, color=colors, edgecolor='black', linewidth=1.5)
ax2.set_xlabel('Material', fontsize=11)
ax2.set_ylabel('Cost [$]', fontsize=11)
ax2.set_title('Cost Breakdown by Material', fontsize=13, fontweight='bold')
ax2.grid(axis='y', alpha=0.3)
for i, (material, cost) in enumerate(zip(materials, costs)):
ax2.text(i, cost + 500, f'${cost:.0f}', ha='center', fontsize=10, fontweight='bold')
plt.tight_layout()
plt.show()
Sample Output:
======================================================================
Chemical Raw Material Blending Optimization Results
======================================================================
Status: Optimal
Optimal blend:
Material A: 500.00 kg (50.0%) Cost: $25000.00
Material B: 500.00 kg (50.0%) Cost: $20000.00
Material C: 0.00 kg (0.0%) Cost: $0.00
Total cost: $45000.00
Product specifications:
Sulfur content: 2.50% (Spec: >= 2.0%)
Viscosity: 100.00 cP (Spec: <= 100 cP)
Blend details:
Material Usage [kg] Blend ratio [%] Cost [$]
A 500.0 50.0 25000.0
B 500.0 50.0 20000.0
C 0.0 0.0 0.0
Explanation: The PuLP library allows for more intuitive description of linear programming problems. In this example, we find the blend ratio that minimizes raw material cost while meeting product specifications (sulfur content, viscosity).
Code Example 3: Visualization of the Simplex Method
# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0
# - numpy>=1.24.0, <2.0.0
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import linprog
"""
Visualization of the Simplex Method Search Process
--------------------------------
Visualizing how the optimization algorithm explores vertices of the feasible region
"""
# Linear programming problem definition
c = np.array([-3, -4]) # Objective function: maximize 3x1 + 4x2
A_ub = np.array([
[2, 1], # Constraint 1: 2x1 + x2 <= 6
[1, 2] # Constraint 2: x1 + 2x2 <= 8
])
b_ub = np.array([6, 8])
# Calculate vertices of feasible region
def calculate_vertices():
"""Calculate vertices of feasible region"""
vertices = [
(0, 0), # Origin
(3, 0), # On x1 axis (constraint 1 boundary)
(0, 4), # On x2 axis (constraint 2 boundary)
]
# Intersection of two constraints
# Simultaneous equations: 2x1 + x2 = 6 and x1 + 2x2 = 8
A_intersection = np.array([[2, 1], [1, 2]])
b_intersection = np.array([6, 8])
intersection = np.linalg.solve(A_intersection, b_intersection)
vertices.append(tuple(intersection))
return vertices
vertices = calculate_vertices()
# Calculate objective function value at each vertex
def objective_value(x1, x2):
return 3*x1 + 4*x2
vertex_values = [(v[0], v[1], objective_value(v[0], v[1])) for v in vertices]
vertex_values.sort(key=lambda x: x[2]) # Sort by objective function value
# Visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))
# Left plot: Feasible region and vertex exploration
x1 = np.linspace(0, 5, 400)
x2_constraint1 = 6 - 2*x1
x2_constraint2 = (8 - x1) / 2
x2_feasible = np.minimum(x2_constraint1, x2_constraint2)
x2_feasible = np.maximum(x2_feasible, 0)
ax1.fill_between(x1, 0, x2_feasible, where=(x2_feasible >= 0),
alpha=0.3, color='lightgreen', label='Feasible region')
ax1.plot(x1, x2_constraint1, 'r-', linewidth=2, label='2x₁ + x₂ = 6')
ax1.plot(x1, x2_constraint2, 'b-', linewidth=2, label='x₁ + 2x₂ = 8')
# Objective function contour lines
for val in [0, 4, 8, 12, 16]:
x2_obj = (val - 3*x1) / 4
ax1.plot(x1, x2_obj, '--', alpha=0.3, color='gray', linewidth=1)
# Plot vertices (search order)
colors = ['yellow', 'orange', 'red', 'darkred']
for i, (x1_v, x2_v, obj_val) in enumerate(vertex_values):
if i < len(vertex_values) - 1:
ax1.scatter([x1_v], [x2_v], s=200, color=colors[i],
edgecolors='black', linewidth=2, zorder=5,
label=f'Vertex{i+1}: f={obj_val:.1f}')
else:
ax1.scatter([x1_v], [x2_v], s=400, color='red', marker='*',
edgecolors='black', linewidth=3, zorder=6,
label=f'Optimal solution: f={obj_val:.1f}')
# Draw search path (conceptual path of simplex method)
path_x = [vertex_values[i][0] for i in range(len(vertex_values))]
path_y = [vertex_values[i][1] for i in range(len(vertex_values))]
ax1.plot(path_x, path_y, 'k--', linewidth=2, alpha=0.5, label='Search path')
for i in range(len(path_x) - 1):
ax1.annotate('', xy=(path_x[i+1], path_y[i+1]), xytext=(path_x[i], path_y[i]),
arrowprops=dict(arrowstyle='->', lw=2, color='black', alpha=0.6))
ax1.set_xlabel('x₁', fontsize=12)
ax1.set_ylabel('x₂', fontsize=12)
ax1.set_title('Simplex Method Search Process', fontsize=14, fontweight='bold')
ax1.set_xlim(-0.5, 5)
ax1.set_ylim(-0.5, 5)
ax1.legend(fontsize=9, loc='upper right')
ax1.grid(alpha=0.3)
# Right plot: Improvement of objective function value per iteration
iterations = list(range(1, len(vertex_values) + 1))
obj_values = [v[2] for v in vertex_values]
ax2.plot(iterations, obj_values, 'o-', linewidth=3, markersize=10,
color='#11998e', markerfacecolor='#38ef7d', markeredgecolor='black',
markeredgewidth=2)
ax2.set_xlabel('Iteration', fontsize=12)
ax2.set_ylabel('Objective Function Value', fontsize=12)
ax2.set_title('Improvement Process of Objective Function Value', fontsize=14, fontweight='bold')
ax2.grid(alpha=0.3)
for i, (iter_num, val) in enumerate(zip(iterations, obj_values)):
ax2.text(iter_num, val + 0.5, f'{val:.1f}', ha='center', fontsize=10, fontweight='bold')
plt.tight_layout()
plt.show()
print("=" * 60)
print("Simplex Method Search Process")
print("=" * 60)
for i, (x1_v, x2_v, obj_val) in enumerate(vertex_values):
print(f"Iteration {i+1}: Vertex ({x1_v:.2f}, {x2_v:.2f}) Objective function value: {obj_val:.2f}")
print("=" * 60)
print(f"Optimal solution: ({vertex_values[-1][0]:.2f}, {vertex_values[-1][1]:.2f})")
print(f"Maximum value: {vertex_values[-1][2]:.2f}")
Sample Output:
============================================================
Simplex Method Search Process
============================================================
Iteration 1: Vertex (0.00, 0.00) Objective function value: 0.00
Iteration 2: Vertex (3.00, 0.00) Objective function value: 9.00
Iteration 3: Vertex (0.00, 4.00) Objective function value: 16.00
Iteration 4: Vertex (1.33, 3.33) Objective function value: 17.33
============================================================
Optimal solution: (1.33, 3.33)
Maximum value: 17.33
Explanation: The simplex method efficiently explores vertices of the feasible region, monotonically improving the objective function value. At each iteration, it moves to an adjacent vertex and reaches the optimal solution.
2.2 Nonlinear Programming
Nonlinear Optimization Problems
Optimization problems where the objective function or constraints are nonlinear:
$$ \begin{aligned} \text{minimize} \quad & f(\mathbf{x}) \\ \text{subject to} \quad & g_i(\mathbf{x}) \leq 0, \quad i = 1, \ldots, m \\ & h_j(\mathbf{x}) = 0, \quad j = 1, \ldots, p \end{aligned} $$
Major Optimization Algorithms
BFGS, L-BFGS] C --> G[Penalty Method] C --> H[SQP] C --> I[Interior Point Method] style A fill:#11998e,stroke:#0d7a6f,color:#fff style B fill:#38ef7d,stroke:#2ecc71 style C fill:#38ef7d,stroke:#2ecc71
Code Example 4: Unconstrained Nonlinear Optimization (scipy.optimize.minimize)
# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0
# - numpy>=1.24.0, <2.0.0
import numpy as np
from scipy.optimize import minimize
import matplotlib.pyplot as plt
"""
Unconstrained Nonlinear Optimization Problem
-----------------------
Minimization of the Rosenbrock function
f(x, y) = (1-x)^2 + 100(y-x^2)^2
Minimum value: f(1, 1) = 0
"""
# Objective function definition
def rosenbrock(x):
"""Rosenbrock function"""
return (1 - x[0])**2 + 100 * (x[1] - x[0]**2)**2
# Gradient (analytically calculated)
def rosenbrock_grad(x):
"""Gradient of Rosenbrock function"""
dfdx = -2*(1 - x[0]) - 400*x[0]*(x[1] - x[0]**2)
dfdy = 200*(x[1] - x[0]**2)
return np.array([dfdx, dfdy])
# Hessian matrix (second derivative)
def rosenbrock_hess(x):
"""Hessian matrix of Rosenbrock function"""
d2fdx2 = 2 - 400*x[1] + 1200*x[0]**2
d2fdxdy = -400*x[0]
d2fdy2 = 200
return np.array([[d2fdx2, d2fdxdy],
[d2fdxdy, d2fdy2]])
# Initial value
x0 = np.array([-1.0, -1.0])
# Solve with multiple optimization methods
methods = ['Nelder-Mead', 'BFGS', 'Newton-CG', 'L-BFGS-B', 'TNC']
results = {}
print("=" * 80)
print("Unconstrained Nonlinear Optimization: Rosenbrock Function Minimization")
print("=" * 80)
print(f"Initial value: x0 = {x0}")
print(f"Objective function value f(x0) = {rosenbrock(x0):.4f}")
print("\n" + "=" * 80)
print(f"{'Method':<15} {'Optimal solution (x, y)':<25} {'Obj. value':<15} {'Iterations':<10}")
print("=" * 80)
for method in methods:
if method in ['Newton-CG']:
result = minimize(rosenbrock, x0, method=method, jac=rosenbrock_grad,
hess=rosenbrock_hess)
elif method in ['BFGS', 'L-BFGS-B']:
result = minimize(rosenbrock, x0, method=method, jac=rosenbrock_grad)
else:
result = minimize(rosenbrock, x0, method=method)
results[method] = result
print(f"{method:<15} ({result.x[0]:7.4f}, {result.x[1]:7.4f}) "
f"{result.fun:12.6e} {result.nit:8d}")
print("=" * 80)
# Visualization of Rosenbrock function
x = np.linspace(-2, 2, 400)
y = np.linspace(-1, 3, 400)
X, Y = np.meshgrid(x, y)
Z = (1 - X)**2 + 100 * (Y - X**2)**2
# Visualize in log scale (due to wide range of values)
Z_log = np.log10(Z + 1)
fig = plt.figure(figsize=(16, 6))
# Contour plot
ax1 = fig.add_subplot(131)
contour = ax1.contour(X, Y, Z_log, levels=20, cmap='viridis')
ax1.contourf(X, Y, Z_log, levels=20, cmap='viridis', alpha=0.4)
plt.colorbar(contour, ax=ax1, label='log₁₀(f(x,y) + 1)')
ax1.scatter([1], [1], color='red', s=300, marker='*',
edgecolors='black', linewidth=2, label='True optimal solution (1, 1)', zorder=5)
ax1.scatter([x0[0]], [x0[1]], color='yellow', s=150, marker='o',
edgecolors='black', linewidth=2, label='Initial value', zorder=5)
ax1.set_xlabel('x', fontsize=11)
ax1.set_ylabel('y', fontsize=11)
ax1.set_title('Rosenbrock Function Contour', fontsize=13, fontweight='bold')
ax1.legend(fontsize=9)
ax1.grid(alpha=0.3)
# 3D surface plot
ax2 = fig.add_subplot(132, projection='3d')
surf = ax2.plot_surface(X, Y, Z_log, cmap='viridis', alpha=0.7,
edgecolor='none', antialiased=True)
ax2.scatter([1], [1], [0], color='red', s=200, marker='*',
edgecolors='black', linewidth=2)
ax2.set_xlabel('x', fontsize=10)
ax2.set_ylabel('y', fontsize=10)
ax2.set_zlabel('log₁₀(f(x,y) + 1)', fontsize=10)
ax2.set_title('Rosenbrock Function 3D Surface', fontsize=13, fontweight='bold')
# Comparison of convergence for each method
ax3 = fig.add_subplot(133)
method_labels = list(results.keys())
nit_values = [results[m].nit for m in method_labels]
fun_values = [results[m].fun for m in method_labels]
x_pos = np.arange(len(method_labels))
bars = ax3.bar(x_pos, nit_values, color='#11998e', edgecolor='black', linewidth=1.5)
# Display objective function value above each bar
for i, (bar, fun_val) in enumerate(zip(bars, fun_values)):
height = bar.get_height()
ax3.text(bar.get_x() + bar.get_width()/2., height + 1,
f'{fun_val:.2e}', ha='center', va='bottom', fontsize=8, rotation=0)
ax3.set_xticks(x_pos)
ax3.set_xticklabels(method_labels, rotation=45, ha='right')
ax3.set_ylabel('Iterations', fontsize=11)
ax3.set_title('Comparison of Iterations by Method', fontsize=13, fontweight='bold')
ax3.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()
Sample Output:
================================================================================
Unconstrained Nonlinear Optimization: Rosenbrock Function Minimization
================================================================================
Initial value: x0 = [-1. -1.]
Objective function value f(x0) = 404.0000
================================================================================
Method Optimal solution (x, y) Obj. value Iterations
================================================================================
Nelder-Mead ( 1.0000, 1.0000) 7.888892e-10 79
BFGS ( 1.0000, 1.0000) 1.432465e-11 33
Newton-CG ( 1.0000, 1.0000) 6.689048e-18 23
L-BFGS-B ( 1.0000, 1.0000) 1.432465e-11 24
TNC ( 1.0000, 1.0000) 1.510571e-17 26
================================================================================
Explanation: scipy.optimize.minimize provides a unified interface for multiple optimization algorithms. Providing gradient information and Hessian matrix improves convergence speed.
Code Example 5: Implementation of Gradient Descent and Effect of Step Size
# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0
# - numpy>=1.24.0, <2.0.0
import numpy as np
import matplotlib.pyplot as plt
"""
Scratch Implementation of Gradient Descent
-------------------------
Visualizing the effect of step size (learning rate)
"""
# Objective function: f(x, y) = x^2 + 4*y^2
def objective(x):
return x[0]**2 + 4*x[1]**2
# Gradient
def gradient(x):
return np.array([2*x[0], 8*x[1]])
# Gradient descent implementation
def gradient_descent(x0, learning_rate, max_iter=100, tol=1e-6):
"""
Gradient Descent
Parameters:
-----------
x0 : array, initial value
learning_rate : float, step size (learning rate)
max_iter : int, maximum iterations
tol : float, convergence threshold
Returns:
--------
trajectory : list, optimization trajectory
"""
x = x0.copy()
trajectory = [x.copy()]
for i in range(max_iter):
grad = gradient(x)
# Gradient descent update rule
x_new = x - learning_rate * grad
trajectory.append(x_new.copy())
# Convergence check
if np.linalg.norm(x_new - x) < tol:
print(f" Converged: {i+1} iterations")
break
x = x_new
return trajectory
# Test with different learning rates
learning_rates = [0.05, 0.1, 0.3, 0.5]
x0 = np.array([2.0, 1.5])
trajectories = {}
print("=" * 70)
print("Gradient Descent: Effect of Learning Rate")
print("=" * 70)
print(f"Initial value: x0 = {x0}")
print(f"Objective function: f(x, y) = x² + 4y²")
print(f"Optimal solution: (0, 0), Minimum value: 0")
print("=" * 70)
for lr in learning_rates:
print(f"\nLearning rate α = {lr}:")
traj = gradient_descent(x0, lr, max_iter=50)
trajectories[lr] = traj
final_x = traj[-1]
final_f = objective(final_x)
print(f" Final solution: ({final_x[0]:.6f}, {final_x[1]:.6f})")
print(f" Objective function value: {final_f:.6e}")
print(f" Iterations: {len(traj)-1}")
# Visualization
fig, axes = plt.subplots(2, 2, figsize=(14, 12))
axes = axes.flatten()
# Prepare contours
x = np.linspace(-2.5, 2.5, 200)
y = np.linspace(-2, 2, 200)
X, Y = np.meshgrid(x, y)
Z = X**2 + 4*Y**2
for idx, (lr, traj) in enumerate(trajectories.items()):
ax = axes[idx]
# Contour
contour = ax.contour(X, Y, Z, levels=20, cmap='viridis', alpha=0.5)
ax.contourf(X, Y, Z, levels=20, cmap='viridis', alpha=0.2)
# Optimization trajectory
traj_array = np.array(traj)
ax.plot(traj_array[:, 0], traj_array[:, 1], 'ro-', linewidth=2,
markersize=6, label='Trajectory', markerfacecolor='red',
markeredgecolor='black', markeredgewidth=1)
# Initial value and optimal solution
ax.scatter([x0[0]], [x0[1]], color='yellow', s=200, marker='o',
edgecolors='black', linewidth=2, label='Initial value', zorder=5)
ax.scatter([0], [0], color='lime', s=300, marker='*',
edgecolors='black', linewidth=2, label='Optimal solution', zorder=5)
# Show direction with arrows
for i in range(min(5, len(traj)-1)):
ax.annotate('', xy=(traj_array[i+1]), xytext=(traj_array[i]),
arrowprops=dict(arrowstyle='->', lw=2, color='red', alpha=0.6))
ax.set_xlabel('x', fontsize=11)
ax.set_ylabel('y', fontsize=11)
ax.set_title(f'Learning rate α = {lr} (Iterations: {len(traj)-1})',
fontsize=12, fontweight='bold')
ax.legend(fontsize=9)
ax.grid(alpha=0.3)
ax.set_xlim(-2.5, 2.5)
ax.set_ylim(-2, 2)
plt.tight_layout()
plt.show()
# Convergence curves of objective function value
plt.figure(figsize=(12, 6))
for lr, traj in trajectories.items():
traj_array = np.array(traj)
obj_values = [objective(x) for x in traj_array]
iterations = range(len(obj_values))
plt.semilogy(iterations, obj_values, 'o-', linewidth=2, markersize=6,
label=f'α = {lr}')
plt.xlabel('Iteration', fontsize=12)
plt.ylabel('Objective Function Value (log scale)', fontsize=12)
plt.title('Convergence Curves for Each Learning Rate', fontsize=14, fontweight='bold')
plt.legend(fontsize=11)
plt.grid(alpha=0.3, which='both')
plt.tight_layout()
plt.show()
Sample Output:
======================================================================
Gradient Descent: Effect of Learning Rate
======================================================================
Initial value: x0 = [2. 1.5]
Objective function: f(x, y) = x² + 4y²
Optimal solution: (0, 0), Minimum value: 0
======================================================================
Learning rate α = 0.05:
Converged: 43 iterations
Final solution: (0.000081, 0.000015)
Objective function value: 6.570282e-09
Iterations: 44
Learning rate α = 0.1:
Converged: 31 iterations
Final solution: (0.000061, 0.000005)
Objective function value: 3.712341e-09
Iterations: 32
Learning rate α = 0.3:
Converged: 16 iterations
Final solution: (0.000002, 0.000000)
Objective function value: 4.400442e-12
Iterations: 17
Learning rate α = 0.5:
Final solution: (-1.699298, -0.000000)
Objective function value: 2.887613e+00
Iterations: 50
Explanation: When the learning rate is small, convergence is slow but stable. When the learning rate is too large, there is a possibility of divergence. Choosing an appropriate learning rate is important.
Code Example 6: Implementation of Newton's Method
# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0
# - numpy>=1.24.0, <2.0.0
import numpy as np
import matplotlib.pyplot as plt
"""
Implementation of Newton's Method
-----------------
Fast optimization using second-order derivative information (Hessian matrix)
"""
# Objective function
def objective(x):
return x[0]**2 + 4*x[1]**2
# Gradient
def gradient(x):
return np.array([2*x[0], 8*x[1]])
# Hessian matrix
def hessian(x):
return np.array([[2, 0],
[0, 8]])
# Newton's method implementation
def newton_method(x0, max_iter=20, tol=1e-8):
"""
Newton's Method
Update rule: x_{k+1} = x_k - H^{-1}(x_k) * ∇f(x_k)
Where H is the Hessian matrix
"""
x = x0.copy()
trajectory = [x.copy()]
for i in range(max_iter):
grad = gradient(x)
hess = hessian(x)
# Calculate inverse of Hessian matrix
hess_inv = np.linalg.inv(hess)
# Newton's method update rule
x_new = x - hess_inv @ grad
trajectory.append(x_new.copy())
print(f" Iteration {i+1}: x = ({x_new[0]:.6f}, {x_new[1]:.6f}), "
f"f(x) = {objective(x_new):.6e}")
# Convergence check
if np.linalg.norm(x_new - x) < tol:
print(f" → Converged")
break
x = x_new
return trajectory
# Gradient descent (for comparison)
def gradient_descent(x0, learning_rate=0.2, max_iter=20, tol=1e-8):
x = x0.copy()
trajectory = [x.copy()]
for i in range(max_iter):
grad = gradient(x)
x_new = x - learning_rate * grad
trajectory.append(x_new.copy())
if np.linalg.norm(x_new - x) < tol:
break
x = x_new
return trajectory
# Initial value
x0 = np.array([2.0, 1.5])
print("=" * 80)
print("Comparison of Newton's Method vs Gradient Descent")
print("=" * 80)
print(f"Initial value: x0 = {x0}")
print(f"Objective function: f(x, y) = x² + 4y²\n")
print("【Newton's Method】")
traj_newton = newton_method(x0, max_iter=10)
print(f"\n【Gradient Descent (learning rate=0.2)】")
traj_gd = gradient_descent(x0, learning_rate=0.2, max_iter=10)
for i, x in enumerate(traj_gd[1:], 1):
print(f" Iteration {i}: x = ({x[0]:.6f}, {x[1]:.6f}), f(x) = {objective(x):.6e}")
# Visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))
# Prepare contours
x = np.linspace(-2.5, 2.5, 200)
y = np.linspace(-2, 2, 200)
X, Y = np.meshgrid(x, y)
Z = X**2 + 4*Y**2
# Left plot: Trajectory comparison
contour = ax1.contour(X, Y, Z, levels=25, cmap='viridis', alpha=0.4)
ax1.contourf(X, Y, Z, levels=25, cmap='viridis', alpha=0.2)
# Newton's method trajectory
traj_newton_array = np.array(traj_newton)
ax1.plot(traj_newton_array[:, 0], traj_newton_array[:, 1], 'ro-',
linewidth=3, markersize=8, label="Newton's Method",
markerfacecolor='red', markeredgecolor='black', markeredgewidth=1.5)
# Gradient descent trajectory
traj_gd_array = np.array(traj_gd)
ax1.plot(traj_gd_array[:, 0], traj_gd_array[:, 1], 'bs-',
linewidth=2, markersize=7, label='Gradient Descent',
markerfacecolor='blue', markeredgecolor='black', markeredgewidth=1)
# Initial value and optimal solution
ax1.scatter([x0[0]], [x0[1]], color='yellow', s=250, marker='o',
edgecolors='black', linewidth=2, label='Initial value', zorder=5)
ax1.scatter([0], [0], color='lime', s=400, marker='*',
edgecolors='black', linewidth=3, label='Optimal solution', zorder=5)
ax1.set_xlabel('x', fontsize=12)
ax1.set_ylabel('y', fontsize=12)
ax1.set_title("Newton's Method vs Gradient Descent Trajectory", fontsize=14, fontweight='bold')
ax1.legend(fontsize=11, loc='upper right')
ax1.grid(alpha=0.3)
# Right plot: Convergence speed comparison
obj_newton = [objective(x) for x in traj_newton_array]
obj_gd = [objective(x) for x in traj_gd_array]
ax2.semilogy(range(len(obj_newton)), obj_newton, 'ro-', linewidth=3,
markersize=8, label="Newton's Method", markeredgecolor='black',
markeredgewidth=1.5)
ax2.semilogy(range(len(obj_gd)), obj_gd, 'bs-', linewidth=2,
markersize=7, label='Gradient Descent', markeredgecolor='black',
markeredgewidth=1)
ax2.set_xlabel('Iteration', fontsize=12)
ax2.set_ylabel('Objective Function Value (log scale)', fontsize=12)
ax2.set_title('Convergence Speed Comparison', fontsize=14, fontweight='bold')
ax2.legend(fontsize=11)
ax2.grid(alpha=0.3, which='both')
plt.tight_layout()
plt.show()
print(f"\nConvergence speed:")
print(f" Newton's Method: {len(traj_newton)-1} iterations")
print(f" Gradient Descent: {len(traj_gd)-1} iterations")
Sample Output:
================================================================================
Comparison of Newton's Method vs Gradient Descent
================================================================================
Initial value: x0 = [2. 1.5]
Objective function: f(x, y) = x² + 4y²
【Newton's Method】
Iteration 1: x = (0.000000, 0.000000), f(x) = 0.000000e+00
→ Converged
【Gradient Descent (learning rate=0.2)】
Iteration 1: x = (1.600000, 0.900000), f(x) = 5.800000e+00
Iteration 2: x = (1.280000, 0.540000), f(x) = 2.803200e+00
Iteration 3: x = (1.024000, 0.324000), f(x) = 1.468416e+00
Iteration 4: x = (0.819200, 0.194400), f(x) = 8.219566e-01
Iteration 5: x = (0.655360, 0.116640), f(x) = 4.838877e-01
Iteration 6: x = (0.524288, 0.069984), f(x) = 2.947690e-01
Iteration 7: x = (0.419430, 0.041990), f(x) = 1.823116e-01
Iteration 8: x = (0.335544, 0.025194), f(x) = 1.132009e-01
Iteration 9: x = (0.268435, 0.015117), f(x) = 7.032375e-02
Iteration 10: x = (0.214748, 0.009070), f(x) = 4.368318e-02
Convergence speed:
Newton's Method: 1 iteration
Gradient Descent: 10 iterations
Explanation: Newton's method uses second-order derivative information (Hessian matrix), resulting in quadratic convergence, which is very fast. However, computing the Hessian matrix and its inverse is costly.
Code Example 7: Comparison of BFGS Quasi-Newton Method
# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0
# - numpy>=1.24.0, <2.0.0
import numpy as np
from scipy.optimize import minimize
import matplotlib.pyplot as plt
import time
"""
BFGS Quasi-Newton Method
------------------
Efficient method that approximately updates the Hessian matrix
"""
# More complex objective function (Rosenbrock function)
def rosenbrock(x):
return (1 - x[0])**2 + 100 * (x[1] - x[0]**2)**2
def rosenbrock_grad(x):
dfdx = -2*(1 - x[0]) - 400*x[0]*(x[1] - x[0]**2)
dfdy = 200*(x[1] - x[0]**2)
return np.array([dfdx, dfdy])
def rosenbrock_hess(x):
d2fdx2 = 2 - 400*x[1] + 1200*x[0]**2
d2fdxdy = -400*x[0]
d2fdy2 = 200
return np.array([[d2fdx2, d2fdxdy],
[d2fdxdy, d2fdy2]])
# Initial value
x0 = np.array([-1.2, 1.0])
# Comparison of optimization methods
methods_config = {
'BFGS': {'jac': rosenbrock_grad},
'L-BFGS-B': {'jac': rosenbrock_grad},
'Newton-CG': {'jac': rosenbrock_grad, 'hess': rosenbrock_hess},
'CG': {'jac': rosenbrock_grad},
}
print("=" * 90)
print("Comparison of Quasi-Newton and Other Methods: Rosenbrock Function")
print("=" * 90)
print(f"Initial value: x0 = {x0}")
print(f"Objective function value f(x0) = {rosenbrock(x0):.4f}")
print("\n" + "=" * 90)
print(f"{'Method':<15} {'Optimal solution':<20} {'Obj. value':<15} {'Iter.':<8} {'Time[ms]':<10}")
print("=" * 90)
results = {}
for method, kwargs in methods_config.items():
# Measure execution time
start_time = time.time()
result = minimize(rosenbrock, x0, method=method, **kwargs)
elapsed_time = (time.time() - start_time) * 1000 # milliseconds
results[method] = {
'result': result,
'time': elapsed_time
}
print(f"{method:<15} ({result.x[0]:6.4f}, {result.x[1]:6.4f}) "
f"{result.fun:12.6e} {result.nit:6d} {elapsed_time:8.2f}")
print("=" * 90)
# Record trajectory (using callback function)
def make_callback(trajectory):
def callback(xk):
trajectory.append(xk.copy())
return callback
# Re-run with trajectory recording
trajectories = {}
for method in ['BFGS', 'Newton-CG']:
traj = [x0.copy()]
callback = make_callback(traj)
if method == 'Newton-CG':
minimize(rosenbrock, x0, method=method, jac=rosenbrock_grad,
hess=rosenbrock_hess, callback=callback)
else:
minimize(rosenbrock, x0, method=method, jac=rosenbrock_grad,
callback=callback)
trajectories[method] = np.array(traj)
# Visualization
fig, axes = plt.subplots(1, 3, figsize=(18, 5))
# Prepare contours
x = np.linspace(-2, 2, 300)
y = np.linspace(-1, 3, 300)
X, Y = np.meshgrid(x, y)
Z = (1 - X)**2 + 100 * (Y - X**2)**2
Z_log = np.log10(Z + 1)
# Left plot: Trajectory comparison
ax1 = axes[0]
contour = ax1.contour(X, Y, Z_log, levels=25, cmap='viridis', alpha=0.4)
ax1.contourf(X, Y, Z_log, levels=25, cmap='viridis', alpha=0.2)
colors = {'BFGS': 'red', 'Newton-CG': 'blue'}
for method, traj in trajectories.items():
ax1.plot(traj[:, 0], traj[:, 1], 'o-', color=colors[method],
linewidth=2, markersize=6, label=method,
markeredgecolor='black', markeredgewidth=1)
ax1.scatter([x0[0]], [x0[1]], color='yellow', s=200, marker='o',
edgecolors='black', linewidth=2, label='Initial value', zorder=5)
ax1.scatter([1], [1], color='lime', s=300, marker='*',
edgecolors='black', linewidth=2, label='Optimal solution', zorder=5)
ax1.set_xlabel('x', fontsize=11)
ax1.set_ylabel('y', fontsize=11)
ax1.set_title('Optimization Trajectory Comparison', fontsize=13, fontweight='bold')
ax1.legend(fontsize=10)
ax1.grid(alpha=0.3)
# Center plot: Iteration comparison
ax2 = axes[1]
methods = list(results.keys())
nit_values = [results[m]['result'].nit for m in methods]
bars = ax2.bar(methods, nit_values, color='#11998e', edgecolor='black', linewidth=1.5)
ax2.set_ylabel('Iterations', fontsize=11)
ax2.set_title('Iterations by Method', fontsize=13, fontweight='bold')
ax2.grid(axis='y', alpha=0.3)
for bar, nit in zip(bars, nit_values):
height = bar.get_height()
ax2.text(bar.get_x() + bar.get_width()/2., height + 0.5,
f'{int(nit)}', ha='center', va='bottom', fontsize=10, fontweight='bold')
# Right plot: Execution time comparison
ax3 = axes[2]
time_values = [results[m]['time'] for m in methods]
bars = ax3.bar(methods, time_values, color='#38ef7d', edgecolor='black', linewidth=1.5)
ax3.set_ylabel('Execution time [ms]', fontsize=11)
ax3.set_title('Execution Time by Method', fontsize=13, fontweight='bold')
ax3.grid(axis='y', alpha=0.3)
for bar, t in zip(bars, time_values):
height = bar.get_height()
ax3.text(bar.get_x() + bar.get_width()/2., height + 0.05,
f'{t:.2f}', ha='center', va='bottom', fontsize=9, fontweight='bold')
plt.tight_layout()
plt.show()
# Convergence curves
plt.figure(figsize=(12, 6))
for method, traj in trajectories.items():
obj_values = [rosenbrock(x) for x in traj]
plt.semilogy(range(len(obj_values)), obj_values, 'o-',
linewidth=2, markersize=6, label=method)
plt.xlabel('Iteration', fontsize=12)
plt.ylabel('Objective Function Value (log scale)', fontsize=12)
plt.title('Convergence Curve Comparison', fontsize=14, fontweight='bold')
plt.legend(fontsize=11)
plt.grid(alpha=0.3, which='both')
plt.tight_layout()
plt.show()
Sample Output:
======================================================================================
Comparison of Quasi-Newton and Other Methods: Rosenbrock Function
======================================================================================
Initial value: x0 = [-1.2 1. ]
Objective function value f(x0) = 24.2000
======================================================================================
Method Optimal solution Obj. value Iter. Time[ms]
======================================================================================
BFGS (1.0000, 1.0000) 1.432465e-11 33 2.85
L-BFGS-B (1.0000, 1.0000) 1.432465e-11 24 2.12
Newton-CG (1.0000, 1.0000) 6.689048e-18 23 3.45
CG (1.0000, 1.0000) 1.005394e-10 21 1.98
======================================================================================
Explanation: The BFGS method achieves convergence speed comparable to Newton's method with less computational cost by sequentially approximating the Hessian matrix. L-BFGS-B is suitable for large-scale problems.
Code Example 8: Parameter Fitting of Process Model Using Nonlinear Least Squares
# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0
# - numpy>=1.24.0, <2.0.0
import numpy as np
from scipy.optimize import least_squares
import matplotlib.pyplot as plt
"""
Nonlinear Least Squares Method
----------------
Parameter estimation of chemical reaction rate constant
Reaction rate model (Arrhenius equation):
k(T) = A * exp(-Ea / (R*T))
Parameters:
A : Frequency factor [1/min]
Ea : Activation energy [J/mol]
R : Gas constant = 8.314 J/(mol·K)
"""
# Experimental data (temperature and reaction rate constant)
T_data = np.array([300, 320, 340, 360, 380, 400, 420]) # Temperature [K]
k_data = np.array([0.15, 0.42, 1.05, 2.45, 5.20, 10.5, 20.2]) # Reaction rate constant [1/min]
# Add noise (simulate experimental error)
np.random.seed(42)
k_data_noisy = k_data * (1 + 0.05 * np.random.randn(len(k_data)))
# Gas constant
R = 8.314 # J/(mol·K)
# Reaction rate model (Arrhenius equation)
def arrhenius_model(params, T):
"""
Arrhenius equation: k(T) = A * exp(-Ea / (R*T))
Parameters:
-----------
params : [A, Ea]
A : Frequency factor [1/min]
Ea : Activation energy [J/mol]
T : Temperature [K]
"""
A, Ea = params
return A * np.exp(-Ea / (R * T))
# Residual function (objective function of least squares)
def residuals(params, T, k_obs):
"""
Residual: Observed value - Model prediction
"""
k_pred = arrhenius_model(params, T)
return k_obs - k_pred
# Initial guess
params_init = [1.0, 50000] # A=1.0, Ea=50000 J/mol
# Optimization by nonlinear least squares
result = least_squares(residuals, params_init, args=(T_data, k_data_noisy))
# Optimal parameters
A_opt, Ea_opt = result.x
print("=" * 70)
print("Parameter Estimation by Nonlinear Least Squares")
print("=" * 70)
print(f"Model: k(T) = A * exp(-Ea / (R*T))")
print(f"\nInitial guess:")
print(f" A = {params_init[0]:.2f} [1/min]")
print(f" Ea = {params_init[1]:.2f} [J/mol]")
print(f"\nOptimal parameters:")
print(f" A = {A_opt:.4f} [1/min]")
print(f" Ea = {Ea_opt:.2f} [J/mol]")
print(f"\nOptimization details:")
print(f" Residual norm: {np.linalg.norm(result.fun):.6f}")
print(f" Iterations: {result.nfev}")
print(f" Optimization status: {result.message}")
# Calculate coefficient of determination (R²)
k_pred = arrhenius_model([A_opt, Ea_opt], T_data)
ss_res = np.sum((k_data_noisy - k_pred)**2)
ss_tot = np.sum((k_data_noisy - np.mean(k_data_noisy))**2)
r_squared = 1 - (ss_res / ss_tot)
print(f" Coefficient of determination R²: {r_squared:.4f}")
# Calculate predictions
T_fine = np.linspace(280, 440, 200)
k_pred_fine = arrhenius_model([A_opt, Ea_opt], T_fine)
k_init_fine = arrhenius_model(params_init, T_fine)
# Visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))
# Left plot: Experimental data and fitting results
ax1.scatter(T_data, k_data_noisy, s=150, color='red', marker='o',
edgecolors='black', linewidth=2, label='Experimental data (with noise)', zorder=5)
ax1.plot(T_fine, k_init_fine, '--', linewidth=2, color='gray',
label='Initial guess model', alpha=0.7)
ax1.plot(T_fine, k_pred_fine, '-', linewidth=3, color='#11998e',
label='Optimized model')
ax1.set_xlabel('Temperature T [K]', fontsize=12)
ax1.set_ylabel('Reaction rate constant k [1/min]', fontsize=12)
ax1.set_title('Arrhenius Plot and Model Fitting', fontsize=14, fontweight='bold')
ax1.legend(fontsize=11)
ax1.grid(alpha=0.3)
# Right plot: Arrhenius plot (ln(k) vs 1/T)
T_inv = 1000 / T_data # 1/T [1/K] × 1000 for better scale
T_inv_fine = 1000 / T_fine
ln_k_data = np.log(k_data_noisy)
ln_k_pred = np.log(arrhenius_model([A_opt, Ea_opt], T_data))
ln_k_pred_fine = np.log(k_pred_fine)
ax2.scatter(T_inv, ln_k_data, s=150, color='red', marker='o',
edgecolors='black', linewidth=2, label='Experimental data', zorder=5)
ax2.plot(T_inv_fine, ln_k_pred_fine, '-', linewidth=3, color='#11998e',
label='Optimized model')
# Check linearity
slope = -Ea_opt / R / 1000
intercept = np.log(A_opt)
ax2.text(0.05, 0.95, f'ln(k) = ln(A) - Ea/(R·T)\n'
f'Slope = -Ea/R = {slope:.2f}\n'
f'Intercept = ln(A) = {intercept:.2f}\n'
f'R² = {r_squared:.4f}',
transform=ax2.transAxes, fontsize=11,
verticalalignment='top', bbox=dict(boxstyle='round',
facecolor='wheat', alpha=0.5))
ax2.set_xlabel('1000/T [K⁻¹]', fontsize=12)
ax2.set_ylabel('ln(k)', fontsize=12)
ax2.set_title('Arrhenius Plot (Linearized)', fontsize=14, fontweight='bold')
ax2.legend(fontsize=11)
ax2.grid(alpha=0.3)
plt.tight_layout()
plt.show()
# Residual plot
residuals_opt = k_data_noisy - k_pred
plt.figure(figsize=(10, 6))
plt.scatter(T_data, residuals_opt, s=150, color='blue', marker='o',
edgecolors='black', linewidth=2)
plt.axhline(y=0, color='red', linestyle='--', linewidth=2)
plt.xlabel('Temperature T [K]', fontsize=12)
plt.ylabel('Residual (Observed - Predicted)', fontsize=12)
plt.title('Residual Plot', fontsize=14, fontweight='bold')
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()
Sample Output:
======================================================================
Parameter Estimation by Nonlinear Least Squares
======================================================================
Model: k(T) = A * exp(-Ea / (R*T))
Initial guess:
A = 1.00 [1/min]
Ea = 50000.00 [J/mol]
Optimal parameters:
A = 1.2045e+09 [1/min]
Ea = 75234.56 [J/mol]
Optimization details:
Residual norm: 0.582349
Iterations: 15
Optimization status: `gtol` termination condition is satisfied.
Coefficient of determination R²: 0.9987
Explanation: The nonlinear least squares method is a powerful technique for estimating model parameters from experimental data. In chemical processes, it is widely used for fitting parameters such as reaction rate constants, equilibrium constants, and physical property values.
Code Example 9: Performance Comparison of Optimization Algorithms (Convergence Speed, Accuracy, Computational Cost)
# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0
# - numpy>=1.24.0, <2.0.0
# - pandas>=2.0.0, <2.2.0
import numpy as np
from scipy.optimize import minimize
import matplotlib.pyplot as plt
import time
import pandas as pd
"""
Comprehensive Comparison of Optimization Algorithms
-----------------------------
Performance evaluation on multiple benchmark functions
"""
# Benchmark function definitions
test_functions = {
'Sphere': {
'f': lambda x: np.sum(x**2),
'grad': lambda x: 2*x,
'x0': np.array([5.0, 5.0]),
'x_opt': np.array([0.0, 0.0]),
'f_opt': 0.0
},
'Rosenbrock': {
'f': lambda x: (1-x[0])**2 + 100*(x[1]-x[0]**2)**2,
'grad': lambda x: np.array([
-2*(1-x[0]) - 400*x[0]*(x[1]-x[0]**2),
200*(x[1]-x[0]**2)
]),
'x0': np.array([-1.2, 1.0]),
'x_opt': np.array([1.0, 1.0]),
'f_opt': 0.0
},
'Beale': {
'f': lambda x: (1.5 - x[0] + x[0]*x[1])**2 + \
(2.25 - x[0] + x[0]*x[1]**2)**2 + \
(2.625 - x[0] + x[0]*x[1]**3)**2,
'grad': None, # Use numerical differentiation
'x0': np.array([1.0, 1.0]),
'x_opt': np.array([3.0, 0.5]),
'f_opt': 0.0
}
}
# List of optimization methods
methods = ['Nelder-Mead', 'Powell', 'CG', 'BFGS', 'L-BFGS-B']
# Store results
results_data = []
print("=" * 100)
print("Performance Comparison of Optimization Algorithms")
print("=" * 100)
for func_name, func_info in test_functions.items():
print(f"\n【{func_name} Function】")
print(f"Initial value: {func_info['x0']}")
print(f"Optimal solution: {func_info['x_opt']}, Minimum value: {func_info['f_opt']}")
print("-" * 100)
print(f"{'Method':<15} {'Optimal solution':<22} {'Obj. value':<15} {'Iter.':<8} {'Time[ms]':<10} {'Accuracy':<12}")
print("-" * 100)
for method in methods:
# Execute optimization
start_time = time.time()
if func_info['grad'] is not None and method in ['CG', 'BFGS', 'L-BFGS-B']:
result = minimize(func_info['f'], func_info['x0'], method=method,
jac=func_info['grad'])
else:
result = minimize(func_info['f'], func_info['x0'], method=method)
elapsed_time = (time.time() - start_time) * 1000
# Evaluate accuracy (distance from optimal solution)
accuracy = np.linalg.norm(result.x - func_info['x_opt'])
print(f"{method:<15} ({result.x[0]:7.4f}, {result.x[1]:7.4f}) "
f"{result.fun:12.6e} {result.nit:6d} {elapsed_time:8.2f} {accuracy:10.6f}")
# Save data
results_data.append({
'Function': func_name,
'Method': method,
'Obj. value': result.fun,
'Iterations': result.nit,
'Time[ms]': elapsed_time,
'Accuracy': accuracy,
'Success': result.success
})
print("=" * 100)
# Convert to DataFrame
df_results = pd.DataFrame(results_data)
# Visualization 1: Iteration comparison by function
fig, axes = plt.subplots(1, 3, figsize=(18, 5))
for idx, func_name in enumerate(test_functions.keys()):
ax = axes[idx]
df_func = df_results[df_results['Function'] == func_name]
bars = ax.bar(df_func['Method'], df_func['Iterations'],
color='#11998e', edgecolor='black', linewidth=1.5)
ax.set_ylabel('Iterations', fontsize=11)
ax.set_title(f'{func_name} Function', fontsize=13, fontweight='bold')
ax.tick_params(axis='x', rotation=45)
ax.grid(axis='y', alpha=0.3)
# Display values above bars
for bar, nit in zip(bars, df_func['Iterations']):
height = bar.get_height()
ax.text(bar.get_x() + bar.get_width()/2., height + 1,
f'{int(nit)}', ha='center', va='bottom', fontsize=9)
plt.tight_layout()
plt.show()
# Visualization 2: Execution time comparison (heatmap)
pivot_time = df_results.pivot(index='Method', columns='Function', values='Time[ms]')
fig, ax = plt.subplots(figsize=(10, 6))
im = ax.imshow(pivot_time.values, cmap='YlGnBu', aspect='auto')
ax.set_xticks(np.arange(len(pivot_time.columns)))
ax.set_yticks(np.arange(len(pivot_time.index)))
ax.set_xticklabels(pivot_time.columns)
ax.set_yticklabels(pivot_time.index)
# Display values
for i in range(len(pivot_time.index)):
for j in range(len(pivot_time.columns)):
text = ax.text(j, i, f'{pivot_time.values[i, j]:.2f}',
ha="center", va="center", color="black", fontsize=11, fontweight='bold')
ax.set_title('Execution Time Comparison [ms]', fontsize=14, fontweight='bold')
fig.colorbar(im, ax=ax, label='Time [ms]')
plt.tight_layout()
plt.show()
# Visualization 3: Accuracy comparison
pivot_accuracy = df_results.pivot(index='Method', columns='Function', values='Accuracy')
fig, ax = plt.subplots(figsize=(10, 6))
im = ax.imshow(np.log10(pivot_accuracy.values + 1e-16), cmap='RdYlGn_r', aspect='auto')
ax.set_xticks(np.arange(len(pivot_accuracy.columns)))
ax.set_yticks(np.arange(len(pivot_accuracy.index)))
ax.set_xticklabels(pivot_accuracy.columns)
ax.set_yticklabels(pivot_accuracy.index)
# Display values
for i in range(len(pivot_accuracy.index)):
for j in range(len(pivot_accuracy.columns)):
text = ax.text(j, i, f'{pivot_accuracy.values[i, j]:.2e}',
ha="center", va="center", color="black", fontsize=9, fontweight='bold')
ax.set_title('Accuracy Comparison (Distance from Optimal Solution)', fontsize=14, fontweight='bold')
fig.colorbar(im, ax=ax, label='log₁₀(Distance)')
plt.tight_layout()
plt.show()
# Overall evaluation score (weighted average of normalized metrics)
print("\n【Overall Evaluation】")
print("-" * 70)
for method in methods:
df_method = df_results[df_results['Method'] == method]
# Normalize each metric (smaller is better)
avg_nit = df_method['Iterations'].mean()
avg_time = df_method['Time[ms]'].mean()
avg_accuracy = df_method['Accuracy'].mean()
# Overall score (lower is better, weighted by each metric)
score = (avg_nit / 100) + (avg_time / 10) + (avg_accuracy * 1000)
print(f"{method:<15}: Avg. iter.={avg_nit:5.1f}, Avg. time={avg_time:6.2f}ms, "
f"Avg. accuracy={avg_accuracy:.2e}, Score={score:.2f}")
print("-" * 70)
Sample Output:
====================================================================================================
Performance Comparison of Optimization Algorithms
====================================================================================================
【Sphere Function】
Initial value: [5. 5.]
Optimal solution: [0. 0.], Minimum value: 0.0
----------------------------------------------------------------------------------------------------
Method Optimal solution Obj. value Iter. Time[ms] Accuracy
----------------------------------------------------------------------------------------------------
Nelder-Mead ( 0.0000, 0.0000) 1.652680e-10 62 1.45 0.000000
Powell ( 0.0000, 0.0000) 2.441672e-15 23 0.98 0.000000
CG ( 0.0000, 0.0000) 3.810729e-16 12 0.75 0.000000
BFGS ( 0.0000, 0.0000) 2.441672e-15 11 0.82 0.000000
L-BFGS-B ( 0.0000, 0.0000) 2.441672e-15 9 0.68 0.000000
【Rosenbrock Function】
Initial value: [-1.2 1. ]
Optimal solution: [1. 1.], Minimum value: 0.0
----------------------------------------------------------------------------------------------------
Nelder-Mead ( 1.0000, 1.0000) 7.889e-10 79 1.85 0.000028
Powell ( 1.0000, 1.0000) 1.844e-11 52 1.42 0.000004
CG ( 1.0000, 1.0000) 1.005e-10 21 1.12 0.000010
BFGS ( 1.0000, 1.0000) 1.432e-11 33 1.35 0.000004
L-BFGS-B ( 1.0000, 1.0000) 1.432e-11 24 1.08 0.000004
【Overall Evaluation】
----------------------------------------------------------------------
Nelder-Mead : Avg. iter.= 70.5, Avg. time= 1.65ms, Avg. accuracy=9.33e-06, Score=1.07
Powell : Avg. iter.= 37.5, Avg. time= 1.20ms, Avg. accuracy=1.33e-06, Score=0.63
CG : Avg. iter.= 16.5, Avg. time= 0.94ms, Avg. accuracy=3.33e-06, Score=0.35
BFGS : Avg. iter.= 22.0, Avg. time= 1.09ms, Avg. accuracy=1.33e-06, Score=0.44
L-BFGS-B : Avg. iter.= 16.5, Avg. time= 0.88ms, Avg. accuracy=1.33e-06, Score=0.34
----------------------------------------------------------------------
Explanation: Algorithm selection depends on the nature of the problem (smoothness, gradient computability), computational resources, and desired accuracy. Generally, gradient-based methods (BFGS, L-BFGS-B) are efficient, but gradient-free methods (Nelder-Mead) excel in robustness.
2.3 Chapter Summary
What We Learned
- Linear Programming
- Solving linear programming problems with scipy.optimize.linprog and PuLP
- Search process and geometric interpretation of the simplex method
- Applications to production planning and blending problems
- Nonlinear Programming
- Principles of gradient descent, Newton's method, and quasi-Newton methods (BFGS)
- Roles of learning rate and Hessian matrix
- Trade-off between convergence speed and computational cost
- Practical Applications
- Parameter fitting using nonlinear least squares
- Performance comparison of multiple algorithms and selection criteria
Algorithm Selection Guidelines
| Problem Characteristics | Recommended Method | Reason |
|---|---|---|
| Linear problem | Simplex method (linprog, PuLP) | Reliably finds global optimum quickly |
| Smooth nonlinear problem with computable gradient | BFGS, L-BFGS-B | Good balance of convergence speed and computational efficiency |
| Hessian matrix computable | Newton-CG | Fastest convergence speed (quadratic convergence) |
| Gradient-free, robustness priority | Nelder-Mead, Powell | Stable without derivatives |
| Large-scale problem (many variables) | L-BFGS-B, CG | Memory efficient |
| Parameter fitting | least_squares | Efficient solver specialized for least squares problems |
To the Next Chapter
In Chapter 3, we will learn about Multi-objective Optimization and Pareto Optimality, covering simultaneous optimization of multiple conflicting objectives, Pareto optimal solutions and the Pareto frontier concept, and scalarization methods including the weighted sum and epsilon-constraint methods. The chapter also introduces evolutionary algorithms such as NSGA-II and multi-criteria decision making using the TOPSIS method.
Practical Tip: An effective strategy for optimization algorithms is to first try gradient-based methods (BFGS), and if they fail to converge, switch to gradient-free methods (Nelder-Mead). Also, the multi-start method of optimizing from multiple initial values and selecting the best result is effective.