Chapter 3: Catalyst MI Implementation Hands-On

This chapter covers Catalyst MI Implementation Hands. You will learn essential concepts and techniques.

Learning Objectives: - Catalyst structure manipulation and calculations using ASE - Building and evaluating catalyst activity prediction models - Catalyst composition exploration using Bayesian optimization - Integrated workflows with DFT calculations

Prerequisites: - Python basics (NumPy, Pandas, Matplotlib) - Machine learning fundamentals (scikit-learn) - Understanding of Chapters 1 and 2

Execution Environment:

pip install ase numpy pandas scikit-learn scikit-optimize matplotlib seaborn

3.1 ASE (Atomic Simulation Environment) Basics

Example 1: ASE Installation and Basic Operations

# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0

"""
Example: Example 1: ASE Installation and Basic Operations

Purpose: Demonstrate neural network implementation
Target: Beginner to Intermediate
Execution time: ~5 seconds
Dependencies: None
"""

from ase import Atoms
from ase.visualize import view
import numpy as np

# Create metal surface (Pt(111) surface)
from ase.build import fcc111
slab = fcc111('Pt', size=(4, 4, 3), vacuum=10.0)

print(f"Number of atoms: {len(slab)}")
print(f"Cell size: {slab.get_cell()}")
print(f"Chemical formula: {slab.get_chemical_formula()}")

# Check atomic positions
positions = slab.get_positions()
print(f"Top layer Z coordinate: {positions[:, 2].max():.3f} Å")

Output:

Number of atoms: 48
Cell size: Cell([[11.122, 0.0, 0.0], [-5.561, 9.632, 0.0], [0.0, 0.0, 27.713]])
Chemical formula: Pt48
Top layer Z coordinate: 7.848 Å

Example 2: Creating Adsorption Structures

from ase.build import fcc111, add_adsorbate
from ase import Atoms

# Adsorb CO on Pt(111) surface
slab = fcc111('Pt', size=(3, 3, 3), vacuum=10.0)
co = Atoms('CO', positions=[(0, 0, 0), (0, 0, 1.15)])

# Adsorb at top site
add_adsorbate(slab, co, height=2.0, position='ontop')

print(f"Chemical formula after adsorption: {slab.get_chemical_formula()}")
print(f"Total number of atoms: {len(slab)}")

# Check CO molecule position
co_indices = [i for i, sym in enumerate(slab.get_chemical_symbols())
              if sym in ['C', 'O']]
print(f"CO molecule indices: {co_indices}")

Example 3: Structure Optimization (Computational Chemistry)

from ase.build import fcc111, add_adsorbate
from ase.calculators.emt import EMT  # Empirical potential
from ase.optimize import BFGS

# Adsorb H atom on Pt surface
slab = fcc111('Pt', size=(3, 3, 3), vacuum=10.0)
from ase import Atoms
h_atom = Atoms('H')
add_adsorbate(slab, h_atom, height=1.5, position='fcc')

# Calculator setup (EMT: fast but low accuracy)
slab.calc = EMT()

# Structure optimization
opt = BFGS(slab, trajectory='opt.traj')
opt.run(fmax=0.05)  # Optimize until forces are below 0.05 eV/Å

# Results
final_energy = slab.get_potential_energy()
print(f"Optimized energy: {final_energy:.3f} eV")

Example 4: Adsorption Energy Calculation

from ase.build import fcc111, add_adsorbate
from ase.calculators.emt import EMT
from ase.optimize import BFGS
from ase import Atoms

def calculate_adsorption_energy(metal='Pt', adsorbate='H'):
    """Calculate adsorption energy"""
    # Clean surface
    slab_clean = fcc111(metal, size=(3, 3, 3), vacuum=10.0)
    slab_clean.calc = EMT()
    E_slab = slab_clean.get_potential_energy()

    # Adsorbed system
    slab_ads = fcc111(metal, size=(3, 3, 3), vacuum=10.0)
    atom = Atoms(adsorbate)
    add_adsorbate(slab_ads, atom, height=1.5, position='fcc')
    slab_ads.calc = EMT()
    opt = BFGS(slab_ads, logfile=None)
    opt.run(fmax=0.05)
    E_slab_ads = slab_ads.get_potential_energy()

    # Gas phase molecule
    molecule = Atoms(adsorbate)
    molecule.center(vacuum=10.0)
    molecule.calc = EMT()
    E_mol = molecule.get_potential_energy()

    # Adsorption energy
    E_ads = E_slab_ads - E_slab - E_mol
    return E_ads

# Comparison across multiple metals
metals = ['Pt', 'Pd', 'Ni', 'Cu']
for metal in metals:
    E_ads = calculate_adsorption_energy(metal, 'H')
    print(f"H adsorption energy on {metal} surface: {E_ads:.3f} eV")

Example Output:

H adsorption energy on Pt surface: -0.534 eV
H adsorption energy on Pd surface: -0.621 eV
H adsorption energy on Ni surface: -0.482 eV
H adsorption energy on Cu surface: -0.213 eV

Example 5: d-band Center Calculation

# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0

import numpy as np
from ase.build import bulk

def calculate_d_band_center(metal, k_points=(8, 8, 8)):
    """Simplified d-band center calculation (actual DFT required)"""
    # Approximation based on experimental values
    d_band_centers = {
        'Pt': -2.25,  # eV (Fermi level reference)
        'Pd': -1.83,
        'Ni': -1.29,
        'Cu': -2.67,
        'Au': -3.56,
        'Ag': -4.31
    }
    return d_band_centers.get(metal, None)

# d-band centers for multiple metals
metals = ['Cu', 'Ni', 'Pd', 'Pt', 'Au']
for metal in metals:
    eps_d = calculate_d_band_center(metal)
    print(f"{metal}: εd = {eps_d:.2f} eV")

Example 6: Creating Alloy Surfaces

# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0

from ase.build import fcc111
import numpy as np

def create_alloy_surface(metal1='Pt', metal2='Ni', ratio=0.5, size=(4, 4, 3)):
    """Create alloy surface"""
    slab = fcc111(metal1, size=size, vacuum=10.0)

    # Randomly substitute metals
    n_atoms = len(slab)
    n_metal2 = int(n_atoms * ratio)
    indices = np.random.choice(n_atoms, n_metal2, replace=False)

    symbols = slab.get_chemical_symbols()
    for idx in indices:
        symbols[idx] = metal2
    slab.set_chemical_symbols(symbols)

    return slab

# Create PtNi alloy surface
alloy = create_alloy_surface('Pt', 'Ni', ratio=0.3, size=(5, 5, 3))
print(f"Composition: {alloy.get_chemical_formula()}")

# Check composition ratio
symbols = alloy.get_chemical_symbols()
pt_count = symbols.count('Pt')
ni_count = symbols.count('Ni')
print(f"Pt: {pt_count} atoms ({pt_count/(pt_count+ni_count)*100:.1f}%)")
print(f"Ni: {ni_count} atoms ({ni_count/(pt_count+ni_count)*100:.1f}%)")

Example 7: Coordination Number Calculation

# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0

from ase.build import fcc111
from ase.neighborlist import NeighborList
import numpy as np

def calculate_coordination_numbers(atoms, cutoff=3.0):
    """Calculate coordination number for each atom"""
    nl = NeighborList([cutoff/2]*len(atoms), self_interaction=False, bothways=True)
    nl.update(atoms)

    coord_numbers = []
    for i in range(len(atoms)):
        indices, offsets = nl.get_neighbors(i)
        coord_numbers.append(len(indices))

    return np.array(coord_numbers)

# Coordination number distribution for Pt(111) surface
slab = fcc111('Pt', size=(4, 4, 3), vacuum=10.0)
coord_nums = calculate_coordination_numbers(slab, cutoff=3.0)

print(f"Coordination number distribution:")
unique, counts = np.unique(coord_nums, return_counts=True)
for cn, count in zip(unique, counts):
    print(f"  CN={cn}: {count} atoms")

# Surface atoms (atoms with low coordination numbers)
surface_indices = np.where(coord_nums < 9)[0]
print(f"Number of surface atoms: {len(surface_indices)}")

3.2 Building Catalyst Activity Prediction Models

Example 8: Creating Descriptor Dataset

# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0
# - pandas>=2.0.0, <2.2.0

"""
Example: Example 8: Creating Descriptor Dataset

Purpose: Demonstrate data manipulation and preprocessing
Target: Beginner to Intermediate
Execution time: ~5 seconds
Dependencies: None
"""

import pandas as pd
import numpy as np

# Metal catalyst descriptor data
data = {
    'metal': ['Pt', 'Pd', 'Ni', 'Cu', 'Au', 'Ag', 'Rh', 'Ir', 'Fe', 'Co'],
    'd_band_center': [-2.25, -1.83, -1.29, -2.67, -3.56, -4.31, -1.73, -2.12, -1.34, -1.41],  # eV
    'work_function': [5.65, 5.12, 5.15, 4.65, 5.1, 4.26, 4.98, 5.27, 4.5, 5.0],  # eV
    'surface_energy': [2.48, 2.00, 2.38, 1.79, 1.50, 1.25, 2.66, 3.05, 2.90, 2.52],  # J/m²
    'lattice_constant': [3.92, 3.89, 3.52, 3.61, 4.08, 4.09, 3.80, 3.84, 2.87, 3.54],  # Å
    'H_ads_energy': [-0.53, -0.62, -0.48, -0.21, -0.15, 0.12, -0.68, -0.71, -0.87, -0.74],  # eV
    'HER_activity': [8.2, 7.5, 6.8, 4.2, 3.8, 2.1, 7.9, 8.5, 5.3, 6.1]  # log10(i0) (A/cm²)
}

df = pd.DataFrame(data)
print(df.head())
print(f"\nDataset shape: {df.shape}")
print(f"Descriptors: {df.columns.tolist()[1:-1]}")

Example 9: Data Preprocessing and Splitting

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Features and target
X = df[['d_band_center', 'work_function', 'surface_energy',
        'lattice_constant', 'H_ads_energy']].values
y = df['HER_activity'].values

# Split into training and test data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Standardization
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print(f"Training data: {X_train.shape}")
print(f"Test data: {X_test.shape}")
print(f"\nBefore scaling: mean={X_train.mean(axis=0)}, std={X_train.std(axis=0)}")
print(f"After scaling: mean={X_train_scaled.mean(axis=0)}, std={X_train_scaled.std(axis=0)}")

Example 10: Linear Regression Model

# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0

"""
Example: Example 10: Linear Regression Model

Purpose: Demonstrate data visualization techniques
Target: Beginner to Intermediate
Execution time: 1-5 minutes
Dependencies: None
"""

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, r2_score
import matplotlib.pyplot as plt

# Model training
model_lr = LinearRegression()
model_lr.fit(X_train_scaled, y_train)

# Prediction
y_pred_train = model_lr.predict(X_train_scaled)
y_pred_test = model_lr.predict(X_test_scaled)

# Evaluation
mae_train = mean_absolute_error(y_train, y_pred_train)
mae_test = mean_absolute_error(y_test, y_pred_test)
r2_train = r2_score(y_train, y_pred_train)
r2_test = r2_score(y_test, y_pred_test)

print(f"Training data: MAE={mae_train:.3f}, R²={r2_train:.3f}")
print(f"Test data: MAE={mae_test:.3f}, R²={r2_test:.3f}")

# Check coefficients
feature_names = ['d-band center', 'work function', 'surface energy',
                 'lattice constant', 'H_ads_energy']
for name, coef in zip(feature_names, model_lr.coef_):
    print(f"{name}: {coef:.3f}")

Example 11: Random Forest Regression

from sklearn.ensemble import RandomForestRegressor

# Model training
model_rf = RandomForestRegressor(n_estimators=100, max_depth=5, random_state=42)
model_rf.fit(X_train_scaled, y_train)

# Prediction and evaluation
y_pred_test_rf = model_rf.predict(X_test_scaled)
mae_rf = mean_absolute_error(y_test, y_pred_test_rf)
r2_rf = r2_score(y_test, y_pred_test_rf)

print(f"Random Forest: MAE={mae_rf:.3f}, R²={r2_rf:.3f}")

# Feature importance
importances = model_rf.feature_importances_
for name, imp in zip(feature_names, importances):
    print(f"{name}: {imp:.3f}")

Example 12: Cross-Validation

from sklearn.model_selection import cross_val_score
from sklearn.linear_model import Ridge

# Ridge model with cross-validation
model_ridge = Ridge(alpha=1.0)
scores = cross_val_score(model_ridge, X_train_scaled, y_train,
                        cv=5, scoring='neg_mean_absolute_error')

print(f"5-fold CV MAE: {-scores.mean():.3f} ± {scores.std():.3f}")
print(f"Each fold: {-scores}")

# Hyperparameter tuning
alphas = [0.01, 0.1, 1.0, 10.0, 100.0]
for alpha in alphas:
    model = Ridge(alpha=alpha)
    scores = cross_val_score(model, X_train_scaled, y_train,
                            cv=5, scoring='neg_mean_absolute_error')
    print(f"alpha={alpha:6.2f}: MAE={-scores.mean():.3f}")

Example 13: Gaussian Process Regression

from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, ConstantKernel

# Kernel definition
kernel = ConstantKernel(1.0) * RBF(length_scale=1.0)

# GPR model
model_gpr = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=10,
                                     random_state=42)
model_gpr.fit(X_train_scaled, y_train)

# Prediction (with uncertainty)
y_pred_gpr, y_std = model_gpr.predict(X_test_scaled, return_std=True)

mae_gpr = mean_absolute_error(y_test, y_pred_gpr)
r2_gpr = r2_score(y_test, y_pred_gpr)

print(f"GPR: MAE={mae_gpr:.3f}, R²={r2_gpr:.3f}")
print(f"\nPredictions with uncertainty:")
for true, pred, std in zip(y_test, y_pred_gpr, y_std):
    print(f"True: {true:.2f}, Pred: {pred:.2f} ± {std:.2f}")

Example 14: Creating Volcano Plot

# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0
# - numpy>=1.24.0, <2.0.0

"""
Example: Example 14: Creating Volcano Plot

Purpose: Demonstrate data visualization techniques
Target: Beginner to Intermediate
Execution time: 2-5 seconds
Dependencies: None
"""

import matplotlib.pyplot as plt
import numpy as np

# Relationship between H adsorption energy and HER activity
fig, ax = plt.subplots(figsize=(8, 6))

ax.scatter(df['H_ads_energy'], df['HER_activity'], s=100, alpha=0.7)

# Display metal names
for i, txt in enumerate(df['metal']):
    ax.annotate(txt, (df['H_ads_energy'].iloc[i], df['HER_activity'].iloc[i]),
                xytext=(5, 5), textcoords='offset points')

# Volcano fit (2nd order polynomial)
x_fit = np.linspace(df['H_ads_energy'].min(), df['H_ads_energy'].max(), 100)
coeffs = np.polyfit(df['H_ads_energy'], df['HER_activity'], 2)
y_fit = np.polyval(coeffs, x_fit)
ax.plot(x_fit, y_fit, 'r--', label='Volcano fit')

ax.set_xlabel('H adsorption energy (eV)', fontsize=12)
ax.set_ylabel('log₁₀(HER activity)', fontsize=12)
ax.set_title('Volcano Plot for HER Activity', fontsize=14)
ax.legend()
ax.grid(alpha=0.3)

print(f"Optimal H adsorption energy: {-coeffs[1]/(2*coeffs[0]):.3f} eV")

Example 15: Saving and Loading Prediction Models

import pickle

# Save model
with open('catalyst_model.pkl', 'wb') as f:
    pickle.dump({'model': model_rf, 'scaler': scaler}, f)

print("Model saved: catalyst_model.pkl")

# Load model
with open('catalyst_model.pkl', 'rb') as f:
    loaded = pickle.load(f)
    loaded_model = loaded['model']
    loaded_scaler = loaded['scaler']

# Predict on new data
new_catalyst = np.array([[-2.0, 5.0, 2.0, 3.8, -0.4]])  # Virtual catalyst
new_catalyst_scaled = loaded_scaler.transform(new_catalyst)
prediction = loaded_model.predict(new_catalyst_scaled)

print(f"Predicted HER activity for new catalyst: {prediction[0]:.2f}")

3.3 Catalyst Composition Exploration Using Bayesian Optimization

Example 16: Bayesian Optimization Basics

# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0

from skopt import gp_minimize
from skopt.space import Real
import numpy as np

# Objective function (catalyst activity simulation)
def catalyst_performance(x):
    """
    x[0]: Pt ratio (0-1)
    x[1]: Calcination temperature (300-800 K)
    """
    pt_ratio, temp = x
    # Virtual activity function (actual experiment or DFT calculation)
    activity = -((pt_ratio - 0.6)**2 * 10 + (temp - 600)**2 / 10000)
    noise = np.random.normal(0, 0.1)  # Experimental noise
    return -activity + noise  # Convert to minimization problem

# Search space
space = [
    Real(0.0, 1.0, name='pt_ratio'),
    Real(300, 800, name='temperature')
]

# Execute Bayesian optimization
result = gp_minimize(
    catalyst_performance,
    space,
    n_calls=20,  # Number of experiments
    random_state=42,
    verbose=True
)

print(f"\nOptimal parameters:")
print(f"  Pt ratio: {result.x[0]:.3f}")
print(f"  Calcination temperature: {result.x[1]:.1f} K")
print(f"  Maximum activity: {-result.fun:.3f}")

Example 17: Comparing Acquisition Functions

from skopt import gp_minimize
from skopt.space import Real

# Compare three acquisition functions
acq_funcs = ['EI', 'PI', 'LCB']
results = {}

for acq in acq_funcs:
    result = gp_minimize(
        catalyst_performance,
        space,
        n_calls=15,
        acq_func=acq,
        random_state=42,
        verbose=False
    )
    results[acq] = result
    print(f"{acq}: Optimal value = {-result.fun:.3f}, "
          f"Pt ratio = {result.x[0]:.3f}, Temp = {result.x[1]:.1f}K")

Example 18: Multi-Objective Optimization (Activity vs Cost)

# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0

from skopt import gp_minimize
import numpy as np

def multi_objective(x):
    """Activity and cost trade-off"""
    pt_ratio = x[0]
    temp = x[1]

    # Activity (higher is better)
    activity = -((pt_ratio - 0.6)**2 * 10 + (temp - 600)**2 / 10000)

    # Cost (depends on Pt usage, lower is better)
    cost = pt_ratio * 100 + (temp - 300) / 10

    # Weighted sum (scalarization)
    weight_activity = 0.7
    weight_cost = 0.3
    return -(weight_activity * activity - weight_cost * cost)

result_mo = gp_minimize(multi_objective, space, n_calls=25, random_state=42)

print(f"Optimal parameters (multi-objective):")
print(f"  Pt ratio: {result_mo.x[0]:.3f}")
print(f"  Temperature: {result_mo.x[1]:.1f} K")

Example 19: Visualizing Experiment History

# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0

"""
Example: Example 19: Visualizing Experiment History

Purpose: Demonstrate data visualization techniques
Target: Beginner to Intermediate
Execution time: 2-5 seconds
Dependencies: None
"""

from skopt.plots import plot_convergence, plot_evaluations
import matplotlib.pyplot as plt

# Convergence plot
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

plot_convergence(result, ax=axes[0])
axes[0].set_title('Convergence Plot')
axes[0].set_ylabel('Objective Value')

# Evaluation plot
plot_evaluations(result, dimensions=['pt_ratio', 'temperature'], ax=axes[1])
axes[1].set_title('Parameter Evaluation')

plt.tight_layout()

# Output experiment history
print("\nExperiment history:")
for i, (params, value) in enumerate(zip(result.x_iters, result.func_vals)):
    print(f"Exp {i+1}: Pt={params[0]:.3f}, T={params[1]:.1f}K, "
          f"Activity={-value:.3f}")

Example 20: Constrained Optimization

# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0

from skopt import gp_minimize
import numpy as np

def constrained_objective(x):
    """Constrained catalyst optimization"""
    pt_ratio = x[0]
    ni_ratio = x[1]

    # Constraint: Pt + Ni ≤ 0.8 (remainder is inexpensive support)
    if pt_ratio + ni_ratio > 0.8:
        return 1e6  # Penalty

    # Activity prediction
    activity = -(pt_ratio * 8 + ni_ratio * 5 -
                (pt_ratio - 0.5)**2 * 10 - (ni_ratio - 0.2)**2 * 10)
    return -activity

space_alloy = [
    Real(0.0, 0.8, name='pt_ratio'),
    Real(0.0, 0.8, name='ni_ratio')
]

result_const = gp_minimize(constrained_objective, space_alloy,
                          n_calls=30, random_state=42)

print(f"Optimal composition:")
print(f"  Pt: {result_const.x[0]:.3f}")
print(f"  Ni: {result_const.x[1]:.3f}")
print(f"  Others: {1 - result_const.x[0] - result_const.x[1]:.3f}")
print(f"  Predicted activity: {-result_const.fun:.3f}")

Example 21: Batch Bayesian Optimization

from skopt import gp_minimize
from skopt.optimizer import Optimizer

# Simulate batch experiments
optimizer = Optimizer(space, acq_func='EI', random_state=42)

# Initial samples
n_initial = 5
X_init = [[np.random.uniform(0, 1), np.random.uniform(300, 800)]
          for _ in range(n_initial)]
y_init = [catalyst_performance(x) for x in X_init]

optimizer.tell(X_init, y_init)

# Batch experiments (3 experiments in parallel)
batch_size = 3
n_batches = 5

for batch in range(n_batches):
    # Generate next batch of experimental candidates
    X_next_batch = []
    for _ in range(batch_size):
        x_next = optimizer.ask()
        X_next_batch.append(x_next)

    # Execute experiments (in parallel)
    y_next_batch = [catalyst_performance(x) for x in X_next_batch]

    # Update results
    optimizer.tell(X_next_batch, y_next_batch)

    print(f"Batch {batch+1}: Best so far = {-min(optimizer.yi):.3f}")

print(f"\nFinal optimal value: {-min(optimizer.yi):.3f}")
print(f"Total experiments: {len(optimizer.Xi)}")

Example 22: Transfer Learning Bayesian Optimization

# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0

from skopt import gp_minimize
from sklearn.gaussian_process.kernels import RBF, WhiteKernel
import numpy as np

# Past data from similar catalyst system (transfer learning source)
X_source = np.array([[0.3, 400], [0.5, 500], [0.7, 600], [0.9, 700]])
y_source = np.array([-2.5, -4.0, -4.8, -3.5])  # Activity data

# New catalyst system optimization (target)
def target_catalyst(x):
    """New catalyst system (similar but different)"""
    pt_ratio, temp = x
    activity = -((pt_ratio - 0.55)**2 * 12 + (temp - 550)**2 / 8000)
    return -activity + np.random.normal(0, 0.1)

# Pre-train GPR model with past data
from sklearn.gaussian_process import GaussianProcessRegressor

kernel = RBF(length_scale=1.0) + WhiteKernel(noise_level=0.1)
gpr_prior = GaussianProcessRegressor(kernel=kernel)
gpr_prior.fit(X_source, y_source)

# Bayesian optimization (utilizing prior knowledge)
result_tl = gp_minimize(
    target_catalyst,
    space,
    n_calls=10,  # Few experiments
    random_state=42
)

print(f"With transfer learning: Optimal value = {-result_tl.fun:.3f} (10 experiments)")

# Comparison: Without transfer learning
result_no_tl = gp_minimize(target_catalyst, space, n_calls=10, random_state=42)
print(f"Without transfer learning: Optimal value = {-result_no_tl.fun:.3f} (10 experiments)")

3.4 Integration with DFT Calculations

Example 23: DFT Calculation Setup with ASE

from ase.build import fcc111, add_adsorbate
from ase import Atoms

# Actual DFT calculations require GPAW, VASP, Quantum ESPRESSO, etc.
# This shows setup example

# CO adsorption on Pt(111) surface
slab = fcc111('Pt', size=(3, 3, 4), vacuum=15.0)
co = Atoms('CO', positions=[(0, 0, 0), (0, 0, 1.15)])
add_adsorbate(slab, co, height=2.0, position='ontop')

# DFT calculator setup (GPAW case)
# from gpaw import GPAW, PW
# calc = GPAW(
#     mode=PW(500),  # Plane wave cutoff
#     xc='PBE',
#     kpts=(4, 4, 1),
#     txt='co_pt.txt'
# )
# slab.calc = calc

print("DFT calculation setup complete (requires GPAW, etc.)")
print(f"System size: {len(slab)} atoms")
print(f"Cell: {slab.get_cell()}")

Example 24: High-Accuracy Adsorption Energy Calculation

# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0

from ase.build import fcc111, add_adsorbate
from ase.calculators.emt import EMT  # EMT instead of DFT
from ase.optimize import BFGS
import numpy as np

def dft_adsorption_energy(metal, adsorbate, site='fcc'):
    """DFT-equivalent adsorption energy calculation"""
    # Clean surface
    slab = fcc111(metal, size=(3, 3, 4), vacuum=10.0)
    slab.calc = EMT()
    E_slab = slab.get_potential_energy()

    # Adsorbed system
    slab_ads = slab.copy()
    if adsorbate == 'H':
        from ase import Atoms
        atom = Atoms('H')
        add_adsorbate(slab_ads, atom, height=1.5, position=site)

    slab_ads.calc = EMT()
    opt = BFGS(slab_ads, logfile=None)
    opt.run(fmax=0.01)  # High-accuracy optimization
    E_slab_ads = slab_ads.get_potential_energy()

    # Gas phase H2
    from ase import Atoms
    h2 = Atoms('H2', positions=[(0, 0, 0), (0, 0, 0.74)])
    h2.center(vacuum=10.0)
    h2.calc = EMT()
    E_h2 = h2.get_potential_energy()

    E_ads = E_slab_ads - E_slab - 0.5 * E_h2
    return E_ads

# Adsorption energy at multiple sites
sites = ['fcc', 'hcp', 'ontop']
for site in sites:
    E_ads = dft_adsorption_energy('Pt', 'H', site)
    print(f"Pt(111) {site} site H adsorption: {E_ads:.3f} eV")

Example 25: Reaction Pathway Analysis (NEB Method)

# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0

from ase.build import fcc111
from ase.calculators.emt import EMT
from ase.neb import NEB
from ase.optimize import BFGS
import numpy as np

# Initial and final states (simplified example)
def setup_reaction_path():
    """H atom surface diffusion (fcc → hcp)"""
    # Initial state: H at fcc site
    slab_initial = fcc111('Pt', size=(3, 3, 3), vacuum=10.0)
    from ase import Atoms
    from ase.build import add_adsorbate
    h_atom = Atoms('H')
    add_adsorbate(slab_initial, h_atom, height=1.5, position='fcc')
    slab_initial.calc = EMT()

    # Final state: H at hcp site
    slab_final = fcc111('Pt', size=(3, 3, 3), vacuum=10.0)
    h_atom = Atoms('H')
    add_adsorbate(slab_final, h_atom, height=1.5, position='hcp')
    slab_final.calc = EMT()

    return slab_initial, slab_final

initial, final = setup_reaction_path()

# NEB calculation (simplified version)
print("Transition state search using NEB method")
print(f"Initial state energy: {initial.get_potential_energy():.3f} eV")
print(f"Final state energy: {final.get_potential_energy():.3f} eV")
print("Actual NEB calculation requires multiple images and optimization")

Example 26: Electronic Structure Analysis

# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0
# - numpy>=1.24.0, <2.0.0

import numpy as np
import matplotlib.pyplot as plt

# Simulate DOS (Density of States) from DFT calculations
def simulate_dos(metal):
    """Simulate metal DOS"""
    energies = np.linspace(-10, 5, 500)

    if metal == 'Pt':
        # d-band: -6 to -2 eV
        dos = np.exp(-((energies + 2.25)**2) / 2) * 3
        # sp-band
        dos += np.exp(-((energies - 0)**2) / 10) * 0.5
    elif metal == 'Cu':
        # d-band: -4 to -1 eV (deeper)
        dos = np.exp(-((energies + 2.67)**2) / 2) * 3
        dos += np.exp(-((energies - 0)**2) / 10) * 0.5

    return energies, dos

# Plot
fig, ax = plt.subplots(figsize=(10, 6))

for metal in ['Pt', 'Cu']:
    energies, dos = simulate_dos(metal)
    ax.plot(energies, dos, label=metal, linewidth=2)

ax.axvline(0, color='k', linestyle='--', label='Fermi level')
ax.set_xlabel('Energy (eV)', fontsize=12)
ax.set_ylabel('Density of States', fontsize=12)
ax.set_title('Electronic DOS of Metal Catalysts', fontsize=14)
ax.legend()
ax.grid(alpha=0.3)

print("DOS plot created")

Example 27: DFT-ML Integration Workflow

# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0
# - numpy>=1.24.0, <2.0.0

from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF
import numpy as np

# Step 1: Small number of DFT calculations
def expensive_dft_calculation(composition):
    """High-cost DFT calculation (simulation)"""
    x = composition
    # Actual adsorption energy calculation
    energy = -2.0 + 3.0 * x - 2.0 * x**2 + np.random.normal(0, 0.05)
    return energy

# Initial DFT calculations (5 points)
X_dft = np.array([[0.2], [0.4], [0.6], [0.8], [1.0]])
y_dft = np.array([expensive_dft_calculation(x[0]) for x in X_dft])

# Step 2: GPR surrogate model
gpr = GaussianProcessRegressor(kernel=RBF(), n_restarts_optimizer=10)
gpr.fit(X_dft, y_dft)

# Step 3: Prediction at many points (low cost)
X_pred = np.linspace(0, 1, 100).reshape(-1, 1)
y_pred, y_std = gpr.predict(X_pred, return_std=True)

print("DFT-ML integration workflow:")
print(f"Number of DFT calculations: {len(X_dft)}")
print(f"Number of ML predictions: {len(X_pred)}")
print(f"Optimal composition (predicted): {X_pred[np.argmin(y_pred)][0]:.3f}")

# Plot
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(10, 6))
ax.scatter(X_dft, y_dft, c='red', s=100, label='DFT calc', zorder=10)
ax.plot(X_pred, y_pred, 'b-', label='GPR mean')
ax.fill_between(X_pred.ravel(), y_pred - y_std, y_pred + y_std,
                alpha=0.3, label='±1 std')
ax.set_xlabel('Composition')
ax.set_ylabel('Adsorption Energy (eV)')
ax.legend()

3.5 Reaction Kinetics Analysis

Example 28: Arrhenius Analysis

# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0
# - numpy>=1.24.0, <2.0.0

"""
Example: Example 28: Arrhenius Analysis

Purpose: Demonstrate data visualization techniques
Target: Intermediate
Execution time: 10-30 seconds
Dependencies: None
"""

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

# Experimental data (temperature vs reaction rate)
temperatures = np.array([300, 350, 400, 450, 500, 550])  # K
rate_constants = np.array([0.01, 0.05, 0.15, 0.35, 0.70, 1.20])  # s⁻¹

# Arrhenius equation: k = A * exp(-Ea / (R*T))
def arrhenius(T, A, Ea):
    R = 8.314e-3  # kJ/(mol·K)
    return A * np.exp(-Ea / (R * T))

# Fitting
popt, pcov = curve_fit(arrhenius, temperatures, rate_constants, p0=[1e10, 50])
A_fit, Ea_fit = popt

print(f"Frequency factor A: {A_fit:.2e} s⁻¹")
print(f"Activation energy Ea: {Ea_fit:.2f} kJ/mol")

# Arrhenius plot
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Normal plot
axes[0].scatter(temperatures, rate_constants, label='Experimental', s=100)
T_fit = np.linspace(280, 570, 100)
axes[0].plot(T_fit, arrhenius(T_fit, A_fit, Ea_fit), 'r--', label='Fit')
axes[0].set_xlabel('Temperature (K)')
axes[0].set_ylabel('Rate constant (s⁻¹)')
axes[0].legend()

# Arrhenius plot (linearized)
axes[1].scatter(1000/temperatures, np.log(rate_constants), s=100)
axes[1].plot(1000/T_fit, np.log(arrhenius(T_fit, A_fit, Ea_fit)), 'r--')
axes[1].set_xlabel('1000/T (K⁻¹)')
axes[1].set_ylabel('ln(k)')
axes[1].set_title(f'Ea = {Ea_fit:.1f} kJ/mol')

Example 29: Reaction Order Analysis

# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0
# - numpy>=1.24.0, <2.0.0

import numpy as np
from scipy.integrate import odeint
import matplotlib.pyplot as plt

# Reaction rate equation: -dC/dt = k * C^n
def reaction_rate(C, t, k, n):
    """n-th order reaction rate equation"""
    return -k * C**n

# Parameters
k = 0.1  # Rate constant
C0 = 1.0  # Initial concentration
t = np.linspace(0, 50, 100)

# Calculate for different reaction orders
fig, ax = plt.subplots(figsize=(10, 6))

for n in [0, 1, 2]:
    C = odeint(reaction_rate, C0, t, args=(k, n))
    ax.plot(t, C, label=f'n={n} order', linewidth=2)

ax.set_xlabel('Time', fontsize=12)
ax.set_ylabel('Concentration', fontsize=12)
ax.set_title('Reaction Order Effects', fontsize=14)
ax.legend()
ax.grid(alpha=0.3)

# Half-life calculation
def half_life(k, n, C0=1.0):
    """Half-life for n-th order reaction"""
    if n == 0:
        return C0 / (2 * k)
    elif n == 1:
        return np.log(2) / k
    elif n == 2:
        return 1 / (k * C0)

for n in [0, 1, 2]:
    t_half = half_life(k, n, C0)
    print(f"Half-life for {n}-th order reaction: {t_half:.2f}")

Example 30: Microkinetic Model

# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0
# - numpy>=1.24.0, <2.0.0

import numpy as np
from scipy.integrate import odeint
import matplotlib.pyplot as plt

# Microkinetic model for CO oxidation reaction
# CO(g) + O(g) -> CO2(g) on Pt surface
def microkinetics(y, t, k1, k2, k3, k4):
    """
    y: [θ_CO, θ_O, θ_free]  # Surface coverage
    k1: CO adsorption rate constant
    k2: O2 dissociative adsorption rate constant
    k3: CO oxidation rate constant
    k4: CO2 desorption rate constant
    """
    theta_CO, theta_O, theta_free = y

    # Gas phase partial pressures (constant)
    P_CO = 0.1  # bar
    P_O2 = 0.2  # bar

    # Reaction rates
    r1 = k1 * P_CO * theta_free  # CO adsorption
    r2 = k2 * P_O2 * theta_free**2  # O2 dissociative adsorption
    r3 = k3 * theta_CO * theta_O  # CO + O -> CO2
    r4 = k4 * theta_CO * theta_O  # CO2 desorption (simultaneous with reaction)

    # Coverage changes
    dtheta_CO_dt = r1 - r3
    dtheta_O_dt = 2 * r2 - r3
    dtheta_free_dt = -r1 - 2 * r2 + r3

    return [dtheta_CO_dt, dtheta_O_dt, dtheta_free_dt]

# Initial conditions
y0 = [0.0, 0.0, 1.0]  # Clean surface

# Rate constants (arbitrary units)
k1, k2, k3, k4 = 1.0, 0.5, 2.0, 2.0

# Time evolution
t = np.linspace(0, 10, 1000)
solution = odeint(microkinetics, y0, t, args=(k1, k2, k3, k4))

# Plot
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(t, solution[:, 0], label='θ_CO', linewidth=2)
ax.plot(t, solution[:, 1], label='θ_O', linewidth=2)
ax.plot(t, solution[:, 2], label='θ_free', linewidth=2)
ax.set_xlabel('Time', fontsize=12)
ax.set_ylabel('Surface Coverage', fontsize=12)
ax.set_title('Microkinetic Model: CO Oxidation on Pt', fontsize=14)
ax.legend()
ax.grid(alpha=0.3)

# Steady state
theta_CO_ss = solution[-1, 0]
theta_O_ss = solution[-1, 1]
TOF = k3 * theta_CO_ss * theta_O_ss  # Turnover Frequency

print(f"Steady state:")
print(f"  θ_CO: {theta_CO_ss:.3f}")
print(f"  θ_O: {theta_O_ss:.3f}")
print(f"  TOF: {TOF:.3f} s⁻¹")

3.6 Project Challenge

Challenge: CO2 Reduction Catalyst Optimization

Optimize a CO2→CO conversion catalyst following these steps:

Data Collection: Calculate CO2 adsorption energies for multiple metals (Cu, Ag, Au, Pd, Pt)
Descriptor Calculation: Obtain d-band center, work function, etc.
Build Prediction Model: GPR model to predict activity from descriptors
Bayesian Optimization: Optimize alloy composition (Cu-Ag binary system)
Validation: DFT calculation (EMT approximation) at optimal composition

Evaluation Criteria: - CO production selectivity > 80% - Overpotential < 0.5 V - Minimize cost (Ag usage)

Deliverables: - Optimal composition (Cu:Ag ratio) - Trade-off analysis of predicted activity and cost - Complete Python code

Exercise Problems

Problem 1: Create a Pt-Ni alloy catalyst (Pt:Ni = 3:1) (111) surface and calculate the coordination number distribution.

Problem 2: Based on the Sabatier principle from Chapter 2, build a machine learning model to predict optimal H adsorption energy (refer to Examples 8-14).

Problem 3: Simultaneously optimize calcination temperature (400-900 K) and loading amount (0-50 wt%) using Bayesian optimization to achieve maximum activity.

Problem 4: Calculate the activation energy from your experimental data using an Arrhenius plot.

Problem 5: Simulate steady-state coverage and TOF when varying CO/O2 ratio in the microkinetic model.

References

ASE Documentation: https://wiki.fysik.dtu.dk/ase/
scikit-optimize: https://scikit-optimize.github.io/
Nørskov, J. K. et al. "Origin of the Overpotential for Oxygen Reduction at a Fuel-Cell Cathode." J. Phys. Chem. B (2004).
Hammer, B. & Nørskov, J. K. "Theoretical Surface Science and Catalysis." Advances in Catalysis (2000).

Next Chapter: Chapter 4: Catalyst MI Practical Case Studies

License: This content is provided under CC BY 4.0 license.