Learning Objectives
- Understand the principles of Bayesian Optimization (PHYSBO)
- Learn when to use each of NIMO's 11 algorithms
- Configure algorithm-specific parameters
- Choose the right algorithm for your optimization problem
3.1 The Exploration-Exploitation Trade-off
Before diving into specific algorithms, let's understand the fundamental challenge in optimization:
Exploration vs Exploitation
Exploration: Testing regions where we have little information, potentially finding better solutions in unexplored areas.
Exploitation: Focusing on regions where we've already found good results, refining our best solutions.
Good optimization algorithms balance both strategies automatically.
3.2 Bayesian Optimization (PHYSBO)
PHYSBO is NIMO's flagship algorithm, based on Bayesian Optimization principles. It's the recommended choice for most materials optimization problems.
How Bayesian Optimization Works
The algorithm follows a three-step process:
- Build a surrogate model: Use a Gaussian Process (GP) to model the objective function based on existing data
- Compute acquisition function: Calculate where the next experiment is most likely to improve results
- Select next candidate: Choose the point that maximizes the acquisition function
PHYSBO in NIMO
import nimo
# Basic Bayesian Optimization
nimo.selection(
method="PHYSBO",
input_file="candidates.csv",
output_file="proposals.csv",
num_objectives=1,
num_proposals=1
)
import nimo
# Bayesian Optimization with configuration
nimo.selection(
method="PHYSBO",
input_file="candidates.csv",
output_file="proposals.csv",
num_objectives=1,
num_proposals=3,
physbo_score="EI", # Acquisition function: EI, PI, or TS
physbo_seed=42 # Random seed for reproducibility
)
Acquisition Functions
PHYSBO supports three acquisition functions:
| Function | Full Name | Strategy | When to Use |
|---|---|---|---|
| EI | Expected Improvement | Balanced | Default choice, most situations |
| PI | Probability of Improvement | Exploitation-heavy | When close to optimal, need refinement |
| TS | Thompson Sampling | Exploration-heavy | Multi-modal functions, parallel experiments |
Recommended: Expected Improvement (EI)
For most materials science problems, EI provides the best balance between exploration and exploitation. It naturally adapts to the optimization landscape.
3.3 Random Exploration (RE)
RE is a simple but essential algorithm: it randomly selects from untested candidates.
import nimo
# Random selection for initial data collection
nimo.selection(
method="RE",
input_file="candidates.csv",
output_file="proposals.csv",
num_objectives=1,
num_proposals=5,
re_seed=42 # For reproducibility
)
When to Use RE
- First cycle: Collect initial data before other algorithms can work
- Baseline comparison: Compare against random to validate AI improvement
- High uncertainty: When you have no prior knowledge about the search space
Typical Workflow
Cycle 1: RE (collect 5-10 random samples)
Cycles 2+: PHYSBO (intelligent selection based on data)
3.4 BLOX: Random Forest Approach
BLOX uses Random Forest models instead of Gaussian Processes. This makes it faster for high-dimensional problems.
import nimo
# Random Forest-based optimization
nimo.selection(
method="BLOX",
input_file="candidates.csv",
output_file="proposals.csv",
num_objectives=1,
num_proposals=3
)
BLOX vs PHYSBO Comparison
| Aspect | PHYSBO | BLOX |
|---|---|---|
| Model | Gaussian Process | Random Forest |
| Uncertainty | Well-calibrated | Approximate |
| Speed (low-dim) | Fast | Fast |
| Speed (high-dim) | Slow | Fast |
| Best for | ≤20 descriptors | >20 descriptors |
3.5 Phase Diagram Construction (PDC)
PDC is designed specifically for materials science problems where you need to map out different phases or regions.
import nimo
# Phase diagram construction
nimo.selection(
method="PDC",
input_file="candidates.csv",
output_file="proposals.csv",
num_objectives=1,
num_proposals=3
)
When to Use PDC
- Mapping phase boundaries in alloy systems
- Identifying regions of different material properties
- Exploring multi-component composition spaces
3.6 Multi-Objective Algorithms
When optimizing multiple objectives simultaneously (e.g., maximize strength AND minimize cost), use these algorithms:
PTR: Pareto-based Thompson Ranking
import nimo
# Multi-objective optimization (2 objectives)
nimo.selection(
method="PTR",
input_file="candidates.csv",
output_file="proposals.csv",
num_objectives=2, # Optimize 2 properties
num_proposals=3
)
Pareto Optimality
A solution is Pareto optimal if no objective can be improved without worsening another. Multi-objective algorithms find the set of all Pareto optimal solutions (the Pareto front).
3.7 Algorithm Selection Guide
Use this decision tree to choose the right algorithm:
Quick Reference Table
| Scenario | Recommended Algorithm | Reason |
|---|---|---|
| First experiments | RE | Need initial random data |
| General optimization | PHYSBO | Best overall performance |
| High dimensions (>20) | BLOX | Faster with many features |
| Phase mapping | PDC | Designed for phase diagrams |
| Multi-objective | PTR | Pareto optimization |
| Fast exploration | SLESA | Quick candidate selection |
3.8 Algorithm-Specific Parameters
Each algorithm has specific parameters you can tune:
PHYSBO Parameters
physbo_score: "EI", "PI", or "TS"physbo_seed: Random seed for reproducibility
RE Parameters
re_seed: Random seed for reproducibility
BLOX Parameters
blox_num_rand_basis: Number of random basis functionsblox_seed: Random seed for reproducibility
Exercises
Exercise 1: Algorithm Selection
For each scenario, choose the most appropriate algorithm:
- You're starting a new optimization with no prior data (10 descriptors)
- You want to optimize hardness and ductility simultaneously
- You're mapping the phase diagram of a ternary alloy system
- You have 50 descriptors and need fast optimization
Exercise 2: PHYSBO Configuration
Write NIMO code to:
- Use PHYSBO with Thompson Sampling acquisition
- Select 5 proposals per cycle
- Set a random seed of 123 for reproducibility
Summary
- Bayesian Optimization (PHYSBO) is the default choice for most problems
- Random Exploration (RE) is essential for collecting initial data
- BLOX scales better for high-dimensional problems
- PDC is specialized for phase diagram construction
- PTR handles multi-objective optimization
- Always start with RE for the first cycle, then switch to intelligent algorithms