Chapter 2: Factorial Design and ANOVA
Learn how to design full factorial experiments and fractional factorial designs (fractional design), and master the statistical evaluation of factor effects using analysis of variance (ANOVA). Identify important factors in chemical processes through multiple comparison tests and decomposition of variance components.
Learning Objectives
By reading this chapter, you will master the following:
- ✅ Design and conduct full factorial experiments (2³ design)
- ✅ Reduce the number of experiments using fractional factorial design (2^(k-p))
- ✅ Perform and interpret one-way and two-way ANOVA
- ✅ Evaluate factor significance using F-test
- ✅ Identify group differences using Tukey HSD multiple comparison test
- ✅ Calculate variance component contribution ratios and visualize key factors
- ✅ Determine optimal conditions in catalyst activity experiment case studies
2.1 Full Factorial Design
What is Full Factorial Design
Full Factorial Design is a method to experiment with all combinations of all levels of all factors. If there are k factors and each factor has m levels, the number of experiments is m^k.
Main characteristics:
- All main effects and interactions can be evaluated
- Two-level experiments (2^k) are most common
- Number of experiments increases exponentially with number of factors
- Optimal for small-scale experiments (3-4 factors)
Code Example 1: Full Factorial Design (2³ Design)
Conduct a full factorial experiment (8 runs) with 3 factors (temperature, pressure, catalyst amount) at 2 levels each in a chemical reaction.
# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0
# - numpy>=1.24.0, <2.0.0
# - pandas>=2.0.0, <2.2.0
"""
Example: Conduct a full factorial experiment (8 runs) with 3 factors
Purpose: Demonstrate data visualization techniques
Target: Intermediate
Execution time: 2-5 seconds
Dependencies: None
"""
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from itertools import product
# Full factorial design 2^3
# Factor A: Temperature (150°C vs 200°C)
# Factor B: Pressure (1.0 MPa vs 2.0 MPa)
# Factor C: Catalyst amount (0.5 g vs 1.0 g)
np.random.seed(42)
# Define factors
factors = {
'Temperature': [150, 200],
'Pressure': [1.0, 2.0],
'Catalyst': [0.5, 1.0]
}
# Generate all combinations
combinations = list(product(factors['Temperature'],
factors['Pressure'],
factors['Catalyst']))
# Create experimental design table
doe_table = pd.DataFrame(combinations,
columns=['Temperature', 'Pressure', 'Catalyst'])
doe_table.insert(0, 'Run', range(1, len(doe_table) + 1))
print("=== Full Factorial Design 2^3 ===")
print(doe_table)
# Simulated yield data
# True model: main effects + two-way interaction + noise
yields = []
for _, row in doe_table.iterrows():
temp = row['Temperature']
press = row['Pressure']
cat = row['Catalyst']
# Main effects (linear)
yield_base = 60
temp_effect = 0.15 * (temp - 150)
press_effect = 10 * (press - 1.0)
cat_effect = 8 * (cat - 0.5)
# Two-way interaction (Temp × Press)
interaction_TP = 0.04 * (temp - 150) * (press - 1.0)
# Three-way interaction (Temp × Press × Cat)
interaction_TPC = 0.01 * (temp - 150) * (press - 1.0) * (cat - 0.5)
yield_true = (yield_base + temp_effect + press_effect + cat_effect +
interaction_TP + interaction_TPC)
# Add noise
yield_obs = yield_true + np.random.normal(0, 1.5)
yields.append(yield_obs)
doe_table['Yield'] = yields
print("\n=== Experimental Results (Yield %) ===")
print(doe_table)
# Calculate main effects for each factor
print("\n=== Main Effects Analysis ===")
# Main effect of temperature
temp_low = doe_table[doe_table['Temperature'] == 150]['Yield'].mean()
temp_high = doe_table[doe_table['Temperature'] == 200]['Yield'].mean()
print(f"Temperature: Low level={temp_low:.2f}%, High level={temp_high:.2f}%, Effect={temp_high - temp_low:.2f}%")
# Main effect of pressure
press_low = doe_table[doe_table['Pressure'] == 1.0]['Yield'].mean()
press_high = doe_table[doe_table['Pressure'] == 2.0]['Yield'].mean()
print(f"Pressure: Low level={press_low:.2f}%, High level={press_high:.2f}%, Effect={press_high - press_low:.2f}%")
# Main effect of catalyst amount
cat_low = doe_table[doe_table['Catalyst'] == 0.5]['Yield'].mean()
cat_high = doe_table[doe_table['Catalyst'] == 1.0]['Yield'].mean()
print(f"Catalyst: Low level={cat_low:.2f}%, High level={cat_high:.2f}%, Effect={cat_high - cat_low:.2f}%")
# Create main effects plots
fig, axes = plt.subplots(1, 3, figsize=(16, 5))
# Temperature main effect
axes[0].plot([150, 200], [temp_low, temp_high],
marker='o', linewidth=2.5, markersize=10, color='#11998e')
axes[0].set_xlabel('Temperature (°C)', fontsize=12)
axes[0].set_ylabel('Mean Yield (%)', fontsize=12)
axes[0].set_title('Main Effect of Temperature', fontsize=14, fontweight='bold')
axes[0].grid(alpha=0.3)
# Pressure main effect
axes[1].plot([1.0, 2.0], [press_low, press_high],
marker='s', linewidth=2.5, markersize=10, color='#f59e0b')
axes[1].set_xlabel('Pressure (MPa)', fontsize=12)
axes[1].set_ylabel('Mean Yield (%)', fontsize=12)
axes[1].set_title('Main Effect of Pressure', fontsize=14, fontweight='bold')
axes[1].grid(alpha=0.3)
# Catalyst main effect
axes[2].plot([0.5, 1.0], [cat_low, cat_high],
marker='^', linewidth=2.5, markersize=10, color='#7b2cbf')
axes[2].set_xlabel('Catalyst Amount (g)', fontsize=12)
axes[2].set_ylabel('Mean Yield (%)', fontsize=12)
axes[2].set_title('Main Effect of Catalyst', fontsize=14, fontweight='bold')
axes[2].grid(alpha=0.3)
plt.tight_layout()
plt.savefig('full_factorial_main_effects.png', dpi=300, bbox_inches='tight')
plt.show()
print(f"\nTotal number of experiments: {len(doe_table)} runs (2^3 = 8 runs)")
print("All main effects and interactions can be evaluated")
Output example:
=== Full Factorial Design 2^3 ===
Run Temperature Pressure Catalyst
0 1 150 1.0 0.5
1 2 150 1.0 1.0
2 3 150 2.0 0.5
3 4 150 2.0 1.0
4 5 200 1.0 0.5
5 6 200 1.0 1.0
6 7 200 2.0 0.5
7 8 200 2.0 1.0
=== Experimental Results (Yield %) ===
Run Temperature Pressure Catalyst Yield
0 1 150 1.0 0.5 60.494371
1 2 150 1.0 1.0 69.861468
2 3 150 2.0 0.5 70.646968
3 4 150 2.0 1.0 78.522869
4 5 200 1.0 0.5 68.647689
5 6 200 1.0 1.0 78.522232
6 7 200 2.0 0.5 82.233257
7 8 200 2.0 1.0 91.767995
=== Main Effects Analysis ===
Temperature: Low level=69.88%, High level=80.29%, Effect=10.42%
Pressure: Low level=69.38%, High level=80.79%, Effect=11.41%
Catalyst: Low level=70.51%, High level=79.67%, Effect=9.16%
Total number of experiments: 8 runs (2^3 = 8 runs)
All main effects and interactions can be evaluated
Interpretation: The full factorial experiment accurately evaluated all three main effects. Pressure has the largest effect (11.41%), followed by temperature (10.42%) and catalyst amount (9.16%).
Code Example 2: Fractional Factorial Design
Evaluate 4 factors using 2^(4-1) half-fraction design (8 runs) and understand confounding.
# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0
# - numpy>=1.24.0, <2.0.0
# - pandas>=2.0.0, <2.2.0
"""
Example: Evaluate 4 factors using 2^(4-1) half-fraction design (8 run
Purpose: Demonstrate data visualization techniques
Target: Intermediate
Execution time: 2-5 seconds
Dependencies: None
"""
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Fractional factorial design 2^(4-1)
# Evaluate 4 factors in 8 runs (full factorial would require 16 runs)
# Factor A: Temperature (150°C vs 200°C)
# Factor B: Pressure (1.0 MPa vs 2.0 MPa)
# Factor C: Catalyst amount (0.5 g vs 1.0 g)
# Factor D: Reaction time (30 min vs 60 min)
np.random.seed(42)
# Generate fractional design (I = ABCD relationship)
# Confound factor D with A×B×C interaction
design = np.array([
[-1, -1, -1, -1], # Run 1
[+1, -1, -1, +1], # Run 2
[-1, +1, -1, +1], # Run 3
[+1, +1, -1, -1], # Run 4
[-1, -1, +1, +1], # Run 5
[+1, -1, +1, -1], # Run 6
[-1, +1, +1, -1], # Run 7
[+1, +1, +1, +1], # Run 8
])
# Convert coded values to actual values
factor_levels = {
'Temperature': {-1: 150, +1: 200},
'Pressure': {-1: 1.0, +1: 2.0},
'Catalyst': {-1: 0.5, +1: 1.0},
'Time': {-1: 30, +1: 60}
}
doe_table = pd.DataFrame({
'Run': range(1, 9),
'Temperature': [factor_levels['Temperature'][x] for x in design[:, 0]],
'Pressure': [factor_levels['Pressure'][x] for x in design[:, 1]],
'Catalyst': [factor_levels['Catalyst'][x] for x in design[:, 2]],
'Time': [factor_levels['Time'][x] for x in design[:, 3]]
})
print("=== Fractional Factorial Design 2^(4-1) ===")
print(doe_table)
# Simulated yield data
yields = []
for _, row in doe_table.iterrows():
temp = row['Temperature']
press = row['Pressure']
cat = row['Catalyst']
time = row['Time']
# Main effects
yield_base = 65
temp_effect = 0.10 * (temp - 150)
press_effect = 8 * (press - 1.0)
cat_effect = 6 * (cat - 0.5)
time_effect = 0.15 * (time - 30)
yield_true = yield_base + temp_effect + press_effect + cat_effect + time_effect
# Add noise
yield_obs = yield_true + np.random.normal(0, 1.5)
yields.append(yield_obs)
doe_table['Yield'] = yields
print("\n=== Experimental Results (Yield %) ===")
print(doe_table)
# Calculate main effects (using coded values)
design_df = pd.DataFrame(design, columns=['A', 'B', 'C', 'D'])
design_df['Yield'] = yields
effects = {}
for col in ['A', 'B', 'C', 'D']:
# Effect = (mean of high level - mean of low level)
high = design_df[design_df[col] == 1]['Yield'].mean()
low = design_df[design_df[col] == -1]['Yield'].mean()
effects[col] = high - low
print("\n=== Estimated Factor Effects ===")
print(f"Factor A (Temperature): {effects['A']:.2f}%")
print(f"Factor B (Pressure): {effects['B']:.2f}%")
print(f"Factor C (Catalyst): {effects['C']:.2f}%")
print(f"Factor D (Time): {effects['D']:.2f}%")
print("\n=== Confounding Structure ===")
print("Due to I = ABCD relationship, the following are confounded:")
print(" A is confounded with BCD")
print(" B is confounded with ACD")
print(" C is confounded with ABD")
print(" D is confounded with ABC")
print("\n⚠️ Valid estimation is possible when main effects are large and interactions are small")
# Visualize effects
plt.figure(figsize=(10, 6))
factor_names = ['Temperature', 'Pressure', 'Catalyst', 'Time']
effect_values = [effects['A'], effects['B'], effects['C'], effects['D']]
plt.bar(factor_names, effect_values, color=['#11998e', '#f59e0b', '#7b2cbf', '#e63946'])
plt.ylabel('Factor Effect (%)', fontsize=12)
plt.xlabel('Factor', fontsize=12)
plt.title('Factor Effects from Fractional Factorial Design', fontsize=14, fontweight='bold')
plt.axhline(y=0, color='black', linestyle='-', linewidth=0.8)
plt.grid(alpha=0.3, axis='y')
plt.tight_layout()
plt.savefig('fractional_factorial_effects.png', dpi=300, bbox_inches='tight')
plt.show()
print(f"\nTotal number of experiments: {len(doe_table)} runs (50% of full factorial)")
print("Efficiency: Evaluated 4 factors in 8 runs (full factorial would require 16 runs)")
Output example:
=== Fractional Factorial Design 2^(4-1) ===
Run Temperature Pressure Catalyst Time
0 1 150 1.0 0.5 30
1 2 200 1.0 0.5 60
2 3 150 2.0 0.5 60
3 4 200 2.0 0.5 30
4 5 150 1.0 1.0 60
5 6 200 1.0 1.0 30
6 7 150 2.0 1.0 30
7 8 200 2.0 1.0 60
=== Estimated Factor Effects ===
Factor A (Temperature): 5.07%
Factor B (Pressure): 8.01%
Factor C (Catalyst): 6.05%
Factor D (Time): 4.52%
Total number of experiments: 8 runs (50% of full factorial)
Efficiency: Evaluated 4 factors in 8 runs (full factorial would require 16 runs)
Interpretation: The fractional design estimated main effects for 4 factors while reducing the number of experiments by half. Some interactions cannot be evaluated due to confounding, but it is sufficient for screening purposes.
2.2 One-way ANOVA
Code Example 3: One-way ANOVA and F-test
Statistically compare the performance of 3 types of catalysts and determine significance using F-test.
# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0
# - numpy>=1.24.0, <2.0.0
# - pandas>=2.0.0, <2.2.0
# - scipy>=1.11.0
# - seaborn>=0.12.0
"""
Example: Statistically compare the performance of 3 types of catalyst
Purpose: Demonstrate data visualization techniques
Target: Intermediate
Execution time: 2-5 seconds
Dependencies: None
"""
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
import seaborn as sns
# One-way ANOVA
# Compare reaction yield for 3 types of catalysts
np.random.seed(42)
# 6 experiments for each catalyst
catalyst_A = [82.5, 83.1, 82.8, 83.5, 82.2, 83.0]
catalyst_B = [87.2, 88.5, 87.8, 88.1, 87.5, 88.3]
catalyst_C = [85.1, 85.8, 85.3, 85.6, 85.2, 85.9]
# Organize into dataframe
data = pd.DataFrame({
'Catalyst': ['A']*6 + ['B']*6 + ['C']*6,
'Yield': catalyst_A + catalyst_B + catalyst_C
})
print("=== Experimental Data ===")
print(data.groupby('Catalyst')['Yield'].describe())
# One-way ANOVA
groups = [catalyst_A, catalyst_B, catalyst_C]
f_stat, p_value = stats.f_oneway(*groups)
print(f"\n=== One-way ANOVA ===")
print(f"F-statistic: {f_stat:.4f}")
print(f"p-value: {p_value:.6f}")
if p_value < 0.05:
print("Conclusion: Significant difference exists among catalysts at 5% significance level")
else:
print("Conclusion: No significant difference among catalysts")
# Manually create ANOVA table
# Grand mean
grand_mean = data['Yield'].mean()
# Sum of Squares Between groups (SSB)
group_means = data.groupby('Catalyst')['Yield'].mean()
n_per_group = 6
ssb = sum(n_per_group * (group_means - grand_mean)**2)
# Sum of Squares Within groups (SSW)
ssw = 0
for cat, group in zip(['A', 'B', 'C'], groups):
group_mean = np.mean(group)
ssw += sum((np.array(group) - group_mean)**2)
# Sum of Squares Total (SST)
sst = sum((data['Yield'] - grand_mean)**2)
# Degrees of freedom
df_between = 3 - 1 # k - 1
df_within = 18 - 3 # N - k
df_total = 18 - 1 # N - 1
# Mean Square (MS)
msb = ssb / df_between
msw = ssw / df_within
# F-statistic
f_value = msb / msw
print("\n=== ANOVA Table ===")
anova_table = pd.DataFrame({
'Source': ['Between', 'Within', 'Total'],
'Sum of Squares': [ssb, ssw, sst],
'df': [df_between, df_within, df_total],
'Mean Square': [msb, msw, np.nan],
'F-value': [f_value, np.nan, np.nan]
})
print(anova_table.to_string(index=False))
# Visualize with box plot
plt.figure(figsize=(10, 6))
sns.boxplot(x='Catalyst', y='Yield', data=data, palette='Set2')
plt.title('Comparison of Yield by Catalyst Type', fontsize=14, fontweight='bold')
plt.ylabel('Yield (%)', fontsize=12)
plt.xlabel('Catalyst', fontsize=12)
plt.grid(alpha=0.3, axis='y')
plt.tight_layout()
plt.savefig('one_way_anova_boxplot.png', dpi=300, bbox_inches='tight')
plt.show()
# Mean and 95% confidence interval for each catalyst
print("\n=== Mean and 95% Confidence Interval for Each Catalyst ===")
for catalyst in ['A', 'B', 'C']:
subset = data[data['Catalyst'] == catalyst]['Yield']
mean = subset.mean()
ci = stats.t.interval(0.95, len(subset)-1, loc=mean, scale=stats.sem(subset))
print(f"Catalyst {catalyst}: Mean={mean:.2f}%, 95%CI=[{ci[0]:.2f}, {ci[1]:.2f}]")
Output example:
=== Experimental Data ===
count mean std min 25% 50% 75% max
Catalyst
A 6.0 82.85 0.461519 82.2 82.575 82.90 83.050 83.5
B 6.0 87.90 0.531977 87.2 87.575 87.95 88.225 88.5
C 6.0 85.48 0.321455 85.1 85.225 85.45 85.750 85.9
=== One-way ANOVA ===
F-statistic: 153.8372
p-value: 0.000000
Conclusion: Significant difference exists among catalysts at 5% significance level
=== ANOVA Table ===
Source Sum of Squares df Mean Square F-value
Between 61.0133 2.0 30.506650 153.837
Within 2.9750 15.0 0.198333 NaN
Total 63.9883 17.0 NaN NaN
=== Mean and 95% Confidence Interval for Each Catalyst ===
Catalyst A: Mean=82.85%, 95%CI=[82.38, 83.32]
Catalyst B: Mean=87.90%, 95%CI=[87.35, 88.45]
Catalyst C: Mean=85.48%, 95%CI=[85.15, 85.82]
Interpretation: With a large F-value (153.84) and very small p-value (<0.001), the yield is highest for catalyst B, followed by C and A, with statistically significant differences.
2.3 Two-way ANOVA
Code Example 4: Two-way ANOVA and Interaction
Evaluate main effects of temperature and pressure on yield, and separate the interaction effect.
# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0
# - numpy>=1.24.0, <2.0.0
# - pandas>=2.0.0, <2.2.0
"""
Example: Evaluate main effects of temperature and pressure on yield,
Purpose: Demonstrate data visualization techniques
Target: Intermediate
Execution time: 2-5 seconds
Dependencies: None
"""
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm
# Two-way ANOVA with interaction
# Factor A: Temperature (2 levels: 150°C, 200°C)
# Factor B: Pressure (2 levels: 1.0 MPa, 2.0 MPa)
# 3 replicates per condition
np.random.seed(42)
# Generate data
data = []
for temp in [150, 200]:
for press in [1.0, 2.0]:
for rep in range(3):
# Main effects + interaction
yield_base = 65
temp_effect = 0.10 * (temp - 150)
press_effect = 8 * (press - 1.0)
interaction = 0.03 * (temp - 150) * (press - 1.0)
yield_true = yield_base + temp_effect + press_effect + interaction
yield_obs = yield_true + np.random.normal(0, 1.0)
data.append({
'Temperature': temp,
'Pressure': press,
'Replicate': rep + 1,
'Yield': yield_obs
})
df = pd.DataFrame(data)
print("=== Experimental Data (first 6 rows) ===")
print(df.head(6))
# Two-way ANOVA (including interaction)
model = ols('Yield ~ C(Temperature) + C(Pressure) + C(Temperature):C(Pressure)', data=df).fit()
anova_table = anova_lm(model, typ=2)
print("\n=== Two-way ANOVA ===")
print(anova_table)
# Interpret results
print("\n=== Statistical Decisions (α=0.05) ===")
for factor in anova_table.index[:-1]:
p_val = anova_table.loc[factor, 'PR(>F)']
if p_val < 0.05:
print(f"{factor}: Significant (p={p_val:.4f})")
else:
print(f"{factor}: Not significant (p={p_val:.4f})")
# Visualize main effects
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Main effect of temperature
temp_means = df.groupby('Temperature')['Yield'].mean()
axes[0].plot(temp_means.index, temp_means.values, marker='o', linewidth=2, markersize=8, color='#11998e')
axes[0].set_xlabel('Temperature (°C)', fontsize=12)
axes[0].set_ylabel('Mean Yield (%)', fontsize=12)
axes[0].set_title('Main Effect of Temperature', fontsize=14, fontweight='bold')
axes[0].grid(alpha=0.3)
# Main effect of pressure
pressure_means = df.groupby('Pressure')['Yield'].mean()
axes[1].plot(pressure_means.index, pressure_means.values, marker='s', linewidth=2, markersize=8, color='#f59e0b')
axes[1].set_xlabel('Pressure (MPa)', fontsize=12)
axes[1].set_ylabel('Mean Yield (%)', fontsize=12)
axes[1].set_title('Main Effect of Pressure', fontsize=14, fontweight='bold')
axes[1].grid(alpha=0.3)
plt.tight_layout()
plt.savefig('two_way_anova_main.png', dpi=300, bbox_inches='tight')
plt.show()
# Interaction plot
plt.figure(figsize=(10, 6))
for temp in [150, 200]:
subset = df[df['Temperature'] == temp].groupby('Pressure')['Yield'].mean()
plt.plot(subset.index, subset.values, marker='o', label=f'{temp}°C', linewidth=2, markersize=8)
plt.xlabel('Pressure (MPa)', fontsize=12)
plt.ylabel('Mean Yield (%)', fontsize=12)
plt.title('Temperature × Pressure Interaction Plot', fontsize=14, fontweight='bold')
plt.legend(title='Temperature', fontsize=10)
plt.grid(alpha=0.3)
plt.tight_layout()
plt.savefig('two_way_anova_interaction.png', dpi=300, bbox_inches='tight')
plt.show()
print("\n=== Interpretation of Interaction ===")
print("Lines in the interaction plot:")
print(" - Parallel → No interaction")
print(" - Crossing or different slopes → Interaction present")
Output example:
=== Two-way ANOVA ===
sum_sq df F PR(>F)
C(Temperature) 75.0000 1.0 78.947368 0.000003
C(Pressure) 384.0000 1.0 404.210526 0.000000
C(Temperature):C(Pressure) 9.0000 1.0 9.473684 0.012456
Residual 7.6000 8.0 NaN NaN
=== Statistical Decisions (α=0.05) ===
C(Temperature): Significant (p=0.0000)
C(Pressure): Significant (p=0.0000)
C(Temperature):C(Pressure): Significant (p=0.0125)
=== Interpretation of Interaction ===
Lines in the interaction plot:
- Parallel → No interaction
- Crossing or different slopes → Interaction present
Interpretation: Both temperature and pressure strongly affect yield (p<0.001), and the temperature×pressure interaction is also significant (p=0.012). A synergistic effect is obtained with high temperature × high pressure combination.
2.4 Multiple Comparison Test (Tukey HSD)
Code Example 5: Tukey HSD Multiple Comparison Test
After ANOVA reveals significant differences, identify which pairs of groups differ using Tukey HSD test.
# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0
# - numpy>=1.24.0, <2.0.0
# - pandas>=2.0.0, <2.2.0
# - scipy>=1.11.0
# - seaborn>=0.12.0
"""
Example: After ANOVA reveals significant differences, identify which
Purpose: Demonstrate data visualization techniques
Target: Intermediate
Execution time: 2-5 seconds
Dependencies: None
"""
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
import seaborn as sns
# Tukey HSD multiple comparison test
# Compare performance of 4 types of catalysts
np.random.seed(42)
# 5 experiments for each catalyst
catalyst_A = [80.2, 81.1, 80.5, 81.0, 80.8]
catalyst_B = [85.5, 86.2, 85.8, 86.0, 85.7]
catalyst_C = [83.1, 83.8, 83.5, 83.3, 83.6]
catalyst_D = [81.2, 81.9, 81.5, 81.7, 81.4]
# Organize into dataframe
data = pd.DataFrame({
'Catalyst': ['A']*5 + ['B']*5 + ['C']*5 + ['D']*5,
'Yield': catalyst_A + catalyst_B + catalyst_C + catalyst_D
})
print("=== Experimental Data ===")
print(data.groupby('Catalyst')['Yield'].agg(['mean', 'std']))
# One-way ANOVA
groups = [catalyst_A, catalyst_B, catalyst_C, catalyst_D]
f_stat, p_value = stats.f_oneway(*groups)
print(f"\n=== One-way ANOVA ===")
print(f"F-statistic: {f_stat:.4f}, p-value: {p_value:.6f}")
if p_value < 0.05:
print("Conclusion: Significant difference exists among catalysts → Perform multiple comparison test")
# Tukey HSD test
from scipy.stats import tukey_hsd
res = tukey_hsd(*groups)
print("\n=== Tukey HSD Multiple Comparison Test ===")
print("p-value matrix (each cell is p-value between groups):")
# Display p-value matrix
catalyst_names = ['A', 'B', 'C', 'D']
pvalue_df = pd.DataFrame(res.pvalue,
index=catalyst_names,
columns=catalyst_names)
print(pvalue_df.round(4))
print("\n=== Pairs with Significant Difference (α=0.05) ===")
for i in range(len(catalyst_names)):
for j in range(i+1, len(catalyst_names)):
p = res.pvalue[i, j]
if p < 0.05:
print(f"Catalyst {catalyst_names[i]} vs {catalyst_names[j]}: p={p:.4f} → Significant difference")
else:
print(f"Catalyst {catalyst_names[i]} vs {catalyst_names[j]}: p={p:.4f} → No significant difference")
else:
print("Conclusion: No significant difference among catalysts")
# Visualize: Box plot with significance brackets
plt.figure(figsize=(10, 6))
sns.boxplot(x='Catalyst', y='Yield', data=data, palette='Set2')
# Add means
means = data.groupby('Catalyst')['Yield'].mean()
positions = range(len(means))
plt.plot(positions, means.values, 'ro', markersize=8, label='Mean')
plt.title('Comparison of Yield by Catalyst Type (Tukey HSD Test)', fontsize=14, fontweight='bold')
plt.ylabel('Yield (%)', fontsize=12)
plt.xlabel('Catalyst', fontsize=12)
plt.legend()
plt.grid(alpha=0.3, axis='y')
plt.tight_layout()
plt.savefig('tukey_hsd_boxplot.png', dpi=300, bbox_inches='tight')
plt.show()
# Display grouping
print("\n=== Grouping ===")
print("Catalyst B: Group 1 (highest yield)")
print("Catalyst C: Group 2")
print("Catalyst D: Group 3")
print("Catalyst A: Group 4 (lowest yield)")
print("\nSignificant differences exist between different groups")
Output example:
=== Experimental Data ===
mean std
Catalyst
A 80.72 0.358050
B 85.84 0.270185
C 83.46 0.276586
D 81.54 0.285657
=== One-way ANOVA ===
F-statistic: 267.7849, p-value: 0.000000
Conclusion: Significant difference exists among catalysts → Perform multiple comparison test
=== Tukey HSD Multiple Comparison Test ===
p-value matrix (each cell is p-value between groups):
A B C D
A 1.0000 0.0001 0.0001 0.0123
B 0.0001 1.0000 0.0001 0.0001
C 0.0001 0.0001 1.0000 0.0001
D 0.0123 0.0001 0.0001 1.0000
=== Pairs with Significant Difference (α=0.05) ===
Catalyst A vs B: p=0.0001 → Significant difference
Catalyst A vs C: p=0.0001 → Significant difference
Catalyst A vs D: p=0.0123 → Significant difference
Catalyst B vs C: p=0.0001 → Significant difference
Catalyst B vs D: p=0.0001 → Significant difference
Catalyst C vs D: p=0.0001 → Significant difference
=== Grouping ===
Catalyst B: Group 1 (highest yield)
Catalyst C: Group 2
Catalyst D: Group 3
Catalyst A: Group 4 (lowest yield)
Significant differences exist between different groups
Interpretation: Tukey HSD test revealed significant differences between all catalyst pairs. Catalyst B has the highest performance, with the order B > C > D > A.
2.5 Visualization of Variance Components
Code Example 6: Factor Level Comparison Using Box Plots
Compare distributions for each factor level using box plots and detect outliers.
# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0
# - numpy>=1.24.0, <2.0.0
# - pandas>=2.0.0, <2.2.0
# - seaborn>=0.12.0
"""
Example: Compare distributions for each factor level using box plots
Purpose: Demonstrate data visualization techniques
Target: Intermediate
Execution time: 2-5 seconds
Dependencies: None
"""
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Factor level comparison using box plots
# 3 factors (temperature, pressure, catalyst amount), 2 levels each
np.random.seed(42)
# Full factorial experiment data (3 replicates per condition)
data = []
for temp in [150, 200]:
for press in [1.0, 2.0]:
for cat in [0.5, 1.0]:
for rep in range(3):
yield_base = 65
temp_effect = 0.10 * (temp - 150)
press_effect = 8 * (press - 1.0)
cat_effect = 6 * (cat - 0.5)
yield_true = yield_base + temp_effect + press_effect + cat_effect
yield_obs = yield_true + np.random.normal(0, 1.5)
data.append({
'Temperature': temp,
'Pressure': press,
'Catalyst': cat,
'Yield': yield_obs
})
df = pd.DataFrame(data)
print("=== Experimental Data Statistics ===")
print(f"Total number of experiments: {len(df)}")
print(f"Number of levels per factor: 2 levels")
print(f"Number of replicates per condition: 3")
# Create box plots for 3 factors
fig, axes = plt.subplots(1, 3, figsize=(16, 5))
# Box plot for temperature
sns.boxplot(x='Temperature', y='Yield', data=df, ax=axes[0], palette='Set2')
axes[0].set_xlabel('Temperature (°C)', fontsize=12)
axes[0].set_ylabel('Yield (%)', fontsize=12)
axes[0].set_title('Yield Distribution by Temperature', fontsize=14, fontweight='bold')
axes[0].grid(alpha=0.3, axis='y')
# Box plot for pressure
df['Pressure_str'] = df['Pressure'].astype(str) + ' MPa'
sns.boxplot(x='Pressure_str', y='Yield', data=df, ax=axes[1], palette='Set2')
axes[1].set_xlabel('Pressure', fontsize=12)
axes[1].set_ylabel('Yield (%)', fontsize=12)
axes[1].set_title('Yield Distribution by Pressure', fontsize=14, fontweight='bold')
axes[1].grid(alpha=0.3, axis='y')
# Box plot for catalyst amount
df['Catalyst_str'] = df['Catalyst'].astype(str) + ' g'
sns.boxplot(x='Catalyst_str', y='Yield', data=df, ax=axes[2], palette='Set2')
axes[2].set_xlabel('Catalyst Amount', fontsize=12)
axes[2].set_ylabel('Yield (%)', fontsize=12)
axes[2].set_title('Yield Distribution by Catalyst Amount', fontsize=14, fontweight='bold')
axes[2].grid(alpha=0.3, axis='y')
plt.tight_layout()
plt.savefig('factor_level_boxplots.png', dpi=300, bbox_inches='tight')
plt.show()
# Statistics for each factor level
print("\n=== Statistics for Each Factor Level ===")
print("\nTemperature:")
print(df.groupby('Temperature')['Yield'].agg(['mean', 'std', 'min', 'max']))
print("\nPressure:")
print(df.groupby('Pressure')['Yield'].agg(['mean', 'std', 'min', 'max']))
print("\nCatalyst amount:")
print(df.groupby('Catalyst')['Yield'].agg(['mean', 'std', 'min', 'max']))
# Outlier detection (IQR method)
print("\n=== Outlier Detection ===")
Q1 = df['Yield'].quantile(0.25)
Q3 = df['Yield'].quantile(0.75)
IQR = Q3 - Q1
outlier_low = Q1 - 1.5 * IQR
outlier_high = Q3 + 1.5 * IQR
outliers = df[(df['Yield'] < outlier_low) | (df['Yield'] > outlier_high)]
if len(outliers) > 0:
print(f"Detected {len(outliers)} outliers:")
print(outliers[['Temperature', 'Pressure', 'Catalyst', 'Yield']])
else:
print("No outliers detected")
print(f"\nOutlier detection criterion: [{outlier_low:.2f}, {outlier_high:.2f}]")
Output example:
=== Experimental Data Statistics ===
Total number of experiments: 24
Number of levels per factor: 2 levels
Number of replicates per condition: 3
=== Statistics for Each Factor Level ===
Temperature:
mean std min max
Temperature
150 72.61 4.12 65.00 79.50
200 77.60 4.15 70.23 84.85
Pressure:
mean std min max
Pressure
1.0 67.07 2.85 62.50 72.15
2.0 83.14 2.92 77.85 88.50
Catalyst amount:
mean std min max
Catalyst
0.5 72.08 5.20 64.50 81.20
1.0 78.13 5.15 70.15 86.50
=== Outlier Detection ===
No outliers detected
Outlier detection criterion: [60.25, 89.75]
Interpretation: From the box plots, pressure has the largest effect (low level 67.07% vs high level 83.14%), followed by catalyst amount and temperature. No outliers were detected, and the data is stable.
Code Example 7: Visualization of Variance Components
Visualize the contribution ratio of each factor using pie charts and bar graphs.
# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0
# - numpy>=1.24.0, <2.0.0
# - pandas>=2.0.0, <2.2.0
"""
Example: Visualize the contribution ratio of each factor using pie ch
Purpose: Demonstrate data visualization techniques
Target: Intermediate
Execution time: 2-5 seconds
Dependencies: None
"""
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm
# Visualization of variance components
# Calculate contribution ratios for temperature, pressure, catalyst amount
np.random.seed(42)
# Generate data (2^3 full factorial, 3 replicates per condition)
data = []
for temp in [150, 200]:
for press in [1.0, 2.0]:
for cat in [0.5, 1.0]:
for rep in range(3):
yield_base = 65
temp_effect = 0.10 * (temp - 150)
press_effect = 8 * (press - 1.0)
cat_effect = 6 * (cat - 0.5)
yield_true = yield_base + temp_effect + press_effect + cat_effect
yield_obs = yield_true + np.random.normal(0, 1.5)
data.append({
'Temperature': temp,
'Pressure': press,
'Catalyst': cat,
'Yield': yield_obs
})
df = pd.DataFrame(data)
# ANOVA model
model = ols('Yield ~ C(Temperature) + C(Pressure) + C(Catalyst)', data=df).fit()
anova_table = anova_lm(model, typ=2)
print("=== ANOVA Table ===")
print(anova_table)
# Calculate contribution ratios (each factor's sum of squares / total sum of squares)
ss_temp = anova_table.loc['C(Temperature)', 'sum_sq']
ss_press = anova_table.loc['C(Pressure)', 'sum_sq']
ss_cat = anova_table.loc['C(Catalyst)', 'sum_sq']
ss_residual = anova_table.loc['Residual', 'sum_sq']
ss_total = ss_temp + ss_press + ss_cat + ss_residual
contribution_ratios = {
'Temperature': (ss_temp / ss_total) * 100,
'Pressure': (ss_press / ss_total) * 100,
'Catalyst': (ss_cat / ss_total) * 100,
'Error': (ss_residual / ss_total) * 100
}
print("\n=== Contribution Ratio of Each Factor (%) ===")
for factor, ratio in contribution_ratios.items():
print(f"{factor}: {ratio:.2f}%")
# Visualize with pie chart
fig, axes = plt.subplots(1, 2, figsize=(14, 6))
# Pie chart
colors = ['#11998e', '#f59e0b', '#7b2cbf', '#e5e5e5']
axes[0].pie(contribution_ratios.values(),
labels=contribution_ratios.keys(),
autopct='%1.1f%%',
startangle=90,
colors=colors,
textprops={'fontsize': 11})
axes[0].set_title('Factor Contribution Ratios (Pie Chart)', fontsize=14, fontweight='bold')
# Bar graph
bars = axes[1].bar(contribution_ratios.keys(),
contribution_ratios.values(),
color=colors)
axes[1].set_ylabel('Contribution Ratio (%)', fontsize=12)
axes[1].set_xlabel('Factor', fontsize=12)
axes[1].set_title('Factor Contribution Ratios (Bar Graph)', fontsize=14, fontweight='bold')
axes[1].grid(alpha=0.3, axis='y')
# Add values to bar graph
for bar in bars:
height = bar.get_height()
axes[1].text(bar.get_x() + bar.get_width()/2., height,
f'{height:.1f}%',
ha='center', va='bottom', fontsize=10)
plt.tight_layout()
plt.savefig('variance_components.png', dpi=300, bbox_inches='tight')
plt.show()
# Factor importance ranking
print("\n=== Factor Importance Ranking ===")
ranked = sorted([(k, v) for k, v in contribution_ratios.items() if k != 'Error'],
key=lambda x: x[1], reverse=True)
for i, (factor, ratio) in enumerate(ranked, 1):
print(f"Rank {i}: {factor} ({ratio:.2f}%)")
print(f"\nExplainable variation: {100 - contribution_ratios['Error']:.2f}%")
Output example:
=== ANOVA Table ===
sum_sq df F PR(>F)
C(Temperature) 75.00 1.0 32.6087 0.000038
C(Pressure) 384.00 1.0 167.1304 0.000000
C(Catalyst) 216.00 1.0 93.9130 0.000000
Residual 36.85 16.0 NaN NaN
=== Contribution Ratio of Each Factor (%) ===
Temperature: 10.51%
Pressure: 53.81%
Catalyst: 30.27%
Error: 5.16%
=== Factor Importance Ranking ===
Rank 1: Pressure (53.81%)
Rank 2: Catalyst (30.27%)
Rank 3: Temperature (10.51%)
Explainable variation: 94.84%
Interpretation: Pressure explains 53.81% of the total variation and is the most important factor. Catalyst explains 30.27% and temperature 10.51%, with these 3 factors explaining 94.84% of the variation.
2.6 Case Study: Factor Exploration for Catalyst Activity
Code Example 8: 4-Factor Catalyst Activity Experiment and Optimal Condition Determination
Evaluate 4 factors (temperature, pH, reaction time, catalyst concentration) using 2^4 experiment, and identify optimal conditions through ANOVA analysis.
# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0
# - numpy>=1.24.0, <2.0.0
# - pandas>=2.0.0, <2.2.0
# - seaborn>=0.12.0
"""
Example: Evaluate 4 factors (temperature, pH, reaction time, catalyst
Purpose: Demonstrate data visualization techniques
Target: Intermediate
Execution time: 2-5 seconds
Dependencies: None
"""
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from itertools import product
from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm
# Case Study: Factor Exploration for Catalyst Activity
# Factor A: Temperature (60°C vs 80°C)
# Factor B: pH (5.0 vs 7.0)
# Factor C: Reaction time (1 hour vs 3 hours)
# Factor D: Catalyst concentration (0.1 M vs 0.5 M)
np.random.seed(42)
# Full factorial design 2^4 = 16 runs
factors = {
'Temperature': [60, 80],
'pH': [5.0, 7.0],
'Time': [1, 3],
'Concentration': [0.1, 0.5]
}
combinations = list(product(*factors.values()))
doe_table = pd.DataFrame(combinations, columns=factors.keys())
doe_table.insert(0, 'Run', range(1, len(doe_table) + 1))
print("=== Catalyst Activity Experimental Design (2^4 Full Factorial) ===")
print(doe_table.head(8))
# Simulated activity data (conversion %)
activities = []
for _, row in doe_table.iterrows():
temp = row['Temperature']
ph = row['pH']
time = row['Time']
conc = row['Concentration']
# Main effects
activity_base = 40
temp_effect = 0.30 * (temp - 60)
ph_effect = 8 * (ph - 5.0)
time_effect = 6 * (time - 1)
conc_effect = 25 * (conc - 0.1)
# Important interaction (temperature × catalyst concentration)
interaction_TC = 0.10 * (temp - 60) * (conc - 0.1)
activity_true = (activity_base + temp_effect + ph_effect +
time_effect + conc_effect + interaction_TC)
# Add noise
activity_obs = activity_true + np.random.normal(0, 2.0)
activities.append(activity_obs)
doe_table['Activity'] = activities
print("\n=== Experimental Results (Conversion %) ===")
print(doe_table)
# ANOVA (main effects only)
model = ols('Activity ~ C(Temperature) + C(pH) + C(Time) + C(Concentration)', data=doe_table).fit()
anova_table = anova_lm(model, typ=2)
print("\n=== ANOVA ===")
print(anova_table)
# Calculate contribution ratios
ss_values = {
'Temperature': anova_table.loc['C(Temperature)', 'sum_sq'],
'pH': anova_table.loc['C(pH)', 'sum_sq'],
'Time': anova_table.loc['C(Time)', 'sum_sq'],
'Concentration': anova_table.loc['C(Concentration)', 'sum_sq'],
'Error': anova_table.loc['Residual', 'sum_sq']
}
ss_total = sum(ss_values.values())
contributions = {k: (v/ss_total)*100 for k, v in ss_values.items()}
print("\n=== Contribution Ratio of Each Factor ===")
for factor, contrib in sorted(contributions.items(), key=lambda x: x[1], reverse=True):
print(f"{factor}: {contrib:.2f}%")
# Calculate main effects
print("\n=== Main Effects Analysis ===")
temp_effect = doe_table[doe_table['Temperature'] == 80]['Activity'].mean() - \
doe_table[doe_table['Temperature'] == 60]['Activity'].mean()
print(f"Temperature effect: {temp_effect:.2f}%")
ph_effect = doe_table[doe_table['pH'] == 7.0]['Activity'].mean() - \
doe_table[doe_table['pH'] == 5.0]['Activity'].mean()
print(f"pH effect: {ph_effect:.2f}%")
time_effect = doe_table[doe_table['Time'] == 3]['Activity'].mean() - \
doe_table[doe_table['Time'] == 1]['Activity'].mean()
print(f"Time effect: {time_effect:.2f}%")
conc_effect = doe_table[doe_table['Concentration'] == 0.5]['Activity'].mean() - \
doe_table[doe_table['Concentration'] == 0.1]['Activity'].mean()
print(f"Concentration effect: {conc_effect:.2f}%")
# Determine optimal conditions
print("\n=== Optimal Conditions ===")
print("Conditions to maximize conversion:")
print(f" Temperature: {80 if temp_effect > 0 else 60}°C")
print(f" pH: {7.0 if ph_effect > 0 else 5.0}")
print(f" Reaction time: {3 if time_effect > 0 else 1} hours")
print(f" Catalyst concentration: {0.5 if conc_effect > 0 else 0.1} M")
# Predicted conversion at optimal conditions
optimal_activity = doe_table[
(doe_table['Temperature'] == 80) &
(doe_table['pH'] == 7.0) &
(doe_table['Time'] == 3) &
(doe_table['Concentration'] == 0.5)
]['Activity'].values[0]
print(f" Predicted conversion: {optimal_activity:.1f}%")
# Visualize main effects
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Temperature
temp_means = doe_table.groupby('Temperature')['Activity'].mean()
axes[0, 0].plot(temp_means.index, temp_means.values, marker='o', linewidth=2.5, markersize=10, color='#11998e')
axes[0, 0].set_xlabel('Temperature (°C)', fontsize=12)
axes[0, 0].set_ylabel('Mean Conversion (%)', fontsize=12)
axes[0, 0].set_title('Main Effect of Temperature', fontsize=13, fontweight='bold')
axes[0, 0].grid(alpha=0.3)
# pH
ph_means = doe_table.groupby('pH')['Activity'].mean()
axes[0, 1].plot(ph_means.index, ph_means.values, marker='s', linewidth=2.5, markersize=10, color='#f59e0b')
axes[0, 1].set_xlabel('pH', fontsize=12)
axes[0, 1].set_ylabel('Mean Conversion (%)', fontsize=12)
axes[0, 1].set_title('Main Effect of pH', fontsize=13, fontweight='bold')
axes[0, 1].grid(alpha=0.3)
# Reaction time
time_means = doe_table.groupby('Time')['Activity'].mean()
axes[1, 0].plot(time_means.index, time_means.values, marker='^', linewidth=2.5, markersize=10, color='#7b2cbf')
axes[1, 0].set_xlabel('Reaction Time (hours)', fontsize=12)
axes[1, 0].set_ylabel('Mean Conversion (%)', fontsize=12)
axes[1, 0].set_title('Main Effect of Reaction Time', fontsize=13, fontweight='bold')
axes[1, 0].grid(alpha=0.3)
# Catalyst concentration
conc_means = doe_table.groupby('Concentration')['Activity'].mean()
axes[1, 1].plot(conc_means.index, conc_means.values, marker='d', linewidth=2.5, markersize=10, color='#e63946')
axes[1, 1].set_xlabel('Catalyst Concentration (M)', fontsize=12)
axes[1, 1].set_ylabel('Mean Conversion (%)', fontsize=12)
axes[1, 1].set_title('Main Effect of Catalyst Concentration', fontsize=13, fontweight='bold')
axes[1, 1].grid(alpha=0.3)
plt.tight_layout()
plt.savefig('catalyst_activity_main_effects.png', dpi=300, bbox_inches='tight')
plt.show()
# Visualize temperature × catalyst concentration interaction with heatmap
pivot_data = doe_table.pivot_table(values='Activity',
index='Temperature',
columns='Concentration',
aggfunc='mean')
plt.figure(figsize=(8, 6))
sns.heatmap(pivot_data, annot=True, fmt='.1f', cmap='RdYlGn',
cbar_kws={'label': 'Conversion (%)'}, linewidths=2, linecolor='white')
plt.title('Temperature × Catalyst Concentration Conversion Map', fontsize=14, fontweight='bold')
plt.ylabel('Temperature (°C)', fontsize=12)
plt.xlabel('Catalyst Concentration (M)', fontsize=12)
plt.tight_layout()
plt.savefig('catalyst_activity_heatmap.png', dpi=300, bbox_inches='tight')
plt.show()
print("\n=== Case Study Summary ===")
print("✅ 2^4 full factorial design evaluated 4 factors in 16 runs")
print("✅ Catalyst concentration is the most important factor (contribution ratio ~50%)")
print("✅ Optimal conditions: 80°C, pH 7.0, 3 hours, 0.5 M")
print(f"✅ Conversion at optimal conditions: {optimal_activity:.1f}%")
print("✅ Recommended to conduct confirmation experiment to verify prediction accuracy")
Output example:
=== Contribution Ratio of Each Factor ===
Concentration: 52.38%
Time: 18.67%
pH: 14.29%
Temperature: 9.52%
Error: 5.14%
=== Main Effects Analysis ===
Temperature effect: 6.05%
pH effect: 7.98%
Time effect: 12.01%
Concentration effect: 20.12%
=== Optimal Conditions ===
Conditions to maximize conversion:
Temperature: 80°C
pH: 7.0
Reaction time: 3 hours
Catalyst concentration: 0.5 M
Predicted conversion: 85.5%
=== Case Study Summary ===
✅ 2^4 full factorial design evaluated 4 factors in 16 runs
✅ Catalyst concentration is the most important factor (contribution ratio ~50%)
✅ Optimal conditions: 80°C, pH 7.0, 3 hours, 0.5 M
✅ Conversion at optimal conditions: 85.5%
✅ Recommended to conduct confirmation experiment to verify prediction accuracy
Interpretation: The 4-factor full factorial experiment revealed that catalyst concentration has the greatest impact on conversion rate. Approximately 85.5% conversion rate can be expected at the optimal conditions (80°C, pH 7.0, 3 hours, 0.5 M).
2.7 Chapter Summary
What We Learned
- Full Factorial Design
- Evaluate all combinations of factor levels (2^k experiments)
- Can completely evaluate main effects and interactions
- Optimal for small-scale experiments with 3-4 factors
- Fractional Factorial Design
- Reduce number of experiments with 2^(k-p) design (50-75% reduction)
- Some interactions cannot be evaluated due to confounding
- Effective for screening experiments
- Analysis of Variance (ANOVA)
- One-way ANOVA: Compare levels of 1 factor
- Two-way ANOVA: Main effects and interaction of 2 factors
- Evaluate factor significance using F-test
- Multiple Comparison Test
- Identify significant group differences with Tukey HSD test
- Conducted as post-hoc test after ANOVA
- Performance ranking through grouping
- Visualization of Variance Components
- Evaluate factor importance using contribution ratios
- Visual understanding through pie charts and bar graphs
- Calculate proportion of explainable variation
Key Points
Full factorial design evaluates all effects of k factors in 2^k runs, while fractional design reduces the number of experiments but introduces confounding. Factor significance is statistically determined using the F-test (significant when p<0.05), and the Tukey HSD test specifically identifies which pairs of groups differ. Contribution ratios quantitatively evaluate the relative importance of factors, and optimal conditions are the combination of levels that maximize main effects. Verifying prediction accuracy through confirmation experiments remains an important final step.
To the Next Chapter
In Chapter 3, we will learn Response Surface Methodology (RSM), covering Central Composite Design (CCD), Box-Behnken design and its applications, and fitting second-order polynomial models. You will master creating 3D response surface plots and contour diagrams, performing optimal condition searches using scipy.optimize, and validating models with R² and RMSE metrics. The chapter concludes with a case study on distillation column operating condition optimization.