Chapter 4: Practical AutoML Tools
This chapter focuses on practical applications of Practical AutoML Tools. You will learn essential concepts and techniques.
Learning Objectives:
- Understand TPOT's genetic programming approach
- Master Auto-sklearn's Bayesian optimization and meta-learning
- Build stacked ensembles with H2O AutoML
- Understand characteristics and use cases of each AutoML tool
- Learn deployment strategies for production environments
Reading Time: 40-45 minutes
4.1 TPOT (Tree-based Pipeline Optimization Tool)
4.1.1 Overview of TPOT
What is TPOT:
An AutoML tool that automatically optimizes entire scikit-learn pipelines using genetic programming.
Developer: University of Pennsylvania (Moore Lab)
Features:
- Exploration using genetic algorithms
- Full automation from preprocessing to model selection
- Fully compatible with scikit-learn
- Generated pipeline code can be exported as Python code
4.1.2 Genetic Programming Approach
Genetic Algorithm Flow:
1. Initial population generation (create random pipelines)
2. Evaluation (cross-validation score)
3. Selection (choose top individuals)
4. Crossover (combine pipelines)
5. Mutation (random changes)
6. Next generation
7. Repeat steps 2-6 for specified number of generations
Pipeline Representation:
# Genotype (tree structure)
Pipeline(
SelectKBest(k=10),
StandardScaler(),
RandomForestClassifier(n_estimators=100)
)
4.1.3 Basic Usage of TPOT
Example 1: Basic Classification Example
# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0
"""
Example: Example 1: Basic Classification Example
Purpose: Demonstrate machine learning model training and evaluation
Target: Beginner to Intermediate
Execution time: 1-5 minutes
Dependencies: None
"""
from tpot import TPOTClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import numpy as np
# Prepare dataset
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(
iris.data, iris.target, test_size=0.2, random_state=42
)
# Create TPOTClassifier
tpot = TPOTClassifier(
generations=5, # Number of evolutionary generations
population_size=20, # Number of individuals per generation
cv=5, # Number of cross-validation folds
random_state=42,
verbosity=2, # Progress display level
n_jobs=-1 # Parallel processing
)
# Training (takes a few minutes)
tpot.fit(X_train, y_train)
# Evaluation
print(f'Test Accuracy: {tpot.score(X_test, y_test):.4f}')
# Save optimal pipeline as Python code
tpot.export('tpot_iris_pipeline.py')
Output Example:
Generation 1 - Current best internal CV score: 0.9666666666666667
Generation 2 - Current best internal CV score: 0.975
Generation 3 - Current best internal CV score: 0.975
Generation 4 - Current best internal CV score: 0.9833333333333333
Generation 5 - Current best internal CV score: 0.9833333333333333
Best pipeline: RandomForestClassifier(SelectKBest(input_matrix, k=2),
bootstrap=True, n_estimators=100)
Test Accuracy: 1.0000
4.1.4 Customizing TPOT Configuration
Example 2: Custom TPOT Configuration
from tpot import TPOTClassifier
# Create TPOT with custom configuration
tpot_config = {
'sklearn.ensemble.RandomForestClassifier': {
'n_estimators': [50, 100, 200],
'max_features': ['sqrt', 'log2', None],
'min_samples_split': [2, 5, 10]
},
'sklearn.svm.SVC': {
'C': [0.1, 1.0, 10.0],
'kernel': ['linear', 'rbf'],
'gamma': ['scale', 'auto']
},
'sklearn.preprocessing.StandardScaler': {},
'sklearn.feature_selection.SelectKBest': {
'k': range(1, 11)
}
}
tpot = TPOTClassifier(
config_dict=tpot_config,
generations=10,
population_size=50,
cv=5,
scoring='f1_weighted', # Change evaluation metric to F1 score
max_time_mins=30, # Maximum execution time 30 minutes
random_state=42,
verbosity=2
)
tpot.fit(X_train, y_train)
4.1.5 Regression Example
Example 3: Using TPOT for Regression
from tpot import TPOTRegressor
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
# Generate regression dataset
X, y = make_regression(n_samples=1000, n_features=20,
n_informative=15, noise=10, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# TPOTRegressor
tpot_reg = TPOTRegressor(
generations=5,
population_size=20,
cv=5,
scoring='neg_mean_squared_error', # Minimize MSE
random_state=42,
verbosity=2
)
tpot_reg.fit(X_train, y_train)
# Evaluation
from sklearn.metrics import mean_squared_error, r2_score
y_pred = tpot_reg.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f'Test MSE: {mse:.4f}')
print(f'Test R²: {r2:.4f}')
# Save pipeline
tpot_reg.export('tpot_regression_pipeline.py')
Example of Exported Code:
# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0
# - pandas>=2.0.0, <2.2.0
"""
Example: Example of Exported Code:
Purpose: Demonstrate machine learning model training and evaluation
Target: Advanced
Execution time: 1-5 minutes
Dependencies: None
"""
# tpot_regression_pipeline.py
import numpy as np
import pandas as pd
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
# Generated pipeline
exported_pipeline = make_pipeline(
StandardScaler(),
GradientBoostingRegressor(
alpha=0.9, learning_rate=0.1, loss="squared_error",
max_depth=3, n_estimators=100
)
)
# Usage example
exported_pipeline.fit(training_features, training_target)
results = exported_pipeline.predict(testing_features)
4.2 Auto-sklearn
4.2.1 Overview of Auto-sklearn
What is Auto-sklearn:
An automated machine learning tool that combines Bayesian optimization, meta-learning, and ensemble construction.
Developer: University of Freiburg (Germany)
Key Technologies:
- Bayesian Optimization: SMAC (Sequential Model-based Algorithm Configuration)
- Meta-learning: Learn initial configurations from past tasks
- Ensemble Construction: Automatically combine multiple models
4.2.2 Bayesian Optimization and Meta-learning
Bayesian Optimization Flow:
1. Evaluate model with initial configuration
2. Predict performance using Gaussian process
3. Determine next search point using acquisition function
4. Evaluate and update Gaussian process
5. Repeat steps 2-4
Meta-learning:
Infer good initial configurations for similar tasks from optimal settings on 140+ past datasets
Meta-knowledge base (140+ tasks)
↓
Similarity calculation (dataset features)
↓
Warm start with top 25 configurations
↓
Fine-tune with Bayesian optimization
4.2.3 Basic Usage of Auto-sklearn
Example 4: Auto-sklearn Classification
# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0
"""
Example: Example 4: Auto-sklearn Classification
Purpose: Demonstrate machine learning model training and evaluation
Target: Beginner to Intermediate
Execution time: 1-5 minutes
Dependencies: None
"""
import autosklearn.classification
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import numpy as np
# Prepare dataset
digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(
digits.data, digits.target, test_size=0.2, random_state=42
)
# Auto-sklearn classifier
automl = autosklearn.classification.AutoSklearnClassifier(
time_left_for_this_task=300, # Total execution time 5 minutes
per_run_time_limit=30, # 30 seconds per model
ensemble_size=50, # Ensemble size
ensemble_nbest=200, # Number of ensemble candidates
initial_configurations_via_metalearning=25, # Number of meta-learning initial configurations
seed=42
)
# Training
automl.fit(X_train, y_train)
# Prediction and evaluation
y_pred = automl.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f'Test Accuracy: {accuracy:.4f}')
# Statistics of trained models
print(automl.sprint_statistics())
# Ensemble details
print(automl.show_models())
Output Example:
auto-sklearn results:
Dataset name: digits
Metric: accuracy
Best validation score: 0.9832
Number of target algorithm runs: 127
Number of successful target algorithm runs: 115
Number of crashed target algorithm runs: 8
Number of target algorithms that exceeded the time limit: 4
Number of target algorithms that exceeded the memory limit: 0
Test Accuracy: 0.9806
4.2.4 New Features in Auto-sklearn 2.0
Improvements in Auto-sklearn 2.0:
- Reduced execution time (50% reduction compared to previous version)
- Improved default settings
- Faster portfolio construction
- More efficient ensemble selection
Example 5: Using Auto-sklearn 2.0
from autosklearn.experimental.askl2 import AutoSklearn2Classifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
# Prepare data
cancer = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
cancer.data, cancer.target, test_size=0.2, random_state=42
)
# Auto-sklearn 2.0 (faster)
automl2 = AutoSklearn2Classifier(
time_left_for_this_task=120, # 2 minutes
seed=42
)
automl2.fit(X_train, y_train)
# Evaluation
from sklearn.metrics import classification_report
y_pred = automl2.predict(X_test)
print(classification_report(y_test, y_pred))
# Get CV results
cv_results = automl2.cv_results_
print(f"Best model config: {automl2.get_models_with_weights()}")
4.2.5 Custom Settings and Constraints
Example 6: Restricting Model Candidates
import autosklearn.classification
# Restrict algorithms to use
automl_custom = autosklearn.classification.AutoSklearnClassifier(
time_left_for_this_task=300,
include={
'classifier': ['random_forest', 'gradient_boosting', 'extra_trees'],
'feature_preprocessor': ['no_preprocessing', 'pca', 'select_percentile']
},
exclude={
'classifier': ['k_nearest_neighbors'], # Exclude KNN
},
seed=42
)
automl_custom.fit(X_train, y_train)
4.3 H2O AutoML
4.3.1 Overview of H2O AutoML
What is H2O.ai:
An open-source distributed machine learning platform. Strong in large-scale data processing.
H2O AutoML Features:
- Automatic stacked ensemble construction
- Leaderboard-style result display
- Support for large-scale data (distributed processing)
- Model explainability features (SHAP, PDP)
4.3.2 Basic Usage of H2O AutoML
Example 7: H2O AutoML Classification
# Requirements:
# - Python 3.9+
# - pandas>=2.0.0, <2.2.0
"""
Example: Example 7: H2O AutoML Classification
Purpose: Demonstrate machine learning model training and evaluation
Target: Beginner to Intermediate
Execution time: 1-5 minutes
Dependencies: None
"""
import h2o
from h2o.automl import H2OAutoML
import pandas as pd
# Initialize H2O
h2o.init()
# Prepare dataset (convert from Pandas)
from sklearn.datasets import load_wine
wine = load_wine()
df = pd.DataFrame(wine.data, columns=wine.feature_names)
df['target'] = wine.target
# Convert to H2O DataFrame
hf = h2o.H2OFrame(df)
hf['target'] = hf['target'].asfactor() # For classification task
# Train/test split
train, test = hf.split_frame(ratios=[0.8], seed=42)
# Run AutoML
aml = H2OAutoML(
max_runtime_secs=300, # Maximum execution time 5 minutes
max_models=20, # Maximum number of models
seed=42,
sort_metric='AUC', # Evaluation metric
exclude_algos=['DeepLearning'] # Exclude deep learning
)
# Training (target is response variable, rest are predictor variables)
x = hf.columns
x.remove('target')
y = 'target'
aml.fit(x=x, y=y, training_frame=train)
# Display leaderboard
lb = aml.leaderboard
print(lb.head(rows=10))
# Prediction with best model
best_model = aml.leader
preds = best_model.predict(test)
print(preds.head())
# Model performance
perf = best_model.model_performance(test)
print(perf)
Leaderboard Output Example:
model_id auc logloss
0 StackedEnsemble_AllModels_1_AutoML_1_20241021 0.998876 0.067234
1 StackedEnsemble_BestOfFamily_1_AutoML_1_20241021 0.997543 0.072156
2 GBM_1_AutoML_1_20241021_163045 0.996321 0.078432
3 XRT_1_AutoML_1_20241021_163012 0.995234 0.081245
4 DRF_1_AutoML_1_20241021_163001 0.993456 0.089321
4.3.3 Stacked Ensemble
H2O's Stacking Strategy:
Base Model Layer:
- GBM (multiple configurations)
- Random Forest
- XGBoost
- GLM
- DeepLearning
↓ Meta-features
Meta-model Layer:
- GLM (regularized)
- GBM
↓
Final Prediction
Example 8: Custom Stacked Ensemble
from h2o.estimators import H2OGradientBoostingEstimator, H2ORandomForestEstimator
from h2o.estimators.stackedensemble import H2OStackedEnsembleEstimator
# Base model 1: GBM
gbm = H2OGradientBoostingEstimator(
ntrees=50,
max_depth=5,
learn_rate=0.1,
seed=42,
model_id='gbm_base'
)
gbm.train(x=x, y=y, training_frame=train)
# Base model 2: Random Forest
rf = H2ORandomForestEstimator(
ntrees=50,
max_depth=10,
seed=42,
model_id='rf_base'
)
rf.train(x=x, y=y, training_frame=train)
# Build stacked ensemble
ensemble = H2OStackedEnsembleEstimator(
base_models=[gbm, rf],
metalearner_algorithm='gbm',
seed=42
)
ensemble.train(x=x, y=y, training_frame=train)
# Evaluation
ensemble_perf = ensemble.model_performance(test)
print(f"Ensemble AUC: {ensemble_perf.auc()}")
4.3.4 Model Explainability
Example 9: SHAP Values and PDP Visualization
# SHAP values for best model
shap_values = best_model.shap_summary_plot(test)
# Partial Dependence Plot
best_model.partial_plot(
data=test,
cols=['alcohol', 'flavanoids'], # Feature names
plot=True
)
# Variable importance
varimp = best_model.varimp(use_pandas=True)
print(varimp.head(10))
# Feature Interaction
best_model.feature_interaction(max_depth=2)
4.4 Other AutoML Tools
4.4.1 Google AutoML
Features:
- Managed service on Google Cloud Platform
- Uses Neural Architecture Search (NAS)
- Supports image, text, and tabular data
- Enterprise-grade scalability
Main Products:
- AutoML Tables (tabular data)
- AutoML Vision (image classification)
- AutoML Natural Language (text classification)
- Vertex AI (unified platform)
4.4.2 Azure AutoML
Features:
- Integrated into Azure Machine Learning Studio
- Codeless UI + Python library
- Rich model explainability features
- MLOps pipeline integration
4.4.3 PyCaret
What is PyCaret:
A low-code machine learning library in Python. Can execute AutoML with just a few lines.
Example 10: PyCaret Usage Example
# Requirements:
# - Python 3.9+
# - pandas>=2.0.0, <2.2.0
"""
Example: Example 10: PyCaret Usage Example
Purpose: Demonstrate data manipulation and preprocessing
Target: Beginner to Intermediate
Execution time: 1-5 minutes
Dependencies: None
"""
from pycaret.classification import *
import pandas as pd
from sklearn.datasets import load_iris
# Prepare dataset
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['target'] = iris.target
# PyCaret environment setup
clf_setup = setup(
data=df,
target='target',
train_size=0.8,
session_id=42,
verbose=False
)
# Compare all models (automatic)
best_models = compare_models(n_select=3) # Top 3 models
# Detailed evaluation of best model
best = best_models[0]
evaluate_model(best)
# Hyperparameter tuning
tuned_best = tune_model(best, n_iter=50)
# Ensemble
bagged = ensemble_model(tuned_best, method='Bagging')
boosted = ensemble_model(tuned_best, method='Boosting')
# Stacking
stacked = stack_models(estimator_list=best_models[:3])
# Save model
save_model(stacked, 'pycaret_final_model')
# Predict on new data
predictions = predict_model(stacked, data=df)
print(predictions.head())
4.4.4 Ludwig
What is Ludwig:
A codeless deep learning toolbox developed by Uber. Build models with YAML configuration files.
Features:
- Declarative model definition (YAML-based)
- Support for diverse data types (mixed image, text, and tabular data)
- Built-in AutoML mode
- Transfer learning support
4.4.5 AutoML Tool Comparison Table
| Tool | Optimization Method | Execution Speed | Scalability | Explainability | Learning Curve | Best Use Case |
|---|---|---|---|---|---|---|
| TPOT | Genetic Programming | Medium | Medium | High (code export) | Low | Medium-scale data, pipeline automation |
| Auto-sklearn | Bayesian Optimization + Meta-learning | Medium-High | Medium | Medium | Low | Academic research, benchmarking |
| H2O AutoML | Grid Search + Stacking | High | High | High (SHAP integration) | Medium | Large-scale data, production |
| PyCaret | Combination of multiple methods | High | Medium | High | Very Low | Rapid prototyping |
| Google AutoML | NAS (Neural Architecture Search) | High | Very High | Medium | Low | Cloud-based large-scale tasks |
| Azure AutoML | Hybrid of multiple methods | High | High | Very High | Low | Enterprise MLOps |
| Ludwig | Hyperparameter search | Medium | Medium | Medium | Medium | Multimodal deep learning |
4.5 AutoML Best Practices
4.5.1 Tool Selection Criteria
Selection by Data Size:
- Small scale (<10,000 samples): TPOT, Auto-sklearn
- Medium scale (10,000-1,000,000): H2O AutoML, PyCaret
- Large scale (>1,000,000): H2O AutoML (distributed mode), Google/Azure AutoML
Selection by Task Type:
- Tabular data: TPOT, Auto-sklearn, H2O, PyCaret
- Image/Text: Google AutoML, Ludwig
- Time series: Auto-sklearn, H2O, PyCaret
- Multimodal: Ludwig
Execution Time Constraints:
- Short time (<10 minutes): PyCaret, Auto-sklearn 2.0
- Medium time (10 minutes to 1 hour): TPOT, H2O AutoML
- Long time OK (>1 hour): All possible (deeper exploration)
4.5.2 Customization vs Full Automation
When Full Automation is Suitable:
- Creating initial baseline
- Limited domain knowledge
- Rapid prototyping
- Batch processing of multiple datasets
When Customization is Necessary:
- Domain-specific preprocessing needed
- Want to restrict to specific model families
- Using custom evaluation metrics
- Interpretability is top priority
Hybrid Approach:
# 1. Create baseline with AutoML
tpot.fit(X_train, y_train)
baseline_score = tpot.score(X_test, y_test)
# 2. Manually improve exported pipeline
from tpot_exported_pipeline import exported_pipeline
pipeline = exported_pipeline
# 3. Add domain knowledge
from sklearn.preprocessing import FunctionTransformer
def domain_specific_transform(X):
# Custom transformation
return X
pipeline.steps.insert(
0, ('domain_transform', FunctionTransformer(domain_specific_transform))
)
# 4. Re-evaluate
pipeline.fit(X_train, y_train)
improved_score = pipeline.score(X_test, y_test)
print(f'Baseline: {baseline_score:.4f}, Improved: {improved_score:.4f}')
4.5.3 Deployment to Production Environment
Considerations for Deployment:
-
Model Size and Inference Speed
- Ensemble models are high accuracy but heavy
- Select model based on inference speed requirements
-
Dependency Management
- Include AutoML tool dependency libraries in production environment
- Docker containerization recommended
-
Version Control
- Model and pipeline versioning
- Use MLOps tools like MLflow, DVC
-
Monitoring
- Data drift detection
- Model performance tracking
- Set retraining triggers
Deployment Example (Flask API):
# Requirements:
# - Python 3.9+
# - flask>=2.3.0
# - joblib>=1.3.0
# - numpy>=1.24.0, <2.0.0
"""
Example: Deployment Example (Flask API):
Purpose: Demonstrate core concepts and implementation patterns
Target: Intermediate
Execution time: ~5 seconds
Dependencies: None
"""
# app.py
from flask import Flask, request, jsonify
import joblib
import numpy as np
app = Flask(__name__)
# Load model
model = joblib.load('tpot_model.pkl')
@app.route('/predict', methods=['POST'])
def predict():
data = request.get_json()
features = np.array(data['features']).reshape(1, -1)
prediction = model.predict(features)
probability = model.predict_proba(features)
return jsonify({
'prediction': int(prediction[0]),
'probability': probability[0].tolist()
})
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000)
4.5.4 Cost and Time Management
Computational Cost Reduction Strategies:
-
Early Stopping:
- Terminate early when no improvement is seen
- Set
max_time_mins,max_modelsparameters
-
Parallel Processing:
- Use all CPU cores with
n_jobs=-1 - For cloud, select appropriate instance type
- Use all CPU cores with
-
Data Sampling:
- Initial exploration with small sample
- Retrain with full data once promising configuration is found
-
Staged Approach:
# Stage 1: Fast exploration (10 minutes)
quick_automl = TPOTClassifier(
generations=3,
population_size=10,
max_time_mins=10
)
quick_automl.fit(X_train_sample, y_train_sample)
# Stage 2: Detailed exploration (1 hour)
if quick_automl.score(X_val, y_val) > 0.85: # Only if threshold exceeded
deep_automl = TPOTClassifier(
generations=20,
population_size=50,
max_time_mins=60
)
deep_automl.fit(X_train, y_train)
Cloud Cost Management:
- Spot/Preemptible Instances: Can reduce costs by 70%
- Auto Scaling: Use resources only when needed
- Budget Alerts: Avoid unexpected costs by setting limits
4.6 Summary
What We Learned
-
TPOT:
- Optimizes entire pipeline with genetic programming
- High transparency with Python code export
- Suitable for exploration on medium-scale data
-
Auto-sklearn:
- Efficient exploration with Bayesian optimization and meta-learning
- Automatic ensemble construction
- Widely used academically
-
H2O AutoML:
- Strong with large-scale data
- Easy result comparison with leaderboard
- Rich model explainability features
-
Tool Selection Criteria:
- Consider data size, task type, time constraints
- Balance full automation and customization
- Production requirements (speed, size, dependencies)
-
Best Practices:
- Reduce costs with staged approach
- Design monitoring for deployment
- Integration with MLOps tools
Next Steps
In Chapter 5, we will learn automated feature engineering and using Feature Tools:
- Theory of automatic feature generation
- Deep feature synthesis with Feature Tools
- Automatic feature extraction for time series data
- Automation of feature selection
Exercises
Question 1: Explain the roles of "crossover" and "mutation" in TPOT's genetic programming approach, and describe how each contributes to pipeline optimization.
Question 2: Explain how Auto-sklearn's meta-learning solves the cold start problem. Also discuss situations where meta-learning might not be effective.
Question 3: Design an experiment to compare the performance of H2O AutoML's stacked ensemble versus single models. Describe what types of datasets would maximize the effectiveness of stacking.
Question 4: Select the optimal AutoML tool for the following scenarios and explain your reasoning:
(a) 10,000 samples of medical diagnostic data, high interpretability required
(b) 1 billion samples of click log data, inference speed is important
(c) Mixed image and text data, rapid prototyping
Question 5: List five major considerations when deploying AutoML models to production environments, and describe specific countermeasures for each (within 600 characters).
References
- Olson, R. S. et al. "TPOT: A Tree-based Pipeline Optimization Tool for Automating Machine Learning." AutoML Workshop at ICML (2016).
- Feurer, M. et al. "Efficient and Robust Automated Machine Learning." NIPS (2015).
- LeDell, E. & Poirier, S. "H2O AutoML: Scalable Automatic Machine Learning." AutoML Workshop at ICML (2020).
- Hutter, F. et al. "Sequential Model-Based Optimization for General Algorithm Configuration." LION (2011).
- Molnar, C. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. (2022).
- Lundberg, S. M. & Lee, S.-I. "A Unified Approach to Interpreting Model Predictions." NIPS (2017).
- He, X. et al. "AutoML: A Survey of the State-of-the-Art." Knowledge-Based Systems (2021).
Next Chapter: Chapter 5: Automated Feature Engineering
License: This content is provided under CC BY 4.0 license.