🌐 EN | 🇯🇵 JP | Last sync: 2025-11-16

Chapter 4: Practical AutoML Tools

Automated Machine Learning with TPOT, Auto-sklearn, and H2O AutoML

📖 Reading Time: 40-45 minutes 📊 Difficulty: Intermediate 💻 Code Examples: 10 📝 Exercises: 5

Chapter 4: Practical AutoML Tools

This chapter focuses on practical applications of Practical AutoML Tools. You will learn essential concepts and techniques.

Learning Objectives:

Reading Time: 40-45 minutes


4.1 TPOT (Tree-based Pipeline Optimization Tool)

4.1.1 Overview of TPOT

What is TPOT:
An AutoML tool that automatically optimizes entire scikit-learn pipelines using genetic programming.

Developer: University of Pennsylvania (Moore Lab)

Features:

4.1.2 Genetic Programming Approach

Genetic Algorithm Flow:

1. Initial population generation (create random pipelines)
2. Evaluation (cross-validation score)
3. Selection (choose top individuals)
4. Crossover (combine pipelines)
5. Mutation (random changes)
6. Next generation
7. Repeat steps 2-6 for specified number of generations

Pipeline Representation:

# Genotype (tree structure)
Pipeline(
    SelectKBest(k=10),
    StandardScaler(),
    RandomForestClassifier(n_estimators=100)
)

4.1.3 Basic Usage of TPOT

Example 1: Basic Classification Example

# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0

"""
Example: Example 1: Basic Classification Example

Purpose: Demonstrate machine learning model training and evaluation
Target: Beginner to Intermediate
Execution time: 1-5 minutes
Dependencies: None
"""

from tpot import TPOTClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import numpy as np

# Prepare dataset
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(
    iris.data, iris.target, test_size=0.2, random_state=42
)

# Create TPOTClassifier
tpot = TPOTClassifier(
    generations=5,        # Number of evolutionary generations
    population_size=20,   # Number of individuals per generation
    cv=5,                 # Number of cross-validation folds
    random_state=42,
    verbosity=2,          # Progress display level
    n_jobs=-1             # Parallel processing
)

# Training (takes a few minutes)
tpot.fit(X_train, y_train)

# Evaluation
print(f'Test Accuracy: {tpot.score(X_test, y_test):.4f}')

# Save optimal pipeline as Python code
tpot.export('tpot_iris_pipeline.py')

Output Example:

Generation 1 - Current best internal CV score: 0.9666666666666667
Generation 2 - Current best internal CV score: 0.975
Generation 3 - Current best internal CV score: 0.975
Generation 4 - Current best internal CV score: 0.9833333333333333
Generation 5 - Current best internal CV score: 0.9833333333333333

Best pipeline: RandomForestClassifier(SelectKBest(input_matrix, k=2),
                                      bootstrap=True, n_estimators=100)
Test Accuracy: 1.0000

4.1.4 Customizing TPOT Configuration

Example 2: Custom TPOT Configuration

from tpot import TPOTClassifier

# Create TPOT with custom configuration
tpot_config = {
    'sklearn.ensemble.RandomForestClassifier': {
        'n_estimators': [50, 100, 200],
        'max_features': ['sqrt', 'log2', None],
        'min_samples_split': [2, 5, 10]
    },
    'sklearn.svm.SVC': {
        'C': [0.1, 1.0, 10.0],
        'kernel': ['linear', 'rbf'],
        'gamma': ['scale', 'auto']
    },
    'sklearn.preprocessing.StandardScaler': {},
    'sklearn.feature_selection.SelectKBest': {
        'k': range(1, 11)
    }
}

tpot = TPOTClassifier(
    config_dict=tpot_config,
    generations=10,
    population_size=50,
    cv=5,
    scoring='f1_weighted',  # Change evaluation metric to F1 score
    max_time_mins=30,       # Maximum execution time 30 minutes
    random_state=42,
    verbosity=2
)

tpot.fit(X_train, y_train)

4.1.5 Regression Example

Example 3: Using TPOT for Regression

from tpot import TPOTRegressor
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

# Generate regression dataset
X, y = make_regression(n_samples=1000, n_features=20,
                       n_informative=15, noise=10, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# TPOTRegressor
tpot_reg = TPOTRegressor(
    generations=5,
    population_size=20,
    cv=5,
    scoring='neg_mean_squared_error',  # Minimize MSE
    random_state=42,
    verbosity=2
)

tpot_reg.fit(X_train, y_train)

# Evaluation
from sklearn.metrics import mean_squared_error, r2_score
y_pred = tpot_reg.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f'Test MSE: {mse:.4f}')
print(f'Test R²: {r2:.4f}')

# Save pipeline
tpot_reg.export('tpot_regression_pipeline.py')

Example of Exported Code:

# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0
# - pandas>=2.0.0, <2.2.0

"""
Example: Example of Exported Code:

Purpose: Demonstrate machine learning model training and evaluation
Target: Advanced
Execution time: 1-5 minutes
Dependencies: None
"""

# tpot_regression_pipeline.py
import numpy as np
import pandas as pd
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

# Generated pipeline
exported_pipeline = make_pipeline(
    StandardScaler(),
    GradientBoostingRegressor(
        alpha=0.9, learning_rate=0.1, loss="squared_error",
        max_depth=3, n_estimators=100
    )
)

# Usage example
exported_pipeline.fit(training_features, training_target)
results = exported_pipeline.predict(testing_features)

4.2 Auto-sklearn

4.2.1 Overview of Auto-sklearn

What is Auto-sklearn:
An automated machine learning tool that combines Bayesian optimization, meta-learning, and ensemble construction.

Developer: University of Freiburg (Germany)

Key Technologies:

  1. Bayesian Optimization: SMAC (Sequential Model-based Algorithm Configuration)
  2. Meta-learning: Learn initial configurations from past tasks
  3. Ensemble Construction: Automatically combine multiple models

4.2.2 Bayesian Optimization and Meta-learning

Bayesian Optimization Flow:

1. Evaluate model with initial configuration
2. Predict performance using Gaussian process
3. Determine next search point using acquisition function
4. Evaluate and update Gaussian process
5. Repeat steps 2-4

Meta-learning:
Infer good initial configurations for similar tasks from optimal settings on 140+ past datasets

Meta-knowledge base (140+ tasks)
    ↓
Similarity calculation (dataset features)
    ↓
Warm start with top 25 configurations
    ↓
Fine-tune with Bayesian optimization

4.2.3 Basic Usage of Auto-sklearn

Example 4: Auto-sklearn Classification

# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0

"""
Example: Example 4: Auto-sklearn Classification

Purpose: Demonstrate machine learning model training and evaluation
Target: Beginner to Intermediate
Execution time: 1-5 minutes
Dependencies: None
"""

import autosklearn.classification
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import numpy as np

# Prepare dataset
digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(
    digits.data, digits.target, test_size=0.2, random_state=42
)

# Auto-sklearn classifier
automl = autosklearn.classification.AutoSklearnClassifier(
    time_left_for_this_task=300,  # Total execution time 5 minutes
    per_run_time_limit=30,         # 30 seconds per model
    ensemble_size=50,              # Ensemble size
    ensemble_nbest=200,            # Number of ensemble candidates
    initial_configurations_via_metalearning=25,  # Number of meta-learning initial configurations
    seed=42
)

# Training
automl.fit(X_train, y_train)

# Prediction and evaluation
y_pred = automl.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f'Test Accuracy: {accuracy:.4f}')

# Statistics of trained models
print(automl.sprint_statistics())

# Ensemble details
print(automl.show_models())

Output Example:

auto-sklearn results:
  Dataset name: digits
  Metric: accuracy
  Best validation score: 0.9832
  Number of target algorithm runs: 127
  Number of successful target algorithm runs: 115
  Number of crashed target algorithm runs: 8
  Number of target algorithms that exceeded the time limit: 4
  Number of target algorithms that exceeded the memory limit: 0

Test Accuracy: 0.9806

4.2.4 New Features in Auto-sklearn 2.0

Improvements in Auto-sklearn 2.0:

Example 5: Using Auto-sklearn 2.0

from autosklearn.experimental.askl2 import AutoSklearn2Classifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

# Prepare data
cancer = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
    cancer.data, cancer.target, test_size=0.2, random_state=42
)

# Auto-sklearn 2.0 (faster)
automl2 = AutoSklearn2Classifier(
    time_left_for_this_task=120,  # 2 minutes
    seed=42
)

automl2.fit(X_train, y_train)

# Evaluation
from sklearn.metrics import classification_report
y_pred = automl2.predict(X_test)
print(classification_report(y_test, y_pred))

# Get CV results
cv_results = automl2.cv_results_
print(f"Best model config: {automl2.get_models_with_weights()}")

4.2.5 Custom Settings and Constraints

Example 6: Restricting Model Candidates

import autosklearn.classification

# Restrict algorithms to use
automl_custom = autosklearn.classification.AutoSklearnClassifier(
    time_left_for_this_task=300,
    include={
        'classifier': ['random_forest', 'gradient_boosting', 'extra_trees'],
        'feature_preprocessor': ['no_preprocessing', 'pca', 'select_percentile']
    },
    exclude={
        'classifier': ['k_nearest_neighbors'],  # Exclude KNN
    },
    seed=42
)

automl_custom.fit(X_train, y_train)

4.3 H2O AutoML

4.3.1 Overview of H2O AutoML

What is H2O.ai:
An open-source distributed machine learning platform. Strong in large-scale data processing.

H2O AutoML Features:

4.3.2 Basic Usage of H2O AutoML

Example 7: H2O AutoML Classification

# Requirements:
# - Python 3.9+
# - pandas>=2.0.0, <2.2.0

"""
Example: Example 7: H2O AutoML Classification

Purpose: Demonstrate machine learning model training and evaluation
Target: Beginner to Intermediate
Execution time: 1-5 minutes
Dependencies: None
"""

import h2o
from h2o.automl import H2OAutoML
import pandas as pd

# Initialize H2O
h2o.init()

# Prepare dataset (convert from Pandas)
from sklearn.datasets import load_wine
wine = load_wine()
df = pd.DataFrame(wine.data, columns=wine.feature_names)
df['target'] = wine.target

# Convert to H2O DataFrame
hf = h2o.H2OFrame(df)
hf['target'] = hf['target'].asfactor()  # For classification task

# Train/test split
train, test = hf.split_frame(ratios=[0.8], seed=42)

# Run AutoML
aml = H2OAutoML(
    max_runtime_secs=300,      # Maximum execution time 5 minutes
    max_models=20,              # Maximum number of models
    seed=42,
    sort_metric='AUC',          # Evaluation metric
    exclude_algos=['DeepLearning']  # Exclude deep learning
)

# Training (target is response variable, rest are predictor variables)
x = hf.columns
x.remove('target')
y = 'target'

aml.fit(x=x, y=y, training_frame=train)

# Display leaderboard
lb = aml.leaderboard
print(lb.head(rows=10))

# Prediction with best model
best_model = aml.leader
preds = best_model.predict(test)
print(preds.head())

# Model performance
perf = best_model.model_performance(test)
print(perf)

Leaderboard Output Example:

                                              model_id       auc   logloss
0  StackedEnsemble_AllModels_1_AutoML_1_20241021  0.998876  0.067234
1  StackedEnsemble_BestOfFamily_1_AutoML_1_20241021  0.997543  0.072156
2               GBM_1_AutoML_1_20241021_163045  0.996321  0.078432
3                XRT_1_AutoML_1_20241021_163012  0.995234  0.081245
4                DRF_1_AutoML_1_20241021_163001  0.993456  0.089321

4.3.3 Stacked Ensemble

H2O's Stacking Strategy:

Base Model Layer:
- GBM (multiple configurations)
- Random Forest
- XGBoost
- GLM
- DeepLearning

    ↓ Meta-features

Meta-model Layer:
- GLM (regularized)
- GBM

    ↓

Final Prediction

Example 8: Custom Stacked Ensemble

from h2o.estimators import H2OGradientBoostingEstimator, H2ORandomForestEstimator
from h2o.estimators.stackedensemble import H2OStackedEnsembleEstimator

# Base model 1: GBM
gbm = H2OGradientBoostingEstimator(
    ntrees=50,
    max_depth=5,
    learn_rate=0.1,
    seed=42,
    model_id='gbm_base'
)
gbm.train(x=x, y=y, training_frame=train)

# Base model 2: Random Forest
rf = H2ORandomForestEstimator(
    ntrees=50,
    max_depth=10,
    seed=42,
    model_id='rf_base'
)
rf.train(x=x, y=y, training_frame=train)

# Build stacked ensemble
ensemble = H2OStackedEnsembleEstimator(
    base_models=[gbm, rf],
    metalearner_algorithm='gbm',
    seed=42
)
ensemble.train(x=x, y=y, training_frame=train)

# Evaluation
ensemble_perf = ensemble.model_performance(test)
print(f"Ensemble AUC: {ensemble_perf.auc()}")

4.3.4 Model Explainability

Example 9: SHAP Values and PDP Visualization

# SHAP values for best model
shap_values = best_model.shap_summary_plot(test)

# Partial Dependence Plot
best_model.partial_plot(
    data=test,
    cols=['alcohol', 'flavanoids'],  # Feature names
    plot=True
)

# Variable importance
varimp = best_model.varimp(use_pandas=True)
print(varimp.head(10))

# Feature Interaction
best_model.feature_interaction(max_depth=2)

4.4 Other AutoML Tools

4.4.1 Google AutoML

Features:

Main Products:

4.4.2 Azure AutoML

Features:

4.4.3 PyCaret

What is PyCaret:
A low-code machine learning library in Python. Can execute AutoML with just a few lines.

Example 10: PyCaret Usage Example

# Requirements:
# - Python 3.9+
# - pandas>=2.0.0, <2.2.0

"""
Example: Example 10: PyCaret Usage Example

Purpose: Demonstrate data manipulation and preprocessing
Target: Beginner to Intermediate
Execution time: 1-5 minutes
Dependencies: None
"""

from pycaret.classification import *
import pandas as pd
from sklearn.datasets import load_iris

# Prepare dataset
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['target'] = iris.target

# PyCaret environment setup
clf_setup = setup(
    data=df,
    target='target',
    train_size=0.8,
    session_id=42,
    verbose=False
)

# Compare all models (automatic)
best_models = compare_models(n_select=3)  # Top 3 models

# Detailed evaluation of best model
best = best_models[0]
evaluate_model(best)

# Hyperparameter tuning
tuned_best = tune_model(best, n_iter=50)

# Ensemble
bagged = ensemble_model(tuned_best, method='Bagging')
boosted = ensemble_model(tuned_best, method='Boosting')

# Stacking
stacked = stack_models(estimator_list=best_models[:3])

# Save model
save_model(stacked, 'pycaret_final_model')

# Predict on new data
predictions = predict_model(stacked, data=df)
print(predictions.head())

4.4.4 Ludwig

What is Ludwig:
A codeless deep learning toolbox developed by Uber. Build models with YAML configuration files.

Features:

4.4.5 AutoML Tool Comparison Table

Tool Optimization Method Execution Speed Scalability Explainability Learning Curve Best Use Case
TPOT Genetic Programming Medium Medium High (code export) Low Medium-scale data, pipeline automation
Auto-sklearn Bayesian Optimization + Meta-learning Medium-High Medium Medium Low Academic research, benchmarking
H2O AutoML Grid Search + Stacking High High High (SHAP integration) Medium Large-scale data, production
PyCaret Combination of multiple methods High Medium High Very Low Rapid prototyping
Google AutoML NAS (Neural Architecture Search) High Very High Medium Low Cloud-based large-scale tasks
Azure AutoML Hybrid of multiple methods High High Very High Low Enterprise MLOps
Ludwig Hyperparameter search Medium Medium Medium Medium Multimodal deep learning

4.5 AutoML Best Practices

4.5.1 Tool Selection Criteria

Selection by Data Size:

Selection by Task Type:

Execution Time Constraints:

4.5.2 Customization vs Full Automation

When Full Automation is Suitable:

When Customization is Necessary:

Hybrid Approach:

# 1. Create baseline with AutoML
tpot.fit(X_train, y_train)
baseline_score = tpot.score(X_test, y_test)

# 2. Manually improve exported pipeline
from tpot_exported_pipeline import exported_pipeline
pipeline = exported_pipeline

# 3. Add domain knowledge
from sklearn.preprocessing import FunctionTransformer

def domain_specific_transform(X):
    # Custom transformation
    return X

pipeline.steps.insert(
    0, ('domain_transform', FunctionTransformer(domain_specific_transform))
)

# 4. Re-evaluate
pipeline.fit(X_train, y_train)
improved_score = pipeline.score(X_test, y_test)
print(f'Baseline: {baseline_score:.4f}, Improved: {improved_score:.4f}')

4.5.3 Deployment to Production Environment

Considerations for Deployment:

  1. Model Size and Inference Speed

    • Ensemble models are high accuracy but heavy
    • Select model based on inference speed requirements
  2. Dependency Management

    • Include AutoML tool dependency libraries in production environment
    • Docker containerization recommended
  3. Version Control

    • Model and pipeline versioning
    • Use MLOps tools like MLflow, DVC
  4. Monitoring

    • Data drift detection
    • Model performance tracking
    • Set retraining triggers

Deployment Example (Flask API):

# Requirements:
# - Python 3.9+
# - flask>=2.3.0
# - joblib>=1.3.0
# - numpy>=1.24.0, <2.0.0

"""
Example: Deployment Example (Flask API):

Purpose: Demonstrate core concepts and implementation patterns
Target: Intermediate
Execution time: ~5 seconds
Dependencies: None
"""

# app.py
from flask import Flask, request, jsonify
import joblib
import numpy as np

app = Flask(__name__)

# Load model
model = joblib.load('tpot_model.pkl')

@app.route('/predict', methods=['POST'])
def predict():
    data = request.get_json()
    features = np.array(data['features']).reshape(1, -1)
    prediction = model.predict(features)
    probability = model.predict_proba(features)

    return jsonify({
        'prediction': int(prediction[0]),
        'probability': probability[0].tolist()
    })

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

4.5.4 Cost and Time Management

Computational Cost Reduction Strategies:

  1. Early Stopping:

    • Terminate early when no improvement is seen
    • Set max_time_mins, max_models parameters
  2. Parallel Processing:

    • Use all CPU cores with n_jobs=-1
    • For cloud, select appropriate instance type
  3. Data Sampling:

    • Initial exploration with small sample
    • Retrain with full data once promising configuration is found
  4. Staged Approach:

# Stage 1: Fast exploration (10 minutes)
quick_automl = TPOTClassifier(
    generations=3,
    population_size=10,
    max_time_mins=10
)
quick_automl.fit(X_train_sample, y_train_sample)

# Stage 2: Detailed exploration (1 hour)
if quick_automl.score(X_val, y_val) > 0.85:  # Only if threshold exceeded
    deep_automl = TPOTClassifier(
        generations=20,
        population_size=50,
        max_time_mins=60
    )
    deep_automl.fit(X_train, y_train)

Cloud Cost Management:


4.6 Summary

What We Learned

  1. TPOT:

    • Optimizes entire pipeline with genetic programming
    • High transparency with Python code export
    • Suitable for exploration on medium-scale data
  2. Auto-sklearn:

    • Efficient exploration with Bayesian optimization and meta-learning
    • Automatic ensemble construction
    • Widely used academically
  3. H2O AutoML:

    • Strong with large-scale data
    • Easy result comparison with leaderboard
    • Rich model explainability features
  4. Tool Selection Criteria:

    • Consider data size, task type, time constraints
    • Balance full automation and customization
    • Production requirements (speed, size, dependencies)
  5. Best Practices:

    • Reduce costs with staged approach
    • Design monitoring for deployment
    • Integration with MLOps tools

Next Steps

In Chapter 5, we will learn automated feature engineering and using Feature Tools:


Exercises

Question 1: Explain the roles of "crossover" and "mutation" in TPOT's genetic programming approach, and describe how each contributes to pipeline optimization.

Question 2: Explain how Auto-sklearn's meta-learning solves the cold start problem. Also discuss situations where meta-learning might not be effective.

Question 3: Design an experiment to compare the performance of H2O AutoML's stacked ensemble versus single models. Describe what types of datasets would maximize the effectiveness of stacking.

Question 4: Select the optimal AutoML tool for the following scenarios and explain your reasoning:
(a) 10,000 samples of medical diagnostic data, high interpretability required
(b) 1 billion samples of click log data, inference speed is important
(c) Mixed image and text data, rapid prototyping

Question 5: List five major considerations when deploying AutoML models to production environments, and describe specific countermeasures for each (within 600 characters).


References

  1. Olson, R. S. et al. "TPOT: A Tree-based Pipeline Optimization Tool for Automating Machine Learning." AutoML Workshop at ICML (2016).
  2. Feurer, M. et al. "Efficient and Robust Automated Machine Learning." NIPS (2015).
  3. LeDell, E. & Poirier, S. "H2O AutoML: Scalable Automatic Machine Learning." AutoML Workshop at ICML (2020).
  4. Hutter, F. et al. "Sequential Model-Based Optimization for General Algorithm Configuration." LION (2011).
  5. Molnar, C. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. (2022).
  6. Lundberg, S. M. & Lee, S.-I. "A Unified Approach to Interpreting Model Predictions." NIPS (2017).
  7. He, X. et al. "AutoML: A Survey of the State-of-the-Art." Knowledge-Based Systems (2021).

Next Chapter: Chapter 5: Automated Feature Engineering

License: This content is provided under CC BY 4.0 license.

Disclaimer