Chapter 4: Practical AutoML Tools

This chapter focuses on practical applications of Practical AutoML Tools. You will learn essential concepts and techniques.

Learning Objectives:

Understand TPOT's genetic programming approach
Master Auto-sklearn's Bayesian optimization and meta-learning
Build stacked ensembles with H2O AutoML
Understand characteristics and use cases of each AutoML tool
Learn deployment strategies for production environments

Reading Time: 40-45 minutes

4.1 TPOT (Tree-based Pipeline Optimization Tool)

4.1.1 Overview of TPOT

What is TPOT:
An AutoML tool that automatically optimizes entire scikit-learn pipelines using genetic programming.

Developer: University of Pennsylvania (Moore Lab)

Features:

Exploration using genetic algorithms
Full automation from preprocessing to model selection
Fully compatible with scikit-learn
Generated pipeline code can be exported as Python code

4.1.2 Genetic Programming Approach

Genetic Algorithm Flow:

1. Initial population generation (create random pipelines)
2. Evaluation (cross-validation score)
3. Selection (choose top individuals)
4. Crossover (combine pipelines)
5. Mutation (random changes)
6. Next generation
7. Repeat steps 2-6 for specified number of generations

Pipeline Representation:

# Genotype (tree structure)
Pipeline(
    SelectKBest(k=10),
    StandardScaler(),
    RandomForestClassifier(n_estimators=100)
)

4.1.3 Basic Usage of TPOT

Example 1: Basic Classification Example

# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0

"""
Example: Example 1: Basic Classification Example

Purpose: Demonstrate machine learning model training and evaluation
Target: Beginner to Intermediate
Execution time: 1-5 minutes
Dependencies: None
"""

from tpot import TPOTClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import numpy as np

# Prepare dataset
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(
    iris.data, iris.target, test_size=0.2, random_state=42
)

# Create TPOTClassifier
tpot = TPOTClassifier(
    generations=5,        # Number of evolutionary generations
    population_size=20,   # Number of individuals per generation
    cv=5,                 # Number of cross-validation folds
    random_state=42,
    verbosity=2,          # Progress display level
    n_jobs=-1             # Parallel processing
)

# Training (takes a few minutes)
tpot.fit(X_train, y_train)

# Evaluation
print(f'Test Accuracy: {tpot.score(X_test, y_test):.4f}')

# Save optimal pipeline as Python code
tpot.export('tpot_iris_pipeline.py')

Output Example:

Generation 1 - Current best internal CV score: 0.9666666666666667
Generation 2 - Current best internal CV score: 0.975
Generation 3 - Current best internal CV score: 0.975
Generation 4 - Current best internal CV score: 0.9833333333333333
Generation 5 - Current best internal CV score: 0.9833333333333333

Best pipeline: RandomForestClassifier(SelectKBest(input_matrix, k=2),
                                      bootstrap=True, n_estimators=100)
Test Accuracy: 1.0000

4.1.4 Customizing TPOT Configuration

Example 2: Custom TPOT Configuration

from tpot import TPOTClassifier

# Create TPOT with custom configuration
tpot_config = {
    'sklearn.ensemble.RandomForestClassifier': {
        'n_estimators': [50, 100, 200],
        'max_features': ['sqrt', 'log2', None],
        'min_samples_split': [2, 5, 10]
    },
    'sklearn.svm.SVC': {
        'C': [0.1, 1.0, 10.0],
        'kernel': ['linear', 'rbf'],
        'gamma': ['scale', 'auto']
    },
    'sklearn.preprocessing.StandardScaler': {},
    'sklearn.feature_selection.SelectKBest': {
        'k': range(1, 11)
    }
}

tpot = TPOTClassifier(
    config_dict=tpot_config,
    generations=10,
    population_size=50,
    cv=5,
    scoring='f1_weighted',  # Change evaluation metric to F1 score
    max_time_mins=30,       # Maximum execution time 30 minutes
    random_state=42,
    verbosity=2
)

tpot.fit(X_train, y_train)

4.1.5 Regression Example

Example 3: Using TPOT for Regression

from tpot import TPOTRegressor
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

# Generate regression dataset
X, y = make_regression(n_samples=1000, n_features=20,
                       n_informative=15, noise=10, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# TPOTRegressor
tpot_reg = TPOTRegressor(
    generations=5,
    population_size=20,
    cv=5,
    scoring='neg_mean_squared_error',  # Minimize MSE
    random_state=42,
    verbosity=2
)

tpot_reg.fit(X_train, y_train)

# Evaluation
from sklearn.metrics import mean_squared_error, r2_score
y_pred = tpot_reg.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f'Test MSE: {mse:.4f}')
print(f'Test R²: {r2:.4f}')

# Save pipeline
tpot_reg.export('tpot_regression_pipeline.py')

Example of Exported Code:

# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0
# - pandas>=2.0.0, <2.2.0

"""
Example: Example of Exported Code:

Purpose: Demonstrate machine learning model training and evaluation
Target: Advanced
Execution time: 1-5 minutes
Dependencies: None
"""

# tpot_regression_pipeline.py
import numpy as np
import pandas as pd
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

# Generated pipeline
exported_pipeline = make_pipeline(
    StandardScaler(),
    GradientBoostingRegressor(
        alpha=0.9, learning_rate=0.1, loss="squared_error",
        max_depth=3, n_estimators=100
    )
)

# Usage example
exported_pipeline.fit(training_features, training_target)
results = exported_pipeline.predict(testing_features)

4.2 Auto-sklearn

4.2.1 Overview of Auto-sklearn

What is Auto-sklearn:
An automated machine learning tool that combines Bayesian optimization, meta-learning, and ensemble construction.

Developer: University of Freiburg (Germany)

Key Technologies:

Bayesian Optimization: SMAC (Sequential Model-based Algorithm Configuration)
Meta-learning: Learn initial configurations from past tasks
Ensemble Construction: Automatically combine multiple models

4.2.2 Bayesian Optimization and Meta-learning

Bayesian Optimization Flow:

1. Evaluate model with initial configuration
2. Predict performance using Gaussian process
3. Determine next search point using acquisition function
4. Evaluate and update Gaussian process
5. Repeat steps 2-4

Meta-learning:
Infer good initial configurations for similar tasks from optimal settings on 140+ past datasets

Meta-knowledge base (140+ tasks)
    ↓
Similarity calculation (dataset features)
    ↓
Warm start with top 25 configurations
    ↓
Fine-tune with Bayesian optimization

4.2.3 Basic Usage of Auto-sklearn

Example 4: Auto-sklearn Classification

# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0

"""
Example: Example 4: Auto-sklearn Classification

Purpose: Demonstrate machine learning model training and evaluation
Target: Beginner to Intermediate
Execution time: 1-5 minutes
Dependencies: None
"""

import autosklearn.classification
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import numpy as np

# Prepare dataset
digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(
    digits.data, digits.target, test_size=0.2, random_state=42
)

# Auto-sklearn classifier
automl = autosklearn.classification.AutoSklearnClassifier(
    time_left_for_this_task=300,  # Total execution time 5 minutes
    per_run_time_limit=30,         # 30 seconds per model
    ensemble_size=50,              # Ensemble size
    ensemble_nbest=200,            # Number of ensemble candidates
    initial_configurations_via_metalearning=25,  # Number of meta-learning initial configurations
    seed=42
)

# Training
automl.fit(X_train, y_train)

# Prediction and evaluation
y_pred = automl.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f'Test Accuracy: {accuracy:.4f}')

# Statistics of trained models
print(automl.sprint_statistics())

# Ensemble details
print(automl.show_models())

Output Example:

auto-sklearn results:
  Dataset name: digits
  Metric: accuracy
  Best validation score: 0.9832
  Number of target algorithm runs: 127
  Number of successful target algorithm runs: 115
  Number of crashed target algorithm runs: 8
  Number of target algorithms that exceeded the time limit: 4
  Number of target algorithms that exceeded the memory limit: 0

Test Accuracy: 0.9806

4.2.4 New Features in Auto-sklearn 2.0

Improvements in Auto-sklearn 2.0:

Reduced execution time (50% reduction compared to previous version)
Improved default settings
Faster portfolio construction
More efficient ensemble selection

Example 5: Using Auto-sklearn 2.0

from autosklearn.experimental.askl2 import AutoSklearn2Classifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

# Prepare data
cancer = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
    cancer.data, cancer.target, test_size=0.2, random_state=42
)

# Auto-sklearn 2.0 (faster)
automl2 = AutoSklearn2Classifier(
    time_left_for_this_task=120,  # 2 minutes
    seed=42
)

automl2.fit(X_train, y_train)

# Evaluation
from sklearn.metrics import classification_report
y_pred = automl2.predict(X_test)
print(classification_report(y_test, y_pred))

# Get CV results
cv_results = automl2.cv_results_
print(f"Best model config: {automl2.get_models_with_weights()}")

4.2.5 Custom Settings and Constraints

Example 6: Restricting Model Candidates

import autosklearn.classification

# Restrict algorithms to use
automl_custom = autosklearn.classification.AutoSklearnClassifier(
    time_left_for_this_task=300,
    include={
        'classifier': ['random_forest', 'gradient_boosting', 'extra_trees'],
        'feature_preprocessor': ['no_preprocessing', 'pca', 'select_percentile']
    },
    exclude={
        'classifier': ['k_nearest_neighbors'],  # Exclude KNN
    },
    seed=42
)

automl_custom.fit(X_train, y_train)

4.3 H2O AutoML

4.3.1 Overview of H2O AutoML

What is H2O.ai:
An open-source distributed machine learning platform. Strong in large-scale data processing.

H2O AutoML Features:

Automatic stacked ensemble construction
Leaderboard-style result display
Support for large-scale data (distributed processing)
Model explainability features (SHAP, PDP)

4.3.2 Basic Usage of H2O AutoML

Example 7: H2O AutoML Classification

# Requirements:
# - Python 3.9+
# - pandas>=2.0.0, <2.2.0

"""
Example: Example 7: H2O AutoML Classification

Purpose: Demonstrate machine learning model training and evaluation
Target: Beginner to Intermediate
Execution time: 1-5 minutes
Dependencies: None
"""

import h2o
from h2o.automl import H2OAutoML
import pandas as pd

# Initialize H2O
h2o.init()

# Prepare dataset (convert from Pandas)
from sklearn.datasets import load_wine
wine = load_wine()
df = pd.DataFrame(wine.data, columns=wine.feature_names)
df['target'] = wine.target

# Convert to H2O DataFrame
hf = h2o.H2OFrame(df)
hf['target'] = hf['target'].asfactor()  # For classification task

# Train/test split
train, test = hf.split_frame(ratios=[0.8], seed=42)

# Run AutoML
aml = H2OAutoML(
    max_runtime_secs=300,      # Maximum execution time 5 minutes
    max_models=20,              # Maximum number of models
    seed=42,
    sort_metric='AUC',          # Evaluation metric
    exclude_algos=['DeepLearning']  # Exclude deep learning
)

# Training (target is response variable, rest are predictor variables)
x = hf.columns
x.remove('target')
y = 'target'

aml.fit(x=x, y=y, training_frame=train)

# Display leaderboard
lb = aml.leaderboard
print(lb.head(rows=10))

# Prediction with best model
best_model = aml.leader
preds = best_model.predict(test)
print(preds.head())

# Model performance
perf = best_model.model_performance(test)
print(perf)

Leaderboard Output Example:

                                              model_id       auc   logloss
0  StackedEnsemble_AllModels_1_AutoML_1_20241021  0.998876  0.067234
1  StackedEnsemble_BestOfFamily_1_AutoML_1_20241021  0.997543  0.072156
2               GBM_1_AutoML_1_20241021_163045  0.996321  0.078432
3                XRT_1_AutoML_1_20241021_163012  0.995234  0.081245
4                DRF_1_AutoML_1_20241021_163001  0.993456  0.089321

4.3.3 Stacked Ensemble

H2O's Stacking Strategy:

Base Model Layer:
- GBM (multiple configurations)
- Random Forest
- XGBoost
- GLM
- DeepLearning

    ↓ Meta-features

Meta-model Layer:
- GLM (regularized)
- GBM

    ↓

Final Prediction

Example 8: Custom Stacked Ensemble

from h2o.estimators import H2OGradientBoostingEstimator, H2ORandomForestEstimator
from h2o.estimators.stackedensemble import H2OStackedEnsembleEstimator

# Base model 1: GBM
gbm = H2OGradientBoostingEstimator(
    ntrees=50,
    max_depth=5,
    learn_rate=0.1,
    seed=42,
    model_id='gbm_base'
)
gbm.train(x=x, y=y, training_frame=train)

# Base model 2: Random Forest
rf = H2ORandomForestEstimator(
    ntrees=50,
    max_depth=10,
    seed=42,
    model_id='rf_base'
)
rf.train(x=x, y=y, training_frame=train)

# Build stacked ensemble
ensemble = H2OStackedEnsembleEstimator(
    base_models=[gbm, rf],
    metalearner_algorithm='gbm',
    seed=42
)
ensemble.train(x=x, y=y, training_frame=train)

# Evaluation
ensemble_perf = ensemble.model_performance(test)
print(f"Ensemble AUC: {ensemble_perf.auc()}")

4.3.4 Model Explainability

Example 9: SHAP Values and PDP Visualization

# SHAP values for best model
shap_values = best_model.shap_summary_plot(test)

# Partial Dependence Plot
best_model.partial_plot(
    data=test,
    cols=['alcohol', 'flavanoids'],  # Feature names
    plot=True
)

# Variable importance
varimp = best_model.varimp(use_pandas=True)
print(varimp.head(10))

# Feature Interaction
best_model.feature_interaction(max_depth=2)

4.4 Other AutoML Tools

4.4.1 Google AutoML

Features:

Managed service on Google Cloud Platform
Uses Neural Architecture Search (NAS)
Supports image, text, and tabular data
Enterprise-grade scalability

Main Products:

AutoML Tables (tabular data)
AutoML Vision (image classification)
AutoML Natural Language (text classification)
Vertex AI (unified platform)

4.4.2 Azure AutoML

Features:

Integrated into Azure Machine Learning Studio
Codeless UI + Python library
Rich model explainability features
MLOps pipeline integration

4.4.3 PyCaret

What is PyCaret:
A low-code machine learning library in Python. Can execute AutoML with just a few lines.

Example 10: PyCaret Usage Example

# Requirements:
# - Python 3.9+
# - pandas>=2.0.0, <2.2.0

"""
Example: Example 10: PyCaret Usage Example

Purpose: Demonstrate data manipulation and preprocessing
Target: Beginner to Intermediate
Execution time: 1-5 minutes
Dependencies: None
"""

from pycaret.classification import *
import pandas as pd
from sklearn.datasets import load_iris

# Prepare dataset
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['target'] = iris.target

# PyCaret environment setup
clf_setup = setup(
    data=df,
    target='target',
    train_size=0.8,
    session_id=42,
    verbose=False
)

# Compare all models (automatic)
best_models = compare_models(n_select=3)  # Top 3 models

# Detailed evaluation of best model
best = best_models[0]
evaluate_model(best)

# Hyperparameter tuning
tuned_best = tune_model(best, n_iter=50)

# Ensemble
bagged = ensemble_model(tuned_best, method='Bagging')
boosted = ensemble_model(tuned_best, method='Boosting')

# Stacking
stacked = stack_models(estimator_list=best_models[:3])

# Save model
save_model(stacked, 'pycaret_final_model')

# Predict on new data
predictions = predict_model(stacked, data=df)
print(predictions.head())

4.4.4 Ludwig

What is Ludwig:
A codeless deep learning toolbox developed by Uber. Build models with YAML configuration files.

Features:

Declarative model definition (YAML-based)
Support for diverse data types (mixed image, text, and tabular data)
Built-in AutoML mode
Transfer learning support

4.4.5 AutoML Tool Comparison Table

Tool	Optimization Method	Execution Speed	Scalability	Explainability	Learning Curve	Best Use Case
TPOT	Genetic Programming	Medium	Medium	High (code export)	Low	Medium-scale data, pipeline automation
Auto-sklearn	Bayesian Optimization + Meta-learning	Medium-High	Medium	Medium	Low	Academic research, benchmarking
H2O AutoML	Grid Search + Stacking	High	High	High (SHAP integration)	Medium	Large-scale data, production
PyCaret	Combination of multiple methods	High	Medium	High	Very Low	Rapid prototyping
Google AutoML	NAS (Neural Architecture Search)	High	Very High	Medium	Low	Cloud-based large-scale tasks
Azure AutoML	Hybrid of multiple methods	High	High	Very High	Low	Enterprise MLOps
Ludwig	Hyperparameter search	Medium	Medium	Medium	Medium	Multimodal deep learning

4.5 AutoML Best Practices

4.5.1 Tool Selection Criteria

Selection by Data Size:

Small scale (<10,000 samples): TPOT, Auto-sklearn
Medium scale (10,000-1,000,000): H2O AutoML, PyCaret
Large scale (>1,000,000): H2O AutoML (distributed mode), Google/Azure AutoML

Selection by Task Type:

Tabular data: TPOT, Auto-sklearn, H2O, PyCaret
Image/Text: Google AutoML, Ludwig
Time series: Auto-sklearn, H2O, PyCaret
Multimodal: Ludwig

Execution Time Constraints:

Short time (<10 minutes): PyCaret, Auto-sklearn 2.0
Medium time (10 minutes to 1 hour): TPOT, H2O AutoML
Long time OK (>1 hour): All possible (deeper exploration)

4.5.2 Customization vs Full Automation

When Full Automation is Suitable:

Creating initial baseline
Limited domain knowledge
Rapid prototyping
Batch processing of multiple datasets

When Customization is Necessary:

Domain-specific preprocessing needed
Want to restrict to specific model families
Using custom evaluation metrics
Interpretability is top priority

Hybrid Approach:

# 1. Create baseline with AutoML
tpot.fit(X_train, y_train)
baseline_score = tpot.score(X_test, y_test)

# 2. Manually improve exported pipeline
from tpot_exported_pipeline import exported_pipeline
pipeline = exported_pipeline

# 3. Add domain knowledge
from sklearn.preprocessing import FunctionTransformer

def domain_specific_transform(X):
    # Custom transformation
    return X

pipeline.steps.insert(
    0, ('domain_transform', FunctionTransformer(domain_specific_transform))
)

# 4. Re-evaluate
pipeline.fit(X_train, y_train)
improved_score = pipeline.score(X_test, y_test)
print(f'Baseline: {baseline_score:.4f}, Improved: {improved_score:.4f}')

4.5.3 Deployment to Production Environment

Considerations for Deployment:

Model Size and Inference Speed
- Ensemble models are high accuracy but heavy
- Select model based on inference speed requirements
Dependency Management
- Include AutoML tool dependency libraries in production environment
- Docker containerization recommended
Version Control
- Model and pipeline versioning
- Use MLOps tools like MLflow, DVC
Monitoring
- Data drift detection
- Model performance tracking
- Set retraining triggers

Deployment Example (Flask API):

# Requirements:
# - Python 3.9+
# - flask>=2.3.0
# - joblib>=1.3.0
# - numpy>=1.24.0, <2.0.0

"""
Example: Deployment Example (Flask API):

Purpose: Demonstrate core concepts and implementation patterns
Target: Intermediate
Execution time: ~5 seconds
Dependencies: None
"""

# app.py
from flask import Flask, request, jsonify
import joblib
import numpy as np

app = Flask(__name__)

# Load model
model = joblib.load('tpot_model.pkl')

@app.route('/predict', methods=['POST'])
def predict():
    data = request.get_json()
    features = np.array(data['features']).reshape(1, -1)
    prediction = model.predict(features)
    probability = model.predict_proba(features)

    return jsonify({
        'prediction': int(prediction[0]),
        'probability': probability[0].tolist()
    })

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

4.5.4 Cost and Time Management

Computational Cost Reduction Strategies:

Early Stopping:
- Terminate early when no improvement is seen
- Set max_time_mins, max_models parameters
Parallel Processing:
- Use all CPU cores with n_jobs=-1
- For cloud, select appropriate instance type
Data Sampling:
- Initial exploration with small sample
- Retrain with full data once promising configuration is found
Staged Approach:

# Stage 1: Fast exploration (10 minutes)
quick_automl = TPOTClassifier(
    generations=3,
    population_size=10,
    max_time_mins=10
)
quick_automl.fit(X_train_sample, y_train_sample)

# Stage 2: Detailed exploration (1 hour)
if quick_automl.score(X_val, y_val) > 0.85:  # Only if threshold exceeded
    deep_automl = TPOTClassifier(
        generations=20,
        population_size=50,
        max_time_mins=60
    )
    deep_automl.fit(X_train, y_train)

Cloud Cost Management:

Spot/Preemptible Instances: Can reduce costs by 70%
Auto Scaling: Use resources only when needed
Budget Alerts: Avoid unexpected costs by setting limits

4.6 Summary

What We Learned

TPOT:
- Optimizes entire pipeline with genetic programming
- High transparency with Python code export
- Suitable for exploration on medium-scale data
Auto-sklearn:
- Efficient exploration with Bayesian optimization and meta-learning
- Automatic ensemble construction
- Widely used academically
H2O AutoML:
- Strong with large-scale data
- Easy result comparison with leaderboard
- Rich model explainability features
Tool Selection Criteria:
- Consider data size, task type, time constraints
- Balance full automation and customization
- Production requirements (speed, size, dependencies)
Best Practices:
- Reduce costs with staged approach
- Design monitoring for deployment
- Integration with MLOps tools

Next Steps

In Chapter 5, we will learn automated feature engineering and using Feature Tools:

Theory of automatic feature generation
Deep feature synthesis with Feature Tools
Automatic feature extraction for time series data
Automation of feature selection

Exercises

Question 1: Explain the roles of "crossover" and "mutation" in TPOT's genetic programming approach, and describe how each contributes to pipeline optimization.

Question 2: Explain how Auto-sklearn's meta-learning solves the cold start problem. Also discuss situations where meta-learning might not be effective.

Question 3: Design an experiment to compare the performance of H2O AutoML's stacked ensemble versus single models. Describe what types of datasets would maximize the effectiveness of stacking.

Question 4: Select the optimal AutoML tool for the following scenarios and explain your reasoning:
(a) 10,000 samples of medical diagnostic data, high interpretability required
(b) 1 billion samples of click log data, inference speed is important
(c) Mixed image and text data, rapid prototyping

Question 5: List five major considerations when deploying AutoML models to production environments, and describe specific countermeasures for each (within 600 characters).

References

Olson, R. S. et al. "TPOT: A Tree-based Pipeline Optimization Tool for Automating Machine Learning." AutoML Workshop at ICML (2016).
Feurer, M. et al. "Efficient and Robust Automated Machine Learning." NIPS (2015).
LeDell, E. & Poirier, S. "H2O AutoML: Scalable Automatic Machine Learning." AutoML Workshop at ICML (2020).
Hutter, F. et al. "Sequential Model-Based Optimization for General Algorithm Configuration." LION (2011).
Molnar, C. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. (2022).
Lundberg, S. M. & Lee, S.-I. "A Unified Approach to Interpreting Model Predictions." NIPS (2017).
He, X. et al. "AutoML: A Survey of the State-of-the-Art." Knowledge-Based Systems (2021).

Next Chapter: Chapter 5: Automated Feature Engineering

License: This content is provided under CC BY 4.0 license.

Chapter 4: Practical AutoML Tools

4.1 TPOT (Tree-based Pipeline Optimization Tool)

4.1.1 Overview of TPOT

4.1.2 Genetic Programming Approach

4.1.3 Basic Usage of TPOT

4.1.4 Customizing TPOT Configuration

4.1.5 Regression Example

4.2 Auto-sklearn

4.2.1 Overview of Auto-sklearn

4.2.2 Bayesian Optimization and Meta-learning

4.2.3 Basic Usage of Auto-sklearn

4.2.4 New Features in Auto-sklearn 2.0

4.2.5 Custom Settings and Constraints

4.3 H2O AutoML

4.3.1 Overview of H2O AutoML

4.3.2 Basic Usage of H2O AutoML

4.3.3 Stacked Ensemble

4.3.4 Model Explainability

4.4 Other AutoML Tools

4.4.1 Google AutoML

4.4.2 Azure AutoML

4.4.3 PyCaret

4.4.4 Ludwig

4.4.5 AutoML Tool Comparison Table

4.5 AutoML Best Practices

4.5.1 Tool Selection Criteria

4.5.2 Customization vs Full Automation

4.5.3 Deployment to Production Environment

4.5.4 Cost and Time Management

4.6 Summary

What We Learned

Next Steps

Exercises

References

Disclaimer