Chapter 1: AutoML Fundamentals

This chapter covers the fundamentals of AutoML Fundamentals, which what is automl. You will learn differences from traditional ML workflows, Comprehend the components of AutoML, and fundamentals of Neural Architecture Search (NAS).

Learning Objectives

By reading this chapter, you will master the following:

✅ Understand the concept and purpose of AutoML
✅ Explain the differences from traditional ML workflows
✅ Comprehend the components of AutoML and their roles
✅ Understand the fundamentals of Neural Architecture Search (NAS)
✅ Learn the concepts and applications of Meta-Learning
✅ Master AutoML evaluation methods

1.1 What is AutoML

Democratization of Machine Learning

AutoML (Automated Machine Learning) is a technology that automates the machine learning model development process. It aims to enable even non-data scientists to build high-quality machine learning models.

"AutoML realizes the democratization of machine learning, allowing more people to utilize AI technology"

Purpose of AutoML

Purpose	Description	Effect
Efficiency	Automate manual processes	Reduce development time
Reduce Expertise Requirements	Deep machine learning knowledge not required	Lower barriers to entry
Performance Improvement	Discover optimal solutions through systematic search	Eliminate human bias
Reproducibility	Standardized processes	Improve result reliability

Comparison with Traditional ML Workflow

graph TD subgraph "Traditional Workflow" A1[Data Collection] --> B1[Manual Preprocessing] B1 --> C1[Feature Engineering] C1 --> D1[Model Selection] D1 --> E1[Hyperparameter Tuning] E1 --> F1[Evaluation] F1 -->|Trial and Error| C1 end subgraph "AutoML Workflow" A2[Data Collection] --> B2[Automatic Preprocessing] B2 --> C2[Automatic Feature Generation] C2 --> D2[Automatic Model Selection] D2 --> E2[Automatic Hyperparameter Optimization] E2 --> F2[Evaluation] end style A1 fill:#ffebee style A2 fill:#ffebee style B1 fill:#fff3e0 style B2 fill:#e8f5e9 style C1 fill:#f3e5f5 style C2 fill:#e8f5e9 style D1 fill:#e3f2fd style D2 fill:#e8f5e9 style E1 fill:#fce4ec style E2 fill:#e8f5e9

Advantages and Disadvantages of AutoML

Advantages

Time Savings: Reduce work from weeks to hours
Accessibility: Usable by people with less expertise
Optimization: Discover combinations humans might overlook
Best Practices: Automatically applied

Disadvantages

Computational Cost: Large-scale searches require many resources
Black Box Nature: Reduced process transparency
Flexibility Constraints: Customization can be difficult
Neglect of Domain Knowledge: Cannot leverage data background knowledge

Example: Effects of AutoML

# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0
# - pandas>=2.0.0, <2.2.0

"""
Example: Example: Effects of AutoML

Purpose: Demonstrate machine learning model training and evaluation
Target: Advanced
Execution time: 1-5 minutes
Dependencies: None
"""

import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import time

# Data preparation
data = load_breast_cancer()
X, y = data.data, data.target
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Traditional approach (fixed parameters)
start_time = time.time()
model_manual = RandomForestClassifier(n_estimators=100, random_state=42)
model_manual.fit(X_train, y_train)
y_pred_manual = model_manual.predict(X_test)
acc_manual = accuracy_score(y_test, y_pred_manual)
time_manual = time.time() - start_time

# AutoML-style simple implementation (grid search)
from sklearn.model_selection import GridSearchCV

start_time = time.time()
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 10, 20, 30],
    'min_samples_split': [2, 5, 10]
}
model_auto = GridSearchCV(
    RandomForestClassifier(random_state=42),
    param_grid,
    cv=3,
    n_jobs=-1
)
model_auto.fit(X_train, y_train)
y_pred_auto = model_auto.predict(X_test)
acc_auto = accuracy_score(y_test, y_pred_auto)
time_auto = time.time() - start_time

print("=== Traditional vs AutoML-style Approach ===")
print(f"\nTraditional Approach:")
print(f"  Accuracy: {acc_manual:.4f}")
print(f"  Time: {time_manual:.2f}s")

print(f"\nAutoML-style Approach:")
print(f"  Accuracy: {acc_auto:.4f}")
print(f"  Time: {time_auto:.2f}s")
print(f"  Best Parameters: {model_auto.best_params_}")

print(f"\nImprovement:")
print(f"  Accuracy Gain: {(acc_auto - acc_manual) * 100:.2f}%")

Example Output:

=== Traditional vs AutoML-style Approach ===

Traditional Approach:
  Accuracy: 0.9649
  Time: 0.15s

AutoML-style Approach:
  Accuracy: 0.9737
  Time: 12.34s
  Best Parameters: {'max_depth': 20, 'min_samples_split': 2, 'n_estimators': 100}

Improvement:
  Accuracy Gain: 0.88%

1.2 Components of AutoML

AutoML systems consist of multiple components to automate the entire machine learning pipeline.

Data Preprocessing Automation

Automates transformation from raw data to learnable formats:

Missing Value Handling: Automatic detection and imputation strategy selection
Outlier Detection: Detection using statistical methods or Isolation Forest
Scaling: Automatic selection of StandardScaler, MinMaxScaler
Encoding: Automatic conversion of categorical variables

# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0
# - pandas>=2.0.0, <2.2.0

"""
Example: Automates transformation from raw data to learnable formats:

Purpose: Demonstrate machine learning model training and evaluation
Target: Intermediate
Execution time: ~5 seconds
Dependencies: None
"""

import numpy as np
import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline

# Sample data (with missing values)
np.random.seed(42)
data = pd.DataFrame({
    'age': [25, 30, np.nan, 45, 50, 35],
    'salary': [50000, 60000, 55000, np.nan, 80000, 65000],
    'department': ['Sales', 'IT', 'HR', 'IT', 'Sales', np.nan]
})

print("=== Original Data ===")
print(data)

# Automatic preprocessing pipeline
numeric_features = ['age', 'salary']
categorical_features = ['department']

numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())
])

categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='most_frequent')),
    ('encoder', OneHotEncoder(handle_unknown='ignore'))
])

preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)
    ])

# Execute preprocessing
data_transformed = preprocessor.fit_transform(data)

print("\n=== Data Shape After Preprocessing ===")
print(f"Shape: {data_transformed.shape}")
print(f"Missing Values: 0 (all handled)")

Feature Engineering

Automatically generates new features:

Polynomial Features: Combinations of existing features
Aggregate Features: Statistics by group
Time Series Features: Lags, moving averages, seasonality
Text Features: TF-IDF, embedding representations

# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0

"""
Example: Automatically generates new features:

Purpose: Demonstrate data visualization techniques
Target: Beginner to Intermediate
Execution time: 2-5 seconds
Dependencies: None
"""

from sklearn.preprocessing import PolynomialFeatures
from sklearn.datasets import make_regression
import matplotlib.pyplot as plt

# Sample data
X, y = make_regression(n_samples=100, n_features=2, noise=10, random_state=42)

# Polynomial feature generation
poly = PolynomialFeatures(degree=2, include_bias=False)
X_poly = poly.fit_transform(X)

print("=== Feature Engineering ===")
print(f"Original feature count: {X.shape[1]}")
print(f"Feature count after generation: {X_poly.shape[1]}")
print(f"\nGenerated features:")
print(poly.get_feature_names_out(['x1', 'x2']))

# Performance comparison
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

# Original features
model_original = LinearRegression()
model_original.fit(X, y)
y_pred_original = model_original.predict(X)
r2_original = r2_score(y, y_pred_original)

# Polynomial features
model_poly = LinearRegression()
model_poly.fit(X_poly, y)
y_pred_poly = model_poly.predict(X_poly)
r2_poly = r2_score(y, y_pred_poly)

print(f"\n=== Performance Comparison ===")
print(f"Original Features R²: {r2_original:.4f}")
print(f"Polynomial Features R²: {r2_poly:.4f}")
print(f"Improvement: {(r2_poly - r2_original) * 100:.2f}%")

Model Selection

Automatically selects the optimal algorithm for the task and data:

Linear Models: Logistic Regression, Ridge, Lasso
Tree-based: Decision Tree, Random Forest, XGBoost
Support Vector Machines: SVC, SVR
Neural Networks: MLP, CNN, RNN

from sklearn.datasets import load_iris
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier

# Data preparation
iris = load_iris()
X, y = iris.data, iris.target

# Evaluate multiple models
models = {
    'Logistic Regression': LogisticRegression(max_iter=1000),
    'Decision Tree': DecisionTreeClassifier(),
    'Random Forest': RandomForestClassifier(),
    'SVM': SVC(),
    'KNN': KNeighborsClassifier()
}

print("=== Automatic Model Selection ===")
results = {}
for name, model in models.items():
    scores = cross_val_score(model, X, y, cv=5, scoring='accuracy')
    results[name] = scores.mean()
    print(f"{name:20s}: {scores.mean():.4f} (+/- {scores.std():.4f})")

# Select best model
best_model = max(results, key=results.get)
print(f"\nBest Model: {best_model} (Accuracy: {results[best_model]:.4f})")

Hyperparameter Optimization

Automatically tunes model parameters:

Grid Search: Explore all combinations
Random Search: Random sampling
Bayesian Optimization: Efficient search
Evolutionary Algorithms: Genetic algorithms

from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint, uniform

# Data preparation
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Hyperparameter search space
param_distributions = {
    'n_estimators': randint(50, 500),
    'max_depth': [None] + list(range(5, 50, 5)),
    'min_samples_split': randint(2, 20),
    'min_samples_leaf': randint(1, 10),
    'max_features': uniform(0.1, 0.9)
}

# Random search
random_search = RandomizedSearchCV(
    RandomForestClassifier(random_state=42),
    param_distributions=param_distributions,
    n_iter=50,
    cv=3,
    random_state=42,
    n_jobs=-1,
    verbose=0
)

print("=== Hyperparameter Optimization ===")
random_search.fit(X_train, y_train)

print(f"Best Score (CV): {random_search.best_score_:.4f}")
print(f"Best Parameters:")
for param, value in random_search.best_params_.items():
    print(f"  {param}: {value}")

# Evaluate on test set
test_score = random_search.score(X_test, y_test)
print(f"\nTest Set Accuracy: {test_score:.4f}")

AutoML Workflow Diagram

graph TD A[Raw Data] --> B[Data Preprocessing Automation] B --> C[Feature Engineering] C --> D[Model Selection] D --> E[Hyperparameter Optimization] E --> F[Ensemble] F --> G[Final Model] B --> B1[Missing Value Handling] B --> B2[Outlier Detection] B --> B3[Scaling] C --> C1[Polynomial Features] C --> C2[Aggregate Features] C --> C3[Feature Selection] D --> D1[Linear Models] D --> D2[Tree-based] D --> D3[Neural Networks] E --> E1[Grid Search] E --> E2[Bayesian Optimization] E --> E3[Evolutionary Methods] style A fill:#ffebee style B fill:#fff3e0 style C fill:#f3e5f5 style D fill:#e3f2fd style E fill:#fce4ec style F fill:#e8f5e9 style G fill:#c8e6c9

1.3 Neural Architecture Search (NAS)

Concept of NAS

Neural Architecture Search (NAS) is a technology that automatically designs neural network architectures. Algorithms automatically search for network structures that were manually designed by humans.

NAS can be described as "a neural network that designs neural networks"

Search Space

Design elements explored by NAS:

Layer Types: Convolutional layers, fully connected layers, pooling layers, etc.
Number of Layers: Network depth
Layer Parameters: Number of filters, kernel size, stride, etc.
Connection Patterns: Skip connections, residual connections, etc.
Activation Functions: ReLU, Sigmoid, Tanh, etc.

Search Strategies

1. Random Search

Randomly samples and evaluates architectures. Simple but inefficient.

2. Reinforcement Learning-based

A controller (RNN) generates architectures and learns using their performance as rewards.

Reward function:

$$ R = \text{Accuracy} - \lambda \cdot \text{Complexity} $$

$\text{Accuracy}$: Validation accuracy
$\text{Complexity}$: Model complexity (number of parameters, etc.)
$\lambda$: Complexity penalty coefficient

3. Evolutionary Algorithms

Uses genetic algorithms to evolve superior architectures.

Mutation: Add/remove layers, change parameters
Crossover: Combine two architectures
Selection: Retain high-performing architectures

4. Gradient-based Methods (DARTS)

Relaxes the search space to be continuous and optimizes using gradient descent. Computationally efficient.

NAS Implementation Example (Simplified Version)

# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0

"""
Example: NAS Implementation Example (Simplified Version)

Purpose: Demonstrate machine learning model training and evaluation
Target: Advanced
Execution time: 30-60 seconds
Dependencies: None
"""

import numpy as np
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.neural_network import MLPClassifier

# Data preparation
digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(
    digits.data, digits.target, test_size=0.2, random_state=42
)

# Simple NAS: Explore architectures with random search
def random_architecture_search(n_trials=10):
    best_score = 0
    best_architecture = None

    print("=== Neural Architecture Search ===")
    for i in range(n_trials):
        # Randomly generate architecture
        n_layers = np.random.randint(1, 4)  # 1-3 layers
        hidden_layer_sizes = tuple(
            np.random.choice([32, 64, 128, 256]) for _ in range(n_layers)
        )
        activation = np.random.choice(['relu', 'tanh', 'logistic'])

        # Train and evaluate model
        model = MLPClassifier(
            hidden_layer_sizes=hidden_layer_sizes,
            activation=activation,
            max_iter=100,
            random_state=42
        )
        model.fit(X_train, y_train)
        score = model.score(X_test, y_test)

        print(f"Trial {i+1}: layers={hidden_layer_sizes}, "
              f"activation={activation}, score={score:.4f}")

        if score > best_score:
            best_score = score
            best_architecture = {
                'hidden_layer_sizes': hidden_layer_sizes,
                'activation': activation,
                'score': score
            }

    return best_architecture

# Run NAS
best_arch = random_architecture_search(n_trials=10)

print(f"\n=== Best Architecture ===")
print(f"Layer Configuration: {best_arch['hidden_layer_sizes']}")
print(f"Activation Function: {best_arch['activation']}")
print(f"Accuracy: {best_arch['score']:.4f}")

NAS Challenges

Challenge	Description	Countermeasure
Computational Cost	Evaluating thousands of architectures	Early stopping, proxy task usage
Search Space Size	Combinatorial explosion	Search space constraints, hierarchical search
Lack of Transferability	Search needed for each task	Transfer learning, meta-learning utilization
Overfitting	Overfitting to validation data	Regularization, use multiple datasets

1.4 Meta-Learning

Learning to Learn

Meta-Learning is a method that "learns how to learn." It leverages experience from past tasks to efficiently learn new tasks.

"Learning the learning algorithm itself" - The essence of meta-learning

Few-shot Learning

A method for efficiently learning from a small number of samples.

N-way K-shot learning:

N: Number of classes
K: Number of samples per class
Example: 5-way 1-shot = 5 classes, 1 sample each

# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0

"""
Example: N-way K-shot learning:

Purpose: Demonstrate machine learning model training and evaluation
Target: Intermediate
Execution time: 30-60 seconds
Dependencies: None
"""

import numpy as np
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

# Few-shot learning simulation
def few_shot_learning_demo(n_way=5, k_shot=3):
    # Data preparation
    digits = load_digits()
    X, y = digits.data, digits.target

    # Select task (n_way classes)
    selected_classes = np.random.choice(10, n_way, replace=False)

    # Support set (training: k_shot × n_way samples)
    support_X, support_y = [], []
    # Query set (testing)
    query_X, query_y = [], []

    for cls in selected_classes:
        cls_indices = np.where(y == cls)[0]
        selected = np.random.choice(cls_indices, k_shot + 10, replace=False)

        # k_shot samples to support set
        support_X.extend(X[selected[:k_shot]])
        support_y.extend([cls] * k_shot)

        # Remaining to query set
        query_X.extend(X[selected[k_shot:]])
        query_y.extend([cls] * 10)

    support_X = np.array(support_X)
    support_y = np.array(support_y)
    query_X = np.array(query_X)
    query_y = np.array(query_y)

    # Few-shot learning (using KNN)
    model = KNeighborsClassifier(n_neighbors=min(3, k_shot))
    model.fit(support_X, support_y)

    # Evaluate
    accuracy = model.score(query_X, query_y)

    print(f"=== {n_way}-way {k_shot}-shot Learning ===")
    print(f"Support Set: {len(support_X)} samples")
    print(f"Query Set: {len(query_X)} samples")
    print(f"Accuracy: {accuracy:.4f}")

    return accuracy

# Experiment with different settings
for k in [1, 3, 5]:
    few_shot_learning_demo(n_way=5, k_shot=k)
    print()

Transfer Learning

Transfer knowledge learned on one task to another task.

Pre-trained Models: Use models trained on ImageNet, etc.
Fine-tuning: Adjust for new tasks
Domain Adaptation: Reduce inter-domain differences

Warm-starting

Use optimal parameters from past tasks as initial values to accelerate learning on new tasks.

from sklearn.linear_model import SGDClassifier
from sklearn.datasets import make_classification

# Task 1 and Task 2 (similar tasks)
X1, y1 = make_classification(n_samples=1000, n_features=20,
                             n_informative=15, random_state=42)
X2, y2 = make_classification(n_samples=1000, n_features=20,
                             n_informative=15, random_state=43)

print("=== Warm-starting Effect Verification ===")

# Cold start (learn Task 2 from scratch)
model_cold = SGDClassifier(max_iter=100, random_state=42)
model_cold.fit(X2[:100], y2[:100])  # Learn with small data
score_cold = model_cold.score(X2[100:], y2[100:])

# Warm start (pre-train on Task 1)
model_warm = SGDClassifier(max_iter=100, random_state=42)
model_warm.fit(X1, y1)  # Learn on Task 1
model_warm.partial_fit(X2[:100], y2[:100])  # Additional learning on Task 2
score_warm = model_warm.score(X2[100:], y2[100:])

print(f"Cold Start Accuracy: {score_cold:.4f}")
print(f"Warm Start Accuracy: {score_warm:.4f}")
print(f"Improvement: {(score_warm - score_cold) * 100:.2f}%")

1.5 AutoML Evaluation

Performance Metrics

Metrics for evaluating AutoML system performance:

Metric	Description	Importance
Prediction Accuracy	Model prediction performance	Most important
Search Time	Time to find optimal model	Practically important
Computational Cost	Required resources (CPU, GPU, memory)	Scalability
Robustness	Stability across different datasets	Generality

Computational Cost

Quantifying AutoML computational cost:

$$ \text{Total Cost} = \sum_{i=1}^{n} C_i \times T_i $$

$C_i$: Computational cost of i-th model (FLOPS, etc.)
$T_i$: Training time of i-th model
$n$: Total number of models evaluated

Reproducibility

Whether the same input produces the same results:

Fixed Random Seeds: Reproducible experiments
Pipeline Saving: Save trained models and preprocessing
Version Control: Record library versions

Interpretability

Understanding AutoML's decision process:

Feature Importance: Which features are important
Model Selection Reasoning: Why that model was chosen
Hyperparameter Impact: Contribution of each parameter

# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0

"""
Example: Understanding AutoML's decision process:

Purpose: Demonstrate data visualization techniques
Target: Advanced
Execution time: 30-60 seconds
Dependencies: None
"""

from sklearn.ensemble import RandomForestClassifier
from sklearn.inspection import permutation_importance
import matplotlib.pyplot as plt

# Data preparation
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Model training
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Feature importance
feature_importance = model.feature_importances_
feature_names = iris.feature_names

# Permutation Importance
perm_importance = permutation_importance(
    model, X_test, y_test, n_repeats=10, random_state=42
)

# Visualization
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Feature importance
axes[0].barh(feature_names, feature_importance)
axes[0].set_xlabel('Importance')
axes[0].set_title('Feature Importance (Gini)')
axes[0].grid(True, alpha=0.3)

# Permutation Importance
axes[1].barh(feature_names, perm_importance.importances_mean)
axes[1].set_xlabel('Importance')
axes[1].set_title('Permutation Importance')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("=== Interpretability Analysis ===")
for name, importance in zip(feature_names, feature_importance):
    print(f"{name:20s}: {importance:.4f}")

1.6 Chapter Summary

What We Learned

AutoML Concepts
- Realizing democratization of machine learning
- Efficiency and reduced expertise requirements
- Differences and advantages over traditional methods
AutoML Components
- Data preprocessing automation
- Feature engineering
- Model selection and hyperparameter optimization
Neural Architecture Search
- Automatic network structure design
- Search strategies (RL, evolutionary, gradient-based)
- Battle with computational cost
Meta-Learning
- Learning how to learn
- Few-shot learning, Transfer learning
- Acceleration through warm-starting
AutoML Evaluation
- Performance metrics (accuracy, time, cost)
- Importance of reproducibility and interpretability

AutoML Principles

Principle	Description
Balance Automation and Transparency	Avoid black boxes, maintain interpretability
Efficiency	Search strategies considering computational resources
Generality	Applicable to various tasks and data
Leverage Domain Knowledge	Combination of automation and expertise
Continuous Improvement	Improve learning efficiency through meta-learning

Next Chapter

In Chapter 2, we will learn about AutoML Tools and Frameworks:

Auto-sklearn
TPOT
H2O AutoML
Google Cloud AutoML
AutoKeras

Exercises

Question 1 (Difficulty: easy)

List three main purposes of AutoML and explain each.

Answer Example

Answer:

Efficiency
- Description: Automate manual model development processes and significantly reduce development time
- Effect: Can reduce work from weeks to hours
Reduce Expertise Requirements
- Description: Enable building high-quality models without deep machine learning expertise
- Effect: More people can utilize AI technology (democratization)
Performance Improvement
- Description: Discover optimal combinations that humans might overlook through systematic search
- Effect: Eliminate human bias and objectively find the best model

Question 2 (Difficulty: medium)

Explain four search strategies for Neural Architecture Search (NAS) and describe the advantages and disadvantages of each.

Answer Example

Answer:

Search Strategy	Description	Advantages	Disadvantages
Random Search	Randomly sample architectures	Simple implementation, easy parallelization	Inefficient, unsuitable for large-scale search
Reinforcement Learning-based	RNN controller generates architectures	Efficiently explores promising regions	High computational cost, stability issues
Evolutionary Algorithms	Evolve superior architectures through genetic operations	Maintains diversity, avoids local optima	Slow convergence, requires large populations
Gradient-based (DARTS)	Relax search space and optimize with gradient descent	Computationally efficient, fast	Discretization errors, search space constraints

Question 3 (Difficulty: medium)

Explain what "5-way 3-shot learning" means in few-shot learning and calculate the number of training samples in this setting.

Answer Example

Answer:

Meaning of "5-way 3-shot learning":

5-way: A task to classify 5 classes
3-shot: Use only 3 samples per class for training

Number of training samples:

$$ \text{Number of samples} = \text{Number of classes} \times \text{Samples per class} = 5 \times 3 = 15 $$

That is, learning 5-class classification with only 15 samples.

Concrete example:

Classify 5 types of animals (dog, cat, bird, fish, horse)
Use only 3 images of each animal (total 15 images) for training
Become able to correctly classify new animal images

Question 4 (Difficulty: hard)

Complete the following code to implement a simple AutoML system. Include data preprocessing, model selection, and hyperparameter optimization.

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

# Data preparation
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Implement AutoML system here

Answer Example

# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0

"""
Example: Complete the following code to implement a simple AutoML sys

Purpose: Demonstrate machine learning model training and evaluation
Target: Advanced
Execution time: 1-5 minutes
Dependencies: None
"""

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
import numpy as np

# Data preparation
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

print("=== Simple AutoML System ===\n")

# Step 1: Define model candidates and hyperparameter space
models = {
    'Logistic Regression': {
        'model': LogisticRegression(max_iter=1000),
        'params': {
            'classifier__C': [0.1, 1.0, 10.0],
            'classifier__penalty': ['l2']
        }
    },
    'Random Forest': {
        'model': RandomForestClassifier(random_state=42),
        'params': {
            'classifier__n_estimators': [50, 100, 200],
            'classifier__max_depth': [None, 10, 20],
            'classifier__min_samples_split': [2, 5]
        }
    },
    'SVM': {
        'model': SVC(),
        'params': {
            'classifier__C': [0.1, 1.0, 10.0],
            'classifier__kernel': ['rbf', 'linear']
        }
    }
}

# Step 2: Preprocessing pipeline + hyperparameter optimization for each model
best_overall_score = 0
best_overall_model = None
best_overall_name = None

for name, config in models.items():
    print(f"--- {name} ---")

    # Build pipeline (preprocessing + model)
    pipeline = Pipeline([
        ('scaler', StandardScaler()),
        ('classifier', config['model'])
    ])

    # Hyperparameter optimization with grid search
    grid_search = GridSearchCV(
        pipeline,
        param_grid=config['params'],
        cv=5,
        scoring='accuracy',
        n_jobs=-1
    )

    grid_search.fit(X_train, y_train)

    # Results
    cv_score = grid_search.best_score_
    test_score = grid_search.score(X_test, y_test)

    print(f"  Best CV Score: {cv_score:.4f}")
    print(f"  Test Score: {test_score:.4f}")
    print(f"  Best Parameters: {grid_search.best_params_}")
    print()

    # Update best model
    if cv_score > best_overall_score:
        best_overall_score = cv_score
        best_overall_model = grid_search.best_estimator_
        best_overall_name = name

# Step 3: Final results
print("=" * 50)
print(f"Best Model: {best_overall_name}")
print(f"CV Score: {best_overall_score:.4f}")
print(f"Test Score: {best_overall_model.score(X_test, y_test):.4f}")
print("=" * 50)

Example Output:

=== Simple AutoML System ===

--- Logistic Regression ---
  Best CV Score: 0.9780
  Test Score: 0.9825
  Best Parameters: {'classifier__C': 1.0, 'classifier__penalty': 'l2'}

--- Random Forest ---
  Best CV Score: 0.9648
  Test Score: 0.9649
  Best Parameters: {'classifier__max_depth': None, ...}

--- SVM ---
  Best CV Score: 0.9758
  Test Score: 0.9737
  Best Parameters: {'classifier__C': 1.0, 'classifier__kernel': 'linear'}

==================================================
Best Model: Logistic Regression
CV Score: 0.9780
Test Score: 0.9825
==================================================

Question 5 (Difficulty: hard)

Explain the tradeoff between "computational cost" and "prediction accuracy" in AutoML, and describe how to balance them practically.

Answer Example

Answer:

Essence of Tradeoff:

Aspect	High Accuracy Pursuit	Low Cost Pursuit
Search Range	Extensive search (thousands of models)	Limited search (dozens of models)
Time	Days to weeks	Hours to days
Resources	Large-scale GPU/cluster	Single machine
Accuracy Improvement	+1-2% improvement	Baseline achievement

Strategies for Balance:

Staged Approach
- Phase 1: Fast search to narrow down promising model candidates (hours)
- Phase 2: Detailed optimization on candidates (days)
Early Stopping
- Terminate search if validation accuracy doesn't improve
- Set computational budget limits (time/cost)
Efficient Search Methods
- Use Bayesian optimization instead of random search
- Improve initial state with transfer learning or meta-learning
Task-dependent Priorities
- Production systems: Accuracy priority (allow high cost)
- Prototypes: Speed priority (emphasize low cost)
- Research: Balance both
Multi-objective Optimization
- Include computational cost in objective function

$$ \text{Objective} = \alpha \cdot \text{Accuracy} - (1-\alpha) \cdot \log(\text{Cost}) $$

$\alpha$: Weight for accuracy and cost (0 to 1)

Practical Recommendations:

First explore with low cost to understand baseline performance
Only perform high-cost search when business value is high
Quantitatively evaluate cost and effect of 1% accuracy improvement

References

Hutter, F., Kotthoff, L., & Vanschoren, J. (Eds.). (2019). Automated Machine Learning: Methods, Systems, Challenges. Springer.
Elsken, T., Metzen, J. H., & Hutter, F. (2019). Neural Architecture Search: A Survey. Journal of Machine Learning Research, 20(55), 1-21.
Hospedales, T., Antoniou, A., Micaelli, P., & Storkey, A. (2021). Meta-Learning in Neural Networks: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence.
Feurer, M., & Hutter, F. (2019). Hyperparameter Optimization. In Automated Machine Learning (pp. 3-33). Springer.
He, X., Zhao, K., & Chu, X. (2021). AutoML: A survey of the state-of-the-art. Knowledge-Based Systems, 212, 106622.

Learning Objectives

1.1 What is AutoML

Democratization of Machine Learning

Purpose of AutoML

Comparison with Traditional ML Workflow

Advantages and Disadvantages of AutoML

Advantages

Disadvantages

Example: Effects of AutoML

1.2 Components of AutoML

Data Preprocessing Automation

Feature Engineering

Model Selection

Hyperparameter Optimization

AutoML Workflow Diagram

1.3 Neural Architecture Search (NAS)

Concept of NAS

Search Space

Search Strategies

1. Random Search

2. Reinforcement Learning-based

3. Evolutionary Algorithms

4. Gradient-based Methods (DARTS)

NAS Implementation Example (Simplified Version)

NAS Challenges

1.4 Meta-Learning

Learning to Learn

Few-shot Learning

Transfer Learning

Warm-starting

1.5 AutoML Evaluation

Performance Metrics

Computational Cost

Reproducibility

Interpretability

1.6 Chapter Summary

What We Learned

AutoML Principles

Next Chapter

Exercises

Question 1 (Difficulty: easy)

Question 2 (Difficulty: medium)

Question 3 (Difficulty: medium)

Question 4 (Difficulty: hard)

Question 5 (Difficulty: hard)

References

Disclaimer