🌐 EN | 🇯🇵 JP | Last sync: 2025-11-16

Chapter 4: Practical Tuning Strategies

From Multi-Objective Optimization to Production Operations - Advanced Techniques for Real-World Use

📖 Reading Time: 25 minutes 📊 Difficulty: Advanced 💻 Code Examples: 6 🎯 Practice-Oriented

This chapter focuses on practical applications of Practical Tuning Strategies. You will learn Achieving tradeoff balance through multi-objective optimization.

Learning Objectives

By reading this chapter, you will master:

1. Multi-Objective Optimization

In real ML systems, you need to balance multiple metrics beyond just accuracy, including latency, model size, and inference cost.

1.1 Accuracy vs Latency Tradeoff

Single-objective optimization may select models unsuitable for real-world operations. For example:

graph LR A[Accuracy-focused] -->|Tradeoff| B[Pareto Frontier] C[Speed-focused] -->|Tradeoff| B B --> D[Set of optimal solutions] D --> E[Select based on business requirements]

1.2 Understanding the Pareto Frontier

The Pareto frontier is the set of solutions where no metric can be improved without degrading another. This allows selecting the optimal model from multiple candidates.

Pareto Optimality: A state where a solution is not inferior in all metrics and is superior in at least one metric.

1.3 Multi-Objective Optimization in Optuna

Optuna supports multi-objective optimization that simultaneously optimizes multiple objective functions.

# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0
# - optuna>=3.2.0

import optuna
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
from sklearn.datasets import load_breast_cancer
import time
import numpy as np

# Dataset preparation
X, y = load_breast_cancer(return_X_y=True)

def objective(trial):
    """Optimize accuracy and latency simultaneously"""
    # Suggest hyperparameters
    params = {
        'n_estimators': trial.suggest_int('n_estimators', 10, 200),
        'max_depth': trial.suggest_int('max_depth', 2, 32),
        'min_samples_split': trial.suggest_int('min_samples_split', 2, 20),
        'min_samples_leaf': trial.suggest_int('min_samples_leaf', 1, 10),
        'max_features': trial.suggest_categorical('max_features', ['sqrt', 'log2', None])
    }

    # Model training and accuracy evaluation
    clf = RandomForestClassifier(**params, random_state=42, n_jobs=-1)
    accuracy = cross_val_score(clf, X, y, cv=3, n_jobs=-1).mean()

    # Latency measurement (inference time)
    clf.fit(X, y)
    start_time = time.time()
    _ = clf.predict(X[:100])  # Measure inference time for 100 samples
    latency = (time.time() - start_time) * 1000  # milliseconds

    # Return two objectives: maximize accuracy, minimize latency
    return accuracy, latency

# Multi-objective optimization study
study = optuna.create_study(
    directions=['maximize', 'minimize'],  # Maximize accuracy, minimize latency
    study_name='multi_objective_optimization'
)

study.optimize(objective, n_trials=50)

# Get Pareto frontier solutions
print("=== Pareto Optimal Solutions ===")
for trial in study.best_trials:
    print(f"Trial {trial.number}:")
    print(f"  Accuracy: {trial.values[0]:.4f}")
    print(f"  Latency: {trial.values[1]:.2f} ms")
    print(f"  Params: {trial.params}\n")
💡 Practical Tips:

1.4 Visualizing the Pareto Frontier

# Requirements:
# - Python 3.9+
# - matplotlib>=3.7.0

import matplotlib.pyplot as plt

def visualize_pareto_frontier(study):
    """Visualize the Pareto frontier"""
    # Get all trial results
    trials = study.trials
    accuracies = [t.values[0] for t in trials]
    latencies = [t.values[1] for t in trials]

    # Get Pareto optimal solutions
    pareto_trials = study.best_trials
    pareto_accuracies = [t.values[0] for t in pareto_trials]
    pareto_latencies = [t.values[1] for t in pareto_trials]

    # Plot
    plt.figure(figsize=(10, 6))
    plt.scatter(latencies, accuracies, alpha=0.5, label='All trials')
    plt.scatter(pareto_latencies, pareto_accuracies,
                color='red', s=100, marker='*', label='Pareto frontier')

    # Connect Pareto frontier with lines
    sorted_pareto = sorted(zip(pareto_latencies, pareto_accuracies))
    plt.plot([p[0] for p in sorted_pareto], [p[1] for p in sorted_pareto],
             'r--', alpha=0.5)

    plt.xlabel('Latency (ms)')
    plt.ylabel('Accuracy')
    plt.title('Multi-Objective Optimization: Accuracy vs Latency')
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()

visualize_pareto_frontier(study)

2. Early Stopping Strategy

By terminating unpromising trials early, you can use computational resources efficiently and perform more exploration.

2.1 Importance of Pruning

In hyperparameter tuning, many trials do not ultimately yield good results, so early termination is the key to efficiency.

Strategy Characteristics Application Scenarios
MedianPruner Terminates trials that are worse than median General purpose, balanced
PercentilePruner Terminates trials not in top X% When aggressive exploration reduction is needed
SuccessiveHalvingPruner Allocates resources progressively When learning curves are available
HyperbandPruner Runs multiple SuccessiveHalving in parallel Large-scale exploration, state-of-the-art method

2.2 Implementing MedianPruner

# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0
# - optuna>=3.2.0

import optuna
from optuna.pruners import MedianPruner
from sklearn.model_selection import cross_validate
from sklearn.ensemble import GradientBoostingClassifier
import numpy as np

def objective_with_pruning(trial):
    """Objective function with pruning"""
    params = {
        'n_estimators': trial.suggest_int('n_estimators', 50, 500),
        'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3),
        'max_depth': trial.suggest_int('max_depth', 3, 10),
        'subsample': trial.suggest_float('subsample', 0.5, 1.0),
    }

    clf = GradientBoostingClassifier(**params, random_state=42)

    # Progressive evaluation (report intermediate progress)
    for step in range(5):
        # Evaluate with progressively increasing n_estimators
        intermediate_clf = GradientBoostingClassifier(
            n_estimators=(step + 1) * 20,
            learning_rate=params['learning_rate'],
            max_depth=params['max_depth'],
            subsample=params['subsample'],
            random_state=42
        )

        # Intermediate evaluation
        scores = cross_validate(intermediate_clf, X, y, cv=3,
                                scoring='accuracy', n_jobs=-1)
        intermediate_score = scores['test_score'].mean()

        # Report intermediate result
        trial.report(intermediate_score, step)

        # Pruning decision
        if trial.should_prune():
            raise optuna.TrialPruned()

    # Final evaluation
    final_scores = cross_validate(clf, X, y, cv=5, scoring='accuracy', n_jobs=-1)
    return final_scores['test_score'].mean()

# Study with MedianPruner
pruner = MedianPruner(
    n_startup_trials=5,  # Do not prune first 5 trials
    n_warmup_steps=2,    # Do not prune first 2 steps
    interval_steps=1     # Make pruning decision at each step
)

study = optuna.create_study(
    direction='maximize',
    pruner=pruner,
    study_name='pruning_optimization'
)

study.optimize(objective_with_pruning, n_trials=50)

print(f"Best accuracy: {study.best_value:.4f}")
print(f"Best params: {study.best_params}")
print(f"Pruned trials: {len([t for t in study.trials if t.state == optuna.trial.TrialState.PRUNED])}")
print(f"Completed trials: {len([t for t in study.trials if t.state == optuna.trial.TrialState.COMPLETE])}")

2.3 Implementing PercentilePruner

from optuna.pruners import PercentilePruner

# PercentilePruner: Terminate trials not in top 25%
pruner = PercentilePruner(
    percentile=25.0,      # Only top 25% continue
    n_startup_trials=5,   # First 5 trials always complete
    n_warmup_steps=2      # Do not prune first 2 steps
)

study = optuna.create_study(
    direction='maximize',
    pruner=pruner,
    study_name='percentile_pruning'
)

study.optimize(objective_with_pruning, n_trials=50)

# Analyze pruning effectiveness
pruned_count = len([t for t in study.trials if t.state == optuna.trial.TrialState.PRUNED])
completed_count = len([t for t in study.trials if t.state == optuna.trial.TrialState.COMPLETE])
pruning_rate = pruned_count / len(study.trials) * 100

print(f"Pruning rate: {pruning_rate:.1f}%")
print(f"Time saved: ~{pruning_rate * 0.8:.1f}% (estimated)")
⚠️ Cautions:

3. Distributed Hyperparameter Tuning

For large search spaces or computationally expensive models, executing tuning in a distributed environment can significantly reduce time.

3.1 How Optuna Distributed Optimization Works

Optuna allows multiple workers to collaboratively execute optimization through shared storage (RDB, Redis, etc.).

graph TD A[Shared Storage
RDB/Redis] --> B[Worker 1] A --> C[Worker 2] A --> D[Worker 3] A --> E[Worker N] B --> F[Save trial results] C --> F D --> F E --> F F --> A

3.2 Distributed Optimization with RDB

# Requirements:
# - Python 3.9+
# - optuna>=3.2.0

"""
Example: 3.2 Distributed Optimization with RDB

Purpose: Demonstrate optimization techniques
Target: Advanced
Execution time: 10-30 seconds
Dependencies: None
"""

import optuna
from optuna.storages import RDBStorage

# Shared storage configuration (PostgreSQL example)
storage = RDBStorage(
    url='postgresql://user:password@localhost/optuna_db',
    engine_kwargs={
        'pool_size': 20,
        'max_overflow': 0,
    }
)

# Create distributed study (shared among multiple workers)
study = optuna.create_study(
    study_name='distributed_optimization',
    storage=storage,
    direction='maximize',
    load_if_exists=True  # Reuse existing study if present
)

# Execute this code on each worker
def objective(trial):
    x = trial.suggest_float('x', -10, 10)
    y = trial.suggest_float('y', -10, 10)
    return -(x**2 + y**2)

# Each worker optimizes in parallel
study.optimize(objective, n_trials=100)

print(f"Best value: {study.best_value}")
print(f"Best params: {study.best_params}")
print(f"Total trials: {len(study.trials)}")
💡 Best Practices for Distributed Optimization:

3.3 Distributed Tuning with Ray Tune

Ray Tune is a framework specialized for distributed execution and scheduling, suitable for parallel tuning on large clusters.

# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0
# - ray>=2.5.0

from ray import tune
from ray.tune.schedulers import ASHAScheduler
from ray.tune.search.optuna import OptunaSearch
import numpy as np

def train_model(config):
    """Training function (Ray Tune executes in parallel)"""
    # Simulation: implement actual model training
    for epoch in range(10):
        # Training using config['lr'] and config['batch_size']
        accuracy = 1 - (config['lr'] - 0.01)**2 - (config['batch_size'] - 32)**2 / 1000
        accuracy += np.random.normal(0, 0.01)  # Noise

        # Report intermediate result
        tune.report(accuracy=accuracy)

# Define search space
search_space = {
    'lr': tune.loguniform(1e-4, 1e-1),
    'batch_size': tune.choice([16, 32, 64, 128]),
    'hidden_size': tune.choice([64, 128, 256, 512]),
}

# ASHA Scheduler (efficient early stopping)
scheduler = ASHAScheduler(
    max_t=10,           # Maximum epochs
    grace_period=1,     # Minimum execution epochs
    reduction_factor=2  # Terminate half at each stage
)

# Use Optuna search algorithm
search_alg = OptunaSearch()

# Execute distributed tuning
analysis = tune.run(
    train_model,
    config=search_space,
    num_samples=50,           # Number of trials
    scheduler=scheduler,
    search_alg=search_alg,
    resources_per_trial={     # Resources per trial
        'cpu': 2,
        'gpu': 0.5
    },
    verbose=1
)

# Get optimal parameters
best_config = analysis.best_config
print(f"Best config: {best_config}")
print(f"Best accuracy: {analysis.best_result['accuracy']:.4f}")

3.4 Scalability Considerations

Scale Recommended Framework Storage Number of Workers
Small scale (~100 trials) Optuna SQLite 1-4
Medium scale (100-1000 trials) Optuna PostgreSQL 4-16
Large scale (1000-10000 trials) Ray Tune PostgreSQL/Redis 16-64
Ultra-large scale (10000+ trials) Ray Tune Distributed Redis 64+

4. Transfer Learning and Warm Start

By leveraging past tuning results and knowledge from similar tasks, exploration can be greatly streamlined.

4.1 Leveraging Prior Knowledge

Rather than starting tuning from scratch, this approach begins exploration from known good hyperparameters.

Benefits of Warm Start

4.2 Implementing Warm Start in Optuna

# Requirements:
# - Python 3.9+
# - optuna>=3.2.0

"""
Example: 4.2 Implementing Warm Start in Optuna

Purpose: Demonstrate optimization techniques
Target: Advanced
Execution time: 10-30 seconds
Dependencies: None
"""

import optuna
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score

# Known good hyperparameters (from past experience or domain knowledge)
known_good_params = [
    {'n_estimators': 100, 'max_depth': 10, 'min_samples_split': 5},
    {'n_estimators': 200, 'max_depth': 15, 'min_samples_split': 2},
    {'n_estimators': 150, 'max_depth': 12, 'min_samples_split': 4},
]

def objective(trial):
    params = {
        'n_estimators': trial.suggest_int('n_estimators', 50, 300),
        'max_depth': trial.suggest_int('max_depth', 5, 20),
        'min_samples_split': trial.suggest_int('min_samples_split', 2, 10),
    }

    clf = RandomForestClassifier(**params, random_state=42, n_jobs=-1)
    score = cross_val_score(clf, X, y, cv=3, n_jobs=-1).mean()
    return score

# Create study
study = optuna.create_study(direction='maximize')

# Warm start: Add known good parameters in advance
for params in known_good_params:
    study.enqueue_trial(params)

# Execute optimization (enqueued trials are executed preferentially)
study.optimize(objective, n_trials=50)

print(f"Best score: {study.best_value:.4f}")
print(f"Best params: {study.best_params}")

# Check effectiveness of pre-registered parameters
warmstart_trials = study.trials[:len(known_good_params)]
print("\n=== Warmstart trials performance ===")
for i, trial in enumerate(warmstart_trials):
    print(f"Params {i+1}: {trial.params} -> Score: {trial.value:.4f}")

4.3 Applications of Meta-Learning

A technique that learns from multiple similar tasks and predicts good initial parameters for new tasks.

Meta-Learning: Called "learning to learn," a technology that accelerates adaptation to new tasks from experience on multiple tasks.

Practical Meta-Learning Approaches

  1. Save past tuning history: Record pairs of dataset characteristics and best parameters
  2. Search similar tasks: Identify past tasks similar to new tasks
  3. Parameter recommendation: Use best parameters from similar tasks as initial values
# Requirements:
# - Python 3.9+
# - numpy>=1.24.0, <2.0.0

import json
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

class MetaLearningOptimizer:
    """Hyperparameter optimization using meta-learning"""

    def __init__(self, history_file='tuning_history.json'):
        self.history_file = history_file
        self.history = self.load_history()

    def load_history(self):
        """Load past tuning history"""
        try:
            with open(self.history_file, 'r') as f:
                return json.load(f)
        except FileNotFoundError:
            return []

    def save_history(self, dataset_features, best_params, best_score):
        """Save new tuning results"""
        self.history.append({
            'dataset_features': dataset_features,
            'best_params': best_params,
            'best_score': best_score
        })
        with open(self.history_file, 'w') as f:
            json.dump(self.history, f, indent=2)

    def get_dataset_features(self, X, y):
        """Extract dataset features"""
        return {
            'n_samples': X.shape[0],
            'n_features': X.shape[1],
            'n_classes': len(np.unique(y)),
            'class_imbalance': np.std(np.bincount(y)) / np.mean(np.bincount(y)),
            'feature_correlation': np.mean(np.abs(np.corrcoef(X.T))),
        }

    def find_similar_tasks(self, current_features, top_k=3):
        """Search for similar tasks"""
        if not self.history:
            return []

        # Vectorize features
        current_vec = np.array(list(current_features.values())).reshape(1, -1)

        similarities = []
        for record in self.history:
            hist_vec = np.array(list(record['dataset_features'].values())).reshape(1, -1)
            sim = cosine_similarity(current_vec, hist_vec)[0][0]
            similarities.append((sim, record))

        # Sort by similarity
        similarities.sort(reverse=True, key=lambda x: x[0])
        return [record for _, record in similarities[:top_k]]

    def get_warmstart_params(self, X, y):
        """Recommend parameters for warm start"""
        current_features = self.get_dataset_features(X, y)
        similar_tasks = self.find_similar_tasks(current_features)

        if not similar_tasks:
            return []

        # Return best parameters from similar tasks
        return [task['best_params'] for task in similar_tasks]

# Usage example
meta_optimizer = MetaLearningOptimizer()

# Warm start for new task
warmstart_params = meta_optimizer.get_warmstart_params(X, y)

if warmstart_params:
    print("=== Recommended warmstart parameters ===")
    for i, params in enumerate(warmstart_params):
        print(f"Recommendation {i+1}: {params}")

    # Warm start with Optuna
    study = optuna.create_study(direction='maximize')
    for params in warmstart_params:
        study.enqueue_trial(params)
    study.optimize(objective, n_trials=50)
else:
    # If no history, perform normal optimization
    study = optuna.create_study(direction='maximize')
    study.optimize(objective, n_trials=50)

# Save results to history
dataset_features = meta_optimizer.get_dataset_features(X, y)
meta_optimizer.save_history(dataset_features, study.best_params, study.best_value)

4.4 Practical Approaches to Transfer Learning

Approach Application Scenarios Effect
Warm Start Experience with similar tasks 20-40% reduction in exploration time
Meta-Learning Many past tasks available Improved initial performance + exploration efficiency
Domain Knowledge Injection Expert insights available Risk reduction + fast convergence
Ensemble Utilization Multiple candidate parameters Improved robustness

5. Practical Tuning Guide

This section explains best practices for applying theory to practical work, debugging techniques, and production operation know-how.

5.1 Best Practices for Search Space Design

Principles for Effective Search Space Design

  1. Prioritize by impact
    • Learning rate, regularization parameters → High priority
    • Batch size, number of epochs → Medium priority
    • Fine-tuning parameters → Low priority
  2. Select appropriate scale
    • Learning rate: Logarithmic scale (loguniform)
    • Regularization strength: Logarithmic scale
    • Number of layers, units: Integer, linear scale
  3. Utilize conditional parameters
    • Parameters dependent on specific choices use conditional branching
💡 Search Space Design Checklist:

5.2 Debugging and Troubleshooting

Common Problems and Solutions

Problem Cause Solution
No convergence Search space too wide Narrow range with preliminary experiments
Same parameters repeatedly Sampler bias Compare with RandomSampler
Too much pruning Pruner settings too strict Increase n_warmup_steps
Unstable results Evaluation randomness Increase CV folds, fix seed
Out of memory Model too large Reduce batch size, gradient accumulation

Visualization for Debugging

# Requirements:
# - Python 3.9+
# - optuna>=3.2.0

"""
Example: Visualization for Debugging

Purpose: Demonstrate data visualization techniques
Target: Advanced
Execution time: 10-30 seconds
Dependencies: None
"""

import optuna
from optuna.visualization import (
    plot_optimization_history,
    plot_param_importances,
    plot_parallel_coordinate,
    plot_slice
)

# Visualization after optimization execution
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=100)

# 1. Optimization history: Check if improving over time
fig1 = plot_optimization_history(study)
fig1.show()

# 2. Parameter importance: Check which parameters are important
fig2 = plot_param_importances(study)
fig2.show()

# 3. Parallel coordinate plot: Check relationships between parameters
fig3 = plot_parallel_coordinate(study)
fig3.show()

# 4. Slice plot: Check individual parameter effects
fig4 = plot_slice(study)
fig4.show()

5.3 Production Environment Operations

Production Operations Workflow

graph TD A[Development environment exploration] --> B[Select candidate parameters] B --> C[Verify in staging environment] C --> D{Performance & stability OK?} D -->|No| A D -->|Yes| E[Production deployment] E --> F[Monitoring] F --> G{Performance degradation detected?} G -->|Yes| H[Re-tuning] G -->|No| F H --> A

Best Practices for Production Operations

  1. Gradual Rollout
    • Canary deployment: Verify with a portion of traffic
    • A/B testing: Parallel operation with existing model
    • Gradual switching: Immediate rollback if problems occur
  2. Continuous Monitoring
    • Track prediction performance (accuracy, AUC, etc.)
    • Monitor latency and throughput
    • Data drift detection
  3. Regular Re-tuning
    • Perform quarterly or as data distribution changes
    • Evaluate new algorithms and techniques
    • Meta-learning leveraging past results
⚠️ Cautions for Production Operations:

5.4 Summary of Practical Tuning Strategies

Phase Purpose Recommended Method Number of Trials
Initial exploration Understand overall picture Random Search 20-50
Narrowing Identify promising regions TPE + Pruning 50-100
Precision exploration Find optimal solution CMA-ES/GP 100-200
Multi-objective Adjust tradeoffs Multi-objective TPE 100-300
Production verification Final confirmation Increase cross-validation 5-10

5.5 Tuning Efficiency Cheat Sheet

Quick tuning procedure when time is limited:
  1. Set initial values from domain knowledge/past experience (warm start)
  2. Narrow down to 2-3 important parameters for exploration (learning rate, regularization)
  3. Enable MedianPruner to reduce wasteful trials
  4. Parallel execution (4-8 workers) to reduce time
  5. Ensure practical performance with 50-100 trials

End-of-Chapter Exercises

Exercise 1: Implementing Multi-Objective Optimization (Difficulty: Medium)

Implement an Optuna study that simultaneously optimizes accuracy, inference time, and model size. Visualize the Pareto frontier and select a solution that meets business requirements (accuracy ≥0.90, inference time ≤50ms).

Exercise 2: Comparing Pruning Strategies (Difficulty: Medium)

Compare MedianPruner, PercentilePruner, and HyperbandPruner through experiments. For the same objective function, compare the pruning rate, final performance, and computation time of each pruner, and evaluate which is most efficient.

Exercise 3: Implementing Distributed Tuning (Difficulty: Advanced)

Implement Optuna distributed optimization using PostgreSQL. Access the study simultaneously from 3 different workers and efficiently execute a total of 150 trials. Analyze each worker's contribution.

Exercise 4: Building a Meta-Learning System (Difficulty: Advanced)

Build a meta-learning system that performs tuning on multiple datasets (UCI ML Repository, etc.) and recommends optimal hyperparameters for new datasets based on that history.

Exercise 5: Production Operations Simulation (Difficulty: Advanced)

Using time series data, implement a production operations simulation. Create a mechanism to detect performance degradation when data drift occurs and automatically trigger re-tuning.

Summary

In this chapter, we learned practical hyperparameter tuning strategies:

By combining these techniques, you can achieve high-quality and efficient hyperparameter tuning required in practical work.

📊 Practical Project: End-to-End Tuning Pipeline

Build an end-to-end tuning pipeline that integrates all the techniques learned in this chapter to solve real business challenges:

  1. Select a Kaggle competition or real dataset
  2. Optimize accuracy and inference time with multi-objective optimization
  3. Streamline exploration with pruning, reduce time with distributed execution
  4. Leverage past knowledge with meta-learning
  5. Implement monitoring and re-tuning mechanisms for production operations

Through this project, establish production-level hyperparameter tuning skills.

Disclaimer