第5章：実践プロジェクト - ディープラーニング基礎入門

この章では、これまでの章で学んだすべてを応用して、完全な画像分類システムを構築します。MNISTデータセットを使用し、完全なPyTorch訓練パイプラインを実装し、モデル性能を評価し、ハイパーパラメータをチューニングし、推論用にモデルを保存・読み込みます。

学習目標

MNISTデータセットの読み込みと探索
適切なデータ前処理と正規化の実装
PyTorchでの完全な訓練ループの構築
精度、適合率、再現率、混同行列を使用したモデル評価
グリッドサーチとランダムサーチによるハイパーパラメータチューニング
訓練済みモデルの保存と推論用の読み込み

1. MNISTデータセットでの画像分類

1.1 データセットの読み込みと探索

MNISTデータセットは、手書き数字（0-9）の70,000枚のグレースケール画像を含み、各画像は28x28ピクセルです。ディープラーニングの「Hello World」です。

import torch
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# データ変換の定義
transform = transforms.Compose([
    transforms.ToTensor(),  # テンソルに変換 [0, 1]
    transforms.Normalize((0.1307,), (0.3081,))  # MNIST平均と標準偏差
])

# MNISTデータセットのダウンロードと読み込み
train_dataset = datasets.MNIST(
    root='./data',
    train=True,
    download=True,
    transform=transform
)

test_dataset = datasets.MNIST(
    root='./data',
    train=False,
    download=True,
    transform=transform
)

print("MNISTデータセット統計")
print("=" * 40)
print(f"訓練サンプル数: {len(train_dataset)}")
print(f"テストサンプル数: {len(test_dataset)}")
print(f"画像形状: {train_dataset[0][0].shape}")
print(f"クラス数: {len(train_dataset.classes)}")

1.2 データの前処理と正規化

適切な前処理は良いモデル性能に不可欠です：

ToTensor()：PIL画像をPyTorchテンソルに変換し、[0, 1]にスケール
Normalize()：データセットの平均と標準偏差で標準化

2. データ前処理とバッチ処理

2.1 DataLoaderの使用

DataLoaderはバッチ処理、シャッフル、並列データ読み込みを処理します：

BATCH_SIZE = 64

train_loader = DataLoader(
    train_dataset,
    batch_size=BATCH_SIZE,
    shuffle=True,     # 訓練データをシャッフル
    num_workers=2,    # 並列データ読み込み
    pin_memory=True   # 高速GPU転送
)

test_loader = DataLoader(
    test_dataset,
    batch_size=BATCH_SIZE,
    shuffle=False,    # テストデータはシャッフルしない
    num_workers=2,
    pin_memory=True
)

print(f"バッチサイズ: {BATCH_SIZE}")
print(f"訓練バッチ数: {len(train_loader)}")
print(f"テストバッチ数: {len(test_loader)}")

2.2 データ拡張（Data Augmentation）

データ拡張は訓練データのサイズと多様性を人工的に増加させます：

from torchvision import transforms

# 拡張付き訓練変換
train_transform = transforms.Compose([
    transforms.RandomRotation(10),           # ±10度回転
    transforms.RandomAffine(
        degrees=0,
        translate=(0.1, 0.1),               # 10%までシフト
        scale=(0.9, 1.1)                    # 90-110%スケール
    ),
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

3. モデル構築と学習ループ

3.1 PyTorchでのモデル定義

import torch.nn as nn
import torch.nn.functional as F

class MNISTClassifier(nn.Module):
    """
    MNIST分類用ニューラルネットワーク

    アーキテクチャ:
    - 入力: 784 (28x28をフラット化)
    - 隠れ層1: 256ユニット + ReLU + Dropout
    - 隠れ層2: 128ユニット + ReLU + Dropout
    - 出力: 10クラス
    """

    def __init__(self, dropout_rate=0.3):
        super(MNISTClassifier, self).__init__()

        self.flatten = nn.Flatten()

        self.fc1 = nn.Linear(28 * 28, 256)
        self.bn1 = nn.BatchNorm1d(256)
        self.dropout1 = nn.Dropout(dropout_rate)

        self.fc2 = nn.Linear(256, 128)
        self.bn2 = nn.BatchNorm1d(128)
        self.dropout2 = nn.Dropout(dropout_rate)

        self.fc3 = nn.Linear(128, 10)

    def forward(self, x):
        x = self.flatten(x)
        x = F.relu(self.bn1(self.fc1(x)))
        x = self.dropout1(x)
        x = F.relu(self.bn2(self.fc2(x)))
        x = self.dropout2(x)
        x = self.fc3(x)
        return x

# モデル作成
model = MNISTClassifier(dropout_rate=0.3)
print("モデルアーキテクチャ:")
print(model)

# パラメータ数カウント
total_params = sum(p.numel() for p in model.parameters())
print(f"\n総パラメータ数: {total_params:,}")

3.2 訓練ループの実装

def train_epoch(model, train_loader, criterion, optimizer, device):
    """1エポックの訓練"""
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0

    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)

        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * images.size(0)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    return running_loss / total, correct / total

def validate(model, val_loader, criterion, device):
    """モデルの検証"""
    model.eval()
    running_loss = 0.0
    correct = 0
    total = 0

    with torch.no_grad():
        for images, labels in val_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            loss = criterion(outputs, labels)

            running_loss += loss.item() * images.size(0)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    return running_loss / total, correct / total

4. 性能評価と混同行列

4.1 精度、適合率、再現率、F1スコア

メトリクス	式	解釈
精度（Accuracy）	$\frac{TP + TN}{TP + TN + FP + FN}$	全体的な正確さ
適合率（Precision）	$\frac{TP}{TP + FP}$	予測陽性のうち正解の割合
再現率（Recall）	$\frac{TP}{TP + FN}$	実際の陽性のうち検出できた割合
F1スコア	$2 \cdot \frac{適合率 \cdot 再現率}{適合率 + 再現率}$	適合率と再現率の調和平均

4.2 混同行列の可視化

import numpy as np

def confusion_matrix(y_true, y_pred, num_classes=10):
    """混同行列の計算"""
    cm = np.zeros((num_classes, num_classes), dtype=int)
    for true, pred in zip(y_true, y_pred):
        cm[true, pred] += 1
    return cm

def print_confusion_matrix(cm, class_names=None):
    """混同行列の表示"""
    num_classes = cm.shape[0]
    if class_names is None:
        class_names = [str(i) for i in range(num_classes)]

    print("混同行列")
    print("=" * 60)
    print("行: 真のラベル, 列: 予測ラベル")

5. ハイパーパラメータチューニング

5.1 グリッドサーチ

import itertools

def grid_search(param_grid, train_fn, eval_fn):
    """グリッドサーチによるハイパーパラメータチューニング"""
    keys = list(param_grid.keys())
    values = list(param_grid.values())
    combinations = list(itertools.product(*values))

    best_score = float('-inf')
    best_params = None

    for combo in combinations:
        params = dict(zip(keys, combo))
        model = train_fn(params)
        score = eval_fn(model)

        if score > best_score:
            best_score = score
            best_params = params

    return best_params

# パラメータグリッド例
param_grid = {
    'learning_rate': [0.001, 0.01],
    'dropout_rate': [0.3, 0.5],
    'hidden_size': [128, 256]
}

5.2 ランダムサーチ

import numpy as np

def random_search(param_distributions, train_fn, eval_fn, n_iter=10):
    """ランダムサーチによるハイパーパラメータチューニング"""
    best_score = float('-inf')
    best_params = None

    for i in range(n_iter):
        params = {k: v() for k, v in param_distributions.items()}
        model = train_fn(params)
        score = eval_fn(model)

        if score > best_score:
            best_score = score
            best_params = params

    return best_params

# 連続分布の例
param_distributions = {
    'learning_rate': lambda: 10 ** np.random.uniform(-4, -2),
    'dropout_rate': lambda: np.random.uniform(0.1, 0.5),
    'hidden_size': lambda: np.random.choice([64, 128, 256, 512])
}

6. モデルの保存と推論

6.1 state_dictの保存と読み込み

import torch

def save_model(model, path, optimizer=None, epoch=None, metrics=None):
    """モデルチェックポイントの保存"""
    checkpoint = {
        'model_state_dict': model.state_dict(),
        'model_architecture': str(model)
    }

    if optimizer is not None:
        checkpoint['optimizer_state_dict'] = optimizer.state_dict()
    if epoch is not None:
        checkpoint['epoch'] = epoch
    if metrics is not None:
        checkpoint['metrics'] = metrics

    torch.save(checkpoint, path)
    print(f"モデルを {path} に保存しました")

def load_model(model, path, optimizer=None):
    """モデルチェックポイントの読み込み"""
    checkpoint = torch.load(path, map_location='cpu')
    model.load_state_dict(checkpoint['model_state_dict'])

    if optimizer is not None and 'optimizer_state_dict' in checkpoint:
        optimizer.load_state_dict(checkpoint['optimizer_state_dict'])

    print(f"モデルを {path} から読み込みました")
    return checkpoint

6.2 推論モードでの予測

import torch.nn.functional as F

def predict(model, images, device='cpu'):
    """訓練済みモデルでの予測"""
    model.eval()
    model = model.to(device)
    images = images.to(device)

    with torch.no_grad():
        logits = model(images)
        probabilities = F.softmax(logits, dim=1)
        predictions = torch.argmax(probabilities, dim=1)

    return predictions, probabilities

def predict_single(model, image, device='cpu'):
    """単一画像の予測"""
    if image.dim() == 3:
        image = image.unsqueeze(0)

    predictions, probabilities = predict(model, image, device)

    prediction = predictions[0].item()
    confidence = probabilities[0, prediction].item()

    return prediction, confidence

演習問題

演習1：Fashion-MNIST分類

問題：学んだすべてをFashion-MNIST（衣類アイテム）に適用してください。95%以上の精度を達成し、混同行列を作成して最も混同されやすいクラスを特定してください。

演習2：データ拡張のアブレーション

問題：データ拡張あり/なしでモデルを訓練し、検証精度と過学習の挙動を比較してください。

演習3：学習率スケジューリングの比較

問題：固定、StepLR、ReduceLROnPlateau、CosineAnnealingLRを比較し、各スケジュールの学習率と検証精度をプロットしてください。

演習4：モデル複雑度の分析

問題：隠れユニット数[64, 128, 256, 512]でモデルを作成し、訓練/検証精度とモデルサイズの関係をプロットして最適な複雑度を見つけてください。

演習5：エンドツーエンドプロジェクト

問題：完全な数字認識システムを構築してください：最良のモデルを訓練、適切な検証と早期終了を実装、最良のモデルチェックポイントを保存、28x28画像配列を受け取る予測関数を作成、最終テスト精度・混同行列・クラスごとのF1スコアを報告。

まとめ

この章では、ディープラーニングの概念を応用して完全な画像分類システムを構築しました：

MNISTデータセット：70,000枚の手書き数字画像を持つ標準ベンチマーク
データパイプライン：バッチ処理用DataLoader、前処理と拡張用transforms
訓練ループ：順伝播、損失計算、逆伝播、オプティマイザ更新
評価：精度、適合率、再現率、F1スコア、混同行列
ハイパーパラメータチューニング：グリッドサーチとランダムサーチ戦略
モデル管理：チェックポイントの保存/読み込み、推論モード予測

おめでとうございます！ディープラーニング基礎コースを修了しました。以下の基盤を得ました：

ゼロからニューラルネットワークを構築・訓練する
最適化と正則化技術を理解する
モデル性能を体系的に評価する
ディープラーニングを実世界の問題に適用する

次のステップ：コンピュータビジョン用CNN、シーケンス用RNN、またはNLP用Transformerを探求しましょう！