Chapter 5: Practical Projects

Building End-to-End Deep Learning Applications

Reading Time: 40-45 minutes Code Examples: 6 Exercises: 5 Difficulty: Intermediate-Advanced
In this chapter, we will apply everything learned in previous chapters to build a complete image classification system. We will work with the MNIST dataset, implement a full PyTorch training pipeline, evaluate model performance, tune hyperparameters, and save/load models for inference.

Learning Objectives

1. MNIST Image Classification

1.1 Loading and Exploring the Dataset

The MNIST dataset contains 70,000 grayscale images of handwritten digits (0-9), each 28x28 pixels. It's the "Hello World" of deep learning.

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
import numpy as np

# Download and load MNIST dataset
transform = transforms.Compose([
    transforms.ToTensor(),  # Convert to tensor [0, 1]
    transforms.Normalize((0.1307,), (0.3081,))  # MNIST mean and std
])

train_dataset = datasets.MNIST(
    root='./data',
    train=True,
    download=True,
    transform=transform
)

test_dataset = datasets.MNIST(
    root='./data',
    train=False,
    download=True,
    transform=transform
)

print("MNIST Dataset Statistics")
print("=" * 40)
print(f"Training samples: {len(train_dataset)}")
print(f"Test samples: {len(test_dataset)}")
print(f"Image shape: {train_dataset[0][0].shape}")
print(f"Number of classes: {len(train_dataset.classes)}")
print(f"Classes: {train_dataset.classes}")

# Explore a sample
sample_image, sample_label = train_dataset[0]
print(f"\nSample image:")
print(f"  Shape: {sample_image.shape}")
print(f"  Min value: {sample_image.min():.4f}")
print(f"  Max value: {sample_image.max():.4f}")
print(f"  Label: {sample_label}")

1.2 Data Preprocessing and Normalization

Proper preprocessing is essential for good model performance:

Why Normalize?

2. Data Preprocessing and Batch Processing

2.1 Using DataLoader

The DataLoader handles batching, shuffling, and parallel data loading:

import torch
from torch.utils.data import DataLoader

# Create data loaders
BATCH_SIZE = 64

train_loader = DataLoader(
    train_dataset,
    batch_size=BATCH_SIZE,
    shuffle=True,     # Shuffle training data
    num_workers=2,    # Parallel data loading
    pin_memory=True   # Faster GPU transfer
)

test_loader = DataLoader(
    test_dataset,
    batch_size=BATCH_SIZE,
    shuffle=False,    # Don't shuffle test data
    num_workers=2,
    pin_memory=True
)

print("DataLoader Configuration")
print("=" * 40)
print(f"Batch size: {BATCH_SIZE}")
print(f"Training batches: {len(train_loader)}")
print(f"Test batches: {len(test_loader)}")

# Iterate through one batch
for images, labels in train_loader:
    print(f"\nBatch shapes:")
    print(f"  Images: {images.shape}")
    print(f"  Labels: {labels.shape}")
    break

2.2 Data Augmentation

Data augmentation artificially increases the size and diversity of training data:

from torchvision import transforms

# Training transforms with augmentation
train_transform = transforms.Compose([
    transforms.RandomRotation(10),           # Rotate +/- 10 degrees
    transforms.RandomAffine(
        degrees=0,
        translate=(0.1, 0.1),               # Shift up to 10%
        scale=(0.9, 1.1)                    # Scale 90-110%
    ),
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

# Test transforms (no augmentation)
test_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

# Create augmented dataset
train_dataset_augmented = datasets.MNIST(
    root='./data',
    train=True,
    download=True,
    transform=train_transform
)

print("Data Augmentation Example")
print("=" * 40)

# Compare original and augmented
original_sample = train_dataset[0][0]
augmented_sample = train_dataset_augmented[0][0]

print(f"Original shape: {original_sample.shape}")
print(f"Augmented shape: {augmented_sample.shape}")
print(f"Values changed: {not torch.equal(original_sample, augmented_sample)}")

3. Model Building and Training Loop

3.1 Model Definition in PyTorch

import torch
import torch.nn as nn
import torch.nn.functional as F

class MNISTClassifier(nn.Module):
    """
    Neural network for MNIST classification

    Architecture:
    - Input: 784 (28x28 flattened)
    - Hidden 1: 256 units + ReLU + Dropout
    - Hidden 2: 128 units + ReLU + Dropout
    - Output: 10 classes
    """

    def __init__(self, dropout_rate=0.3):
        super(MNISTClassifier, self).__init__()

        self.flatten = nn.Flatten()

        self.fc1 = nn.Linear(28 * 28, 256)
        self.bn1 = nn.BatchNorm1d(256)
        self.dropout1 = nn.Dropout(dropout_rate)

        self.fc2 = nn.Linear(256, 128)
        self.bn2 = nn.BatchNorm1d(128)
        self.dropout2 = nn.Dropout(dropout_rate)

        self.fc3 = nn.Linear(128, 10)

    def forward(self, x):
        # Flatten: (batch, 1, 28, 28) -> (batch, 784)
        x = self.flatten(x)

        # Layer 1
        x = self.fc1(x)
        x = self.bn1(x)
        x = F.relu(x)
        x = self.dropout1(x)

        # Layer 2
        x = self.fc2(x)
        x = self.bn2(x)
        x = F.relu(x)
        x = self.dropout2(x)

        # Output layer (no softmax - handled by CrossEntropyLoss)
        x = self.fc3(x)

        return x

# Create model
model = MNISTClassifier(dropout_rate=0.3)

# Check model architecture
print("Model Architecture")
print("=" * 40)
print(model)

# Count parameters
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"\nTotal parameters: {total_params:,}")
print(f"Trainable parameters: {trainable_params:,}")

3.2 Training Loop Implementation

import torch
import torch.nn as nn
import torch.optim as optim
from tqdm import tqdm

def train_epoch(model, train_loader, criterion, optimizer, device):
    """
    Train for one epoch

    Returns:
    --------
    avg_loss : float
        Average training loss
    accuracy : float
        Training accuracy
    """
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0

    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)

        # Zero gradients
        optimizer.zero_grad()

        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward pass
        loss.backward()
        optimizer.step()

        # Statistics
        running_loss += loss.item() * images.size(0)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    avg_loss = running_loss / total
    accuracy = correct / total

    return avg_loss, accuracy

def validate(model, val_loader, criterion, device):
    """
    Validate the model

    Returns:
    --------
    avg_loss : float
        Average validation loss
    accuracy : float
        Validation accuracy
    """
    model.eval()
    running_loss = 0.0
    correct = 0
    total = 0

    with torch.no_grad():
        for images, labels in val_loader:
            images, labels = images.to(device), labels.to(device)

            outputs = model(images)
            loss = criterion(outputs, labels)

            running_loss += loss.item() * images.size(0)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    avg_loss = running_loss / total
    accuracy = correct / total

    return avg_loss, accuracy

3.3 Complete Training Script

import torch
import torch.nn as nn
import torch.optim as optim

def train_model(model, train_loader, test_loader, epochs=10, lr=0.001, device='cpu'):
    """
    Complete training function

    Parameters:
    -----------
    model : nn.Module
        Model to train
    train_loader : DataLoader
        Training data
    test_loader : DataLoader
        Test/validation data
    epochs : int
        Number of training epochs
    lr : float
        Learning rate
    device : str
        Device to use ('cpu' or 'cuda')

    Returns:
    --------
    history : dict
        Training history
    """
    model = model.to(device)

    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=lr)
    scheduler = optim.lr_scheduler.ReduceLROnPlateau(
        optimizer, mode='min', factor=0.5, patience=2
    )

    history = {
        'train_loss': [], 'train_acc': [],
        'val_loss': [], 'val_acc': []
    }

    best_val_acc = 0.0

    print("Training Started")
    print("=" * 60)

    for epoch in range(epochs):
        # Train
        train_loss, train_acc = train_epoch(
            model, train_loader, criterion, optimizer, device
        )

        # Validate
        val_loss, val_acc = validate(model, test_loader, criterion, device)

        # Update scheduler
        scheduler.step(val_loss)

        # Record history
        history['train_loss'].append(train_loss)
        history['train_acc'].append(train_acc)
        history['val_loss'].append(val_loss)
        history['val_acc'].append(val_acc)

        # Save best model
        if val_acc > best_val_acc:
            best_val_acc = val_acc
            best_state = model.state_dict().copy()

        # Print progress
        current_lr = optimizer.param_groups[0]['lr']
        print(f"Epoch {epoch+1:3d}/{epochs} | "
              f"Train Loss: {train_loss:.4f} | Train Acc: {train_acc:.4f} | "
              f"Val Loss: {val_loss:.4f} | Val Acc: {val_acc:.4f} | "
              f"LR: {current_lr:.6f}")

    print("=" * 60)
    print(f"Best Validation Accuracy: {best_val_acc:.4f}")

    # Load best model
    model.load_state_dict(best_state)

    return history

# Example usage (run on available device)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

# Create fresh model
model = MNISTClassifier(dropout_rate=0.3)

# Train (reduced epochs for demonstration)
# history = train_model(model, train_loader, test_loader, epochs=5, device=device)

4. Performance Evaluation and Confusion Matrix

4.1 Precision, Recall, and F1 Score

Metric Formula Interpretation
Accuracy $\frac{TP + TN}{TP + TN + FP + FN}$ Overall correctness
Precision $\frac{TP}{TP + FP}$ Of predicted positives, how many are correct?
Recall $\frac{TP}{TP + FN}$ Of actual positives, how many did we find?
F1 Score $2 \cdot \frac{Precision \cdot Recall}{Precision + Recall}$ Harmonic mean of precision and recall
import numpy as np
from collections import Counter

def compute_metrics(y_true, y_pred, num_classes=10):
    """
    Compute classification metrics

    Parameters:
    -----------
    y_true : array-like
        True labels
    y_pred : array-like
        Predicted labels
    num_classes : int
        Number of classes

    Returns:
    --------
    metrics : dict
        Dictionary of metrics
    """
    y_true = np.array(y_true)
    y_pred = np.array(y_pred)

    # Overall accuracy
    accuracy = np.mean(y_true == y_pred)

    # Per-class metrics
    precision = np.zeros(num_classes)
    recall = np.zeros(num_classes)
    f1 = np.zeros(num_classes)

    for c in range(num_classes):
        # True positives
        tp = np.sum((y_true == c) & (y_pred == c))
        # False positives
        fp = np.sum((y_true != c) & (y_pred == c))
        # False negatives
        fn = np.sum((y_true == c) & (y_pred != c))

        precision[c] = tp / (tp + fp) if (tp + fp) > 0 else 0
        recall[c] = tp / (tp + fn) if (tp + fn) > 0 else 0
        f1[c] = 2 * precision[c] * recall[c] / (precision[c] + recall[c]) \
                if (precision[c] + recall[c]) > 0 else 0

    metrics = {
        'accuracy': accuracy,
        'precision_per_class': precision,
        'recall_per_class': recall,
        'f1_per_class': f1,
        'macro_precision': np.mean(precision),
        'macro_recall': np.mean(recall),
        'macro_f1': np.mean(f1)
    }

    return metrics

# Example with synthetic data
np.random.seed(42)
y_true = np.random.randint(0, 10, 1000)
y_pred = y_true.copy()
# Add some noise (10% error rate)
noise_idx = np.random.choice(1000, 100, replace=False)
y_pred[noise_idx] = np.random.randint(0, 10, 100)

metrics = compute_metrics(y_true, y_pred)

print("Classification Metrics")
print("=" * 40)
print(f"Accuracy: {metrics['accuracy']:.4f}")
print(f"Macro Precision: {metrics['macro_precision']:.4f}")
print(f"Macro Recall: {metrics['macro_recall']:.4f}")
print(f"Macro F1: {metrics['macro_f1']:.4f}")

4.2 Confusion Matrix Visualization

import numpy as np

def confusion_matrix(y_true, y_pred, num_classes=10):
    """
    Compute confusion matrix

    Parameters:
    -----------
    y_true : array-like
        True labels
    y_pred : array-like
        Predicted labels
    num_classes : int
        Number of classes

    Returns:
    --------
    cm : ndarray, shape (num_classes, num_classes)
        Confusion matrix where cm[i, j] is the number of samples
        with true label i that were predicted as j
    """
    cm = np.zeros((num_classes, num_classes), dtype=int)

    for true, pred in zip(y_true, y_pred):
        cm[true, pred] += 1

    return cm

def print_confusion_matrix(cm, class_names=None):
    """
    Print confusion matrix in a readable format
    """
    num_classes = cm.shape[0]

    if class_names is None:
        class_names = [str(i) for i in range(num_classes)]

    # Header
    print("Confusion Matrix")
    print("=" * 60)
    print("Rows: True labels, Columns: Predicted labels")
    print()

    # Column headers
    print("     ", end="")
    for name in class_names:
        print(f"{name:>5}", end="")
    print()

    # Matrix rows
    for i, name in enumerate(class_names):
        print(f"{name:>4} ", end="")
        for j in range(num_classes):
            print(f"{cm[i, j]:>5}", end="")
        print()

# Compute and display confusion matrix
cm = confusion_matrix(y_true, y_pred)
print_confusion_matrix(cm)

# Calculate per-class accuracy from confusion matrix
print("\nPer-class accuracy:")
for i in range(10):
    class_acc = cm[i, i] / cm[i, :].sum() if cm[i, :].sum() > 0 else 0
    print(f"  Class {i}: {class_acc:.2%}")

5. Hyperparameter Tuning

5.1 Grid Search

import itertools

def grid_search(param_grid, train_fn, eval_fn):
    """
    Grid search for hyperparameter tuning

    Parameters:
    -----------
    param_grid : dict
        Dictionary of parameter names to lists of values
    train_fn : callable
        Function that trains a model given parameters
    eval_fn : callable
        Function that evaluates a model and returns a score

    Returns:
    --------
    best_params : dict
        Best parameters found
    results : list
        All results
    """
    # Generate all combinations
    keys = list(param_grid.keys())
    values = list(param_grid.values())
    combinations = list(itertools.product(*values))

    results = []
    best_score = float('-inf')
    best_params = None

    print(f"Grid Search: {len(combinations)} combinations")
    print("=" * 50)

    for combo in combinations:
        params = dict(zip(keys, combo))

        # Train model
        model = train_fn(params)

        # Evaluate
        score = eval_fn(model)

        results.append({'params': params, 'score': score})

        print(f"Params: {params} -> Score: {score:.4f}")

        if score > best_score:
            best_score = score
            best_params = params

    print("=" * 50)
    print(f"Best params: {best_params}")
    print(f"Best score: {best_score:.4f}")

    return best_params, results

# Example parameter grid
param_grid = {
    'learning_rate': [0.001, 0.01],
    'dropout_rate': [0.3, 0.5],
    'hidden_size': [128, 256]
}

# Simulated training and evaluation (for demonstration)
def mock_train(params):
    return params  # Return params as mock model

def mock_eval(params):
    # Simulate: lower lr and moderate dropout tend to be better
    score = 0.9
    if params['learning_rate'] == 0.001:
        score += 0.02
    if params['dropout_rate'] == 0.3:
        score += 0.01
    if params['hidden_size'] == 256:
        score += 0.01
    score += np.random.normal(0, 0.005)
    return score

# Run grid search
np.random.seed(42)
best_params, results = grid_search(param_grid, mock_train, mock_eval)

5.2 Random Search

import numpy as np

def random_search(param_distributions, train_fn, eval_fn, n_iter=10):
    """
    Random search for hyperparameter tuning

    Parameters:
    -----------
    param_distributions : dict
        Dictionary of parameter names to sampling functions
    train_fn : callable
        Function that trains a model
    eval_fn : callable
        Function that evaluates a model
    n_iter : int
        Number of iterations

    Returns:
    --------
    best_params : dict
        Best parameters found
    results : list
        All results
    """
    results = []
    best_score = float('-inf')
    best_params = None

    print(f"Random Search: {n_iter} iterations")
    print("=" * 50)

    for i in range(n_iter):
        # Sample parameters
        params = {k: v() for k, v in param_distributions.items()}

        # Train and evaluate
        model = train_fn(params)
        score = eval_fn(model)

        results.append({'params': params, 'score': score})

        print(f"Iter {i+1}: {params} -> Score: {score:.4f}")

        if score > best_score:
            best_score = score
            best_params = params

    print("=" * 50)
    print(f"Best params: {best_params}")
    print(f"Best score: {best_score:.4f}")

    return best_params, results

# Example with continuous distributions
param_distributions = {
    'learning_rate': lambda: 10 ** np.random.uniform(-4, -2),  # 0.0001 to 0.01
    'dropout_rate': lambda: np.random.uniform(0.1, 0.5),
    'hidden_size': lambda: np.random.choice([64, 128, 256, 512])
}

# Run random search
np.random.seed(42)
best_params, results = random_search(
    param_distributions, mock_train, mock_eval, n_iter=10
)

6. Model Saving and Inference

6.1 Saving and Loading state_dict

import torch

# Save model
def save_model(model, path, optimizer=None, epoch=None, metrics=None):
    """
    Save model checkpoint

    Parameters:
    -----------
    model : nn.Module
        Model to save
    path : str
        File path
    optimizer : Optimizer, optional
        Optimizer state to save
    epoch : int, optional
        Current epoch
    metrics : dict, optional
        Training metrics
    """
    checkpoint = {
        'model_state_dict': model.state_dict(),
        'model_architecture': str(model)
    }

    if optimizer is not None:
        checkpoint['optimizer_state_dict'] = optimizer.state_dict()

    if epoch is not None:
        checkpoint['epoch'] = epoch

    if metrics is not None:
        checkpoint['metrics'] = metrics

    torch.save(checkpoint, path)
    print(f"Model saved to {path}")

# Load model
def load_model(model, path, optimizer=None):
    """
    Load model checkpoint

    Parameters:
    -----------
    model : nn.Module
        Model to load weights into
    path : str
        File path
    optimizer : Optimizer, optional
        Optimizer to load state into

    Returns:
    --------
    checkpoint : dict
        Full checkpoint dictionary
    """
    checkpoint = torch.load(path, map_location='cpu')

    model.load_state_dict(checkpoint['model_state_dict'])

    if optimizer is not None and 'optimizer_state_dict' in checkpoint:
        optimizer.load_state_dict(checkpoint['optimizer_state_dict'])

    print(f"Model loaded from {path}")

    if 'epoch' in checkpoint:
        print(f"Epoch: {checkpoint['epoch']}")
    if 'metrics' in checkpoint:
        print(f"Metrics: {checkpoint['metrics']}")

    return checkpoint

# Example usage
model = MNISTClassifier()
optimizer = torch.optim.Adam(model.parameters())

# Save
# save_model(model, 'mnist_model.pth', optimizer, epoch=10, metrics={'accuracy': 0.98})

# Load
# checkpoint = load_model(model, 'mnist_model.pth', optimizer)

6.2 Inference Mode Prediction

import torch
import torch.nn.functional as F

def predict(model, images, device='cpu'):
    """
    Make predictions with a trained model

    Parameters:
    -----------
    model : nn.Module
        Trained model
    images : Tensor
        Input images
    device : str
        Device to use

    Returns:
    --------
    predictions : Tensor
        Predicted class indices
    probabilities : Tensor
        Class probabilities
    """
    model.eval()
    model = model.to(device)
    images = images.to(device)

    with torch.no_grad():
        logits = model(images)
        probabilities = F.softmax(logits, dim=1)
        predictions = torch.argmax(probabilities, dim=1)

    return predictions, probabilities

def predict_single(model, image, device='cpu'):
    """
    Predict a single image

    Parameters:
    -----------
    model : nn.Module
        Trained model
    image : Tensor
        Single image tensor (1, 28, 28)

    Returns:
    --------
    prediction : int
        Predicted class
    confidence : float
        Prediction confidence
    all_probs : Tensor
        All class probabilities
    """
    if image.dim() == 3:
        image = image.unsqueeze(0)  # Add batch dimension

    predictions, probabilities = predict(model, image, device)

    prediction = predictions[0].item()
    confidence = probabilities[0, prediction].item()

    return prediction, confidence, probabilities[0]

# Example usage
model = MNISTClassifier()
model.eval()

# Create a sample image (random for demonstration)
sample_image = torch.randn(1, 28, 28)

prediction, confidence, probs = predict_single(model, sample_image)
print(f"Predicted digit: {prediction}")
print(f"Confidence: {confidence:.2%}")
print(f"All probabilities: {probs.numpy().round(3)}")

Exercises

Exercise 1: Fashion-MNIST Classification

Problem: Apply everything learned to Fashion-MNIST (clothing items):

  1. Load Fashion-MNIST using torchvision.datasets.FashionMNIST
  2. Build a classifier with at least 95% accuracy
  3. Create a confusion matrix and identify which classes are most often confused
Exercise 2: Data Augmentation Ablation

Problem: Study the effect of data augmentation:

  1. Train a model WITHOUT augmentation
  2. Train the same model WITH augmentation (rotation, translation, scaling)
  3. Compare validation accuracy and overfitting behavior
Exercise 3: Learning Rate Scheduling Comparison

Problem: Compare different learning rate schedules:

  1. Constant learning rate
  2. StepLR (decay every N epochs)
  3. ReduceLROnPlateau
  4. CosineAnnealingLR

Plot learning rate and validation accuracy for each.

Exercise 4: Model Complexity Analysis

Problem: Analyze the bias-variance tradeoff:

  1. Create models with increasing complexity (64->128->256->512 hidden units)
  2. Plot train/val accuracy vs model size
  3. Find the optimal model complexity
Exercise 5: End-to-End Project

Problem: Build a complete digit recognition system:

  1. Train the best model you can on MNIST
  2. Implement proper validation and early stopping
  3. Save the best model checkpoint
  4. Create a prediction function that takes a 28x28 image array
  5. Report final test accuracy, confusion matrix, and per-class F1 scores

Summary

In this chapter, we applied deep learning concepts to build a complete image classification system:

Congratulations! You have completed the Deep Learning Fundamentals course. You now have the foundation to:

Next Steps: Explore CNNs for computer vision, RNNs for sequences, or Transformers for NLP!

Disclaimer