Learning Objectives
- Load and explore the MNIST dataset
- Implement proper data preprocessing and normalization
- Build a complete training loop in PyTorch
- Evaluate models using accuracy, precision, recall, and confusion matrices
- Perform hyperparameter tuning using grid and random search
- Save and load trained models for inference
1. MNIST Image Classification
1.1 Loading and Exploring the Dataset
The MNIST dataset contains 70,000 grayscale images of handwritten digits (0-9), each 28x28 pixels. It's the "Hello World" of deep learning.
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
import numpy as np
# Download and load MNIST dataset
transform = transforms.Compose([
transforms.ToTensor(), # Convert to tensor [0, 1]
transforms.Normalize((0.1307,), (0.3081,)) # MNIST mean and std
])
train_dataset = datasets.MNIST(
root='./data',
train=True,
download=True,
transform=transform
)
test_dataset = datasets.MNIST(
root='./data',
train=False,
download=True,
transform=transform
)
print("MNIST Dataset Statistics")
print("=" * 40)
print(f"Training samples: {len(train_dataset)}")
print(f"Test samples: {len(test_dataset)}")
print(f"Image shape: {train_dataset[0][0].shape}")
print(f"Number of classes: {len(train_dataset.classes)}")
print(f"Classes: {train_dataset.classes}")
# Explore a sample
sample_image, sample_label = train_dataset[0]
print(f"\nSample image:")
print(f" Shape: {sample_image.shape}")
print(f" Min value: {sample_image.min():.4f}")
print(f" Max value: {sample_image.max():.4f}")
print(f" Label: {sample_label}")
1.2 Data Preprocessing and Normalization
Proper preprocessing is essential for good model performance:
- ToTensor(): Converts PIL Image to PyTorch tensor and scales to [0, 1]
- Normalize(): Standardizes using dataset mean and standard deviation
Why Normalize?
- Centers data around zero (faster convergence)
- Ensures all features have similar scales
- Helps with gradient flow in deep networks
2. Data Preprocessing and Batch Processing
2.1 Using DataLoader
The DataLoader handles batching, shuffling, and parallel data loading:
import torch
from torch.utils.data import DataLoader
# Create data loaders
BATCH_SIZE = 64
train_loader = DataLoader(
train_dataset,
batch_size=BATCH_SIZE,
shuffle=True, # Shuffle training data
num_workers=2, # Parallel data loading
pin_memory=True # Faster GPU transfer
)
test_loader = DataLoader(
test_dataset,
batch_size=BATCH_SIZE,
shuffle=False, # Don't shuffle test data
num_workers=2,
pin_memory=True
)
print("DataLoader Configuration")
print("=" * 40)
print(f"Batch size: {BATCH_SIZE}")
print(f"Training batches: {len(train_loader)}")
print(f"Test batches: {len(test_loader)}")
# Iterate through one batch
for images, labels in train_loader:
print(f"\nBatch shapes:")
print(f" Images: {images.shape}")
print(f" Labels: {labels.shape}")
break
2.2 Data Augmentation
Data augmentation artificially increases the size and diversity of training data:
from torchvision import transforms
# Training transforms with augmentation
train_transform = transforms.Compose([
transforms.RandomRotation(10), # Rotate +/- 10 degrees
transforms.RandomAffine(
degrees=0,
translate=(0.1, 0.1), # Shift up to 10%
scale=(0.9, 1.1) # Scale 90-110%
),
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])
# Test transforms (no augmentation)
test_transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])
# Create augmented dataset
train_dataset_augmented = datasets.MNIST(
root='./data',
train=True,
download=True,
transform=train_transform
)
print("Data Augmentation Example")
print("=" * 40)
# Compare original and augmented
original_sample = train_dataset[0][0]
augmented_sample = train_dataset_augmented[0][0]
print(f"Original shape: {original_sample.shape}")
print(f"Augmented shape: {augmented_sample.shape}")
print(f"Values changed: {not torch.equal(original_sample, augmented_sample)}")
3. Model Building and Training Loop
3.1 Model Definition in PyTorch
import torch
import torch.nn as nn
import torch.nn.functional as F
class MNISTClassifier(nn.Module):
"""
Neural network for MNIST classification
Architecture:
- Input: 784 (28x28 flattened)
- Hidden 1: 256 units + ReLU + Dropout
- Hidden 2: 128 units + ReLU + Dropout
- Output: 10 classes
"""
def __init__(self, dropout_rate=0.3):
super(MNISTClassifier, self).__init__()
self.flatten = nn.Flatten()
self.fc1 = nn.Linear(28 * 28, 256)
self.bn1 = nn.BatchNorm1d(256)
self.dropout1 = nn.Dropout(dropout_rate)
self.fc2 = nn.Linear(256, 128)
self.bn2 = nn.BatchNorm1d(128)
self.dropout2 = nn.Dropout(dropout_rate)
self.fc3 = nn.Linear(128, 10)
def forward(self, x):
# Flatten: (batch, 1, 28, 28) -> (batch, 784)
x = self.flatten(x)
# Layer 1
x = self.fc1(x)
x = self.bn1(x)
x = F.relu(x)
x = self.dropout1(x)
# Layer 2
x = self.fc2(x)
x = self.bn2(x)
x = F.relu(x)
x = self.dropout2(x)
# Output layer (no softmax - handled by CrossEntropyLoss)
x = self.fc3(x)
return x
# Create model
model = MNISTClassifier(dropout_rate=0.3)
# Check model architecture
print("Model Architecture")
print("=" * 40)
print(model)
# Count parameters
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"\nTotal parameters: {total_params:,}")
print(f"Trainable parameters: {trainable_params:,}")
3.2 Training Loop Implementation
import torch
import torch.nn as nn
import torch.optim as optim
from tqdm import tqdm
def train_epoch(model, train_loader, criterion, optimizer, device):
"""
Train for one epoch
Returns:
--------
avg_loss : float
Average training loss
accuracy : float
Training accuracy
"""
model.train()
running_loss = 0.0
correct = 0
total = 0
for images, labels in train_loader:
images, labels = images.to(device), labels.to(device)
# Zero gradients
optimizer.zero_grad()
# Forward pass
outputs = model(images)
loss = criterion(outputs, labels)
# Backward pass
loss.backward()
optimizer.step()
# Statistics
running_loss += loss.item() * images.size(0)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
avg_loss = running_loss / total
accuracy = correct / total
return avg_loss, accuracy
def validate(model, val_loader, criterion, device):
"""
Validate the model
Returns:
--------
avg_loss : float
Average validation loss
accuracy : float
Validation accuracy
"""
model.eval()
running_loss = 0.0
correct = 0
total = 0
with torch.no_grad():
for images, labels in val_loader:
images, labels = images.to(device), labels.to(device)
outputs = model(images)
loss = criterion(outputs, labels)
running_loss += loss.item() * images.size(0)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
avg_loss = running_loss / total
accuracy = correct / total
return avg_loss, accuracy
3.3 Complete Training Script
import torch
import torch.nn as nn
import torch.optim as optim
def train_model(model, train_loader, test_loader, epochs=10, lr=0.001, device='cpu'):
"""
Complete training function
Parameters:
-----------
model : nn.Module
Model to train
train_loader : DataLoader
Training data
test_loader : DataLoader
Test/validation data
epochs : int
Number of training epochs
lr : float
Learning rate
device : str
Device to use ('cpu' or 'cuda')
Returns:
--------
history : dict
Training history
"""
model = model.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=lr)
scheduler = optim.lr_scheduler.ReduceLROnPlateau(
optimizer, mode='min', factor=0.5, patience=2
)
history = {
'train_loss': [], 'train_acc': [],
'val_loss': [], 'val_acc': []
}
best_val_acc = 0.0
print("Training Started")
print("=" * 60)
for epoch in range(epochs):
# Train
train_loss, train_acc = train_epoch(
model, train_loader, criterion, optimizer, device
)
# Validate
val_loss, val_acc = validate(model, test_loader, criterion, device)
# Update scheduler
scheduler.step(val_loss)
# Record history
history['train_loss'].append(train_loss)
history['train_acc'].append(train_acc)
history['val_loss'].append(val_loss)
history['val_acc'].append(val_acc)
# Save best model
if val_acc > best_val_acc:
best_val_acc = val_acc
best_state = model.state_dict().copy()
# Print progress
current_lr = optimizer.param_groups[0]['lr']
print(f"Epoch {epoch+1:3d}/{epochs} | "
f"Train Loss: {train_loss:.4f} | Train Acc: {train_acc:.4f} | "
f"Val Loss: {val_loss:.4f} | Val Acc: {val_acc:.4f} | "
f"LR: {current_lr:.6f}")
print("=" * 60)
print(f"Best Validation Accuracy: {best_val_acc:.4f}")
# Load best model
model.load_state_dict(best_state)
return history
# Example usage (run on available device)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")
# Create fresh model
model = MNISTClassifier(dropout_rate=0.3)
# Train (reduced epochs for demonstration)
# history = train_model(model, train_loader, test_loader, epochs=5, device=device)
4. Performance Evaluation and Confusion Matrix
4.1 Precision, Recall, and F1 Score
| Metric | Formula | Interpretation |
|---|---|---|
| Accuracy | $\frac{TP + TN}{TP + TN + FP + FN}$ | Overall correctness |
| Precision | $\frac{TP}{TP + FP}$ | Of predicted positives, how many are correct? |
| Recall | $\frac{TP}{TP + FN}$ | Of actual positives, how many did we find? |
| F1 Score | $2 \cdot \frac{Precision \cdot Recall}{Precision + Recall}$ | Harmonic mean of precision and recall |
import numpy as np
from collections import Counter
def compute_metrics(y_true, y_pred, num_classes=10):
"""
Compute classification metrics
Parameters:
-----------
y_true : array-like
True labels
y_pred : array-like
Predicted labels
num_classes : int
Number of classes
Returns:
--------
metrics : dict
Dictionary of metrics
"""
y_true = np.array(y_true)
y_pred = np.array(y_pred)
# Overall accuracy
accuracy = np.mean(y_true == y_pred)
# Per-class metrics
precision = np.zeros(num_classes)
recall = np.zeros(num_classes)
f1 = np.zeros(num_classes)
for c in range(num_classes):
# True positives
tp = np.sum((y_true == c) & (y_pred == c))
# False positives
fp = np.sum((y_true != c) & (y_pred == c))
# False negatives
fn = np.sum((y_true == c) & (y_pred != c))
precision[c] = tp / (tp + fp) if (tp + fp) > 0 else 0
recall[c] = tp / (tp + fn) if (tp + fn) > 0 else 0
f1[c] = 2 * precision[c] * recall[c] / (precision[c] + recall[c]) \
if (precision[c] + recall[c]) > 0 else 0
metrics = {
'accuracy': accuracy,
'precision_per_class': precision,
'recall_per_class': recall,
'f1_per_class': f1,
'macro_precision': np.mean(precision),
'macro_recall': np.mean(recall),
'macro_f1': np.mean(f1)
}
return metrics
# Example with synthetic data
np.random.seed(42)
y_true = np.random.randint(0, 10, 1000)
y_pred = y_true.copy()
# Add some noise (10% error rate)
noise_idx = np.random.choice(1000, 100, replace=False)
y_pred[noise_idx] = np.random.randint(0, 10, 100)
metrics = compute_metrics(y_true, y_pred)
print("Classification Metrics")
print("=" * 40)
print(f"Accuracy: {metrics['accuracy']:.4f}")
print(f"Macro Precision: {metrics['macro_precision']:.4f}")
print(f"Macro Recall: {metrics['macro_recall']:.4f}")
print(f"Macro F1: {metrics['macro_f1']:.4f}")
4.2 Confusion Matrix Visualization
import numpy as np
def confusion_matrix(y_true, y_pred, num_classes=10):
"""
Compute confusion matrix
Parameters:
-----------
y_true : array-like
True labels
y_pred : array-like
Predicted labels
num_classes : int
Number of classes
Returns:
--------
cm : ndarray, shape (num_classes, num_classes)
Confusion matrix where cm[i, j] is the number of samples
with true label i that were predicted as j
"""
cm = np.zeros((num_classes, num_classes), dtype=int)
for true, pred in zip(y_true, y_pred):
cm[true, pred] += 1
return cm
def print_confusion_matrix(cm, class_names=None):
"""
Print confusion matrix in a readable format
"""
num_classes = cm.shape[0]
if class_names is None:
class_names = [str(i) for i in range(num_classes)]
# Header
print("Confusion Matrix")
print("=" * 60)
print("Rows: True labels, Columns: Predicted labels")
print()
# Column headers
print(" ", end="")
for name in class_names:
print(f"{name:>5}", end="")
print()
# Matrix rows
for i, name in enumerate(class_names):
print(f"{name:>4} ", end="")
for j in range(num_classes):
print(f"{cm[i, j]:>5}", end="")
print()
# Compute and display confusion matrix
cm = confusion_matrix(y_true, y_pred)
print_confusion_matrix(cm)
# Calculate per-class accuracy from confusion matrix
print("\nPer-class accuracy:")
for i in range(10):
class_acc = cm[i, i] / cm[i, :].sum() if cm[i, :].sum() > 0 else 0
print(f" Class {i}: {class_acc:.2%}")
5. Hyperparameter Tuning
5.1 Grid Search
import itertools
def grid_search(param_grid, train_fn, eval_fn):
"""
Grid search for hyperparameter tuning
Parameters:
-----------
param_grid : dict
Dictionary of parameter names to lists of values
train_fn : callable
Function that trains a model given parameters
eval_fn : callable
Function that evaluates a model and returns a score
Returns:
--------
best_params : dict
Best parameters found
results : list
All results
"""
# Generate all combinations
keys = list(param_grid.keys())
values = list(param_grid.values())
combinations = list(itertools.product(*values))
results = []
best_score = float('-inf')
best_params = None
print(f"Grid Search: {len(combinations)} combinations")
print("=" * 50)
for combo in combinations:
params = dict(zip(keys, combo))
# Train model
model = train_fn(params)
# Evaluate
score = eval_fn(model)
results.append({'params': params, 'score': score})
print(f"Params: {params} -> Score: {score:.4f}")
if score > best_score:
best_score = score
best_params = params
print("=" * 50)
print(f"Best params: {best_params}")
print(f"Best score: {best_score:.4f}")
return best_params, results
# Example parameter grid
param_grid = {
'learning_rate': [0.001, 0.01],
'dropout_rate': [0.3, 0.5],
'hidden_size': [128, 256]
}
# Simulated training and evaluation (for demonstration)
def mock_train(params):
return params # Return params as mock model
def mock_eval(params):
# Simulate: lower lr and moderate dropout tend to be better
score = 0.9
if params['learning_rate'] == 0.001:
score += 0.02
if params['dropout_rate'] == 0.3:
score += 0.01
if params['hidden_size'] == 256:
score += 0.01
score += np.random.normal(0, 0.005)
return score
# Run grid search
np.random.seed(42)
best_params, results = grid_search(param_grid, mock_train, mock_eval)
5.2 Random Search
import numpy as np
def random_search(param_distributions, train_fn, eval_fn, n_iter=10):
"""
Random search for hyperparameter tuning
Parameters:
-----------
param_distributions : dict
Dictionary of parameter names to sampling functions
train_fn : callable
Function that trains a model
eval_fn : callable
Function that evaluates a model
n_iter : int
Number of iterations
Returns:
--------
best_params : dict
Best parameters found
results : list
All results
"""
results = []
best_score = float('-inf')
best_params = None
print(f"Random Search: {n_iter} iterations")
print("=" * 50)
for i in range(n_iter):
# Sample parameters
params = {k: v() for k, v in param_distributions.items()}
# Train and evaluate
model = train_fn(params)
score = eval_fn(model)
results.append({'params': params, 'score': score})
print(f"Iter {i+1}: {params} -> Score: {score:.4f}")
if score > best_score:
best_score = score
best_params = params
print("=" * 50)
print(f"Best params: {best_params}")
print(f"Best score: {best_score:.4f}")
return best_params, results
# Example with continuous distributions
param_distributions = {
'learning_rate': lambda: 10 ** np.random.uniform(-4, -2), # 0.0001 to 0.01
'dropout_rate': lambda: np.random.uniform(0.1, 0.5),
'hidden_size': lambda: np.random.choice([64, 128, 256, 512])
}
# Run random search
np.random.seed(42)
best_params, results = random_search(
param_distributions, mock_train, mock_eval, n_iter=10
)
6. Model Saving and Inference
6.1 Saving and Loading state_dict
import torch
# Save model
def save_model(model, path, optimizer=None, epoch=None, metrics=None):
"""
Save model checkpoint
Parameters:
-----------
model : nn.Module
Model to save
path : str
File path
optimizer : Optimizer, optional
Optimizer state to save
epoch : int, optional
Current epoch
metrics : dict, optional
Training metrics
"""
checkpoint = {
'model_state_dict': model.state_dict(),
'model_architecture': str(model)
}
if optimizer is not None:
checkpoint['optimizer_state_dict'] = optimizer.state_dict()
if epoch is not None:
checkpoint['epoch'] = epoch
if metrics is not None:
checkpoint['metrics'] = metrics
torch.save(checkpoint, path)
print(f"Model saved to {path}")
# Load model
def load_model(model, path, optimizer=None):
"""
Load model checkpoint
Parameters:
-----------
model : nn.Module
Model to load weights into
path : str
File path
optimizer : Optimizer, optional
Optimizer to load state into
Returns:
--------
checkpoint : dict
Full checkpoint dictionary
"""
checkpoint = torch.load(path, map_location='cpu')
model.load_state_dict(checkpoint['model_state_dict'])
if optimizer is not None and 'optimizer_state_dict' in checkpoint:
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
print(f"Model loaded from {path}")
if 'epoch' in checkpoint:
print(f"Epoch: {checkpoint['epoch']}")
if 'metrics' in checkpoint:
print(f"Metrics: {checkpoint['metrics']}")
return checkpoint
# Example usage
model = MNISTClassifier()
optimizer = torch.optim.Adam(model.parameters())
# Save
# save_model(model, 'mnist_model.pth', optimizer, epoch=10, metrics={'accuracy': 0.98})
# Load
# checkpoint = load_model(model, 'mnist_model.pth', optimizer)
6.2 Inference Mode Prediction
import torch
import torch.nn.functional as F
def predict(model, images, device='cpu'):
"""
Make predictions with a trained model
Parameters:
-----------
model : nn.Module
Trained model
images : Tensor
Input images
device : str
Device to use
Returns:
--------
predictions : Tensor
Predicted class indices
probabilities : Tensor
Class probabilities
"""
model.eval()
model = model.to(device)
images = images.to(device)
with torch.no_grad():
logits = model(images)
probabilities = F.softmax(logits, dim=1)
predictions = torch.argmax(probabilities, dim=1)
return predictions, probabilities
def predict_single(model, image, device='cpu'):
"""
Predict a single image
Parameters:
-----------
model : nn.Module
Trained model
image : Tensor
Single image tensor (1, 28, 28)
Returns:
--------
prediction : int
Predicted class
confidence : float
Prediction confidence
all_probs : Tensor
All class probabilities
"""
if image.dim() == 3:
image = image.unsqueeze(0) # Add batch dimension
predictions, probabilities = predict(model, image, device)
prediction = predictions[0].item()
confidence = probabilities[0, prediction].item()
return prediction, confidence, probabilities[0]
# Example usage
model = MNISTClassifier()
model.eval()
# Create a sample image (random for demonstration)
sample_image = torch.randn(1, 28, 28)
prediction, confidence, probs = predict_single(model, sample_image)
print(f"Predicted digit: {prediction}")
print(f"Confidence: {confidence:.2%}")
print(f"All probabilities: {probs.numpy().round(3)}")
Exercises
Exercise 1: Fashion-MNIST Classification
Problem: Apply everything learned to Fashion-MNIST (clothing items):
- Load Fashion-MNIST using
torchvision.datasets.FashionMNIST - Build a classifier with at least 95% accuracy
- Create a confusion matrix and identify which classes are most often confused
Exercise 2: Data Augmentation Ablation
Problem: Study the effect of data augmentation:
- Train a model WITHOUT augmentation
- Train the same model WITH augmentation (rotation, translation, scaling)
- Compare validation accuracy and overfitting behavior
Exercise 3: Learning Rate Scheduling Comparison
Problem: Compare different learning rate schedules:
- Constant learning rate
- StepLR (decay every N epochs)
- ReduceLROnPlateau
- CosineAnnealingLR
Plot learning rate and validation accuracy for each.
Exercise 4: Model Complexity Analysis
Problem: Analyze the bias-variance tradeoff:
- Create models with increasing complexity (64->128->256->512 hidden units)
- Plot train/val accuracy vs model size
- Find the optimal model complexity
Exercise 5: End-to-End Project
Problem: Build a complete digit recognition system:
- Train the best model you can on MNIST
- Implement proper validation and early stopping
- Save the best model checkpoint
- Create a prediction function that takes a 28x28 image array
- Report final test accuracy, confusion matrix, and per-class F1 scores
Summary
In this chapter, we applied deep learning concepts to build a complete image classification system:
- MNIST Dataset: Standard benchmark with 70,000 handwritten digit images
- Data Pipeline: DataLoader for batching, transforms for preprocessing and augmentation
- Training Loop: Forward pass, loss computation, backward pass, optimizer step
- Evaluation: Accuracy, precision, recall, F1 score, confusion matrix
- Hyperparameter Tuning: Grid search and random search strategies
- Model Management: Saving/loading checkpoints, inference mode prediction
Congratulations! You have completed the Deep Learning Fundamentals course. You now have the foundation to:
- Build and train neural networks from scratch
- Understand optimization and regularization techniques
- Evaluate model performance systematically
- Apply deep learning to real-world problems
Next Steps: Explore CNNs for computer vision, RNNs for sequences, or Transformers for NLP!