🌐 EN | 🇯🇵 JP | Last sync: 2025-11-16

Chapter 2: Image Classification and Deep Learning

Building High-Accuracy Image Classification Systems with CNN Architectures and Transfer Learning

📖 Reading Time: 30-35 minutes 📊 Difficulty: Intermediate 💻 Code Examples: 10 📝 Exercises: 5

This chapter covers Image Classification and Deep Learning. You will learn efficient design principles of Inception and EfficientNet's Compound Scaling.

Learning Objectives

By reading this chapter, you will master the following:


2.1 Evolution of CNN Architectures

Historical Development of Image Classification

Image classification is one of the most fundamental and important tasks in computer vision. With the advent of deep learning, the accuracy of image classification has dramatically improved.

graph LR A[LeNet-5
1998
MNIST] --> B[AlexNet
2012
ImageNet] B --> C[VGG
2014
19 layers deep] C --> D[GoogLeNet
2014
Inception] D --> E[ResNet
2015
Residual connections] E --> F[Inception-v4
2016
Hybrid] F --> G[MobileNet
2017
Lightweight] G --> H[EfficientNet
2019
Optimization] H --> I[Vision Transformer
2020+
Attention] style A fill:#e1f5ff style B fill:#b3e5fc style C fill:#81d4fa style D fill:#4fc3f7 style E fill:#29b6f6 style F fill:#03a9f4 style G fill:#039be5 style H fill:#0288d1 style I fill:#0277bd

LeNet-5 (1998): The Origin of CNNs

LeNet-5 was developed by Yann LeCun for handwritten digit recognition and became the foundation of modern CNNs.

# Requirements:
# - Python 3.9+
# - torch>=2.0.0, <2.3.0

import torch
import torch.nn as nn
import torch.nn.functional as F

class LeNet5(nn.Module):
    """LeNet-5: Classical CNN for handwritten digit recognition"""
    def __init__(self, num_classes=10):
        super(LeNet5, self).__init__()

        # Feature extraction layers
        self.conv1 = nn.Conv2d(1, 6, kernel_size=5)    # 28×28 → 24×24
        self.pool1 = nn.AvgPool2d(kernel_size=2)       # 24×24 → 12×12
        self.conv2 = nn.Conv2d(6, 16, kernel_size=5)   # 12×12 → 8×8
        self.pool2 = nn.AvgPool2d(kernel_size=2)       # 8×8 → 4×4

        # Classification layers
        self.fc1 = nn.Linear(16 * 4 * 4, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, num_classes)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = self.pool1(x)
        x = F.relu(self.conv2(x))
        x = self.pool2(x)

        x = x.view(x.size(0), -1)  # Flatten

        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)

        return x

# Model instantiation and testing
model = LeNet5(num_classes=10)
x = torch.randn(1, 1, 28, 28)
output = model(x)

print(f"LeNet-5")
print(f"Input: {x.shape} → Output: {output.shape}")
print(f"Parameters: {sum(p.numel() for p in model.parameters()):,}")

AlexNet (2012): The Deep Learning Revolution

AlexNet won the 2012 ImageNet Large Scale Visual Recognition Challenge (ILSVRC) and sparked the deep learning boom.

Main innovations:

# Requirements:
# - Python 3.9+
# - torch>=2.0.0, <2.3.0

import torch
import torch.nn as nn

class AlexNet(nn.Module):
    """AlexNet: ImageNet 2012 winning model"""
    def __init__(self, num_classes=1000):
        super(AlexNet, self).__init__()

        # Feature extraction layers
        self.features = nn.Sequential(
            # Conv1: 96 filters, 11×11, stride=4
            nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),

            # Conv2: 256 filters, 5×5
            nn.Conv2d(96, 256, kernel_size=5, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),

            # Conv3: 384 filters, 3×3
            nn.Conv2d(256, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),

            # Conv4: 384 filters, 3×3
            nn.Conv2d(384, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),

            # Conv5: 256 filters, 3×3
            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
        )

        # Classification layers
        self.classifier = nn.Sequential(
            nn.Dropout(p=0.5),
            nn.Linear(256 * 6 * 6, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.5),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Linear(4096, num_classes),
        )

    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size(0), -1)
        x = self.classifier(x)
        return x

# Check model size
model = AlexNet(num_classes=1000)
total_params = sum(p.numel() for p in model.parameters())
print(f"\nAlexNet Total Parameters: {total_params:,}")
print(f"Memory Usage: approx {total_params * 4 / (1024**2):.1f} MB")

Disclaimer