This chapter covers Containerization Technology. You will learn fundamental concepts of Docker containers, Create Dockerfiles for machine learning models, and Efficiently containerize ML models.
Learning Objectives
By reading this chapter, you will be able to:
- ✅ Understand the fundamental concepts of Docker containers and their differences from virtual machines
- ✅ Create Dockerfiles for machine learning models
- ✅ Efficiently containerize ML models
- ✅ Manage multiple services with Docker Compose
- ✅ Build GPU-enabled ML containers
2.1 Docker Fundamentals
What is a Container
Container is a technology that packages an application and its dependencies into an isolated environment.
"Build once, Run anywhere" - Build it once, and it runs the same way everywhere
Docker vs Virtual Machines
| Feature | Docker Container | Virtual Machine (VM) |
|---|---|---|
| Startup Time | Seconds | Minutes |
| Resources | Lightweight (MBs) | Heavy (GBs) |
| Isolation Level | Process-level | Complete OS isolation |
| Performance | Near-native | Has overhead |
| Portability | High | Moderate |
Basic Docker Commands
# Check Docker version
docker --version
# List images
docker images
# List running containers
docker ps
# List all containers
docker ps -a
# Download image
docker pull python:3.9-slim
# Run container
docker run -it python:3.9-slim bash
# Stop container
docker stop <container_id>
# Remove container
docker rm <container_id>
# Remove image
docker rmi <image_id>
# Cleanup entire system
docker system prune -a
Relationship Between Images and Containers
Image: Blueprint of the application (read-only)
Container: Executable instance created from an image
Important: Multiple containers can be started from a single image. Each container is an independent environment.
2.2 Creating a Dockerfile
Selecting a Base Image
Representative base images for machine learning models:
| Image | Size | Use Case |
|---|---|---|
python:3.9-slim |
~120MB | Lightweight Python environment |
python:3.9 |
~900MB | Full-featured Python environment |
nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04 |
~2GB | GPU inference |
nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04 |
~4GB | GPU development and training |
Basic Dockerfile Structure
# Specify base image
FROM python:3.9-slim
# Set working directory
WORKDIR /app
# Update and install system packages
RUN apt-get update && apt-get install -y \
build-essential \
&& rm -rf /var/lib/apt/lists/*
# Install Python packages
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY . .
# Expose port
EXPOSE 8000
# Startup command
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
Multi-stage Build
A technique to reduce image size and improve security:
# Stage 1: Build environment
FROM python:3.9 as builder
WORKDIR /build
# Install dependencies
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt
# Stage 2: Runtime environment (lightweight)
FROM python:3.9-slim
WORKDIR /app
# Copy only necessary files from build stage
COPY --from=builder /root/.local /root/.local
COPY . .
# Set PATH
ENV PATH=/root/.local/bin:$PATH
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
Effect: Multi-stage builds can reduce image size by 50-70%.
Optimization Techniques
Leveraging Layer Cache
# ❌ Inefficient: Dependencies are reinstalled every time code changes
FROM python:3.9-slim
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
# ✅ Efficient: Cache is used unless dependencies change
FROM python:3.9-slim
WORKDIR /app
# Install dependencies first (changes infrequently)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy code later (changes frequently)
COPY . .
Excluding Unnecessary Files
Example .dockerignore file:
# .dockerignore
__pycache__
*.pyc
*.pyo
*.pyd
.Python
*.so
*.egg
*.egg-info
dist
build
.git
.gitignore
.env
.venv
venv/
data/
notebooks/
tests/
*.md
Dockerfile
docker-compose.yml
2.3 Containerizing ML Models
Dockerfile for FastAPI + PyTorch
# Multi-stage build
FROM python:3.9 as builder
WORKDIR /build
# Copy dependency file
COPY requirements.txt .
# Install dependencies
RUN pip install --user --no-cache-dir \
torch==2.0.0 \
torchvision==0.15.0 \
fastapi==0.104.0 \
uvicorn[standard]==0.24.0 \
pydantic==2.5.0 \
pillow==10.1.0
# Runtime environment
FROM python:3.9-slim
WORKDIR /app
# Copy dependencies from build stage
COPY --from=builder /root/.local /root/.local
# Copy application code and model
COPY app/ ./app/
COPY models/ ./models/
# Set environment variables
ENV PATH=/root/.local/bin:$PATH \
PYTHONUNBUFFERED=1 \
MODEL_PATH=/app/models/model.pth
# Create non-root user (improved security)
RUN useradd -m -u 1000 appuser && \
chown -R appuser:appuser /app
USER appuser
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD python -c "import requests; requests.get('http://localhost:8000/health')"
EXPOSE 8000
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
Example requirements.txt
# requirements.txt
torch==2.0.0
torchvision==0.15.0
fastapi==0.104.0
uvicorn[standard]==0.24.0
pydantic==2.5.0
pillow==10.1.0
numpy==1.24.3
python-multipart==0.0.6
Building and Running Images
# Build image
docker build -t ml-api:v1.0 .
# Display detailed build log
docker build -t ml-api:v1.0 --progress=plain .
# Build without cache
docker build -t ml-api:v1.0 --no-cache .
# Run container
docker run -d \
--name ml-api \
-p 8000:8000 \
-v $(pwd)/models:/app/models \
ml-api:v1.0
# Check logs
docker logs ml-api
# Display real-time logs
docker logs -f ml-api
# Execute command inside container
docker exec -it ml-api bash
# Stop and remove container
docker stop ml-api
docker rm ml-api
Port Mapping
| Option | Description | Example |
|---|---|---|
-p 8000:8000 |
Host:Container | Host port 8000 to container port 8000 |
-p 8080:8000 |
Different ports | Host port 8080 to container port 8000 |
-p 127.0.0.1:8000:8000 |
Localhost only | Accessible only from localhost |
2.4 Orchestration with Docker Compose
docker-compose.yml Configuration
Configuration file for integrated management of multiple services:
# docker-compose.yml
version: '3.8'
services:
# FastAPI application
api:
build:
context: .
dockerfile: Dockerfile
container_name: ml-api
ports:
- "8000:8000"
environment:
- MODEL_PATH=/app/models/model.pth
- REDIS_HOST=redis
- REDIS_PORT=6379
volumes:
- ./models:/app/models:ro
- ./logs:/app/logs
depends_on:
- redis
restart: unless-stopped
networks:
- ml-network
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
# Redis cache
redis:
image: redis:7-alpine
container_name: ml-redis
ports:
- "6379:6379"
volumes:
- redis-data:/data
restart: unless-stopped
networks:
- ml-network
command: redis-server --appendonly yes
networks:
ml-network:
driver: bridge
volumes:
redis-data:
Example of Multiple Service Integration
# docker-compose.yml (extended version)
version: '3.8'
services:
# ML model inference API
ml-api:
build: ./api
ports:
- "8000:8000"
environment:
- REDIS_HOST=redis
- DB_HOST=postgres
volumes:
- ./models:/app/models:ro
depends_on:
- redis
- postgres
networks:
- ml-network
# Cache layer
redis:
image: redis:7-alpine
volumes:
- redis-data:/data
networks:
- ml-network
# Database
postgres:
image: postgres:15-alpine
environment:
- POSTGRES_USER=mluser
- POSTGRES_PASSWORD=mlpass
- POSTGRES_DB=mldb
volumes:
- postgres-data:/var/lib/postgresql/data
networks:
- ml-network
# Monitoring
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
- prometheus-data:/prometheus
networks:
- ml-network
networks:
ml-network:
driver: bridge
volumes:
redis-data:
postgres-data:
prometheus-data:
Volume Mounts
| Type | Syntax | Use Case |
|---|---|---|
| Bind Mount | ./host/path:/container/path |
Code synchronization during development |
| Named Volume | volume-name:/container/path |
Persistent data storage |
| Read-only | ./path:/path:ro |
Model files, etc. |
Environment Variable Management
Example .env file:
# .env
MODEL_PATH=/app/models/resnet50.pth
REDIS_HOST=redis
REDIS_PORT=6379
LOG_LEVEL=INFO
MAX_WORKERS=4
Usage in docker-compose.yml:
services:
api:
env_file:
- .env
# Or specify individually
environment:
- MODEL_PATH=${MODEL_PATH}
- REDIS_HOST=${REDIS_HOST}
Docker Compose Commands
# Start services (background)
docker-compose up -d
# Start services (with logs)
docker-compose up
# Build and start services
docker-compose up -d --build
# Start only specific services
docker-compose up -d api redis
# Stop services
docker-compose stop
# Stop and remove services
docker-compose down
# Remove including volumes
docker-compose down -v
# Check logs
docker-compose logs -f
# Logs for specific service
docker-compose logs -f api
# Check service status
docker-compose ps
# Restart service
docker-compose restart api
2.5 Hands-on: GPU-enabled ML Containers
NVIDIA Docker Setup
Prerequisites:
- Machine with NVIDIA GPU
- NVIDIA driver installed
- NVIDIA Container Toolkit installed
# Install NVIDIA Container Toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
# Restart Docker
sudo systemctl restart docker
# Verify GPU
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
Dockerfile Using CUDA Image
# Dockerfile for GPU inference
FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04
# Install Python
RUN apt-get update && apt-get install -y \
python3.10 \
python3-pip \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
# Install PyTorch GPU version
COPY requirements-gpu.txt .
RUN pip3 install --no-cache-dir -r requirements-gpu.txt
# Application code and model
COPY app/ ./app/
COPY models/ ./models/
ENV PYTHONUNBUFFERED=1 \
CUDA_VISIBLE_DEVICES=0
EXPOSE 8000
CMD ["python3", "-m", "uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
requirements-gpu.txt
# requirements-gpu.txt
torch==2.0.0+cu118
torchvision==0.15.0+cu118
--extra-index-url https://download.pytorch.org/whl/cu118
fastapi==0.104.0
uvicorn[standard]==0.24.0
pydantic==2.5.0
pillow==10.1.0
numpy==1.24.3
GPU Inference Implementation
Example app/main.py:
# Requirements:
# - Python 3.9+
# - fastapi>=0.100.0
# - pillow>=10.0.0
# - torch>=2.0.0, <2.3.0
"""
Example: Example app/main.py:
Purpose: Demonstrate core concepts and implementation patterns
Target: Advanced
Execution time: ~5 seconds
Dependencies: None
"""
# app/main.py
import torch
from fastapi import FastAPI, File, UploadFile
from PIL import Image
import io
app = FastAPI()
# Check GPU availability
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
# Load model
model = torch.load("/app/models/model.pth", map_location=device)
model.eval()
@app.get("/health")
async def health_check():
return {
"status": "healthy",
"device": str(device),
"cuda_available": torch.cuda.is_available(),
"gpu_name": torch.cuda.get_device_name(0) if torch.cuda.is_available() else None
}
@app.post("/predict")
async def predict(file: UploadFile = File(...)):
# Load image
image_bytes = await file.read()
image = Image.open(io.BytesIO(image_bytes))
# Preprocessing (omitted)
# tensor = preprocess(image)
# GPU inference
with torch.no_grad():
# tensor = tensor.to(device)
# output = model(tensor)
pass
return {"prediction": "result"}
Using GPU with Docker Compose
# docker-compose-gpu.yml
version: '3.8'
services:
ml-api-gpu:
build:
context: .
dockerfile: Dockerfile.gpu
container_name: ml-api-gpu
ports:
- "8000:8000"
volumes:
- ./models:/app/models:ro
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
environment:
- CUDA_VISIBLE_DEVICES=0
restart: unless-stopped
Startup commands:
# Start GPU-enabled container
docker-compose -f docker-compose-gpu.yml up -d
# Check GPU usage
docker exec ml-api-gpu nvidia-smi
# Check logs
docker-compose -f docker-compose-gpu.yml logs -f
Performance Comparison
| Environment | Inference Time (1 image) | Throughput (images/sec) | Notes |
|---|---|---|---|
| CPU (8 cores) | 150ms | 6.7 | python:3.9-slim |
| GPU (RTX 3090) | 15ms | 66.7 | nvidia/cuda:11.8.0 |
| Speedup | 10x | 10x | Batch size 1 |
Note: GPU throughput can be further improved by increasing batch size.
2.6 Chapter Summary
What We Learned
Docker Fundamentals
- Differences between containers and virtual machines
- Basic Docker commands
- Relationship between images and containers
Creating Dockerfiles
- Selecting appropriate base images
- Optimization with multi-stage builds
- Leveraging layer cache
Containerizing ML Models
- Dockerizing FastAPI + PyTorch
- Efficiency with .dockerignore
- Security and health checks
Docker Compose Orchestration
- Integrated management of multiple services
- Managing volumes and environment variables
- Service dependencies
GPU-enabled ML Containers
- Setting up NVIDIA Docker
- Using CUDA images
- 10x performance compared to CPU
Best Practices
| Principle | Description |
|---|---|
| Lightweight Images | Prioritize slim or alpine-based images |
| Layer Optimization | Place less frequently changed items first |
| Multi-stage Builds | Separate build and runtime environments |
| Non-root User | For improved security |
| .dockerignore | Exclude unnecessary files |
| Health Checks | Monitor service health |
| Environment Variables | Externalize configuration |
Next Chapter
In Chapter 3, we will learn about Orchestration with Kubernetes:
- Basic Kubernetes concepts
- Creating Pods, Services, and Deployments
- Scaling and load balancing
- Managing ConfigMaps and Secrets
- Deployment to production environments