Chapter 5: Practical Projects - Introduction to Prompt Engineering

Introduction

This final chapter brings together everything you've learned to build real-world applications. We'll cover four practical projects that demonstrate modern prompt engineering techniques including vision prompting, prompt caching, and production deployment strategies.

By the end of this chapter, you will have:

Built a document analysis system with structured extraction
Created a vision-based application using multimodal prompts
Implemented prompt caching for cost optimization
Learned production deployment best practices

Project 1: Document Analysis System

Project Overview

Goal: Build a system that analyzes business documents (invoices, contracts, reports) and extracts structured information.

Techniques Used: Structured Output, Prompt Chains, Error Handling

Difficulty: Intermediate

System Architecture

graph LR A[Document Input] --> B[Classification] B --> C{Document Type} C -->|Invoice| D[Invoice Extractor] C -->|Contract| E[Contract Analyzer] C -->|Report| F[Report Summarizer] D --> G[Structured Output] E --> G F --> G G --> H[Validation] H --> I[Database/API]

Implementation

Document Classification

from openai import OpenAI
from pydantic import BaseModel
from typing import Literal
import json

client = OpenAI()

class DocumentClassification(BaseModel):
    document_type: Literal["invoice", "contract", "report", "other"]
    confidence: float
    language: str
    page_count_estimate: int

def classify_document(text: str) -> DocumentClassification:
    """Classify a document and return structured metadata."""

    response = client.beta.chat.completions.parse(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": """Classify the document type and extract metadata.
                Document types:
                - invoice: Bills, receipts, payment requests
                - contract: Agreements, terms of service, NDAs
                - report: Analysis, summaries, research documents
                - other: Anything else"""
            },
            {"role": "user", "content": text[:4000]}  # First 4K chars
        ],
        response_format=DocumentClassification
    )

    return response.choices[0].message.parsed

Invoice Extraction

from pydantic import BaseModel, Field
from typing import Optional
from datetime import date

class LineItem(BaseModel):
    description: str
    quantity: float
    unit_price: float
    total: float

class InvoiceData(BaseModel):
    invoice_number: str
    vendor_name: str
    vendor_address: Optional[str] = None
    customer_name: str
    invoice_date: date
    due_date: Optional[date] = None
    line_items: list[LineItem]
    subtotal: float
    tax_amount: float = 0
    total_amount: float
    currency: str = "USD"
    payment_terms: Optional[str] = None

def extract_invoice(text: str) -> InvoiceData:
    """Extract structured data from invoice text."""

    response = client.beta.chat.completions.parse(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": """Extract all invoice information into the structured format.
                - Parse dates in ISO format (YYYY-MM-DD)
                - Calculate totals if not explicitly stated
                - Use the document's currency
                - Extract all line items with quantities and prices"""
            },
            {"role": "user", "content": text}
        ],
        response_format=InvoiceData
    )

    return response.choices[0].message.parsed

Complete Pipeline

class DocumentProcessor:
    """Complete document processing pipeline."""

    def __init__(self):
        self.client = OpenAI()

    def process(self, document_text: str) -> dict:
        """Process any document type."""

        # Step 1: Classify
        classification = classify_document(document_text)

        # Step 2: Route to appropriate extractor
        if classification.document_type == "invoice":
            data = extract_invoice(document_text)
        elif classification.document_type == "contract":
            data = extract_contract(document_text)
        elif classification.document_type == "report":
            data = summarize_report(document_text)
        else:
            data = {"raw_text": document_text[:1000]}

        # Step 3: Validate and return
        return {
            "classification": classification.model_dump(),
            "extracted_data": data.model_dump() if hasattr(data, 'model_dump') else data,
            "processing_status": "success"
        }

# Usage
processor = DocumentProcessor()
result = processor.process(invoice_text)
print(json.dumps(result, indent=2, default=str))

Project 2: Vision Prompting Application

Project Overview

Goal: Build a multimodal application that analyzes images for product quality inspection.

Techniques Used: Vision Prompting, Structured Output, Few-shot with Images

Difficulty: Intermediate

Vision Prompting Basics

Modern LLMs (GPT-4V, Claude 3.5, Gemini) support image inputs alongside text. This enables powerful visual understanding applications.

Basic Image Analysis

import base64
from openai import OpenAI

client = OpenAI()

def encode_image(image_path: str) -> str:
    """Encode image to base64."""
    with open(image_path, "rb") as f:
        return base64.standard_b64encode(f.read()).decode("utf-8")

def analyze_image(image_path: str, prompt: str) -> str:
    """Analyze an image with a custom prompt."""

    base64_image = encode_image(image_path)

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": prompt},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": f"data:image/jpeg;base64,{base64_image}",
                            "detail": "high"  # or "low" for faster/cheaper
                        }
                    }
                ]
            }
        ],
        max_tokens=1000
    )

    return response.choices[0].message.content

# Usage
result = analyze_image(
    "product.jpg",
    "Describe any visible defects or quality issues in this product image."
)

Quality Inspection System

Product Quality Inspector

from pydantic import BaseModel
from typing import Literal
from enum import Enum

class DefectType(str, Enum):
    SCRATCH = "scratch"
    DENT = "dent"
    DISCOLORATION = "discoloration"
    CRACK = "crack"
    MISSING_PART = "missing_part"
    CONTAMINATION = "contamination"
    OTHER = "other"

class Defect(BaseModel):
    type: DefectType
    location: str  # e.g., "top-left corner", "center"
    severity: Literal["minor", "moderate", "severe"]
    description: str

class QualityInspection(BaseModel):
    product_identified: str
    overall_quality: Literal["pass", "fail", "review_needed"]
    confidence: float
    defects_found: list[Defect]
    recommendations: list[str]

def inspect_product(image_path: str, product_type: str) -> QualityInspection:
    """Perform quality inspection on a product image."""

    base64_image = encode_image(image_path)

    response = client.beta.chat.completions.parse(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": f"""You are a quality control inspector for {product_type} products.
                Analyze the image for any defects or quality issues.

                Quality Standards:
                - Minor defects: Cosmetic issues that don't affect function
                - Moderate defects: Visible issues that may affect customer satisfaction
                - Severe defects: Functional issues or safety concerns

                Pass criteria: No severe defects, max 2 minor defects
                Fail criteria: Any severe defect or more than 3 moderate defects
                Review needed: Edge cases requiring human judgment"""
            },
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Inspect this product image and provide detailed quality assessment."},
                    {
                        "type": "image_url",
                        "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}
                    }
                ]
            }
        ],
        response_format=QualityInspection
    )

    return response.choices[0].message.parsed

# Usage
inspection = inspect_product("widget.jpg", "electronic component")
print(f"Quality: {inspection.overall_quality}")
print(f"Defects found: {len(inspection.defects_found)}")
for defect in inspection.defects_found:
    print(f"  - {defect.type}: {defect.description} ({defect.severity})")

Vision Prompting Best Practices

Practice	Description
Use high detail for fine analysis	Set `detail: "high"` for detailed inspection tasks
Provide reference context	Tell the model what product it's looking at
Use structured output	Define schemas for consistent extraction
Multiple images for comparison	Send reference images alongside test images
Specify location format	Define how to report positions (grid, coordinates, descriptions)

Project 3: Prompt Caching for Cost Optimization

Project Overview

Goal: Implement prompt caching strategies to reduce API costs by 50-90%.

Techniques Used: Prompt Caching, Token Optimization, Batch Processing

Difficulty: Intermediate

Understanding Prompt Caching

Prompt Caching (available in Anthropic Claude and OpenAI) allows you to cache the prefix of prompts that are reused frequently. Cached tokens cost significantly less on subsequent requests.

Anthropic Prompt Caching

Claude Prompt Caching

import anthropic

client = anthropic.Anthropic()

# Large system prompt that will be cached
SYSTEM_PROMPT = """You are an expert legal assistant with deep knowledge of:

[Include 5000+ tokens of legal context, case law references,
jurisdiction-specific rules, document templates, etc.]

Your role is to help lawyers draft documents, review contracts,
and provide legal research assistance.
"""  # This could be 10K+ tokens

def query_legal_assistant(user_query: str):
    """Query with prompt caching enabled."""

    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        system=[
            {
                "type": "text",
                "text": SYSTEM_PROMPT,
                "cache_control": {"type": "ephemeral"}  # Enable caching
            }
        ],
        messages=[
            {"role": "user", "content": user_query}
        ]
    )

    # Check cache usage in response
    print(f"Input tokens: {response.usage.input_tokens}")
    print(f"Cache read tokens: {response.usage.cache_read_input_tokens}")
    print(f"Cache creation tokens: {response.usage.cache_creation_input_tokens}")

    return response.content[0].text

# First call: Creates cache (higher cost)
result1 = query_legal_assistant("Review this NDA clause...")

# Subsequent calls: Uses cache (90% cost reduction on cached tokens)
result2 = query_legal_assistant("Draft a confidentiality agreement...")
result3 = query_legal_assistant("What are the requirements for...")

OpenAI Automatic Caching

OpenAI Prompt Caching (Automatic)

from openai import OpenAI

client = OpenAI()

# OpenAI automatically caches prompts >= 1024 tokens
# that share the same prefix

LONG_CONTEXT = """
[Large document or context - 5000+ tokens]
This could be:
- A complete codebase for code review
- A lengthy document for analysis
- Extensive product documentation
- Historical conversation context
"""

def analyze_with_context(query: str):
    """Automatically benefits from caching with long prompts."""

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": LONG_CONTEXT  # Automatically cached if >= 1024 tokens
            },
            {"role": "user", "content": query}
        ]
    )

    # Check cache usage
    if hasattr(response.usage, 'prompt_tokens_details'):
        details = response.usage.prompt_tokens_details
        print(f"Cached tokens: {details.cached_tokens}")

    return response.choices[0].message.content

Cost Optimization Strategies

Caching Best Practices

Front-load static content: Put cacheable content at the beginning of prompts
Batch similar requests: Group requests with the same context
Use appropriate cache TTL: Anthropic caches for 5 minutes by default
Monitor cache hit rates: Track to optimize prompt structure
Minimize variable content: Keep dynamic parts at the end

Prompt Structure for Optimal Caching

# Optimal structure for caching

messages = [
    # CACHED SECTION (put first, don't change)
    {
        "role": "system",
        "content": LARGE_STATIC_CONTEXT  # 5000+ tokens, rarely changes
    },

    # SEMI-CACHED (changes occasionally)
    {
        "role": "user",
        "content": session_context  # Session-specific but stable
    },
    {
        "role": "assistant",
        "content": previous_response
    },

    # NOT CACHED (changes every request)
    {
        "role": "user",
        "content": current_query  # Different each time
    }
]

Cost Comparison

Scenario	Without Caching	With Caching	Savings
10K token system prompt, 100 requests	$3.00	$0.33	89%
RAG with 8K context, 50 requests	$1.20	$0.15	87%
Code review (full codebase), 20 requests	$2.00	$0.25	87%

Project 4: Production Deployment

Project Overview

Goal: Deploy a prompt-based application with proper monitoring, error handling, and scaling.

Techniques Used: Observability, Rate Limiting, Fallback Strategies

Difficulty: Advanced

Production Architecture

graph TD A[Client] --> B[API Gateway] B --> C[Rate Limiter] C --> D[Request Router] D --> E[Primary LLM] D --> F[Fallback LLM] E --> G[Response Cache] F --> G G --> H[Monitoring] H --> I[Logging] H --> J[Metrics] H --> K[Alerting]

Robust LLM Client

Production-Ready Client

import time
from openai import OpenAI
from anthropic import Anthropic
from tenacity import retry, stop_after_attempt, wait_exponential
import logging

logger = logging.getLogger(__name__)

class ProductionLLMClient:
    """Production-ready LLM client with fallback and monitoring."""

    def __init__(self):
        self.openai = OpenAI()
        self.anthropic = Anthropic()
        self.metrics = MetricsCollector()

    @retry(
        stop=stop_after_attempt(3),
        wait=wait_exponential(multiplier=1, min=4, max=60)
    )
    def _call_openai(self, messages: list, **kwargs) -> str:
        """Call OpenAI with retry logic."""
        start = time.time()
        try:
            response = self.openai.chat.completions.create(
                model=kwargs.get("model", "gpt-4o"),
                messages=messages,
                **kwargs
            )
            self.metrics.record("openai_success", time.time() - start)
            return response.choices[0].message.content
        except Exception as e:
            self.metrics.record("openai_error", time.time() - start)
            logger.error(f"OpenAI error: {e}")
            raise

    @retry(
        stop=stop_after_attempt(3),
        wait=wait_exponential(multiplier=1, min=4, max=60)
    )
    def _call_anthropic(self, messages: list, **kwargs) -> str:
        """Call Anthropic with retry logic."""
        start = time.time()
        try:
            response = self.anthropic.messages.create(
                model=kwargs.get("model", "claude-sonnet-4-20250514"),
                max_tokens=kwargs.get("max_tokens", 1024),
                messages=messages
            )
            self.metrics.record("anthropic_success", time.time() - start)
            return response.content[0].text
        except Exception as e:
            self.metrics.record("anthropic_error", time.time() - start)
            logger.error(f"Anthropic error: {e}")
            raise

    def complete(self, messages: list, primary: str = "openai", **kwargs) -> str:
        """Complete with automatic fallback."""
        try:
            if primary == "openai":
                return self._call_openai(messages, **kwargs)
            else:
                return self._call_anthropic(messages, **kwargs)
        except Exception as e:
            logger.warning(f"Primary provider failed, trying fallback: {e}")
            # Fallback to other provider
            try:
                if primary == "openai":
                    return self._call_anthropic(messages, **kwargs)
                else:
                    return self._call_openai(messages, **kwargs)
            except Exception as e2:
                logger.error(f"All providers failed: {e2}")
                raise RuntimeError("All LLM providers unavailable")

Monitoring and Observability

Metrics Collection

from dataclasses import dataclass, field
from datetime import datetime
from typing import Dict, List
import json

@dataclass
class LLMMetrics:
    """Track LLM usage metrics for monitoring."""

    requests_total: int = 0
    requests_success: int = 0
    requests_failed: int = 0
    total_tokens: int = 0
    total_cost: float = 0.0
    latencies: List[float] = field(default_factory=list)
    errors_by_type: Dict[str, int] = field(default_factory=dict)

    def record_request(self, success: bool, latency: float,
                       tokens: int, cost: float, error_type: str = None):
        self.requests_total += 1
        if success:
            self.requests_success += 1
        else:
            self.requests_failed += 1
            self.errors_by_type[error_type] = self.errors_by_type.get(error_type, 0) + 1

        self.latencies.append(latency)
        self.total_tokens += tokens
        self.total_cost += cost

    def get_summary(self) -> dict:
        return {
            "success_rate": self.requests_success / max(self.requests_total, 1),
            "avg_latency": sum(self.latencies) / max(len(self.latencies), 1),
            "p95_latency": sorted(self.latencies)[int(len(self.latencies) * 0.95)]
                          if self.latencies else 0,
            "total_cost": self.total_cost,
            "error_breakdown": self.errors_by_type
        }

    def export_prometheus(self) -> str:
        """Export metrics in Prometheus format."""
        return f"""
# HELP llm_requests_total Total LLM requests
# TYPE llm_requests_total counter
llm_requests_total {self.requests_total}

# HELP llm_requests_success_total Successful LLM requests
# TYPE llm_requests_success_total counter
llm_requests_success_total {self.requests_success}

# HELP llm_latency_seconds LLM request latency
# TYPE llm_latency_seconds histogram
llm_latency_seconds_sum {sum(self.latencies)}
llm_latency_seconds_count {len(self.latencies)}

# HELP llm_cost_dollars Total LLM cost
# TYPE llm_cost_dollars counter
llm_cost_dollars {self.total_cost}
"""

Production Checklist

Deployment Checklist

Error Handling: Retry logic, fallback providers, graceful degradation
Rate Limiting: Per-user limits, global throttling, queue management
Caching: Response caching, prompt caching, embedding caching
Monitoring: Latency, success rate, token usage, cost tracking
Logging: Request/response logging (sanitized), error tracking
Security: Input validation, output filtering, PII handling
Cost Controls: Budget alerts, usage caps, cost attribution
Testing: Prompt regression tests, load testing, chaos testing

Final Exercises

Exercise 1: Document Processor Extension (Difficulty: Medium)

Task: Extend the document analysis system to handle:

PDF documents (extract text first)
Multi-language support (detect and translate)
Confidence scoring for extracted fields

Exercise 2: Vision Application (Difficulty: Medium)

Task: Build a receipt scanner that:

Accepts photos of receipts
Extracts merchant, date, items, total
Categorizes expenses automatically
Handles poor image quality gracefully

Exercise 3: Caching Implementation (Difficulty: Medium)

Task: Implement a caching layer for a chatbot that:

Caches the system prompt and knowledge base
Tracks cache hit rate and cost savings
Automatically invalidates cache when knowledge updates

Exercise 4: Production API (Difficulty: Advanced)

Task: Build a production-ready API endpoint that:

Implements the ProductionLLMClient with fallback
Includes rate limiting (10 requests/minute/user)
Exports Prometheus metrics
Logs all requests with correlation IDs

Exercise 5: Complete Project (Difficulty: Advanced)

Task: Combine all techniques to build a "Smart Meeting Notes" application:

Input: Meeting transcript (text or audio)
Processing: Extract action items, decisions, participants
Output: Structured summary with follow-up tasks
Features: Prompt caching, error handling, monitoring

Series Summary

Key Points from This Series

Chapter 1: Prompt fundamentals - clarity, specificity, context, constraints
Chapter 2: Basic techniques - Role Prompting, Structured Output, JSON Mode
Chapter 3: Advanced techniques - Tree of Thought, ReAct, reasoning model prompts
Chapter 4: Function Calling - tool definition, MCP, error handling, orchestration
Chapter 5: Production - vision prompting, caching, deployment best practices

Continuing Your Journey

Congratulations on completing the Introduction to Prompt Engineering series! Here are recommended next steps:

Practice: Apply these techniques to your own projects
Explore: Try the AI Agents Introduction series
Deepen: Study the LLM Basics Introduction for theoretical foundations
Build: Create your own prompt library and templates
Share: Contribute to the prompt engineering community

Chapter 4 Series Index

References

OpenAI. (2024). Vision Guide
Anthropic. (2024). Prompt Caching Documentation
Google. (2024). Gemini Vision Capabilities
LangChain. (2024). Production Deployment Guide

Update History

2026-01-12: v2.0 Initial release with Vision and Caching projects

Introduction

Project 1: Document Analysis System

Project Overview

System Architecture

Implementation

Document Classification

Invoice Extraction

Complete Pipeline

Project 2: Vision Prompting Application

Project Overview

Vision Prompting Basics

Basic Image Analysis

Quality Inspection System

Product Quality Inspector

Vision Prompting Best Practices

Project 3: Prompt Caching for Cost Optimization

Project Overview

Understanding Prompt Caching

Anthropic Prompt Caching

Claude Prompt Caching

OpenAI Automatic Caching

OpenAI Prompt Caching (Automatic)

Cost Optimization Strategies

Caching Best Practices

Prompt Structure for Optimal Caching

Cost Comparison

Project 4: Production Deployment

Project Overview

Production Architecture

Robust LLM Client

Production-Ready Client

Monitoring and Observability

Metrics Collection

Production Checklist

Deployment Checklist

Final Exercises

Exercise 1: Document Processor Extension (Difficulty: Medium)

Exercise 2: Vision Application (Difficulty: Medium)

Exercise 3: Caching Implementation (Difficulty: Medium)

Exercise 4: Production API (Difficulty: Advanced)

Exercise 5: Complete Project (Difficulty: Advanced)

Series Summary

Key Points from This Series

Continuing Your Journey

References

Disclaimer