Chapter 5: Practical Projects

Real-World Applications: Vision, Caching, and Production Deployment

Reading Time: 25-30 minutes Projects: 4 Code Examples: 12 Difficulty: Intermediate-Advanced
Language: English | Japanese

Introduction

This final chapter brings together everything you've learned to build real-world applications. We'll cover four practical projects that demonstrate modern prompt engineering techniques including vision prompting, prompt caching, and production deployment strategies.

By the end of this chapter, you will have:

Project 1: Document Analysis System

Project Overview

Goal: Build a system that analyzes business documents (invoices, contracts, reports) and extracts structured information.

Techniques Used: Structured Output, Prompt Chains, Error Handling

Difficulty: Intermediate

System Architecture

graph LR A[Document Input] --> B[Classification] B --> C{Document Type} C -->|Invoice| D[Invoice Extractor] C -->|Contract| E[Contract Analyzer] C -->|Report| F[Report Summarizer] D --> G[Structured Output] E --> G F --> G G --> H[Validation] H --> I[Database/API]

Implementation

Document Classification

from openai import OpenAI
from pydantic import BaseModel
from typing import Literal
import json

client = OpenAI()

class DocumentClassification(BaseModel):
    document_type: Literal["invoice", "contract", "report", "other"]
    confidence: float
    language: str
    page_count_estimate: int

def classify_document(text: str) -> DocumentClassification:
    """Classify a document and return structured metadata."""

    response = client.beta.chat.completions.parse(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": """Classify the document type and extract metadata.
                Document types:
                - invoice: Bills, receipts, payment requests
                - contract: Agreements, terms of service, NDAs
                - report: Analysis, summaries, research documents
                - other: Anything else"""
            },
            {"role": "user", "content": text[:4000]}  # First 4K chars
        ],
        response_format=DocumentClassification
    )

    return response.choices[0].message.parsed

Invoice Extraction

from pydantic import BaseModel, Field
from typing import Optional
from datetime import date

class LineItem(BaseModel):
    description: str
    quantity: float
    unit_price: float
    total: float

class InvoiceData(BaseModel):
    invoice_number: str
    vendor_name: str
    vendor_address: Optional[str] = None
    customer_name: str
    invoice_date: date
    due_date: Optional[date] = None
    line_items: list[LineItem]
    subtotal: float
    tax_amount: float = 0
    total_amount: float
    currency: str = "USD"
    payment_terms: Optional[str] = None

def extract_invoice(text: str) -> InvoiceData:
    """Extract structured data from invoice text."""

    response = client.beta.chat.completions.parse(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": """Extract all invoice information into the structured format.
                - Parse dates in ISO format (YYYY-MM-DD)
                - Calculate totals if not explicitly stated
                - Use the document's currency
                - Extract all line items with quantities and prices"""
            },
            {"role": "user", "content": text}
        ],
        response_format=InvoiceData
    )

    return response.choices[0].message.parsed

Complete Pipeline

class DocumentProcessor:
    """Complete document processing pipeline."""

    def __init__(self):
        self.client = OpenAI()

    def process(self, document_text: str) -> dict:
        """Process any document type."""

        # Step 1: Classify
        classification = classify_document(document_text)

        # Step 2: Route to appropriate extractor
        if classification.document_type == "invoice":
            data = extract_invoice(document_text)
        elif classification.document_type == "contract":
            data = extract_contract(document_text)
        elif classification.document_type == "report":
            data = summarize_report(document_text)
        else:
            data = {"raw_text": document_text[:1000]}

        # Step 3: Validate and return
        return {
            "classification": classification.model_dump(),
            "extracted_data": data.model_dump() if hasattr(data, 'model_dump') else data,
            "processing_status": "success"
        }

# Usage
processor = DocumentProcessor()
result = processor.process(invoice_text)
print(json.dumps(result, indent=2, default=str))

Project 2: Vision Prompting Application

Project Overview

Goal: Build a multimodal application that analyzes images for product quality inspection.

Techniques Used: Vision Prompting, Structured Output, Few-shot with Images

Difficulty: Intermediate

Vision Prompting Basics

Modern LLMs (GPT-4V, Claude 3.5, Gemini) support image inputs alongside text. This enables powerful visual understanding applications.

Basic Image Analysis

import base64
from openai import OpenAI

client = OpenAI()

def encode_image(image_path: str) -> str:
    """Encode image to base64."""
    with open(image_path, "rb") as f:
        return base64.standard_b64encode(f.read()).decode("utf-8")

def analyze_image(image_path: str, prompt: str) -> str:
    """Analyze an image with a custom prompt."""

    base64_image = encode_image(image_path)

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": prompt},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": f"data:image/jpeg;base64,{base64_image}",
                            "detail": "high"  # or "low" for faster/cheaper
                        }
                    }
                ]
            }
        ],
        max_tokens=1000
    )

    return response.choices[0].message.content

# Usage
result = analyze_image(
    "product.jpg",
    "Describe any visible defects or quality issues in this product image."
)

Quality Inspection System

Product Quality Inspector

from pydantic import BaseModel
from typing import Literal
from enum import Enum

class DefectType(str, Enum):
    SCRATCH = "scratch"
    DENT = "dent"
    DISCOLORATION = "discoloration"
    CRACK = "crack"
    MISSING_PART = "missing_part"
    CONTAMINATION = "contamination"
    OTHER = "other"

class Defect(BaseModel):
    type: DefectType
    location: str  # e.g., "top-left corner", "center"
    severity: Literal["minor", "moderate", "severe"]
    description: str

class QualityInspection(BaseModel):
    product_identified: str
    overall_quality: Literal["pass", "fail", "review_needed"]
    confidence: float
    defects_found: list[Defect]
    recommendations: list[str]

def inspect_product(image_path: str, product_type: str) -> QualityInspection:
    """Perform quality inspection on a product image."""

    base64_image = encode_image(image_path)

    response = client.beta.chat.completions.parse(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": f"""You are a quality control inspector for {product_type} products.
                Analyze the image for any defects or quality issues.

                Quality Standards:
                - Minor defects: Cosmetic issues that don't affect function
                - Moderate defects: Visible issues that may affect customer satisfaction
                - Severe defects: Functional issues or safety concerns

                Pass criteria: No severe defects, max 2 minor defects
                Fail criteria: Any severe defect or more than 3 moderate defects
                Review needed: Edge cases requiring human judgment"""
            },
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Inspect this product image and provide detailed quality assessment."},
                    {
                        "type": "image_url",
                        "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}
                    }
                ]
            }
        ],
        response_format=QualityInspection
    )

    return response.choices[0].message.parsed

# Usage
inspection = inspect_product("widget.jpg", "electronic component")
print(f"Quality: {inspection.overall_quality}")
print(f"Defects found: {len(inspection.defects_found)}")
for defect in inspection.defects_found:
    print(f"  - {defect.type}: {defect.description} ({defect.severity})")

Vision Prompting Best Practices

Practice Description
Use high detail for fine analysis Set detail: "high" for detailed inspection tasks
Provide reference context Tell the model what product it's looking at
Use structured output Define schemas for consistent extraction
Multiple images for comparison Send reference images alongside test images
Specify location format Define how to report positions (grid, coordinates, descriptions)

Project 3: Prompt Caching for Cost Optimization

Project Overview

Goal: Implement prompt caching strategies to reduce API costs by 50-90%.

Techniques Used: Prompt Caching, Token Optimization, Batch Processing

Difficulty: Intermediate

Understanding Prompt Caching

Prompt Caching (available in Anthropic Claude and OpenAI) allows you to cache the prefix of prompts that are reused frequently. Cached tokens cost significantly less on subsequent requests.

graph TD A[Long System Prompt] -->|First Request| B[Full Processing] A -->|Cached| C[Subsequent Requests] B -->|Cost: $X| D[Response 1] C -->|Cost: $0.1X| E[Responses 2-N] style C fill:#e8f5e9 style E fill:#e8f5e9

Anthropic Prompt Caching

Claude Prompt Caching

import anthropic

client = anthropic.Anthropic()

# Large system prompt that will be cached
SYSTEM_PROMPT = """You are an expert legal assistant with deep knowledge of:

[Include 5000+ tokens of legal context, case law references,
jurisdiction-specific rules, document templates, etc.]

Your role is to help lawyers draft documents, review contracts,
and provide legal research assistance.
"""  # This could be 10K+ tokens

def query_legal_assistant(user_query: str):
    """Query with prompt caching enabled."""

    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        system=[
            {
                "type": "text",
                "text": SYSTEM_PROMPT,
                "cache_control": {"type": "ephemeral"}  # Enable caching
            }
        ],
        messages=[
            {"role": "user", "content": user_query}
        ]
    )

    # Check cache usage in response
    print(f"Input tokens: {response.usage.input_tokens}")
    print(f"Cache read tokens: {response.usage.cache_read_input_tokens}")
    print(f"Cache creation tokens: {response.usage.cache_creation_input_tokens}")

    return response.content[0].text

# First call: Creates cache (higher cost)
result1 = query_legal_assistant("Review this NDA clause...")

# Subsequent calls: Uses cache (90% cost reduction on cached tokens)
result2 = query_legal_assistant("Draft a confidentiality agreement...")
result3 = query_legal_assistant("What are the requirements for...")

OpenAI Automatic Caching

OpenAI Prompt Caching (Automatic)

from openai import OpenAI

client = OpenAI()

# OpenAI automatically caches prompts >= 1024 tokens
# that share the same prefix

LONG_CONTEXT = """
[Large document or context - 5000+ tokens]
This could be:
- A complete codebase for code review
- A lengthy document for analysis
- Extensive product documentation
- Historical conversation context
"""

def analyze_with_context(query: str):
    """Automatically benefits from caching with long prompts."""

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": LONG_CONTEXT  # Automatically cached if >= 1024 tokens
            },
            {"role": "user", "content": query}
        ]
    )

    # Check cache usage
    if hasattr(response.usage, 'prompt_tokens_details'):
        details = response.usage.prompt_tokens_details
        print(f"Cached tokens: {details.cached_tokens}")

    return response.choices[0].message.content

Cost Optimization Strategies

Caching Best Practices

  1. Front-load static content: Put cacheable content at the beginning of prompts
  2. Batch similar requests: Group requests with the same context
  3. Use appropriate cache TTL: Anthropic caches for 5 minutes by default
  4. Monitor cache hit rates: Track to optimize prompt structure
  5. Minimize variable content: Keep dynamic parts at the end

Prompt Structure for Optimal Caching

# Optimal structure for caching

messages = [
    # CACHED SECTION (put first, don't change)
    {
        "role": "system",
        "content": LARGE_STATIC_CONTEXT  # 5000+ tokens, rarely changes
    },

    # SEMI-CACHED (changes occasionally)
    {
        "role": "user",
        "content": session_context  # Session-specific but stable
    },
    {
        "role": "assistant",
        "content": previous_response
    },

    # NOT CACHED (changes every request)
    {
        "role": "user",
        "content": current_query  # Different each time
    }
]

Cost Comparison

Scenario Without Caching With Caching Savings
10K token system prompt, 100 requests $3.00 $0.33 89%
RAG with 8K context, 50 requests $1.20 $0.15 87%
Code review (full codebase), 20 requests $2.00 $0.25 87%

Project 4: Production Deployment

Project Overview

Goal: Deploy a prompt-based application with proper monitoring, error handling, and scaling.

Techniques Used: Observability, Rate Limiting, Fallback Strategies

Difficulty: Advanced

Production Architecture

graph TD A[Client] --> B[API Gateway] B --> C[Rate Limiter] C --> D[Request Router] D --> E[Primary LLM] D --> F[Fallback LLM] E --> G[Response Cache] F --> G G --> H[Monitoring] H --> I[Logging] H --> J[Metrics] H --> K[Alerting]

Robust LLM Client

Production-Ready Client

import time
from openai import OpenAI
from anthropic import Anthropic
from tenacity import retry, stop_after_attempt, wait_exponential
import logging

logger = logging.getLogger(__name__)

class ProductionLLMClient:
    """Production-ready LLM client with fallback and monitoring."""

    def __init__(self):
        self.openai = OpenAI()
        self.anthropic = Anthropic()
        self.metrics = MetricsCollector()

    @retry(
        stop=stop_after_attempt(3),
        wait=wait_exponential(multiplier=1, min=4, max=60)
    )
    def _call_openai(self, messages: list, **kwargs) -> str:
        """Call OpenAI with retry logic."""
        start = time.time()
        try:
            response = self.openai.chat.completions.create(
                model=kwargs.get("model", "gpt-4o"),
                messages=messages,
                **kwargs
            )
            self.metrics.record("openai_success", time.time() - start)
            return response.choices[0].message.content
        except Exception as e:
            self.metrics.record("openai_error", time.time() - start)
            logger.error(f"OpenAI error: {e}")
            raise

    @retry(
        stop=stop_after_attempt(3),
        wait=wait_exponential(multiplier=1, min=4, max=60)
    )
    def _call_anthropic(self, messages: list, **kwargs) -> str:
        """Call Anthropic with retry logic."""
        start = time.time()
        try:
            response = self.anthropic.messages.create(
                model=kwargs.get("model", "claude-sonnet-4-20250514"),
                max_tokens=kwargs.get("max_tokens", 1024),
                messages=messages
            )
            self.metrics.record("anthropic_success", time.time() - start)
            return response.content[0].text
        except Exception as e:
            self.metrics.record("anthropic_error", time.time() - start)
            logger.error(f"Anthropic error: {e}")
            raise

    def complete(self, messages: list, primary: str = "openai", **kwargs) -> str:
        """Complete with automatic fallback."""
        try:
            if primary == "openai":
                return self._call_openai(messages, **kwargs)
            else:
                return self._call_anthropic(messages, **kwargs)
        except Exception as e:
            logger.warning(f"Primary provider failed, trying fallback: {e}")
            # Fallback to other provider
            try:
                if primary == "openai":
                    return self._call_anthropic(messages, **kwargs)
                else:
                    return self._call_openai(messages, **kwargs)
            except Exception as e2:
                logger.error(f"All providers failed: {e2}")
                raise RuntimeError("All LLM providers unavailable")

Monitoring and Observability

Metrics Collection

from dataclasses import dataclass, field
from datetime import datetime
from typing import Dict, List
import json

@dataclass
class LLMMetrics:
    """Track LLM usage metrics for monitoring."""

    requests_total: int = 0
    requests_success: int = 0
    requests_failed: int = 0
    total_tokens: int = 0
    total_cost: float = 0.0
    latencies: List[float] = field(default_factory=list)
    errors_by_type: Dict[str, int] = field(default_factory=dict)

    def record_request(self, success: bool, latency: float,
                       tokens: int, cost: float, error_type: str = None):
        self.requests_total += 1
        if success:
            self.requests_success += 1
        else:
            self.requests_failed += 1
            self.errors_by_type[error_type] = self.errors_by_type.get(error_type, 0) + 1

        self.latencies.append(latency)
        self.total_tokens += tokens
        self.total_cost += cost

    def get_summary(self) -> dict:
        return {
            "success_rate": self.requests_success / max(self.requests_total, 1),
            "avg_latency": sum(self.latencies) / max(len(self.latencies), 1),
            "p95_latency": sorted(self.latencies)[int(len(self.latencies) * 0.95)]
                          if self.latencies else 0,
            "total_cost": self.total_cost,
            "error_breakdown": self.errors_by_type
        }

    def export_prometheus(self) -> str:
        """Export metrics in Prometheus format."""
        return f"""
# HELP llm_requests_total Total LLM requests
# TYPE llm_requests_total counter
llm_requests_total {self.requests_total}

# HELP llm_requests_success_total Successful LLM requests
# TYPE llm_requests_success_total counter
llm_requests_success_total {self.requests_success}

# HELP llm_latency_seconds LLM request latency
# TYPE llm_latency_seconds histogram
llm_latency_seconds_sum {sum(self.latencies)}
llm_latency_seconds_count {len(self.latencies)}

# HELP llm_cost_dollars Total LLM cost
# TYPE llm_cost_dollars counter
llm_cost_dollars {self.total_cost}
"""

Production Checklist

Deployment Checklist

Final Exercises

Exercise 1: Document Processor Extension (Difficulty: Medium)

Task: Extend the document analysis system to handle:

Exercise 2: Vision Application (Difficulty: Medium)

Task: Build a receipt scanner that:

Exercise 3: Caching Implementation (Difficulty: Medium)

Task: Implement a caching layer for a chatbot that:

Exercise 4: Production API (Difficulty: Advanced)

Task: Build a production-ready API endpoint that:

Exercise 5: Complete Project (Difficulty: Advanced)

Task: Combine all techniques to build a "Smart Meeting Notes" application:

Series Summary

Key Points from This Series

Continuing Your Journey

Congratulations on completing the Introduction to Prompt Engineering series! Here are recommended next steps:


References


Update History

Disclaimer