Introduction
Building on the basics from Chapters 1 and 2, this chapter covers advanced prompting techniques that significantly improve LLM performance on complex reasoning tasks. These techniques represent the cutting edge of prompt engineering as of 2025-2026.
By the end of this chapter, you will be able to:
- Implement Tree of Thought (ToT) for complex problem-solving
- Use Self-Consistency to improve answer reliability
- Apply the ReAct pattern for reasoning with actions
- Craft effective prompts for reasoning models (o1, o3, DeepSeek-R1)
- Design multi-turn conversation flows
- Build prompt chains for complex workflows
Tree of Thought (ToT)
Tree of Thought prompting encourages the LLM to explore multiple reasoning paths simultaneously, evaluate each path, and select the most promising direction. This is particularly effective for problems requiring planning, search, or creative exploration.
How Tree of Thought Works
Basic ToT Prompt
Tree of Thought Example
Solve this problem using Tree of Thought reasoning.
**Problem**: A farmer needs to cross a river with a wolf, a goat, and a cabbage.
The boat can only carry the farmer and one item. If left alone:
- The wolf will eat the goat
- The goat will eat the cabbage
**Instructions**:
1. Generate 3 different initial moves
2. Evaluate each move (safe/unsafe and why)
3. Select the best move and continue
4. Repeat until solved
**Format for each step**:
```
Move Option [N]:
- Action: [what to take across]
- State after move: [who/what is where]
- Evaluation: [Safe/Unsafe] - [reason]
- Score: [1-10]
```
Begin your tree exploration:
ToT for Creative Tasks
Creative Tree of Thought
Use Tree of Thought to generate a company name for an AI-powered legal assistant startup.
**Step 1: Generate 3 thematic directions**
- Direction A: [Professional/Traditional]
- Direction B: [Tech-forward/Modern]
- Direction C: [Approachable/Friendly]
**Step 2: For each direction, generate 3 name candidates**
Evaluate each on:
- Memorability (1-10)
- Domain availability likelihood (1-10)
- Legal industry fit (1-10)
- Global appeal (1-10)
**Step 3: Select top 3 names across all directions**
Provide final ranking with justification.
**Step 4: For the top choice, generate:**
- Tagline options (3)
- Visual/logo concepts (brief descriptions)
- Potential concerns or trademark issues
When to Use Tree of Thought
| Good for ToT | Not ideal for ToT |
|---|---|
| Strategic planning problems | Simple factual questions |
| Creative brainstorming | Straightforward calculations |
| Multi-step puzzles | Yes/no decisions |
| Decision-making with tradeoffs | Time-sensitive responses |
Self-Consistency
Self-Consistency improves reliability by generating multiple independent reasoning paths and selecting the most common answer. This reduces the impact of individual reasoning errors.
Basic Self-Consistency
Self-Consistency Prompt
Solve this problem 5 times using different reasoning approaches.
Then report the most consistent answer.
**Problem**: A store offers a 20% discount on all items. An additional 10%
is taken off at checkout. What is the total percentage saved on a $100 item?
**Approach 1**: Sequential percentage calculation
[Your reasoning...]
Answer 1: ___
**Approach 2**: Work backwards from final price
[Your reasoning...]
Answer 2: ___
**Approach 3**: Use algebraic formula
[Your reasoning...]
Answer 3: ___
**Approach 4**: Concrete example with different starting price
[Your reasoning...]
Answer 4: ___
**Approach 5**: Verify using a different method
[Your reasoning...]
Answer 5: ___
**Final Answer**: [Most common answer]
**Confidence**: [Based on consistency]
**Note any discrepancies**: [If answers differ, explain why]
Self-Consistency via API
For programmatic self-consistency, sample multiple responses with temperature > 0:
Python Implementation
from openai import OpenAI
from collections import Counter
client = OpenAI()
def self_consistent_answer(prompt, n_samples=5, temperature=0.7):
"""Generate multiple answers and return the most common one."""
responses = []
for _ in range(n_samples):
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
temperature=temperature,
max_tokens=500
)
# Extract the final answer (assume it's on the last line)
answer = response.choices[0].message.content.strip().split('\n')[-1]
responses.append(answer)
# Count occurrences and return most common
answer_counts = Counter(responses)
most_common = answer_counts.most_common(1)[0]
return {
"answer": most_common[0],
"confidence": most_common[1] / n_samples,
"all_responses": responses
}
# Usage
result = self_consistent_answer(
"What is 15% of 80? Show your work, then state just the number on the last line."
)
print(f"Answer: {result['answer']}")
print(f"Confidence: {result['confidence']:.0%}")
ReAct Pattern
ReAct (Reasoning + Acting) is a prompting pattern where the LLM alternates between reasoning about a problem and taking actions (like searching for information or using tools). This is foundational for AI agents.
ReAct Structure
ReAct Prompt Template
ReAct Pattern
Answer the following question using the ReAct format.
You have access to these actions:
- Search[query]: Search Wikipedia for information
- Calculate[expression]: Perform mathematical calculations
- Lookup[term]: Look up a specific term in the current context
**Question**: What is the population density of the country that hosted the 2024 Olympics?
**Format**:
Thought 1: [Your reasoning about what to do first]
Action 1: [Action name][input]
Observation 1: [What you learned - I'll provide this]
Thought 2: [Your next reasoning step]
Action 2: [Next action]
Observation 2: [Result]
... continue until you have the answer ...
Final Answer: [Your answer based on the observations]
Begin:
ReAct Example Execution
ReAct in Practice
Question: Who won more Grand Slam titles - Roger Federer or Rafael Nadal?
And by how many?
Thought 1: I need to find out how many Grand Slam titles each player has won.
Let me start with Roger Federer.
Action 1: Search[Roger Federer Grand Slam titles]
Observation 1: Roger Federer won 20 Grand Slam singles titles.
Thought 2: Now I need to find Nadal's Grand Slam count.
Action 2: Search[Rafael Nadal Grand Slam titles]
Observation 2: Rafael Nadal won 22 Grand Slam singles titles.
Thought 3: Now I can compare and calculate the difference.
Action 3: Calculate[22 - 20]
Observation 3: 2
Thought 4: I have all the information needed to answer the question.
Final Answer: Rafael Nadal won more Grand Slam titles than Roger Federer.
Nadal won 22 titles compared to Federer's 20, a difference of 2 titles.
ReAct for Code Debugging
ReAct Debugging Pattern
Debug this code using the ReAct pattern.
**Available actions**:
- ReadCode[section]: Examine a specific part of the code
- Trace[input]: Trace execution with a specific input
- Hypothesis[bug]: State a hypothesis about the bug
- Fix[change]: Propose a specific fix
**Code**:
```python
def find_duplicates(lst):
seen = set()
duplicates = []
for item in lst:
if item in seen:
duplicates.add(item)
seen.add(item)
return duplicates
```
**Bug report**: Function crashes with "AttributeError: 'list' object has no attribute 'add'"
Begin ReAct debugging:
Prompting for Reasoning Models (o1/o3)
OpenAI's o1 and o3 models (and similar reasoning models like DeepSeek-R1) use internal "thinking" tokens before responding. These models require different prompting strategies than standard LLMs.
Key Differences from Standard LLMs
| Standard LLMs (GPT-4, Claude) | Reasoning Models (o1, o3, DeepSeek-R1) |
|---|---|
| Benefit from detailed instructions | Prefer concise, clear objectives |
| Need explicit CoT prompting | Perform CoT internally (automatic) |
| Few-shot examples improve output | Few-shot can actually hurt performance |
| Temperature affects creativity | Temperature is typically fixed at 1 |
| System prompts guide behavior | System prompts have limited effect |
Prompting Best Practices for o1/o3
Guidelines for Reasoning Models
- Be concise: State the problem clearly without over-explaining
- Don't say "think step by step": They do this automatically
- Avoid few-shot examples: These can constrain their reasoning
- Use delimiters: Clearly separate different parts of the input
- Specify output format at the end: "Provide your final answer as..."
Bad Prompt for o1/o3
You are an expert mathematician. Please think step by step and show all your work.
Here's an example of how to solve a similar problem:
[Long example...]
Now solve this problem carefully, making sure to consider all cases:
What is the probability that a random 5-card poker hand contains exactly one pair?
Think about this systematically:
1. First calculate total combinations
2. Then calculate favorable outcomes
3. Finally divide to get probability
Problems: System prompt, CoT instruction, few-shot example, over-specification
Good Prompt for o1/o3
Calculate the probability that a random 5-card poker hand contains exactly one pair
(two cards of one rank, three cards of different ranks).
Provide your final answer as a simplified fraction and decimal (4 decimal places).
Why it works: Concise problem statement, clear output format, lets model reason freely
When to Use Reasoning Models
Ideal Use Cases for o1/o3
- Complex Math: Multi-step proofs, olympiad problems
- Code Generation: Complex algorithms, system design
- Scientific Reasoning: Research analysis, hypothesis evaluation
- Strategic Planning: Multi-constraint optimization
- Debugging: Root cause analysis in complex systems
Suboptimal Use Cases
- Simple Q&A (overkill, slow, expensive)
- Creative writing (not their strength)
- Tasks requiring specific formatting/style
- Real-time applications (high latency)
Effective o1/o3 Prompt Examples
Coding Problem
Implement a function that finds the longest increasing subsequence in an array.
Requirements:
- Time complexity: O(n log n)
- Return both the length and the actual subsequence
- Handle edge cases: empty array, single element, all same values
Language: Python
Output: Complete function with docstring
Analysis Problem
A company's revenue data:
Q1: $1.2M, Q2: $1.5M, Q3: $1.1M, Q4: $1.8M
Marketing spend: Q1: $200K, Q2: $300K, Q3: $150K, Q4: $400K
Customer acquisition: Q1: 500, Q2: 700, Q3: 400, Q4: 900
Analyze the relationship between marketing spend and revenue.
Identify any anomalies and suggest optimal marketing allocation.
Output: Key findings (3-5 bullet points) and recommendation.
Multi-Turn Conversation Design
Designing effective multi-turn conversations involves planning how context builds across messages and managing conversation state.
Conversation Flow Patterns
1. Funnel Pattern
Start broad, progressively narrow focus:
Funnel Conversation
**Turn 1 (User)**: I need help with my Python project
**Turn 1 (AI)**: I'd be happy to help! Could you tell me:
- What type of project (web app, data analysis, automation)?
- What specific challenge are you facing?
**Turn 2 (User)**: It's a web scraping project. I'm having trouble with dynamic content.
**Turn 2 (AI)**: Dynamic content usually requires JavaScript rendering.
- What website are you scraping?
- Have you tried Selenium or Playwright?
- What's your current approach?
**Turn 3 (User)**: Using BeautifulSoup. The content loads via AJAX.
**Turn 3 (AI)**: BeautifulSoup can't handle JavaScript. Here are your options:
[Specific solutions for AJAX content...]
2. Iterative Refinement Pattern
Build and improve across turns:
Iterative Refinement
**Turn 1**: Write a function to validate email addresses
**Turn 1 Response**: [Basic regex validation]
**Turn 2**: Add support for plus addressing (user+tag@domain.com)
**Turn 2 Response**: [Updated function]
**Turn 3**: Also check if the domain has valid MX records
**Turn 3 Response**: [Added DNS lookup]
**Turn 4**: Make it async and add rate limiting
**Turn 4 Response**: [Final production-ready version]
Context Management
Explicit Context Summarization
**Before a complex multi-turn task**:
I'll be helping you build a REST API. Let's establish our context:
**Project Summary** (I'll update this as we go):
- Framework: [To be determined]
- Database: [To be determined]
- Auth method: [To be determined]
- Endpoints: [To be listed]
After each major decision, I'll update this summary.
Please confirm or correct if I misunderstand anything.
What framework would you like to use?
Prompt Chains
Prompt chains connect multiple prompts where the output of one becomes the input to another. This enables complex workflows that exceed single-prompt capabilities.
Sequential Chain
Document Analysis Chain
def analyze_document(document):
# Step 1: Extract key information
extraction_prompt = f"""
Extract from this document:
- Main topics (list)
- Key entities (people, organizations, places)
- Dates and numbers mentioned
- Action items if any
Document:
{document}
Return as JSON.
"""
extracted = call_llm(extraction_prompt)
# Step 2: Analyze sentiment and tone
analysis_prompt = f"""
Based on this extracted information:
{extracted}
Analyze:
- Overall sentiment (positive/negative/neutral)
- Tone (formal/informal/urgent/routine)
- Key concerns or priorities
- Implicit assumptions
Return as JSON.
"""
analysis = call_llm(analysis_prompt)
# Step 3: Generate executive summary
summary_prompt = f"""
Create an executive summary combining:
Extracted Info: {extracted}
Analysis: {analysis}
Format:
## Executive Summary
[2-3 sentence overview]
## Key Points
- [Bullet points]
## Recommended Actions
1. [Numbered list]
"""
summary = call_llm(summary_prompt)
return summary
Branching Chain
Conditional Branching
def handle_customer_inquiry(inquiry):
# Step 1: Classify the inquiry
classification_prompt = f"""
Classify this customer inquiry into ONE category:
- BILLING: Payment, invoices, refunds
- TECHNICAL: Product issues, bugs, how-to
- SALES: Pricing, features, upgrades
- COMPLAINT: Negative feedback, escalations
Inquiry: {inquiry}
Return only the category name.
"""
category = call_llm(classification_prompt).strip()
# Step 2: Branch based on classification
if category == "BILLING":
response = handle_billing(inquiry)
elif category == "TECHNICAL":
response = handle_technical(inquiry)
elif category == "SALES":
response = handle_sales(inquiry)
else: # COMPLAINT
response = handle_complaint(inquiry)
# Step 3: Format final response
formatting_prompt = f"""
Polish this customer service response:
{response}
Ensure:
- Professional but warm tone
- Clear next steps
- Proper greeting and closing
"""
return call_llm(formatting_prompt)
Exercises
Exercise 1: Tree of Thought (Difficulty: Medium)
Task: Use Tree of Thought to plan a 3-day trip to Tokyo with these constraints:
- Budget: $500 total (excluding flights)
- Must visit: Shibuya, Asakusa, and one museum
- Dietary restriction: Vegetarian
Generate 3 different itinerary approaches, evaluate each, and select the best.
Exercise 2: Self-Consistency (Difficulty: Medium)
Task: Design a self-consistency prompt to determine:
"Is it ethical for companies to use AI to screen job applicants?"
Include 5 different ethical framework perspectives (utilitarian, deontological, etc.)
Exercise 3: ReAct Pattern (Difficulty: Medium)
Task: Create a ReAct prompt for a "Research Assistant" that can:
- Search for academic papers
- Summarize findings
- Compare methodologies
- Identify research gaps
Test it with the topic: "Latest advances in protein folding prediction"
Exercise 4: Reasoning Model Prompt (Difficulty: Advanced)
Task: Rewrite this prompt for o1/o3:
You are an expert software architect. Please think carefully
step by step about this problem. Here's an example of good
architecture... [long example]
Design a scalable microservices architecture for an e-commerce
platform handling 10M daily users.
Exercise 5: Prompt Chain (Difficulty: Advanced)
Task: Design a 4-step prompt chain for "Code Review Automation":
- Parse and understand code structure
- Identify potential issues (bugs, security, performance)
- Prioritize issues by severity
- Generate actionable review comments
Chapter Summary
Key Points
- Tree of Thought: Explore multiple reasoning paths, evaluate, and select the best for complex problems
- Self-Consistency: Generate multiple answers and select the most common for improved reliability
- ReAct: Alternate between reasoning and actions; foundational for AI agents
- Reasoning Models (o1/o3): Use concise prompts, avoid CoT instructions and few-shot examples
- Multi-turn Design: Plan conversation flows, manage context explicitly
- Prompt Chains: Connect multiple prompts for complex workflows that exceed single-prompt capabilities
Next Steps
In Chapter 4, we'll explore Function Calling and Tool Use:
- Function Calling fundamentals
- Tool definition and schema design
- MCP (Model Context Protocol) integration
- Error handling for tool use
- Multi-tool orchestration
References
- Yao, S., et al. (2023). Tree of Thoughts: Deliberate Problem Solving with Large Language Models
- Wang, X., et al. (2023). Self-Consistency Improves Chain of Thought Reasoning in Language Models
- Yao, S., et al. (2022). ReAct: Synergizing Reasoning and Acting in Language Models
- OpenAI. (2024). o1 System Card and Prompting Guide
- DeepSeek. (2025). DeepSeek-R1 Technical Report
Update History
- 2026-01-12: v2.0 Initial release with o1/o3 prompting strategies