Claude Cookbooks StudyPart 7 of 8
AI Agent

Claude Cookbooks (7): Extended Thinking

Introduction

"Extended Thinking" is one of those features I was skeptical about until I started using it on problems that actually deserve multi-step reasoning: tricky debugging, tradeoff-heavy decisions, multi-stage calculations, and anything where I want the model to slow down.

That said, I don't think it's a "turn it on everywhere" setting. It costs tokens, it can add latency, and (depending on your product) you may not want to show internal reasoning to end users. I treat it like a tool: useful when the task demands it, noisy when it doesn't.

Location: extended_thinking/


1. Basic Extended Thinking

Location: extended_thinking/extended_thinking.ipynb

Enabling Thinking

This is the minimal shape: set thinking with a token budget and then parse the mixed response blocks. The structure matters because you'll often want to log the thinking internally even if you only show the final answer to a user.

from anthropic import Anthropic

client = Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000  # Max tokens for thinking
    },
    messages=[{
        "role": "user",
        "content": "Solve this step by step: If a train travels at 60 mph for 2.5 hours, then at 80 mph for 1.5 hours, what is the average speed for the entire journey?"
    }]
)

# Response contains both thinking and text blocks
for block in response.content:
    if block.type == "thinking":
        print("=== THINKING ===")
        print(block.thinking)
        print()
    elif block.type == "text":
        print("=== ANSWER ===")
        print(block.text)

When to Use Extended Thinking

I've found this table to be basically right, with one extra note: if your prompt already has a tight structure and the task is routine, thinking often doesn't help much.

Use CaseBenefit
Math problemsStep-by-step calculation
Logic puzzlesSystematic reasoning
Code analysisThorough examination
Complex decisionsWeighing multiple factors
DebuggingMethodical investigation

2. Thinking with Tool Use

Location: extended_thinking/extended_thinking_with_tool_use.ipynb

This is where Extended Thinking gets more interesting for me: it improves tool sequencing, especially on "I need to compute X, then convert units, then compute Y" style tasks.

One thing I'm not fully sure about (and it probably depends on your setup) is how much of the improvement comes from thinking vs. simply having tools available. But in practice, enabling thinking tends to reduce the weird "tool thrash" where the model calls tools in an unhelpful order.

tools = [
    {
        "name": "calculator",
        "description": "Perform mathematical calculations",
        "input_schema": {
            "type": "object",
            "properties": {
                "expression": {"type": "string", "description": "Math expression to evaluate"}
            },
            "required": ["expression"]
        }
    },
    {
        "name": "unit_converter",
        "description": "Convert between units",
        "input_schema": {
            "type": "object",
            "properties": {
                "value": {"type": "number"},
                "from_unit": {"type": "string"},
                "to_unit": {"type": "string"}
            },
            "required": ["value", "from_unit", "to_unit"]
        }
    }
]

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 5000},
    tools=tools,
    messages=[{
        "role": "user",
        "content": "A car travels 150 kilometers in 2 hours. What is its speed in miles per hour?"
    }]
)

# Claude will think about:
# 1. What information is given
# 2. What needs to be calculated
# 3. Which tools to use and in what order
# Then execute the tools appropriately

Complex Multi-step Problems

When I'm deciding whether to spend a bigger thinking budget, I look for "nested" tasks like this—multiple steps, unit conversions, and money math at the end.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 8000},
    tools=tools,
    messages=[{
        "role": "user",
        "content": """A cylindrical water tank has a radius of 3 meters and height of 5 meters.
        
1. Calculate the volume in cubic meters
2. Convert to gallons
3. If water costs $0.003 per gallon, how much to fill the tank?"""
    }]
)

3. Structured Problem Solving

Analysis Framework

This helper function is a pattern I keep reusing: give the model a framework and force it to walk through it. Even without Extended Thinking, structured prompts help; with thinking enabled, the model tends to adhere to the structure more consistently.

def analyze_with_thinking(problem: str, framework: str = "general") -> dict:
    frameworks = {
        "general": """Analyze this problem using:
1. Understanding: What is being asked?
2. Given: What information do we have?
3. Approach: What method should we use?
4. Solution: Step-by-step work
5. Verification: Check the answer""",
        
        "debugging": """Debug this issue using:
1. Symptoms: What is the observed behavior?
2. Expected: What should happen?
3. Hypotheses: Possible causes
4. Investigation: Check each hypothesis
5. Root Cause: Identify the actual issue
6. Fix: Proposed solution""",
        
        "decision": """Analyze this decision using:
1. Options: What are the choices?
2. Criteria: What factors matter?
3. Analysis: Evaluate each option
4. Tradeoffs: Pros and cons
5. Recommendation: Best choice and why"""
    }
    
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=16000,
        thinking={"type": "enabled", "budget_tokens": 10000},
        messages=[{
            "role": "user",
            "content": f"{frameworks[framework]}\n\nProblem:\n{problem}"
        }]
    )
    
    thinking = ""
    answer = ""
    for block in response.content:
        if block.type == "thinking":
            thinking = block.thinking
        elif block.type == "text":
            answer = block.text
    
    return {"thinking": thinking, "answer": answer}

Code Review with Thinking

I like this pattern, but I'm cautious about it: it's easy to get false confidence from a "thorough-sounding" review. I treat it as a starting point, not a substitute for tests or careful reading.

def review_code(code: str) -> dict:
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=16000,
        thinking={"type": "enabled", "budget_tokens": 8000},
        messages=[{
            "role": "user",
            "content": f"""Review this code for:
1. Correctness - Does it work as intended?
2. Bugs - Any potential issues?
3. Performance - Can it be optimized?
4. Security - Any vulnerabilities?
5. Style - Does it follow best practices?

Code:
\`\`\`
{code}
\`\`\`

Provide specific line references and suggestions."""
        }]
    )
    
    result = {"thinking": "", "review": ""}
    for block in response.content:
        if block.type == "thinking":
            result["thinking"] = block.thinking
        elif block.type == "text":
            result["review"] = block.text
    
    return result

4. Thinking Budget Strategies

Adaptive Budgets

I've experimented with "auto budgets" like this and I still don't know if it's always worth the extra classifier call. But when costs matter, it can be a reasonable compromise: most prompts don't need 10k thinking tokens, and a small classifier pass can prevent waste.

def query_with_adaptive_thinking(question: str, complexity: str = "auto") -> str:
    budgets = {
        "simple": 2000,
        "medium": 5000,
        "complex": 10000,
        "very_complex": 15000
    }
    
    if complexity == "auto":
        # Estimate complexity based on question
        classifier = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=50,
            messages=[{
                "role": "user",
                "content": f"Rate the reasoning complexity (simple/medium/complex/very_complex): {question}"
            }]
        )
        complexity = classifier.content[0].text.strip().lower()
        if complexity not in budgets:
            complexity = "medium"
    
    budget = budgets.get(complexity, 5000)
    
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=16000,
        thinking={"type": "enabled", "budget_tokens": budget},
        messages=[{"role": "user", "content": question}]
    )
    
    return next(b.text for b in response.content if b.type == "text")

5. Thinking Visibility Control

When to Show Thinking

This is a subtle product decision. For internal tooling, I often want thinking for debugging. For end-user apps, I usually default to hiding it unless there's a clear value and a clear policy.

def query_with_thinking_control(question: str, show_thinking: bool = False) -> str | dict:
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=16000,
        thinking={"type": "enabled", "budget_tokens": 5000},
        messages=[{"role": "user", "content": question}]
    )
    
    thinking = ""
    answer = ""
    for block in response.content:
        if block.type == "thinking":
            thinking = block.thinking
        elif block.type == "text":
            answer = block.text
    
    if show_thinking:
        return {"thinking": thinking, "answer": answer}
    return answer

Summary

ConceptDescription
Basic ThinkingEnable reasoning transparency with thinking parameter
Budget ControlAllocate tokens for thinking process
Tool IntegrationImproved tool selection with reasoning
Structured AnalysisFramework-based problem solving
Adaptive BudgetsMatch complexity to thinking allocation

In the final post, I'll pull together a handful of "advanced" topics that feel like the real work of shipping: tuning, document generation, observability, tool evaluation, and even prompt patterns for UI output.