NJI

Introduction

Tool use is the point where Claude stops feeling like a demo and starts feeling like a component you can ship. It's also the point where little implementation details matter: input schemas, retries, timeouts, and how you prevent the model from getting stuck in a tool loop.

The cookbooks cover the mechanics well. What I'm adding here is the stuff I've tripped over: keep tool surfaces small, treat tool results as untrusted, and always have an escape hatch.

1. Basic Tool Use

Tool Use Flow

💬

User Request

"What's the weather?"

🤔

Claude Reasons

Need to call weather API

🛠️

Tool Call

get_weather(location="SF")

⚡

Execute

API returns data

📊

Tool Result

"22°C, Sunny"

💭

Claude Responds

"It's sunny and 22°C"

Location: tool_use/

Defining Tools

from anthropic import Anthropic

client = Anthropic()

tools = [
    {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "input_schema": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City name, e.g., 'San Francisco, CA'"
                },
                "unit": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "Temperature unit"
                }
            },
            "required": ["location"]
        }
    }
]

I've had the best luck when schemas are strict and boring. If you allow "any string," you'll eventually get tool inputs that are half-instructions, half-data.

Processing Tool Calls

def process_tool_call(tool_name: str, tool_input: dict) -> str:
    if tool_name == "get_weather":
        # Actual implementation would call a weather API
        return f"Weather in {tool_input['location']}: 22°C, Sunny"
    return "Unknown tool"

def chat_with_tools(user_message: str) -> str:
    messages = [{"role": "user", "content": user_message}]
    
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1000,
        tools=tools,
        messages=messages
    )
    
    while response.stop_reason == "tool_use":
        tool_use_block = next(
            block for block in response.content 
            if block.type == "tool_use"
        )
        
        tool_result = process_tool_call(
            tool_use_block.name,
            tool_use_block.input
        )
        
        messages.append({"role": "assistant", "content": response.content})
        messages.append({
            "role": "user",
            "content": [{
                "type": "tool_result",
                "tool_use_id": tool_use_block.id,
                "content": tool_result
            }]
        })
        
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1000,
            tools=tools,
            messages=messages
        )
    
    return response.content[0].text

In production, I usually add:

a max tool-call count (to avoid infinite loops),
structured tool errors (so the model can recover),
timeouts and circuit breakers (because external systems fail).

2. Parallel Tool Calls

Location: tool_use/parallel_tools.ipynb

Parallel tool calls are great when the model needs multiple independent facts (weather + calendar + database query). The key detail is: you have to return results in a way that preserves which result maps to which tool call.

def process_parallel_tools(response) -> list[dict]:
    tool_results = []
    
    for block in response.content:
        if block.type == "tool_use":
            result = process_tool_call(block.name, block.input)
            tool_results.append({
                "type": "tool_result",
                "tool_use_id": block.id,
                "content": result
            })
    
    return tool_results

def chat_with_parallel_tools(user_message: str) -> str:
    messages = [{"role": "user", "content": user_message}]
    
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1000,
        tools=tools,
        messages=messages
    )
    
    while response.stop_reason == "tool_use":
        tool_results = process_parallel_tools(response)
        
        messages.append({"role": "assistant", "content": response.content})
        messages.append({"role": "user", "content": tool_results})
        
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1000,
            tools=tools,
            messages=messages
        )
    
    return response.content[0].text

I've found this works best when the tools are genuinely independent; otherwise you get "parallel" calls that really wanted sequencing (and the model doesn't know your dependencies unless you teach it).

3. Pydantic Integration

Location: tool_use/tool_use_with_pydantic.ipynb

I like Pydantic here because it keeps tool schemas honest and makes it harder to accidentally drift between "what the model sends" and "what your code expects."

from pydantic import BaseModel, Field
from typing import Literal

class WeatherQuery(BaseModel):
    location: str = Field(description="City name, e.g., 'San Francisco, CA'")
    unit: Literal["celsius", "fahrenheit"] = Field(default="celsius")

class SearchQuery(BaseModel):
    query: str = Field(description="Search query")
    max_results: int = Field(default=5, ge=1, le=20)

def pydantic_to_tool(model: type[BaseModel], name: str, description: str) -> dict:
    return {
        "name": name,
        "description": description,
        "input_schema": model.model_json_schema()
    }

tools = [
    pydantic_to_tool(WeatherQuery, "get_weather", "Get weather for a location"),
    pydantic_to_tool(SearchQuery, "search", "Search the web"),
]

4. Structured JSON Extraction

Location: tool_use/extracting_structured_json.ipynb

If you only take one technique from tool use, make it this: define a schema and force the model to fill it. I've found this is more reliable than "please output valid JSON" prompts, especially when the content is messy.

class ExtractedEntity(BaseModel):
    name: str
    entity_type: Literal["person", "organization", "location"]
    confidence: float = Field(ge=0, le=1)

class ExtractionResult(BaseModel):
    entities: list[ExtractedEntity]
    summary: str

extraction_tool = {
    "name": "record_extraction",
    "description": "Record the extracted entities from the text",
    "input_schema": ExtractionResult.model_json_schema()
}

def extract_entities(text: str) -> ExtractionResult:
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1000,
        tools=[extraction_tool],
        tool_choice={"type": "tool", "name": "record_extraction"},
        messages=[{
            "role": "user",
            "content": f"Extract all named entities from this text:\n\n{text}"
        }]
    )
    
    tool_input = next(
        block.input for block in response.content
        if block.type == "tool_use"
    )
    
    return ExtractionResult.model_validate(tool_input)

5. Memory Management

Location: tool_use/memory_cookbook.ipynb

I'm cautious about "memory" because it's as much a product/policy decision as it is a technical one. But as a pattern, giving the model a way to store and recall facts (with user-visible controls) can help a lot.

class MemoryStore:
    def __init__(self):
        self.memories = {}
    
    def save(self, key: str, value: str) -> str:
        self.memories[key] = value
        return f"Saved: {key}"
    
    def recall(self, key: str) -> str:
        return self.memories.get(key, "Not found")
    
    def list_keys(self) -> list[str]:
        return list(self.memories.keys())

memory = MemoryStore()

memory_tools = [
    {
        "name": "save_memory",
        "description": "Save information for later recall",
        "input_schema": {
            "type": "object",
            "properties": {
                "key": {"type": "string", "description": "Memory key"},
                "value": {"type": "string", "description": "Information to remember"}
            },
            "required": ["key", "value"]
        }
    },
    {
        "name": "recall_memory",
        "description": "Recall previously saved information",
        "input_schema": {
            "type": "object",
            "properties": {
                "key": {"type": "string", "description": "Memory key to recall"}
            },
            "required": ["key"]
        }
    }
]

In a real system, I don't store raw user text indefinitely. I usually store short, explicit "facts" with metadata (source, timestamp, user consent), and I make deletion easy.

6. Context Compaction

Context Compaction

Before: Long History

~48,000 tokens

Can you help me debug this function?

Of course! Please share the code and des...

Here's the function: def calculate_avera...

I see several issues with your function....

Thanks! I fixed the empty list issue. No...

That error suggests you might have mixed...

You're right! I had strings mixed in. Le...

Perfect! Here's a helper function to cle...

... 15 more exchanges ...

• Full conversation history

• All context preserved

• High token usage

COMPACT

After: Compacted

~12,000 tokens

📝 Previous Discussion Summary

User debugged a function with empty list handling and data type validation issues. Assistant provided solutions for error handling and data cleaning.

This is working much better now. Can you...

Absolutely! For performance optimization...

• Key points summarized

• Recent context preserved

• 75% token reduction

Location: tool_use/automatic-context-compaction.ipynb

Long-running chats hit token limits. The cookbook pattern here—summarize older turns, keep the recent window—is the most practical solution I've seen if you don't want to build a full memory/retrieval layer.

def count_tokens(messages: list[dict]) -> int:
    # Simplified estimation
    return sum(len(str(m)) // 4 for m in messages)

def compact_context(messages: list[dict], max_tokens: int = 50000) -> list[dict]:
    if count_tokens(messages) <= max_tokens:
        return messages
    
    # Keep system message and recent messages
    system_msg = messages[0] if messages[0].get("role") == "system" else None
    recent_count = 10
    recent = messages[-recent_count:]
    
    # Summarize older messages
    older = messages[1:-recent_count] if system_msg else messages[:-recent_count]
    
    if not older:
        return messages
    
    older_text = "\n".join([f"{m['role']}: {m['content']}" for m in older])
    
    summary = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=500,
        messages=[{
            "role": "user",
            "content": f"Summarize this conversation history concisely:\n\n{older_text}"
        }]
    ).content[0].text
    
    compacted = []
    if system_msg:
        compacted.append(system_msg)
    compacted.append({
        "role": "user",
        "content": f"[Previous conversation summary: {summary}]"
    })
    compacted.append({
        "role": "assistant",
        "content": "I understand. I have context from our previous discussion."
    })
    compacted.extend(recent)
    
    return compacted

Two practical notes from my side:

token estimation is usually worth doing with a real tokenizer (this is a rough heuristic),
summaries can quietly drop constraints ("don't do X"), so I often preserve critical rules as "pinned" system text.

Summary

Pattern	Use Case
Basic Tool Use	Extend Claude with external APIs
Parallel Tools	Execute multiple tools simultaneously
Pydantic Integration	Type-safe tool definitions
Structured Extraction	Force specific output formats
Memory Management	Persist information across turns
Context Compaction	Handle long conversations

Next I'll move into multimodal and vision. It's a different flavor of "tool use"—you're not calling an API, but you're still giving the model structured inputs and expecting structured outputs.