Introduction
Tool use is the point where Claude stops feeling like a demo and starts feeling like a component you can ship. It's also the point where little implementation details matter: input schemas, retries, timeouts, and how you prevent the model from getting stuck in a tool loop.
The cookbooks cover the mechanics well. What I'm adding here is the stuff I've tripped over: keep tool surfaces small, treat tool results as untrusted, and always have an escape hatch.
1. Basic Tool Use
Location: tool_use/
Defining Tools
from anthropic import Anthropic
client = Anthropic()
tools = [
{
"name": "get_weather",
"description": "Get current weather for a location",
"input_schema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name, e.g., 'San Francisco, CA'"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["location"]
}
}
]
I've had the best luck when schemas are strict and boring. If you allow "any string," you'll eventually get tool inputs that are half-instructions, half-data.
Processing Tool Calls
def process_tool_call(tool_name: str, tool_input: dict) -> str:
if tool_name == "get_weather":
# Actual implementation would call a weather API
return f"Weather in {tool_input['location']}: 22°C, Sunny"
return "Unknown tool"
def chat_with_tools(user_message: str) -> str:
messages = [{"role": "user", "content": user_message}]
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1000,
tools=tools,
messages=messages
)
while response.stop_reason == "tool_use":
tool_use_block = next(
block for block in response.content
if block.type == "tool_use"
)
tool_result = process_tool_call(
tool_use_block.name,
tool_use_block.input
)
messages.append({"role": "assistant", "content": response.content})
messages.append({
"role": "user",
"content": [{
"type": "tool_result",
"tool_use_id": tool_use_block.id,
"content": tool_result
}]
})
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1000,
tools=tools,
messages=messages
)
return response.content[0].text
In production, I usually add:
- a max tool-call count (to avoid infinite loops),
- structured tool errors (so the model can recover),
- timeouts and circuit breakers (because external systems fail).
2. Parallel Tool Calls
Location: tool_use/parallel_tools.ipynb
Parallel tool calls are great when the model needs multiple independent facts (weather + calendar + database query). The key detail is: you have to return results in a way that preserves which result maps to which tool call.
def process_parallel_tools(response) -> list[dict]:
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = process_tool_call(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result
})
return tool_results
def chat_with_parallel_tools(user_message: str) -> str:
messages = [{"role": "user", "content": user_message}]
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1000,
tools=tools,
messages=messages
)
while response.stop_reason == "tool_use":
tool_results = process_parallel_tools(response)
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1000,
tools=tools,
messages=messages
)
return response.content[0].text
I've found this works best when the tools are genuinely independent; otherwise you get "parallel" calls that really wanted sequencing (and the model doesn't know your dependencies unless you teach it).
3. Pydantic Integration
Location: tool_use/tool_use_with_pydantic.ipynb
I like Pydantic here because it keeps tool schemas honest and makes it harder to accidentally drift between "what the model sends" and "what your code expects."
from pydantic import BaseModel, Field
from typing import Literal
class WeatherQuery(BaseModel):
location: str = Field(description="City name, e.g., 'San Francisco, CA'")
unit: Literal["celsius", "fahrenheit"] = Field(default="celsius")
class SearchQuery(BaseModel):
query: str = Field(description="Search query")
max_results: int = Field(default=5, ge=1, le=20)
def pydantic_to_tool(model: type[BaseModel], name: str, description: str) -> dict:
return {
"name": name,
"description": description,
"input_schema": model.model_json_schema()
}
tools = [
pydantic_to_tool(WeatherQuery, "get_weather", "Get weather for a location"),
pydantic_to_tool(SearchQuery, "search", "Search the web"),
]
4. Structured JSON Extraction
Location: tool_use/extracting_structured_json.ipynb
If you only take one technique from tool use, make it this: define a schema and force the model to fill it. I've found this is more reliable than "please output valid JSON" prompts, especially when the content is messy.
class ExtractedEntity(BaseModel):
name: str
entity_type: Literal["person", "organization", "location"]
confidence: float = Field(ge=0, le=1)
class ExtractionResult(BaseModel):
entities: list[ExtractedEntity]
summary: str
extraction_tool = {
"name": "record_extraction",
"description": "Record the extracted entities from the text",
"input_schema": ExtractionResult.model_json_schema()
}
def extract_entities(text: str) -> ExtractionResult:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1000,
tools=[extraction_tool],
tool_choice={"type": "tool", "name": "record_extraction"},
messages=[{
"role": "user",
"content": f"Extract all named entities from this text:\n\n{text}"
}]
)
tool_input = next(
block.input for block in response.content
if block.type == "tool_use"
)
return ExtractionResult.model_validate(tool_input)
5. Memory Management
Location: tool_use/memory_cookbook.ipynb
I'm cautious about "memory" because it's as much a product/policy decision as it is a technical one. But as a pattern, giving the model a way to store and recall facts (with user-visible controls) can help a lot.
class MemoryStore:
def __init__(self):
self.memories = {}
def save(self, key: str, value: str) -> str:
self.memories[key] = value
return f"Saved: {key}"
def recall(self, key: str) -> str:
return self.memories.get(key, "Not found")
def list_keys(self) -> list[str]:
return list(self.memories.keys())
memory = MemoryStore()
memory_tools = [
{
"name": "save_memory",
"description": "Save information for later recall",
"input_schema": {
"type": "object",
"properties": {
"key": {"type": "string", "description": "Memory key"},
"value": {"type": "string", "description": "Information to remember"}
},
"required": ["key", "value"]
}
},
{
"name": "recall_memory",
"description": "Recall previously saved information",
"input_schema": {
"type": "object",
"properties": {
"key": {"type": "string", "description": "Memory key to recall"}
},
"required": ["key"]
}
}
]
In a real system, I don't store raw user text indefinitely. I usually store short, explicit "facts" with metadata (source, timestamp, user consent), and I make deletion easy.
6. Context Compaction
Before: Long History
After: Compacted
Location: tool_use/automatic-context-compaction.ipynb
Long-running chats hit token limits. The cookbook pattern here—summarize older turns, keep the recent window—is the most practical solution I've seen if you don't want to build a full memory/retrieval layer.
def count_tokens(messages: list[dict]) -> int:
# Simplified estimation
return sum(len(str(m)) // 4 for m in messages)
def compact_context(messages: list[dict], max_tokens: int = 50000) -> list[dict]:
if count_tokens(messages) <= max_tokens:
return messages
# Keep system message and recent messages
system_msg = messages[0] if messages[0].get("role") == "system" else None
recent_count = 10
recent = messages[-recent_count:]
# Summarize older messages
older = messages[1:-recent_count] if system_msg else messages[:-recent_count]
if not older:
return messages
older_text = "\n".join([f"{m['role']}: {m['content']}" for m in older])
summary = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=500,
messages=[{
"role": "user",
"content": f"Summarize this conversation history concisely:\n\n{older_text}"
}]
).content[0].text
compacted = []
if system_msg:
compacted.append(system_msg)
compacted.append({
"role": "user",
"content": f"[Previous conversation summary: {summary}]"
})
compacted.append({
"role": "assistant",
"content": "I understand. I have context from our previous discussion."
})
compacted.extend(recent)
return compacted
Two practical notes from my side:
- token estimation is usually worth doing with a real tokenizer (this is a rough heuristic),
- summaries can quietly drop constraints ("don't do X"), so I often preserve critical rules as "pinned" system text.
Summary
| Pattern | Use Case |
|---|---|
| Basic Tool Use | Extend Claude with external APIs |
| Parallel Tools | Execute multiple tools simultaneously |
| Pydantic Integration | Type-safe tool definitions |
| Structured Extraction | Force specific output formats |
| Memory Management | Persist information across turns |
| Context Compaction | Handle long conversations |
Next I'll move into multimodal and vision. It's a different flavor of "tool use"—you're not calling an API, but you're still giving the model structured inputs and expecting structured outputs.