Introduction
After spending time with the notebook patterns, I wanted to see what Anthropic considers "production-shaped" agent code. That's what claude_agent_sdk/ feels like: less theory, more scaffolding—tools, loops, memory, streaming, error paths.
What surprised me is how far you can get with the "boring" bits done well: a clean tool loop and a few guardrails beat a fancy prompt nine times out of ten.
1. The One-Liner Research Agent
Location: claude_agent_sdk/00_The_one_liner_research_agent.ipynb
I like this example because it's honest: an "agent" can be as simple as a model + a tool + a loop. If you're prototyping, this is often the right starting point.
from claude_agent_sdk import Agent, WebSearchTool
# Minimal agent with web search
agent = Agent(
model="claude-sonnet-4-20250514",
tools=[WebSearchTool()]
)
# One-liner usage
response = agent.query("What are the latest developments in fusion energy?")
print(response)
Understanding the Agent Loop
One thing the docs don't emphasize enough (in my opinion) is that most agent behavior is just message bookkeeping. Once you understand the loop—append assistant tool_use blocks, run tools, append tool_result blocks—you can reason about failures much more calmly.
class SimpleAgent:
def __init__(self, tools: list):
self.client = Anthropic()
self.tools = tools
self.tool_schemas = [t.schema for t in tools]
self.tool_map = {t.name: t for t in tools}
def query(self, question: str) -> str:
messages = [{"role": "user", "content": question}]
while True:
response = self.client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
tools=self.tool_schemas,
messages=messages
)
if response.stop_reason == "end_turn":
return next(b.text for b in response.content if hasattr(b, "text"))
# Process tool calls
messages.append({"role": "assistant", "content": response.content})
tool_results = []
for block in response.content:
if block.type == "tool_use":
tool = self.tool_map[block.name]
result = tool.execute(block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result
})
messages.append({"role": "user", "content": tool_results})
2. The Chief of Staff Agent
Location: claude_agent_sdk/01_The_chief_of_staff_agent.ipynb
This is where the SDK starts to feel like a "real product surface": memory, commands, and a way to coordinate subagents. I think it's useful to read this less as "build a magical assistant" and more as "how do I keep context and delegate without chaos?"
Memory Management
In practice, memory is a double-edged sword. It's powerful, but it also raises the stakes: you need to be clear about what should persist, what shouldn't, and how you'll audit it. I'd rather start with minimal memory and add it deliberately than "store everything" by default.
from claude_agent_sdk import Agent, MemoryStore
memory = MemoryStore()
agent = Agent(
model="claude-sonnet-4-20250514",
memory=memory,
system_prompt="""You are a Chief of Staff AI assistant.
You manage the user's priorities, schedule, and delegate work to specialized subagents.
Use your memory to track ongoing projects and commitments."""
)
# Memory persists across conversations
agent.query("Remember that Project Alpha deadline is January 15th")
# Later...
agent.query("What deadlines do I have coming up?")
Custom Slash Commands
I like slash commands because they make the agent feel less "mystical." You're giving users a deterministic lever: when they type /schedule, you run code. That's often the right UX.
from claude_agent_sdk import Command
class SummarizeEmailsCommand(Command):
name = "/emails"
description = "Summarize unread emails"
def execute(self, args: str) -> str:
# Integration with email API
emails = fetch_unread_emails()
summaries = [summarize(e) for e in emails[:10]]
return "\n".join(summaries)
class ScheduleMeetingCommand(Command):
name = "/schedule"
description = "Schedule a meeting. Usage: /schedule <title> with <attendees> at <time>"
def execute(self, args: str) -> str:
# Parse and create calendar event
parsed = parse_meeting_request(args)
event_id = create_calendar_event(parsed)
return f"Meeting scheduled: {event_id}"
agent = Agent(
model="claude-sonnet-4-20250514",
commands=[SummarizeEmailsCommand(), ScheduleMeetingCommand()]
)
Subagent Orchestration
When I've tried "multi-agent" setups, the failure mode is usually not intelligence—it's coordination. The best fix I know is to keep roles narrow and outputs structured, and to force the coordinator to synthesize explicitly.
from claude_agent_sdk import Agent, SubAgent
research_agent = SubAgent(
name="researcher",
model="claude-sonnet-4-20250514",
system_prompt="You are a research specialist. Provide thorough, cited research.",
tools=[WebSearchTool()]
)
writing_agent = SubAgent(
name="writer",
model="claude-sonnet-4-20250514",
system_prompt="You are a professional writer. Create polished, engaging content."
)
chief_of_staff = Agent(
model="claude-sonnet-4-20250514",
subagents=[research_agent, writing_agent],
system_prompt="""You are a Chief of Staff.
Delegate tasks to your subagents:
- @researcher: For information gathering and fact-checking
- @writer: For content creation and editing
Coordinate their work to complete complex tasks."""
)
# The chief of staff delegates automatically
response = chief_of_staff.query(
"Research the impact of AI on healthcare and write a 500-word executive summary"
)
3. The Observability Agent
Location: claude_agent_sdk/02_The_observability_agent.ipynb
I'm grouping this under "MCP integration" in my head: it's the moment the agent stops being a chat wrapper and starts being a workflow tool that can inspect real systems.
The common mistake here is accidentally giving the agent too much power. When you connect it to GitHub or Git, you probably want to constrain what it can do (read-only vs write, which repos, which branches, etc.)—even if the SDK makes it easy.
MCP Integration
from claude_agent_sdk import Agent
from claude_agent_sdk.mcp import MCPServer, GitHubMCP, GitMCP
# Connect to MCP servers
github_server = MCPServer(GitHubMCP(token=os.environ["GITHUB_TOKEN"]))
git_server = MCPServer(GitMCP())
agent = Agent(
model="claude-sonnet-4-20250514",
mcp_servers=[github_server, git_server],
system_prompt="""You are a DevOps assistant with access to GitHub and Git.
You can:
- View and manage GitHub issues and PRs
- Check repository status and history
- Analyze code changes
Help developers with their workflow."""
)
# Agent can now interact with GitHub
agent.query("What are the open PRs in our main repo?")
agent.query("Summarize the changes in the last 5 commits")
agent.query("Create an issue for the bug we discussed")
MCP Tool Access
# MCP exposes tools automatically
# GitHub MCP provides:
# - list_repos, get_repo, list_issues, create_issue, list_prs, etc.
# Git MCP provides:
# - git_status, git_log, git_diff, git_branch, etc.
# The agent uses these seamlessly
response = agent.query("""
Check if there are any uncommitted changes in the current repo.
If so, describe what files were modified.
""")
4. Building Custom Agents
The SDK examples are useful, but I mostly read them as a checklist: configuration, streaming, and error handling are the "adult" parts of agent systems. If you skip them, the agent might still work—until the day it doesn't.
Agent Configuration
from claude_agent_sdk import Agent, AgentConfig
config = AgentConfig(
model="claude-sonnet-4-20250514",
max_tokens=4096,
temperature=0.7,
max_tool_calls=20, # Limit tool call iterations
timeout=300, # 5 minute timeout
retry_on_error=True,
error_retry_count=3
)
agent = Agent(config=config, tools=[...])
Streaming Responses
Streaming isn't just a UX improvement; it's a debugging tool. You can see where the agent is spending time (and whether it's stuck in a tool loop).
from claude_agent_sdk import Agent
agent = Agent(model="claude-sonnet-4-20250514", tools=[...])
# Stream the response
for chunk in agent.stream("Explain quantum computing"):
if chunk.type == "text":
print(chunk.text, end="", flush=True)
elif chunk.type == "tool_use":
print(f"\n[Using tool: {chunk.tool_name}]")
elif chunk.type == "tool_result":
print(f"[Tool result received]\n")
Error Handling
If I had to pick one "must have," it's robust error handling. Tool calls fail. Networks flake. Rate limits happen. If you don't treat those as normal, your agent will feel unreliable even when the model is fine.
from claude_agent_sdk import Agent, AgentError, ToolError, RateLimitError
agent = Agent(model="claude-sonnet-4-20250514", tools=[...])
try:
response = agent.query("Complex task...")
except ToolError as e:
print(f"Tool failed: {e.tool_name} - {e.message}")
# Retry without the failing tool
response = agent.query("Complex task...", exclude_tools=[e.tool_name])
except RateLimitError as e:
print(f"Rate limited. Retry after {e.retry_after} seconds")
except AgentError as e:
print(f"Agent error: {e}")
Summary
| Agent Type | Complexity | Features |
|---|---|---|
| One-Liner | Simple | Basic query loop with tools |
| Chief of Staff | Intermediate | Memory, commands, subagents |
| Observability | Advanced | MCP integration, external systems |
Next, I'll zoom in on Extended Thinking—where I've found it genuinely helps (hard reasoning, debugging, tool sequencing) and where it's mostly unnecessary overhead.