NJI

Introduction

Agent SDK Examples

🔍

Research Agent

Web search + knowledge synthesis

Automatically searches, analyzes, and summarizes information from multiple sources

Web search capability

Source citation

Multi-step research

Summary generation

📋

Chief of Staff

Memory + delegation + coordination

Manages priorities, schedules, and delegates work to specialized subagents

Persistent memory

Task delegation

Subagent coordination

Priority management

👁️

Observability Agent

System monitoring + MCP integration

Monitors systems, analyzes logs, and provides insights through MCP connections

MCP server integration

Log analysis

System monitoring

Alert generation

All examples include streaming, error handling, and production-ready patterns

After spending time with the notebook patterns, I wanted to see what Anthropic considers "production-shaped" agent code. That's what claude_agent_sdk/ feels like: less theory, more scaffolding—tools, loops, memory, streaming, error paths.

What surprised me is how far you can get with the "boring" bits done well: a clean tool loop and a few guardrails beat a fancy prompt nine times out of ten.

1. The One-Liner Research Agent

Location: claude_agent_sdk/00_The_one_liner_research_agent.ipynb

I like this example because it's honest: an "agent" can be as simple as a model + a tool + a loop. If you're prototyping, this is often the right starting point.

from claude_agent_sdk import Agent, WebSearchTool

# Minimal agent with web search
agent = Agent(
    model="claude-sonnet-4-20250514",
    tools=[WebSearchTool()]
)

# One-liner usage
response = agent.query("What are the latest developments in fusion energy?")
print(response)

Understanding the Agent Loop

One thing the docs don't emphasize enough (in my opinion) is that most agent behavior is just message bookkeeping. Once you understand the loop—append assistant tool_use blocks, run tools, append tool_result blocks—you can reason about failures much more calmly.

class SimpleAgent:
    def __init__(self, tools: list):
        self.client = Anthropic()
        self.tools = tools
        self.tool_schemas = [t.schema for t in tools]
        self.tool_map = {t.name: t for t in tools}
    
    def query(self, question: str) -> str:
        messages = [{"role": "user", "content": question}]
        
        while True:
            response = self.client.messages.create(
                model="claude-sonnet-4-20250514",
                max_tokens=4096,
                tools=self.tool_schemas,
                messages=messages
            )
            
            if response.stop_reason == "end_turn":
                return next(b.text for b in response.content if hasattr(b, "text"))
            
            # Process tool calls
            messages.append({"role": "assistant", "content": response.content})
            tool_results = []
            
            for block in response.content:
                if block.type == "tool_use":
                    tool = self.tool_map[block.name]
                    result = tool.execute(block.input)
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": result
                    })
            
            messages.append({"role": "user", "content": tool_results})

2. The Chief of Staff Agent

Location: claude_agent_sdk/01_The_chief_of_staff_agent.ipynb

This is where the SDK starts to feel like a "real product surface": memory, commands, and a way to coordinate subagents. I think it's useful to read this less as "build a magical assistant" and more as "how do I keep context and delegate without chaos?"

Memory Management

In practice, memory is a double-edged sword. It's powerful, but it also raises the stakes: you need to be clear about what should persist, what shouldn't, and how you'll audit it. I'd rather start with minimal memory and add it deliberately than "store everything" by default.

from claude_agent_sdk import Agent, MemoryStore

memory = MemoryStore()

agent = Agent(
    model="claude-sonnet-4-20250514",
    memory=memory,
    system_prompt="""You are a Chief of Staff AI assistant.
    
You manage the user's priorities, schedule, and delegate work to specialized subagents.
Use your memory to track ongoing projects and commitments."""
)

# Memory persists across conversations
agent.query("Remember that Project Alpha deadline is January 15th")
# Later...
agent.query("What deadlines do I have coming up?")

Custom Slash Commands

I like slash commands because they make the agent feel less "mystical." You're giving users a deterministic lever: when they type /schedule, you run code. That's often the right UX.

from claude_agent_sdk import Command

class SummarizeEmailsCommand(Command):
    name = "/emails"
    description = "Summarize unread emails"
    
    def execute(self, args: str) -> str:
        # Integration with email API
        emails = fetch_unread_emails()
        summaries = [summarize(e) for e in emails[:10]]
        return "\n".join(summaries)

class ScheduleMeetingCommand(Command):
    name = "/schedule"
    description = "Schedule a meeting. Usage: /schedule <title> with <attendees> at <time>"
    
    def execute(self, args: str) -> str:
        # Parse and create calendar event
        parsed = parse_meeting_request(args)
        event_id = create_calendar_event(parsed)
        return f"Meeting scheduled: {event_id}"

agent = Agent(
    model="claude-sonnet-4-20250514",
    commands=[SummarizeEmailsCommand(), ScheduleMeetingCommand()]
)

Subagent Orchestration

When I've tried "multi-agent" setups, the failure mode is usually not intelligence—it's coordination. The best fix I know is to keep roles narrow and outputs structured, and to force the coordinator to synthesize explicitly.

from claude_agent_sdk import Agent, SubAgent

research_agent = SubAgent(
    name="researcher",
    model="claude-sonnet-4-20250514",
    system_prompt="You are a research specialist. Provide thorough, cited research.",
    tools=[WebSearchTool()]
)

writing_agent = SubAgent(
    name="writer",
    model="claude-sonnet-4-20250514",
    system_prompt="You are a professional writer. Create polished, engaging content."
)

chief_of_staff = Agent(
    model="claude-sonnet-4-20250514",
    subagents=[research_agent, writing_agent],
    system_prompt="""You are a Chief of Staff.

Delegate tasks to your subagents:
- @researcher: For information gathering and fact-checking
- @writer: For content creation and editing

Coordinate their work to complete complex tasks."""
)

# The chief of staff delegates automatically
response = chief_of_staff.query(
    "Research the impact of AI on healthcare and write a 500-word executive summary"
)

3. The Observability Agent

Location: claude_agent_sdk/02_The_observability_agent.ipynb

I'm grouping this under "MCP integration" in my head: it's the moment the agent stops being a chat wrapper and starts being a workflow tool that can inspect real systems.

The common mistake here is accidentally giving the agent too much power. When you connect it to GitHub or Git, you probably want to constrain what it can do (read-only vs write, which repos, which branches, etc.)—even if the SDK makes it easy.

MCP Integration

from claude_agent_sdk import Agent
from claude_agent_sdk.mcp import MCPServer, GitHubMCP, GitMCP

# Connect to MCP servers
github_server = MCPServer(GitHubMCP(token=os.environ["GITHUB_TOKEN"]))
git_server = MCPServer(GitMCP())

agent = Agent(
    model="claude-sonnet-4-20250514",
    mcp_servers=[github_server, git_server],
    system_prompt="""You are a DevOps assistant with access to GitHub and Git.

You can:
- View and manage GitHub issues and PRs
- Check repository status and history
- Analyze code changes

Help developers with their workflow."""
)

# Agent can now interact with GitHub
agent.query("What are the open PRs in our main repo?")
agent.query("Summarize the changes in the last 5 commits")
agent.query("Create an issue for the bug we discussed")

MCP Tool Access

# MCP exposes tools automatically
# GitHub MCP provides:
# - list_repos, get_repo, list_issues, create_issue, list_prs, etc.

# Git MCP provides:
# - git_status, git_log, git_diff, git_branch, etc.

# The agent uses these seamlessly
response = agent.query("""
Check if there are any uncommitted changes in the current repo.
If so, describe what files were modified.
""")

4. Building Custom Agents

The SDK examples are useful, but I mostly read them as a checklist: configuration, streaming, and error handling are the "adult" parts of agent systems. If you skip them, the agent might still work—until the day it doesn't.

Agent Configuration

from claude_agent_sdk import Agent, AgentConfig

config = AgentConfig(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    temperature=0.7,
    max_tool_calls=20,      # Limit tool call iterations
    timeout=300,            # 5 minute timeout
    retry_on_error=True,
    error_retry_count=3
)

agent = Agent(config=config, tools=[...])

Streaming Responses

Streaming isn't just a UX improvement; it's a debugging tool. You can see where the agent is spending time (and whether it's stuck in a tool loop).

from claude_agent_sdk import Agent

agent = Agent(model="claude-sonnet-4-20250514", tools=[...])

# Stream the response
for chunk in agent.stream("Explain quantum computing"):
    if chunk.type == "text":
        print(chunk.text, end="", flush=True)
    elif chunk.type == "tool_use":
        print(f"\n[Using tool: {chunk.tool_name}]")
    elif chunk.type == "tool_result":
        print(f"[Tool result received]\n")

Error Handling

If I had to pick one "must have," it's robust error handling. Tool calls fail. Networks flake. Rate limits happen. If you don't treat those as normal, your agent will feel unreliable even when the model is fine.

from claude_agent_sdk import Agent, AgentError, ToolError, RateLimitError

agent = Agent(model="claude-sonnet-4-20250514", tools=[...])

try:
    response = agent.query("Complex task...")
except ToolError as e:
    print(f"Tool failed: {e.tool_name} - {e.message}")
    # Retry without the failing tool
    response = agent.query("Complex task...", exclude_tools=[e.tool_name])
except RateLimitError as e:
    print(f"Rate limited. Retry after {e.retry_after} seconds")
except AgentError as e:
    print(f"Agent error: {e}")

Summary

Agent Type	Complexity	Features
One-Liner	Simple	Basic query loop with tools
Chief of Staff	Intermediate	Memory, commands, subagents
Observability	Advanced	MCP integration, external systems

Next, I'll zoom in on Extended Thinking—where I've found it genuinely helps (hard reasoning, debugging, tool sequencing) and where it's mostly unnecessary overhead.