Most "enterprise AI" products start with a chat box and bolt on retrieval. AIP is interesting for a different reason: Palantir is productizing the control plane around tool use—permissions, approvals, audit trails, and (crucially) a constrained set of actions that map to real operations. Their docs for AIP Agent Studio are explicit that agents can use application commands as tools, and that command execution is approval-gated by default.
That puts AIP closer to "function calling with guardrails" than "RAG with vibes."
LLM as orchestrator: the only architecture that survives contact with production
If you want an AI system that changes state, you need to separate "model intent" from "system execution." OpenAI's function calling docs describe the canonical multi-step flow: model emits a tool call → your system executes it → you feed results back → model responds. OpenAI's safety guidance is blunt about human review in high-stakes domains.
Palantir's product choices line up with that model:
- Agents can use commands as tools (tool access is configured).
- Commands run in the user's application context (they can access application state/screen).
- The default UX asks the user to Approve/Reject before executing a command.
- There is an explicit toggle to let an agent auto-run commands (disabled by default).
- Agents using commands as tools have a retention window that expires after 24 hours of inactivity.
None of that is accidental. It's the scaffolding you need when "the AI" is doing more than summarizing text.
Tool calling beats RAG when the output is a mutation
- ✓Read-only information retrieval
- ✓Summarizing large documents
- ✓Q&A over knowledge bases
- →State-changing operations
- →Structured data operations
- →Multi-step workflows
| Problem | RAG-first approach | Tool/function calling approach |
|---|---|---|
| Hallucinated actions | model can describe actions that don't exist | model can only call declared tools, with schema constraints |
| Permission enforcement | often implemented ad hoc in app code | centralized: tool availability + user permissions + approvals |
| Auditability | "prompt logs" are not a compliance artifact | tool calls are discrete events you can log + review |
| Context window pressure | stuffing data chunks into prompts | fetch structured state via tools, not raw text |
Palantir also pushes the idea that AIP is built "on top of the Ontology and developer toolchain." That's the right dependency direction: you don't want agents inventing business objects; you want them operating over a typed model with defined actions.
An end-to-end AIP-style workflow (in real operational terms)
Take a common planning task: "Reschedule delayed shipments in the Northeast."
| Step | Who does it | What happens |
|---|---|---|
| 1 | user | asks for rescheduling |
| 2 | model | selects relevant tools/commands rather than drafting a prose plan |
| 3 | system | runs command(s) in app context to pull the current shipment set |
| 4 | model | proposes reschedule operations, parameterized per shipment |
| 5 | user/system | approval gate fires (default); user approves or rejects |
| 6 | system | executes the command; downstream hooks/webhooks can propagate changes where configured in the operational layer |
| 7 | system/model | audit trail + outcome summary |
This is the boring, correct architecture: intent → plan → gated execution → audit.
The cold-start paradox is real, and it's not "AI"
Tool calling scales with the richness of the action surface. If your organization hasn't defined operational actions—what can be changed, by whom, with what approvals—AI has nothing safe to do.
Palantir's own docs underline the dependency stack:
| Dependency | What it provides | What breaks if it's missing |
|---|---|---|
| Ontology (typed objects) | stable nouns and relationships | agents operate on messy tables/files |
| Actions/commands | safe verbs (state transitions) | agents can only "recommend," not execute |
| Security/governance | constraints and accountability | automation becomes a liability |
| Evals/observability | measurement and debugging | you can't improve reliably |
What I'd copy if I were building this elsewhere
I'd steal three ideas, because they're measurable and they work:
| Design choice | Why I like it |
|---|---|
| approval by default for state-changing tools | prevents "silent automation" disasters |
| strict, schema-driven tools | makes invalid states unrepresentable; improves reliability |
| justification checkpoints for sensitive actions | forces operators (human or agent) to leave a rationale |
If your "AI platform" can't do these, it's a demo environment.