NJI

Most "enterprise AI" products start with a chat box and bolt on retrieval. AIP is interesting for a different reason: Palantir is productizing the control plane around tool use—permissions, approvals, audit trails, and (crucially) a constrained set of actions that map to real operations. Their docs for AIP Agent Studio are explicit that agents can use application commands as tools, and that command execution is approval-gated by default.

That puts AIP closer to "function calling with guardrails" than "RAG with vibes."

LLM as orchestrator: the only architecture that survives contact with production

LLM as Orchestrator Architecture

🧠 LLM Orchestrator

🔧 Tools

🔗 Ontology

🛡️ Governance

👤 User

React Flow

🧠

LLM

Decides what to do next

🔧

Tools

Execute actions safely

🔗

Ontology

Provides typed context

🛡️

Governance

Enforces constraints

If you want an AI system that changes state, you need to separate "model intent" from "system execution." OpenAI's function calling docs describe the canonical multi-step flow: model emits a tool call → your system executes it → you feed results back → model responds. OpenAI's safety guidance is blunt about human review in high-stakes domains.

Palantir's product choices line up with that model:

Agents can use commands as tools (tool access is configured).
Commands run in the user's application context (they can access application state/screen).
The default UX asks the user to Approve/Reject before executing a command.
There is an explicit toggle to let an agent auto-run commands (disabled by default).
Agents using commands as tools have a retention window that expires after 24 hours of inactivity.

None of that is accidental. It's the scaffolding you need when "the AI" is doing more than summarizing text.

Tool calling beats RAG when the output is a mutation

RAG (Retrieval)

QueryRetrieveAnswer

Best for:

✓Read-only information retrieval
✓Summarizing large documents
✓Q&A over knowledge bases

Limitation: Can hallucinate actions that don't exist

Tool Calling (Function)

IntentTool CallMutation

Best for:

→State-changing operations
→Structured data operations
→Multi-step workflows

Advantage: Schema-constrained, auditable actions

When Output is a Mutation

Concern

RAG

Tool Calling

Hallucinated actions

❌ Model can describe non-existent actions

✅ Only declared tools available

Permission enforcement

⚠️ Ad hoc in app code

✅ Centralized tool permissions

Auditability

❌ Prompt logs aren't compliance

✅ Discrete, reviewable events

Context pressure

⚠️ Stuff chunks into prompts

✅ Fetch structured state via tools

Problem	RAG-first approach	Tool/function calling approach
Hallucinated actions	model can describe actions that don't exist	model can only call declared tools, with schema constraints
Permission enforcement	often implemented ad hoc in app code	centralized: tool availability + user permissions + approvals
Auditability	"prompt logs" are not a compliance artifact	tool calls are discrete events you can log + review
Context window pressure	stuffing data chunks into prompts	fetch structured state via tools, not raw text

Palantir also pushes the idea that AIP is built "on top of the Ontology and developer toolchain." That's the right dependency direction: you don't want agents inventing business objects; you want them operating over a typed model with defined actions.

An end-to-end AIP-style workflow (in real operational terms)

Take a common planning task: "Reschedule delayed shipments in the Northeast."

Step	Who does it	What happens
1	user	asks for rescheduling
2	model	selects relevant tools/commands rather than drafting a prose plan
3	system	runs command(s) in app context to pull the current shipment set
4	model	proposes reschedule operations, parameterized per shipment
5	user/system	approval gate fires (default); user approves or rejects
6	system	executes the command; downstream hooks/webhooks can propagate changes where configured in the operational layer
7	system/model	audit trail + outcome summary

This is the boring, correct architecture: intent → plan → gated execution → audit.

The cold-start paradox is real, and it's not "AI"

Tool calling scales with the richness of the action surface. If your organization hasn't defined operational actions—what can be changed, by whom, with what approvals—AI has nothing safe to do.

Palantir's own docs underline the dependency stack:

Dependency	What it provides	What breaks if it's missing
Ontology (typed objects)	stable nouns and relationships	agents operate on messy tables/files
Actions/commands	safe verbs (state transitions)	agents can only "recommend," not execute
Security/governance	constraints and accountability	automation becomes a liability
Evals/observability	measurement and debugging	you can't improve reliably

What I'd copy if I were building this elsewhere

I'd steal three ideas, because they're measurable and they work:

Design choice	Why I like it
approval by default for state-changing tools	prevents "silent automation" disasters
strict, schema-driven tools	makes invalid states unrepresentable; improves reliability
justification checkpoints for sensitive actions	forces operators (human or agent) to leave a rationale

If your "AI platform" can't do these, it's a demo environment.