flowchart TB
subgraph Experience["Experience layer"]
UI[Adaptive UI · chat, inline, ambient]
end
subgraph Capability["Capability layer"]
ORC[Orchestrator / agent harness]
EVAL[Eval + guardrail gate]
end
subgraph Knowledge["Knowledge & memory"]
RET[Retrieval]
MEM[Memory]
TOOLS[Tools / actions]
end
subgraph Foundation["Foundation"]
MODELS[Model gateway · multi-model]
OBS[Observability · traces, evals, cost]
GOV[Governance · authz, PII, audit]
end
UI --> ORC --> EVAL
ORC --> RET & MEM & TOOLS
ORC --> MODELS
EVAL -.-> OBS
ORC -.-> GOV
TL;DR
- “Adding AI” to a product usually means a chat box that calls one model. That’s a feature. An AI-native product is organized around capabilities the model makes possible, with the model in the critical path of the core workflow.
- The architecture is four layers: experience, capability, knowledge, and foundation. The interesting engineering is in the capability and foundation layers — orchestration, evals, and governance — not the prompt.
- You don’t build the autonomous version first. You earn it by climbing a maturity ladder — assistive → co-pilot → AI-native — as your evals and guardrails get good enough to trust.
The problem: the bolt-on plateau
Almost every enterprise “AI initiative” starts the same way. There’s a working product — a CRM, a ticketing system, an analytics dashboard — and someone adds an “Ask AI” button. It opens a chat panel, sends the user’s question plus a system prompt to a single model, and pastes the answer back. The demo gets applause. Six weeks later, usage is flat.
The reason is structural, not a prompt problem. The chatbot sits beside the product, not inside its workflows. It can describe what the user could do, but it can’t do it. It has no durable memory of the user’s context, no permissions-aware access to the data that matters, no ability to take the action the user actually wanted, and no way for the product’s UI to respond to what the model figured out. It’s a smart intern with their hands tied behind their back.
This is the bolt-on plateau, and you can’t prompt your way off it. Getting past it requires changing the architecture so the model is a first-class participant in the core loop — which is what “AI-native” actually means.
flowchart LR
subgraph BoltOn["AI-as-feature (bolt-on)"]
direction TB
A1[CRUD app] --> A2[New button: Ask AI]
A2 --> A3[One prompt to one model]
A3 --> A4[Paste text back into UI]
end
subgraph Native["AI-native"]
direction TB
B1[User intent] --> B2[Orchestrator decides]
B2 --> B3[Retrieve, act, reason, verify]
B3 --> B4[UI adapts to the result]
B4 --> B2
end Feature vs capability: the distinction that organizes everything
The mental shift is from shipping features to exposing capabilities.
A feature is a fixed path: a button that does a predefined thing. You enumerate them, build each one, and the product is the sum of its features. A capability is a general power — “summarize any document the user can see,” “draft a response grounded in our policies,” “take a multi-step action on the user’s behalf and verify it.” One capability spans what would have been dozens of features, and it composes with the others.
| Feature-oriented | Capability-oriented (AI-native) | |
|---|---|---|
| Unit of work | a specific button/screen | a general power the model wields |
| How it scales | linearly — build each one | combinatorially — capabilities compose |
| Where logic lives | hardcoded in app code | in orchestration + model reasoning |
| What the UI does | shows fixed flows | adapts to the model’s output |
| What breaks it | edge cases you didn’t build | context the model can’t see |
The reference architecture
Four layers, each with a clear job. The diagram at the top of this recipe shows them; here’s what each one is responsible for and the decisions that matter.
Experience layer
The UI stops being a fixed set of screens and becomes adaptive: chat where conversation fits, inline suggestions where the user is already working, and ambient actions that happen without being asked. The key design principle is provenance — every AI-produced result carries where it came from (sources, confidence, what action it took) so the user can trust and verify it. An AI-native UI that hides its reasoning erodes trust the first time it’s wrong.
Capability layer
The heart of the system: an orchestrator (the agent harness generalized) that turns intent into a sequence of retrieval, reasoning, and actions, and an eval + guardrail gate that checks every output before it reaches the user or the world. This layer is where “the model decides, the system disposes” lives. It’s also where most of your engineering effort will go, and rightly so.
Knowledge & memory layer
The model is only as good as the context you feed it. This layer provides retrieval over enterprise knowledge (permissions-aware — see the enterprise RAG approach), memory of the user and prior interactions, and tools/actions the model can invoke to actually do things. Without this layer you have a clever chatbot; with it you have a product.
Foundation layer
The unglamorous platform that everything depends on: a model gateway that lets you route across models (and swap them as the frontier moves), observability for traces, evals, and cost, and governance for authorization, PII handling, and audit. Enterprises buy or reject products on this layer. Skimp on it and you’ll fail security review no matter how good the demo was.
How a single request flows
sequenceDiagram
participant U as User
participant O as Orchestrator
participant K as Knowledge (retrieval/memory)
participant M as Model gateway
participant G as Eval + guardrail gate
U->>O: Intent (typed, clicked, or ambient)
O->>K: Gather context (permissions-aware)
K-->>O: Grounded context
O->>M: Reason / plan / act
M-->>O: Proposed output or action
O->>G: Check (policy, PII, confidence)
G-->>O: Approve / block / escalate
O-->>U: Result + provenance Notice what’s not in that flow: the user picking a feature. They express intent; the orchestrator composes the capabilities. Notice also that the guardrail gate is non-optional and sits between the model and the user — in an enterprise, an unchecked model output reaching a customer is an incident waiting to happen.
Don’t build the autonomous version first
The single most common way AI-native products fail is overreaching — shipping autonomy before the evals and guardrails exist to make it safe. The way through is a maturity ladder, where each rung is unlocked by measurement, not optimism.
flowchart LR
C[Crawl: assistive<br/>suggest, human approves] --> W[Walk: co-pilot<br/>drafts + acts with review]
W --> R[Run: AI-native<br/>autonomous within guardrails]
classDef now fill:#f5e7e3,stroke:#b1361e;
class R now; Crawl — assistive. The model suggests; the human always approves before anything happens. Low risk, immediate value, and — crucially — every approval/rejection is a labeled data point for your evals.
Walk — co-pilot. The model drafts and can take reversible actions, with the human reviewing. You graduate to this rung per capability, only once its eval scores clear a bar you set in advance.
Run — AI-native. The model acts autonomously within guardrails for capabilities where your evals show it’s more reliable than the human baseline, with monitoring and the ability to roll back.
Build vs buy, layer by layer
You won’t build all four layers from scratch, and you shouldn’t. A rough heuristic:
| Layer | Default | Rationale |
|---|---|---|
| Foundation (models) | Buy | Model gateways and hosted models are commodities; don’t build a model. |
| Foundation (governance/obs) | Buy or adopt | Mature tools exist; integrate rather than reinvent audit/PII. |
| Knowledge & memory | Mix | Buy vector infra; build the retrieval logic specific to your data and permissions. |
| Capability | Build | This is your product’s differentiation — the orchestration and evals are yours. |
| Experience | Build | The adaptive UX is where users feel the difference; own it. |
The pattern: buy the undifferentiated foundation, build the capability and experience layers that are actually your product.
Trade-offs and honest costs
AI-native is not free. The capability and foundation layers are real systems that need real engineering. You take on non-determinism (the same input can produce different outputs), latency (multi-step reasoning is slower than a CRUD read), cost (tokens add up), and a new failure mode (confidently wrong). For some products — especially ones where the workflow is genuinely fixed and well-understood — a few good bolt-on features are the right answer, and pretending otherwise is just fashion.
The decision rule: go AI-native when the core value of the product is something the model uniquely enables (synthesis, judgment, open-ended action), and stay feature-oriented when the model is a convenience on top of a fundamentally deterministic workflow.
Pitfalls
- Chatbot-as-strategy — bolting a chat panel on and calling it AI-native; the model has to be in the workflow, not beside it.
- Skipping the foundation — shipping capabilities before governance and observability exist; you’ll hit a wall at enterprise security review.
- All-or-nothing autonomy — flipping the whole product to autonomous instead of promoting capabilities up the maturity ladder individually.
- No provenance in the UI — hiding sources and reasoning, so users can’t verify and don’t trust the output.
- Model lock-in — wiring one provider’s SDK through your whole codebase instead of routing through a gateway; the frontier moves every few months.
- Ignoring the human-in-the-loop data — every approval and correction is training/eval signal; throwing it away is throwing away your moat.
How to adopt this
- Write down your product’s core value and ask: does it survive removing the model? That tells you if AI-native is even the right call.
- Sketch the four layers for your product; identify which you’ll build vs buy.
- Put a model gateway in front of every model call from day one (no direct SDK calls in app code).
- Stand up observability (traces + cost) before shipping any capability.
- Ship your first capability at the assistive rung; instrument every human approval as eval data.
- Define, in advance, the eval bar a capability must clear to be promoted to co-pilot, then to autonomous.
- Put a guardrail gate between every model output and the user; never let unchecked output reach a customer.
References
This recipe is the architectural frame for the rest of the site. The capability layer is the agent harness at scale; the promotion gates depend on the evaluation harness; and disciplined delivery of each capability uses spec-driven development. The reusable building blocks live in the cookbook.