Applied GenAI

Architecting an AI-native product

Most 'AI features' are a chatbot bolted onto a CRUD app. An AI-native product is built the other way around — capabilities first, with the model in the critical path and the UI, data, and org reshaped to match. Here's the reference architecture.

System designLLM platformAny backend

#enterprise#architecture#product#patterns

flowchart TB
  subgraph Experience["Experience layer"]
    UI[Adaptive UI · chat, inline, ambient]
  end
  subgraph Capability["Capability layer"]
    ORC[Orchestrator / agent harness]
    EVAL[Eval + guardrail gate]
  end
  subgraph Knowledge["Knowledge & memory"]
    RET[Retrieval]
    MEM[Memory]
    TOOLS[Tools / actions]
  end
  subgraph Foundation["Foundation"]
    MODELS[Model gateway · multi-model]
    OBS[Observability · traces, evals, cost]
    GOV[Governance · authz, PII, audit]
  end
  UI --> ORC --> EVAL
  ORC --> RET & MEM & TOOLS
  ORC --> MODELS
  EVAL -.-> OBS
  ORC -.-> GOV
A four-layer AI-native architecture: experience on top, a capability layer (orchestrator + eval gate), a knowledge/memory/tools layer, and a foundation of model gateway, observability, and governance.

TL;DR

The problem: the bolt-on plateau

Almost every enterprise “AI initiative” starts the same way. There’s a working product — a CRM, a ticketing system, an analytics dashboard — and someone adds an “Ask AI” button. It opens a chat panel, sends the user’s question plus a system prompt to a single model, and pastes the answer back. The demo gets applause. Six weeks later, usage is flat.

The reason is structural, not a prompt problem. The chatbot sits beside the product, not inside its workflows. It can describe what the user could do, but it can’t do it. It has no durable memory of the user’s context, no permissions-aware access to the data that matters, no ability to take the action the user actually wanted, and no way for the product’s UI to respond to what the model figured out. It’s a smart intern with their hands tied behind their back.

This is the bolt-on plateau, and you can’t prompt your way off it. Getting past it requires changing the architecture so the model is a first-class participant in the core loop — which is what “AI-native” actually means.

flowchart LR
    subgraph BoltOn["AI-as-feature (bolt-on)"]
      direction TB
      A1[CRUD app] --> A2[New button: Ask AI]
      A2 --> A3[One prompt to one model]
      A3 --> A4[Paste text back into UI]
    end
    subgraph Native["AI-native"]
      direction TB
      B1[User intent] --> B2[Orchestrator decides]
      B2 --> B3[Retrieve, act, reason, verify]
      B3 --> B4[UI adapts to the result]
      B4 --> B2
    end
Left: a bolt-on where a CRUD app gets an Ask-AI button calling one model. Right: AI-native, where an orchestrator turns intent into retrieve/act/reason/verify and the UI adapts.

Feature vs capability: the distinction that organizes everything

The mental shift is from shipping features to exposing capabilities.

A feature is a fixed path: a button that does a predefined thing. You enumerate them, build each one, and the product is the sum of its features. A capability is a general power — “summarize any document the user can see,” “draft a response grounded in our policies,” “take a multi-step action on the user’s behalf and verify it.” One capability spans what would have been dozens of features, and it composes with the others.

Feature-orientedCapability-oriented (AI-native)
Unit of worka specific button/screena general power the model wields
How it scaleslinearly — build each onecombinatorially — capabilities compose
Where logic liveshardcoded in app codein orchestration + model reasoning
What the UI doesshows fixed flowsadapts to the model’s output
What breaks itedge cases you didn’t buildcontext the model can’t see

The reference architecture

Four layers, each with a clear job. The diagram at the top of this recipe shows them; here’s what each one is responsible for and the decisions that matter.

Experience layer

The UI stops being a fixed set of screens and becomes adaptive: chat where conversation fits, inline suggestions where the user is already working, and ambient actions that happen without being asked. The key design principle is provenance — every AI-produced result carries where it came from (sources, confidence, what action it took) so the user can trust and verify it. An AI-native UI that hides its reasoning erodes trust the first time it’s wrong.

Capability layer

The heart of the system: an orchestrator (the agent harness generalized) that turns intent into a sequence of retrieval, reasoning, and actions, and an eval + guardrail gate that checks every output before it reaches the user or the world. This layer is where “the model decides, the system disposes” lives. It’s also where most of your engineering effort will go, and rightly so.

Knowledge & memory layer

The model is only as good as the context you feed it. This layer provides retrieval over enterprise knowledge (permissions-aware — see the enterprise RAG approach), memory of the user and prior interactions, and tools/actions the model can invoke to actually do things. Without this layer you have a clever chatbot; with it you have a product.

Foundation layer

The unglamorous platform that everything depends on: a model gateway that lets you route across models (and swap them as the frontier moves), observability for traces, evals, and cost, and governance for authorization, PII handling, and audit. Enterprises buy or reject products on this layer. Skimp on it and you’ll fail security review no matter how good the demo was.

How a single request flows

sequenceDiagram
    participant U as User
    participant O as Orchestrator
    participant K as Knowledge (retrieval/memory)
    participant M as Model gateway
    participant G as Eval + guardrail gate
    U->>O: Intent (typed, clicked, or ambient)
    O->>K: Gather context (permissions-aware)
    K-->>O: Grounded context
    O->>M: Reason / plan / act
    M-->>O: Proposed output or action
    O->>G: Check (policy, PII, confidence)
    G-->>O: Approve / block / escalate
    O-->>U: Result + provenance
A request flows from user intent through the orchestrator, which gathers permissions-aware context, calls the model, checks the output at a guardrail gate, and returns a result with provenance.

Notice what’s not in that flow: the user picking a feature. They express intent; the orchestrator composes the capabilities. Notice also that the guardrail gate is non-optional and sits between the model and the user — in an enterprise, an unchecked model output reaching a customer is an incident waiting to happen.

Don’t build the autonomous version first

The single most common way AI-native products fail is overreaching — shipping autonomy before the evals and guardrails exist to make it safe. The way through is a maturity ladder, where each rung is unlocked by measurement, not optimism.

flowchart LR
    C[Crawl: assistive<br/>suggest, human approves] --> W[Walk: co-pilot<br/>drafts + acts with review]
    W --> R[Run: AI-native<br/>autonomous within guardrails]
    classDef now fill:#f5e7e3,stroke:#b1361e;
    class R now;
A maturity ladder: crawl (assistive, human approves), walk (co-pilot, drafts and acts with review), run (AI-native, autonomous within guardrails).

Crawl — assistive. The model suggests; the human always approves before anything happens. Low risk, immediate value, and — crucially — every approval/rejection is a labeled data point for your evals.

Walk — co-pilot. The model drafts and can take reversible actions, with the human reviewing. You graduate to this rung per capability, only once its eval scores clear a bar you set in advance.

Run — AI-native. The model acts autonomously within guardrails for capabilities where your evals show it’s more reliable than the human baseline, with monitoring and the ability to roll back.

Build vs buy, layer by layer

You won’t build all four layers from scratch, and you shouldn’t. A rough heuristic:

LayerDefaultRationale
Foundation (models)BuyModel gateways and hosted models are commodities; don’t build a model.
Foundation (governance/obs)Buy or adoptMature tools exist; integrate rather than reinvent audit/PII.
Knowledge & memoryMixBuy vector infra; build the retrieval logic specific to your data and permissions.
CapabilityBuildThis is your product’s differentiation — the orchestration and evals are yours.
ExperienceBuildThe adaptive UX is where users feel the difference; own it.

The pattern: buy the undifferentiated foundation, build the capability and experience layers that are actually your product.

Trade-offs and honest costs

AI-native is not free. The capability and foundation layers are real systems that need real engineering. You take on non-determinism (the same input can produce different outputs), latency (multi-step reasoning is slower than a CRUD read), cost (tokens add up), and a new failure mode (confidently wrong). For some products — especially ones where the workflow is genuinely fixed and well-understood — a few good bolt-on features are the right answer, and pretending otherwise is just fashion.

The decision rule: go AI-native when the core value of the product is something the model uniquely enables (synthesis, judgment, open-ended action), and stay feature-oriented when the model is a convenience on top of a fundamentally deterministic workflow.

Pitfalls

How to adopt this

References

This recipe is the architectural frame for the rest of the site. The capability layer is the agent harness at scale; the promotion gates depend on the evaluation harness; and disciplined delivery of each capability uses spec-driven development. The reusable building blocks live in the cookbook.

MM
Mohit Mittal
Writes Applied GenAI — practical recipes for building with generative AI. Code lives in the cookbook.