Skip to main content

19 April 2026

AI Agent

AI Agent Architecture: Reference Patterns for Production Systems

Seven production-grade AI agent architecture patterns, when each works, when each breaks, and how to pick the right one for your use case. From an operator who ships them.

AI Agent Architecture: Reference Patterns for Production Systems, AI Agent, Agent Architecture analysis by Amjid Ali.

Every month a new blog post announces that “multi-agent architectures are the future”. Every month a new team ships one and discovers that what they actually needed was a carefully designed single agent. Every month an enterprise reference architecture gets posted, absorbed, then quietly abandoned when the team hits real production traffic.

This is the architecture reference I use when I scope agent work for clients. Seven patterns, what each does, when each earns its keep, and when each breaks. Grounded in shipping 55+ production agents across finance, HR, sales, operations, and voice, not a slide deck.

If you are still orienting, start with the primer and the platform comparison. This post assumes you are choosing a shape, not learning what an agent is.

The architecture question you should actually ask

Most teams start the architecture conversation with “should we use multi-agent?” That is the wrong first question. The right ones, in order:

  1. What decision are we asking software to make that we previously asked a human to make?
  2. Does the input shape, the decision, or the output shape change much between invocations?
  3. How many steps separate the trigger from the outcome, and do those steps require different specialist knowledge?
  4. What is the failure cost, and what is the recovery mechanism?

Those answers pick the pattern. Not the vendor brochure.

Pattern 1: Single-agent loop (the ReAct pattern)

Shape. One agent. One LLM. A tool list. A loop that reasons, picks a tool, calls it, observes the result, reasons again, until a stopping condition is met.

Code sketch (pseudocode):

loop:
  thought = llm(system_prompt + history + "what should I do next?")
  if thought.is_final:
    return thought.output
  tool_call = thought.tool_call
  result = execute(tool_call)
  history.append(thought, tool_call, result)

When it earns its keep. Single-task automations. Booking appointments, processing invoices, answering support questions, running a qualification call. 70-80% of production agents I ship are this shape.

When it breaks. Long-horizon tasks with many required sub-steps where the agent forgets earlier context or loses focus. Tasks requiring fundamentally different specialist knowledge at different steps.

What to watch for. Loop budgets (cap the iterations), retry policy, structured stopping conditions, memory compaction as history grows.

Frameworks that implement this cleanly. Claude Agent SDK, OpenAI Agent SDK, Pydantic AI, Vapi for voice.

Pattern 2: Plan-and-execute

Shape. A planner agent produces a written plan. An executor agent runs the plan step by step. The executor can replan if a step fails.

plan = planner.create_plan(goal)
for step in plan:
  result = executor.execute(step)
  if result.failed:
    plan = planner.replan(plan, step, result)

When it earns its keep. Multi-step tasks with predictable shape where writing down the plan improves reliability: research tasks, complex ticket handling, multi-system data migrations. Particularly useful when you want an auditable plan you can show to a human before execution.

When it breaks. Trivial tasks (the planning step adds latency and cost for no benefit). Highly dynamic tasks where the plan changes at every step (pattern 1 fits better).

What to watch for. Planner quality matters most. A bad plan is worse than no plan. Checkpoint plans with a human for high-stakes work.

Frameworks. LangGraph implements this elegantly. Claude Agent SDK supports the pattern idiomatically with its thinking mode.

Pattern 3: Orchestrator-worker

Shape. An orchestrator agent receives the goal, decomposes it, and dispatches sub-tasks to specialist worker agents. Workers execute, return results, and the orchestrator assembles the final output.

subtasks = orchestrator.decompose(goal)
results = parallel([worker[s.specialist].execute(s) for s in subtasks])
output = orchestrator.assemble(results)

When it earns its keep. When the sub-tasks are genuinely specialist and parallelisable. Research pipelines (searcher + summariser + fact-checker + writer). Customer-support triage where specialists for billing, technical, and account work exist. Complex document processing (extract + classify + route).

When it breaks. When the “specialists” are actually doing similar work and a single well-designed agent would outperform them. Multi-agent debugging is genuinely harder than single-agent debugging.

What to watch for. Worker scope creep. If one worker starts doing 80% of the real work, you have a single-agent pattern dressed up.

Frameworks. CrewAI optimises for this pattern. LangGraph and the Anthropic Agent SDK both support it without forcing it.

Pattern 4: RAG-integrated agent

Shape. An agent whose primary job involves retrieving context from a knowledge base and reasoning over it. The retrieval is itself a tool call (or a set of tool calls), and the agent decides when to retrieve, what to retrieve, and how to combine results.

loop:
  query = agent.formulate_query(user_goal, history)
  context = retriever.search(query)
  response = agent.reason(user_goal, context, history)
  if response.needs_more_context:
    continue
  return response

When it earns its keep. Question-answering against documentation, policies, code, or domain literature. Customer support over a product knowledge base. Legal or compliance agents reasoning over regulations.

When it breaks. Tasks where retrieval is incidental (a RAG pipeline is overhead). Tasks where the knowledge base is stale (fix the data, not the agent).

What to watch for. Chunk strategy, embedding model choice, reranker quality, citation discipline. Hybrid (keyword + semantic) retrieval usually beats pure vector on enterprise content.

Frameworks. LlamaIndex Agents, LangGraph with retrieval nodes, any framework with a vector-store integration.

Pattern 5: MCP-centric agent

Shape. The agent’s tool layer is entirely MCP servers. The agent is thin; the intelligence about how to reach systems lives in the MCP servers. Tools are portable across agents and across models.

agent = create_agent(model, mcp_servers=[crm, calendar, support, audit])
agent.run(goal)

When it earns its keep. Enterprise deployments where tools need to be shared across many agents. Multi-model environments where the same tools must work with Claude, GPT, and Gemini. Regulated environments where tool-layer audit and access control matter.

When it breaks. Prototype or one-off agents where the MCP overhead is not earned back.

What to watch for. MCP server governance is non-trivial at scale; I wrote a handbook on it. Don’t build MCP servers as afterthoughts.

Frameworks. Any agent SDK in 2026 supports MCP natively. Claude Agent SDK is the most opinionated about it.

Pattern 6: Hierarchical multi-agent (the team pattern)

Shape. A manager agent supervises several sub-agents, which may themselves supervise further sub-agents. Tasks cascade down; results cascade up.

manager.delegate(task)
  sub_manager_1.delegate(subtask_a)
    worker_1a.execute()
    worker_1b.execute()
  sub_manager_2.delegate(subtask_b)
    ...
manager.aggregate(results)

When it earns its keep. Rare, in my experience. Occasionally legitimate for complex research or long-horizon programme-management-style work (code migration at scale, content production pipelines).

When it breaks. Most of the time. Debugging a hierarchy of agents is genuinely painful. Latency compounds. Cost compounds. The orchestration-worker pattern (pattern 3) usually covers the same problem with less complexity.

What to watch for. “We need a team of agents” is often a hint that the problem is not yet well-decomposed. Resist.

Frameworks. CrewAI supports this explicitly. LangGraph supports it with sub-graphs.

Pattern 7: Event-driven continuous agent

Shape. The agent runs continuously, listening to a queue, a stream, an inbox, or a schedule. It triggers on events and operates in short bursts.

on event:
  context = build_context(event)
  action = agent.decide(event, context)
  execute(action)

When it earns its keep. IT operations (ticket auto-remediation), outbound voice campaigns, compliance monitoring, market-signal trading (with heavy human oversight), social-listening.

When it breaks. Tasks with ambiguous triggers (when should the agent act?) or tasks needing wide-context reasoning (single events rarely give enough context on their own).

What to watch for. Cost runaway. An always-on agent with a loose trigger can burn budget fast. Strict budgets, deduplication, and circuit breakers are not optional.

Frameworks. n8n excels here (scheduled triggers, webhook triggers, queue consumers). LangGraph for code-first implementations.

Picking by shape, at a glance

If your problem is…Start with
One specific automation, clearly scopedPattern 1 (single-agent loop)
Multi-step task with a useful written planPattern 2 (plan-and-execute)
Genuine specialist decompositionPattern 3 (orchestrator-worker)
Question answering over a knowledge basePattern 4 (RAG-integrated)
Enterprise deployment with shared toolsPattern 5 (MCP-centric), overlaid on 1-4
Complex programme with sub-programmesPattern 6 (hierarchical multi-agent), reluctantly
Always-on monitor or campaign runnerPattern 7 (event-driven)

Note that patterns 5 and 7 usually overlay one of 1-4 rather than replace them. An MCP-centric single-agent loop is still a single-agent loop. An event-driven plan-and-execute agent is still plan-and-execute.

Cross-cutting concerns that apply to every pattern

Regardless of which pattern you pick, these seven concerns decide whether the agent actually works in production.

1. Identity and permissions

Every agent acts on behalf of somebody. Get the identity model right or security will veto at go-live. Interactive agents: user identity flows through to tools (OAuth 2.0 with PKCE, scoped tokens). Non-interactive: service principals with least privilege.

2. Memory architecture

Short-term memory (current task history), long-term memory (learned facts, user preferences, past outcomes), shared memory (across agents in the same org). Decide what lives where. Most failure modes I see are “the agent forgot” or “the agent remembered the wrong thing”.

3. Observability

Every tool call, every argument, every result, every principal, every cost, every latency. OpenTelemetry. Shipped to your existing stack (Datadog, Grafana, Splunk, Azure Monitor). Build this on day one, not day 90.

4. Cost control

Per-run budget, per-principal budget, per-tenant budget, hard ceilings. Agents loop. Without guards, one bad prompt burns a quarter’s spend.

5. Escalation to humans

Every production agent knows when to stop and call a human. Frustrated caller, high-value decision, compliance boundary, out-of-scope request. Build the escalation paths before the happy path.

6. Evaluation

You cannot improve what you do not measure. Structured evals on realistic inputs, ideally with LLM-as-judge for subjective qualities, regularly run, versioned like code. The teams that ship great agents treat evals the way normal teams treat unit tests.

7. Governance

Audit trail, retention policy, PII handling, jurisdictional requirements (EU AI Act, Australian Privacy Act, sector-specific regs), model versioning. Write this down before the agent ships. Retrofit is expensive.

The architecture mistakes I see most often

Starting with multi-agent when single-agent would have worked. Multi-agent is not “more powerful”. It is more complex. Pick it when genuine specialisation exists, not because it sounds impressive.

Skipping the planner. A short explicit plan before a long sequence of tool calls usually dramatically improves reliability. Teams skip it because “it adds latency”. The latency you save is less than the retries you avoid.

Letting the agent see everything. Feeding the entire knowledge base, all documents, all tools, all history into every prompt. Context explodes, latency explodes, cost explodes, and the model gets dumber as context bloats. Retrieve what’s needed; drop the rest.

Treating guardrails as phase 2. Guardrails are part of the agent, not a wrapper. Bake them in from the first prototype or you ship something dangerous.

Building the platform before the agent. Teams spend six weeks building an “agent platform” before shipping any agent. Ship one agent. Learn. Generalise. Build the platform from the reality, not from the imagined reality.

Ignoring the cost envelope. “We’ll monitor costs” is not a cost control. Hard caps are. Circuit breakers are.

Coupling agents to models. If your agent only works on one model, you have a platform risk you did not need to take. MCP and clean tool contracts solve this.

Australian-specific architectural considerations

  • Data sovereignty. If your client contract says AU-resident, all seven patterns still work. Host the agent, the tools, the memory, and the logs in AU regions of your chosen cloud.
  • Privacy Act and consent. Call recordings, chat transcripts, and tool-call audit must respect consent notices given at the start. Bake this into the agent’s opening behaviour, not a privacy PDF on a separate page.
  • Sector regs. Financial services, healthcare, and critical infrastructure have sector-specific obligations. Your architecture needs to accommodate them without retrofit.

Reference stacks for common use cases

A few concrete combinations that I have shipped and would ship again.

Enterprise internal automation agent. Pattern 1 (single-agent loop) + Pattern 5 (MCP-centric). Claude Sonnet as the reasoner. MCP servers for each internal system (ERP, CRM, ticketing). n8n as the operator-facing control plane. Audit to Splunk. Deploy to AU-resident AWS.

Customer-facing support agent over a knowledge base. Pattern 4 (RAG-integrated) + Pattern 1. Claude Sonnet as reasoner, Haiku for cheap sub-tasks. LlamaIndex for retrieval. Tight scope (one product, one policy domain). Escalation to human chat in real time.

Voice reception agent for Australian business. Pattern 1 (single-agent loop) + Pattern 5 (MCP for CRM/calendar tools) + Pattern 7 (event-driven, trigger is an inbound call). Vapi + Twilio AU + Claude or GPT for reasoning. Escalation to Twilio warm transfer. Full transcripts + cost per call tracked.

Outbound lead qualification campaign. Pattern 7 (event-driven on a CRM list) + Pattern 1. Vapi + Twilio outbound. Claude for reasoning + tool calls. Qualification output (BANT-style) written back to CRM. Human handoff for qualified leads above a threshold.

Complex research agent. Pattern 2 (plan-and-execute) + Pattern 3 (orchestrator-worker) + Pattern 4 (RAG). Claude Opus for the plan, Sonnet for execution, Haiku for cheap sub-tasks. Search grounding on Gemini for current web data. Results assembled by the orchestrator.

What to do next

If you are scoping an agent for the first time: resist the multi-agent drum. Pattern 1 or Pattern 4 handles most first agents cleanly. Pick the simplest pattern that serves the goal. You can always evolve.

If you have an agent in production already: audit it against the seven cross-cutting concerns. Most of the real issues live in 1-7 above, not in the model choice or the framework.

If you are designing an agent platform for an org: pick MCP as the tool layer on day one, pick a single primary framework, pick a single primary observability stack, and pick a governance document that covers all seven cross-cutting concerns. The rest follows.

If you want help: I run agent architecture engagements that cover pattern selection, framework and model choice, governance, and the first agent build. Two-to-four-week discovery followed by a production build.

The best agent architecture is the simplest one that solves the specific problem you actually have. Everything above is how to figure out what simple looks like for your shape.


Further reading: What is an AI agent?, Best AI agent platforms and frameworks, The MCP Server Handbook for Enterprise, How we deployed 55 AI agents in production.

Frequently asked.

What is an AI agent architecture pattern?
An AI agent architecture pattern is a reusable design for how an agent receives inputs, makes decisions, calls tools, and produces outputs. Seven common patterns recur in production: single-agent tool-use, ReAct loop, router + specialists, hierarchical planner/executor, multi-agent debate, pipeline of agents, and human-in-the-loop review. Each has clear use cases and failure modes.
Single agent vs multi-agent, which is better?
Single agent wins for 80% of enterprise use cases. Multi-agent adds coordination overhead and more failure modes; it is worth it only when you have genuine role specialisation and can measure that the split improves quality or latency. Most 'multi-agent' systems we audit could be simplified to a well-designed single agent with better tools.
How do you choose the right agent architecture for an enterprise use case?
Start with the data flow: what triggers the agent, what does it decide, what does it need to call, what does it produce? If decisions are linear, use a pipeline. If branching, use a router. If stateful and long-running, use a planner/executor. The pattern should fall out of the problem, not be chosen first.

Picked by shared topic. The through-line is agentic AI shipped into production, not the pilot theatre.

Read another.