OpenClaw is deliberately model-agnostic. The gateway, agents, skills, and channels all work the same regardless of which LLM sits underneath. That is one of the reasons to pick OpenClaw in the first place: you get to choose the model, and you get to swap it when a better one ships.
This is the practical guide to wiring up the four model families I see most often in production: Claude (Anthropic), GPT (OpenAI), Gemini (Google), and Ollama (local, open-source models). How to configure each, when to pick which, how to combine them sensibly, and what the cost and latency tradeoffs actually look like.
Canonical configuration details live at docs.openclaw.ai; this piece gives the operator context around those details. If you are still figuring out what OpenClaw is, start with the plain operator’s guide. If you are installing for the first time, the install guide goes first.
Why model flexibility matters
Three reasons, in order of how often they come up.
Model releases move the ceiling. Claude Sonnet gets better at tool calling; GPT-5 drops the price of a class of work; Gemini 2.x lands with a 2M-token context that finally handles your document pile. If your agent stack is welded to one provider, every release is either a happy day or a blocker. Model-agnostic means every release is just a knob to turn.
Cost and latency vary per task. A cheap small model for classification and routing, a premium model for reasoning, a local Ollama model for anything involving sensitive data that cannot leave your tenant. Single-provider lock-in makes this calculus harder.
Compliance and sovereignty. Some workloads must run on-prem or in a specific region. Ollama on your own hardware unlocks them. SaaS-only agent platforms cannot.
OpenClaw treats the model as a configuration value, not an architectural commitment. That is the point of this post.
The configuration pattern, at a glance
Conceptually, every model provider plugs into OpenClaw the same way:
- Pick the provider.
- Provide credentials (API key for cloud; endpoint URL for Ollama).
- Pick a specific model (e.g.
claude-sonnet-4.6,gpt-5,gemini-2.x-pro,llama3.1:70b). - Set sensible defaults (temperature, max tokens, timeout).
- Restart the gateway.
The canonical configuration file shapes are documented at docs.openclaw.ai. Walk this guide to understand what to put in them and why; walk the docs to see the exact field names for your installed version.
Claude (Anthropic)
My default recommendation for most OpenClaw deployments. The reasons are covered in depth in Claude vs ChatGPT vs Gemini agents, but the short version is: best-in-class tool-calling reliability, long coherent context, MCP-native, and safety defaults that pass enterprise review.
What you need
- An Anthropic API key (or Claude via AWS Bedrock, or Claude via GCP Vertex AI if you want in-cloud hosting).
- A chosen model. In 2026 the standard picks are:
- Sonnet for most agent work (reasoning, tool use, general-purpose).
- Haiku for cheap high-volume tasks (classification, routing, simple transforms).
- Opus for the hardest reasoning where cost is a secondary concern.
The configuration
In your OpenClaw config or environment:
ANTHROPIC_API_KEY=sk-ant-...
Point the agent at a specific Claude model in the OpenClaw agent definition. Sonnet is a sensible default for a first agent; move to Haiku for high-volume sub-tasks once you have evals in place.
When Claude wins inside OpenClaw
- Agents with many tools where tool-calling reliability matters.
- Long-context agents (reading big documents or code).
- Agents that need to reason and plan, not just reply.
- Agents touching regulated data where refusal behaviour matters.
- Voice agents where the reasoning step is on the critical path of latency.
Claude gotchas
- Rate limits on new API keys. Anthropic ramps limits as you demonstrate usage. A brand new key will not run a high-volume production agent on day one.
- Region choice. Bedrock in Sydney (
ap-southeast-2) is the Australian-resident option. Native Anthropic API is US/EU hosted. - Model deprecation. Pin to specific model versions in config so upgrades are intentional, not surprise.
GPT (OpenAI)
The most-deployed model family globally. Broad ecosystem, hosted tools, Agent Builder UI if you want to bypass OpenClaw for parts of the stack. Inside OpenClaw, GPT is a reliable general-purpose reasoner.
What you need
- An OpenAI API key (or GPT via Azure OpenAI in Australia East for AU residency).
- A chosen model:
- GPT-5 or equivalent current flagship for reasoning agents.
- GPT-5 mini or equivalent small model for cheap high-volume tasks.
- o-series for deep reasoning tasks where cost is not an issue.
The configuration
OPENAI_API_KEY=sk-...
Point the agent at the model in the OpenClaw agent definition. Same pattern as Claude: a flagship for reasoning, a small model for high-volume.
When GPT wins inside OpenClaw
- Agents whose team already lives on OpenAI and has credits on file.
- Agents that benefit from OpenAI’s built-in tools via function calling into OpenAI-hosted code interpreter.
- Workloads where fine-tuning is valuable (OpenAI’s fine-tuning is mature).
- Voice agents using OpenAI’s realtime API (though usually I prefer Vapi on top of any model, see the voice shootout).
- Azure-hosted enterprise deployments.
GPT gotchas
- Cost can spike on reasoning-heavy tasks (o-series pricing is higher). Measure per-agent cost in production.
- Assistants API vs Responses API vs direct chat completions. OpenAI’s API surface has evolved. Use the one OpenClaw’s current version supports; do not hand-craft calls against a deprecated surface.
- Australia East availability. Azure OpenAI in Australia East sometimes lags global model releases by weeks. Plan upgrades around that delay.
Gemini (Google)
Google’s agent and multimodal story has tightened significantly through 2025 and 2026. Inside OpenClaw, Gemini shines on multimodal and long-context workloads.
What you need
- A Google AI Studio API key (or Gemini via Vertex AI in
australia-southeast1for AU residency). - A chosen model:
- Gemini 2.x Pro for reasoning agents that benefit from multimodal input.
- Gemini 2.x Flash for cheap high-volume tasks.
The configuration
GOOGLE_API_KEY=...
Or, if you are on Vertex AI, use the Vertex-specific configuration with a service account key. docs.openclaw.ai has the exact field.
When Gemini wins inside OpenClaw
- Agents processing PDFs, images, or video. Gemini is ahead of the others on multimodal.
- Agents reaching into Google Workspace (Docs, Sheets, Gmail, Drive). Native integration matters.
- Long-context agents pushing beyond 1M tokens (Gemini’s context is the longest).
- GCP-hosted deployments where Vertex AI is the path of least friction.
- Agents that benefit from Google Search grounding for current web information.
Gemini gotchas
- Safety filter surprises. Gemini’s safety filters can block legitimate business content (legal analysis, medical context) without obvious warning. Tune the filter thresholds at configuration time.
- Tool-calling reliability is improving fast but still trails Claude on complex tool graphs. Test, measure.
- Region availability. Vertex in
australia-southeast1is the AU-resident path. Some newer Gemini model versions land there behind US/EU; pin your config to a version available in your region.
Ollama (local, open-source models)
The self-hosted option. Run Llama, Qwen, Mistral, DeepSeek, or any Ollama-supported model on your own hardware, no cloud API calls, no per-token billing. Privacy and sovereignty win here; raw capability usually loses.
What you need
- A machine that can actually run the model. For a usable 7B-class model: 16GB RAM minimum, 32GB comfortable. For 70B-class models: 64GB+ RAM, or a GPU with enough VRAM. Mac minis with M-series chips are genuinely good hosts for mid-sized models.
- Ollama installed.
brew install ollamaon Mac, the installer on Windows, orcurl -fsSL https://ollama.com/install.sh | shon Linux. - A pulled model.
ollama pull llama3.1:70bfor a substantial general model;ollama pull qwen2.5:32bfor strong multilingual and reasoning;ollama pull deepseek-v3for strong coding on capable hardware.
The configuration
Point OpenClaw at the Ollama endpoint. Default is http://localhost:11434. In Docker on Mac or Windows, use http://host.docker.internal:11434 to reach Ollama running on the host. On Linux Docker, use the Docker bridge IP or --network host.
When Ollama wins inside OpenClaw
- Strict data-sovereignty requirements where audio, transcripts, or sensitive documents cannot leave your infrastructure.
- Offline environments (air-gapped, intermittent connectivity).
- High-volume simple tasks where per-token cloud billing is uneconomic.
- Privacy-first personal or small-business deployments.
- Development and testing where burning cloud tokens for iteration is wasteful.
Ollama gotchas
- Quality gap. A 7B or 13B local model is meaningfully weaker than Claude Sonnet or GPT-5 on tool-calling and complex reasoning. A 70B model closes much of the gap; below that, expect more failures. Evaluate honestly.
- Concurrency limits. A single Ollama host processes one request at a time by default. Scale out across multiple Ollama instances for concurrent agents, or use
OLLAMA_NUM_PARALLELto allow parallelism if the model and RAM permit. - Latency is usually higher than hosted models, especially on CPU-only hosts. Budget 2-5x the latency of a cloud API for comparable work on modest hardware.
- Model quality varies widely. Llama, Qwen, Mistral, and DeepSeek each have strengths. Test on your specific workload; do not pick based on Twitter posts.
The multi-model pattern
Mature OpenClaw deployments rarely use one model for everything. A pattern I ship often:
- Planner agent on Claude Sonnet or Opus. It decomposes the goal, writes the plan, and supervises.
- Worker agents on Claude Haiku or GPT-5 mini. They execute sub-tasks cheaply.
- Classification and routing on Ollama locally. Very high volume, very simple judgement, zero per-call cost.
- Multimodal document analysis on Gemini. When PDF or video content hits the queue.
OpenClaw’s agent definitions let you pick a model per agent. The gateway coordinates. The cost curve bends quickly in your favour once you are willing to think about which agent deserves which tier.
Cost and latency tradeoffs
Rough ranges for a typical per-call unit of work (a few thousand tokens in, a few hundred out, one tool call):
| Model | Typical cost per call | Typical latency |
|---|---|---|
| Claude Haiku | Fractions of a cent | 1-2s |
| Claude Sonnet | 1-4 cents | 2-4s |
| Claude Opus | 5-15 cents | 4-8s |
| GPT-5 mini | Fractions of a cent | 1-2s |
| GPT-5 | 1-4 cents | 2-5s |
| Gemini Flash | Fractions of a cent | 1-2s |
| Gemini Pro | 1-3 cents | 2-4s |
| Ollama (local, 7B-13B, CPU) | ~$0 | 5-15s |
| Ollama (local, 70B, good GPU) | ~$0 | 3-6s |
Exact numbers drift. The shape does not. The two insights that matter in production:
- Small models are dramatically cheaper for simple tasks. Use them where the task permits.
- Latency is dominated by the model call, not OpenClaw. Tune model choice for latency before you tune anything else.
Structured outputs and tool-calling consistency
All four providers now support structured outputs (JSON schema or equivalent) and typed tool calls. In OpenClaw, prefer this pattern:
- Define tool schemas tightly, with descriptions that would help a model reading them cold.
- Require structured outputs when the agent is talking back to a downstream system.
- Never parse free-form text to extract structured data; make the model emit structured data directly.
Claude and GPT are excellent at this. Gemini is very good. Ollama varies by model; 70B-class models handle it reliably, smaller models less so.
Picking a model for a specific agent type
A rough map of what I reach for in practice:
- Accounting and finance agents. Claude Sonnet. Reasoning reliability and safety defaults matter.
- Customer support triage. GPT-5 mini or Claude Haiku. High volume, moderate judgement.
- Content research and drafting. Claude Sonnet for drafting, Gemini for web grounding when it helps.
- Code-generation agents. Claude Sonnet or Opus. Code quality leads.
- Document and PDF processing at scale. Gemini Flash or Pro. Multimodal strength.
- Classification and routing at very high volume. Ollama on a local host, or Haiku/Flash/mini cloud options.
- Sensitive or sovereign deployments. Ollama on your hardware, no cloud calls.
- Voice-agent reasoning layers. Claude Haiku or Sonnet for latency-sensitive work, paired with Vapi or LiveKit for the voice loop. Voice agent guide.
Governance across providers
Regardless of which model sits under OpenClaw, three governance items apply:
- Data handling. Know what each provider does with your prompts and completions. Anthropic, OpenAI, and Google all offer data-handling terms that exclude training on your data; set those terms in your enterprise agreement, do not assume defaults.
- Region and residency. Use Bedrock, Azure OpenAI, or Vertex in Australian regions for AU residency. Ollama for full control.
- Audit. OpenClaw’s gateway logs the prompt, the tool calls, the completion, and the identity. Ship those logs where your compliance team expects them. See AI governance on ISO 9001 for the pattern.
When to swap models
Good signals for a model swap:
- A new release materially outperforms the current model on your evals. Upgrade.
- Cost per outcome creeps up. Check whether a cheaper model does the task adequately.
- Latency regresses past what users tolerate. Check whether a faster model is now available.
- A compliance or data-handling requirement changes. Move to a provider or region that matches.
Bad signals for a model swap:
- Social media hype about a release. Wait for evals.
- A demo worked well. Demos are not your workload.
- Another team at another company picked it. Their constraints are not yours.
Decide on evidence, not noise.
Troubleshooting the most common provider issues
“Agent does not respond.” Usually a missing or wrong API key, or the key’s region does not match the model. Check logs for the provider’s error message.
“Agent responds slowly.” First suspect: the model call is slow (check the provider’s status page). Second: a tool inside the loop is slow. OpenClaw’s audit log shows which.
“Agent hallucinates tool calls.” Usually thin tool descriptions. Rewrite the tool schemas with clearer descriptions. Claude is the most tolerant; the others benefit most from the fix.
“Agent costs too much.” Move the high-volume sub-tasks to a small model. Measure. Repeat.
“Agent suddenly stops working after an upgrade.” Pin the model version in config. Do not let “latest” be your default.
What to do next
If you have the gateway running already: pick a primary model for your first agent. Claude Sonnet is the safe default; your context may say otherwise. Wire the credentials, point the agent definition at the model, restart the gateway, test.
If you are not running yet: the install guide goes first.
If you want a deeper treatment of model choice specifically: Claude vs ChatGPT vs Gemini agents: which to pick when.
If you want a guided end-to-end walkthrough: the Udemy course covers the model-wiring piece with live examples.
If you want help architecting a multi-model OpenClaw deployment: I run agent engagements that include model selection, cost modelling, and governance.
The best model under OpenClaw is the one whose strengths match your agent’s job, whose governance fits your context, and whose cost curve makes sense at your volume. Everything above is how to figure out which that is, faster than a three-week bake-off.
Further reading: What is OpenClaw?, How to install OpenClaw, Claude vs ChatGPT vs Gemini agents, The comprehensive OpenClaw 2026 guide. Canonical docs: docs.openclaw.ai. Source: github.com/openclaw/openclaw.
Disclosure: the link to the OpenClaw AI Agents Install and Setup Guide is a Udemy referral link. I may earn a commission if you enrol, at no extra cost to you.