If you ask a vendor for agentic AI ROI data, you’ll get forecasts. If you ask an analyst, you’ll get ranges. If you ask a CFO who’s already paid for a stalled pilot, you’ll get a raised eyebrow.
There’s a reason the eyebrow goes up. RAND’s 2025 study put the AI-project value-failure rate at 80.3%. Beam.ai’s analysis found 42% of enterprises’ AI projects show zero ROI. Another 18.1% deliver value but cannot justify their cost.
Put differently: more than 60% of enterprise AI spend currently returns less than it cost. That’s the base rate your business case is fighting.
This piece is for the finance and strategy readers. What’s actually payable. What’s realistically not. How I built the 300% ROI outcome on a real P&L, and how you should frame your own.
What the 300% number actually means
Let me define what “300% ROI” means at the Oman conglomerate where we ran the AI Factory from 2021 to 2025. It’s not a forecast. It’s a retrospective calculation agreed between the Group CIO office and Group Finance.
Inputs to the numerator (value):
- Fully-loaded hours saved per role, measured by SOP analysis before and after automation, audited against time-tracking and payroll data.
- Error-reduction value (re-work hours avoided, compliance penalties avoided, customer-refund delta).
- Revenue impact where an agent directly accelerated a sales or collections cycle, booked to the relevant functional P&L.
- Inventory-carrying and working-capital impact where supply-chain agents improved stock-turn.
Inputs to the denominator (cost):
- Build cost: AI team headcount (partially allocated), platform subscriptions, model API spend, one-off integration cost.
- Run cost: ongoing model API spend, infrastructure, MCP server operation, human oversight hours, governance overhead.
- Opportunity cost: what the same headcount would have delivered on non-AI work.
Time horizon: three fiscal years. Earlier than that and you’re front-loading benefits against capex. Later and the model layer drift starts distorting the run-cost line.
The 300% figure is (value / cost) × 100 − 100, measured across those three years. It is not pre-deployment projection; it is post-deployment measurement.
35% operational cost reduction is the other headline. That’s a function-weighted average across the 10 departments where agents were deployed, measured against the pre-AI operating cost baseline of those functions. Some functions (finance, shared services) hit 50%+. Others (CX, marketing) hit 15–20%. 35% is the blended number.
These are real numbers from a real enterprise. They are also upper-quartile. The median agentic deployment will not hit 300%. More on that below.
What actually pays back
Four categories, in order of return predictability.
1. Direct labour substitution (highest, most predictable)
Agents that replace well-defined, high-volume, low-variance human work. Typical targets:
- Invoice reconciliation and three-way matching.
- Bank and sub-ledger reconciliation.
- Standardised customer queries (tier-1 service).
- Routine procurement tasks (PO creation, vendor onboarding, contract extraction).
- HR ticket triage and basic policy Q&A.
Typical payback: 6–12 months. ROI range: 180–400% over three years.
These are the easiest wins and should be the first cohort. Not because they’re glamorous, but because the unit economics are legible. Hours saved per run × runs per year × fully-loaded cost. Minus agent build and run cost. You can defend the number at board level in two slides.
2. Exception handling and quality uplift (strong, but longer payback)
Agents that catch errors, route exceptions, or pre-screen human decisions. The value is avoided re-work and avoided downstream cost, not eliminated headcount.
- Anomaly detection in financial close.
- Contract compliance checks.
- Insurance-claims triage.
- Regulatory-submission pre-review.
Typical payback: 9–18 months. ROI range: 150–250% over three years.
These are harder to build a business case for because “errors avoided” is a counterfactual. The discipline is to measure before you deploy: what’s the current error rate, what’s the cost per error? Then measure the delta. If you don’t baseline the pre-state, you can’t claim the value.
3. Revenue acceleration (variable, but potentially enormous)
Agents that accelerate revenue: lead qualification, outbound cadence management, cross-sell recommendation, collections prioritisation.
- Lead-to-opportunity velocity improvements.
- Collections days-sales-outstanding reduction.
- Upsell/cross-sell attach rate increase.
Typical payback: 6–24 months (high variance). ROI range: 100–600% (very high variance).
These require hard attribution discipline. A/B control groups or matched-pairs analysis. Revenue outcomes are sensitive to too many other variables to claim the lift without controlled measurement. When it works, it’s the category with the largest absolute dollar impact. When it doesn’t, it’s the category where “it feels like it’s helping” survives longest past the point where the agent should be retired.
4. Capability leverage (indirect, hardest to claim)
Agents that unlock work the business wasn’t doing at all: structured analysis on unstructured data, compliance monitoring at scale, market-sensing, internal knowledge retrieval.
Typical payback: 12–36 months, often claimed as “new capability” rather than traditional ROI. ROI range: effectively unboundable, which is why auditors push back on it.
This is real value but hard to expense against a business case. I’d treat it as a bonus tier: don’t lead a CFO proposal with it; lead with categories 1 and 2 and let category 4 accumulate.
The unit economics of an agent
A concrete model. Assumptions typical of a mid-market Australian enterprise deploying its first five agents through a managed-service partner.
Single agent, direct labour substitution case:
- Process volume: 24,000 runs/year (~100/business day)
- Hours saved per run: 0.25 (15 min)
- Fully-loaded labour cost: A$85/hour
- Gross annual value: 24,000 × 0.25 × 85 = A$510,000
Costs:
- Build cost: A$35,000 one-off (discovery, integration, evaluation harness, deployment)
- Run cost: A$18,000/year (model API, platform share, observability, ~10% human-oversight hours in first year dropping to 4% thereafter)
- Governance overhead: ~A$6,000/year (periodic review, evaluation runs, audit contribution)
Year 1 net: A$510k − (A$35k + A$18k + A$6k) = A$451k Year 2+ net: A$510k − (A$18k + A$6k) = A$486k
3-year cumulative value: A$451k + A$486k + A$486k = A$1.42M 3-year cumulative cost: A$83k 3-year ROI: ((1.42M − 83k) / 83k) × 100 = ~1,600%
That number is going to look suspicious. Three reasons it’s real:
- The build cost is genuinely one-off. Production agents on a platform that’s already stood up cost a fraction of the first one.
- The run cost assumes standard small-to-mid-model usage, not frontier-model-only runs. Right-sizing the model tier is 70% of the cost optimisation.
- The volume is realistic for a common mid-market back-office process. If the volume is 10× lower, the ROI drops to ~150%, and the agent is borderline.
That sensitivity is the honest part. The unit economics collapse at low volume. Below ~2,000 runs/year for a 15-minute task, most agents lose money net of build cost. Scored process inventory exists to keep you from building those agents.
What kills the ROI in practice
Five patterns I’ve watched destroy otherwise-sensible business cases.
1. Wrong-model default
Teams default to the most capable frontier model for everything. That’s ~10–30× the run cost of a small model that would have done the job. The fix: model tiering by task complexity, measured, revisited quarterly as the open-source models close in.
2. No evaluation harness
Without an eval harness, you can’t swap models confidently. So you stay on the expensive one. Eval infrastructure pays for itself in the first model-swap cycle.
3. Under-priced human oversight
Plans assume 2% human oversight. Reality is 15% in year one, dropping to 5% by year two, 2–3% from year three. Pricing the human-in-the-loop cost honestly in years one and two is what separates defensible business cases from surprises.
4. Build-run imbalance
Organisations budget for the build, not the ongoing operations. They then cut the ops line, governance degrades, an incident costs more than the annual savings, and the programme gets pulled. The healthy ratio is roughly 40% build / 60% run over a three-year horizon.
5. Over-attribution to AI
The revenue-acceleration category is especially prone to this. “Sales is up 12%, we deployed an AI agent, therefore AI = 12% lift.” No. Controlled measurement or the number is fiction. CFOs who’ve been burned once by over-attribution never trust the next business case.
How to write a defensible business case
Five principles.
- Categorise every claim. Direct labour, exception handling, revenue acceleration, capability leverage. Each with its own confidence level.
- Baseline the pre-state. Current cost, current error rate, current volume. No baseline, no claim.
- Price the run cost honestly. Include human oversight, evaluation runs, governance, model drift budget.
- Use three-year cumulative, not year-one. Year one is skewed by build cost; year three reflects operating reality.
- Publish the sensitivities. Volume down 50%, model costs up 2×, oversight stays at 10%. If the case survives plausible stress, it’s defensible.
If you build the business case this way, your own CFO becomes an advocate. If you build it the vendor way (pre-loaded benefits, under-estimated run cost, no baseline, no sensitivities), you’ll be in the 60% that fails.
The blunt version
Agentic AI has real, large, measurable ROI in the categories it was designed for. The 300% outcome I’ve written about is not unique; plenty of well-run programmes hit it. But the base rate is what RAND and Beam say: most projects return less than they cost, because most projects skip the unglamorous inputs (process inventory, scored backlog, honest unit economics, evaluation, governance) that generate the upside.
If your AI business case would survive an hour with your own internal audit team, it’s probably real. If it wouldn’t, it’s an optimism document.
If you’d like an operator’s read on your AI business case before it goes to the board, book a discovery call. Or see the full 300% ROI case study.