Skip to main content

19 April 2026

Voice AI

Retell vs Vapi vs LiveKit: The Voice AI Shootout for 2026

Honest operator comparison of Retell, Vapi, and LiveKit Agents. Pricing, latency, features, Australian support, and which one to pick for inbound, outbound, and self-hosted voice agents.

Retell vs Vapi vs LiveKit: The Voice AI Shootout for 2026, Voice AI, Vapi analysis by Amjid Ali.

Three platforms now dominate the voice AI agent conversation: Vapi, Retell, and LiveKit Agents. Between them they handle the overwhelming majority of production voice deployments I see in Australian mid-market work. They look similar on the landing page. They behave very differently under a production call load.

This is the shootout. Honest, operator, no vendor allegiance, grounded in builds I have actually shipped. If you need the background on what a voice agent is and what it costs to run, the voice agent build playbook is the prerequisite.

TL;DR verdict

  • Pick Vapi for most Australian businesses: best Twilio integration, cleanest developer experience, broadest model and voice choice, affiliate-friendly if you are building as an agency. Try it: Vapi.
  • Pick Retell if you want the most opinionated workflow builder and you value tightly-coupled call analytics over model flexibility.
  • Pick LiveKit Agents if you need to self-host, own the stack, or hit data sovereignty requirements that rule out a hosted orchestrator.

Everything else below is the detail that decides the choice at the margin.

Meet the contenders

Vapi

Launched in 2023, matured through 2024-2025. A fully hosted voice agent orchestrator: you bring the phone number (or buy one from them), your chosen LLM, your chosen TTS voice, and Vapi handles the real-time voice loop, turn-taking, interruption handling, barge-in, endpointing, and the plumbing to Twilio or other telephony. Developer-first API, assistant configuration via REST or their dashboard, strong webhook and function-call story.

Retell

Also launched in 2023. Hosted platform, similar scope to Vapi. Strong emphasis on low latency and a structured conversation flow builder. Tighter defaults, less flexibility on the model/voice combo in exchange for fewer knobs to tune. Good call analytics out of the box.

LiveKit Agents

Open source, built on LiveKit’s WebRTC infrastructure. Python and Node SDKs. Fully self-hostable, or use LiveKit Cloud for the transport. Model-agnostic by design, you wire up ASR, LLM, and TTS yourself. More work to stand up, more control once running.

Head-to-head at a glance

DimensionVapiRetellLiveKit Agents
HostingFully hostedFully hostedSelf-host or LiveKit Cloud
End-to-end latency (typical)500-800ms500-900ms400-900ms depending on setup
Model flexibilityVery high (Claude, GPT, Gemini, Groq, custom)ModerateFull (you pick everything)
TTS voice flexibilityVery high (ElevenLabs, Cartesia, Deepgram, PlayHT, OpenAI)GoodFull (you pick)
ASRDeepgram, Whisper, AssemblyAI, GladiaTightly integratedFull (you pick)
TelephonyTwilio, Vonage, Telnyx, SIPTwilio, SIPBring your own via SIP
Phone numbersProvisioned via Vapi or BYOTwilio BYOBYO
Function/tool callingFirst-classFirst-classFirst-class
Workflow builder UIYes (late 2024 onward)Yes, strongNo, code-first
Call analyticsGoodVery goodBuild your own
Typical pricing (orchestration only)~US$0.05-0.10/min~US$0.07-0.10/minRaw cost of your stack
Australian supportTwilio AU numbers, goodTwilio AU numbers, goodFully your choice
SOC 2 / HIPAAYesYesYour problem
Data sovereignty controlLimited (hosted)Limited (hosted)Full
Time to first working agentHoursHoursDays

Exact pricing drifts, check the vendor pages. What does not drift is the relative shape: hosted platforms trade control for speed, self-hosted trades speed for control.

Where Vapi wins

Developer experience. Vapi’s API is the most consistent of the three. Create an assistant, assign a phone number, attach tools, go. The abstractions map cleanly to how you think about the problem: assistant, tool, voice, model. No surprises.

Model choice. You can swap between Claude 4.x, GPT-5, Gemini 2.x, Groq-hosted Llama, and several others without re-architecting. This matters when a model family has a bad week, or when a new release (faster, cheaper, better at your use case) lands.

TTS voice diversity. ElevenLabs, Cartesia, Deepgram, PlayHT, OpenAI voices, all first-class. Australian-accent voices are accessible. Voice cloning works. Vapi is the platform where voice brand-fit is easiest to get right.

Twilio integration quality. Vapi’s Twilio integration, particularly for AU numbers and SIP trunking, is the most production-ready of the three. The bits that matter at scale, warm transfer, call recording, DTMF fallback, webhooks on call events, all work cleanly.

Ecosystem and docs. More third-party tutorials, more n8n-community integrations, more YouTube walkthroughs. For an operator learning the platform, this compounds.

Affiliate programme. If you are building as an agency or reseller, Vapi has a clean referral programme. This post, for example, uses my referral link: Vapi.

Where it is merely OK: call analytics. Better than DIY, not as deep as Retell out of the box. Workflow-builder UX is newer and less polished than Retell’s.

Where Retell wins

Workflow builder UX. Retell’s visual flow builder is noticeably more thought-through than Vapi’s. If your team’s primary builder is a non-developer and the agent has branching conversation logic, this matters.

Latency consistency. Retell has invested in low-variance latency. The average number in the table is similar to Vapi’s, but the p99 on Retell tends to be better on the deployments I have compared side-by-side.

Call analytics. Out of the box, Retell’s analytics are more operator-friendly. Conversion dashboards, call-type breakdowns, failure categorisation. Vapi gets there, but you wire more of it yourself.

Dispositioning and structured outputs. Retell’s schema for what a call “produced” at the end (qualified, booked, escalated, failed, with structured reasons) is slightly more opinionated and slightly more useful, especially for outbound campaigns.

Opinionated defaults. Retell gives you fewer choices, which means faster to a working agent if you don’t need the flexibility. Vapi gives you more choices, which means more power but also more tuning.

Where it is merely OK: model and voice flexibility. If you want to swap to a specific Claude version or a specific Cartesia voice, Retell is harder to bend than Vapi.

Where LiveKit Agents wins

Data sovereignty. Self-hosted, run anywhere. If Australian data sovereignty, sovereign-region requirements, or customer-specific hosting is in play, LiveKit is often the only sensible option.

Cost at scale. Once volume crosses a threshold (roughly 20,000+ minutes a month in my experience), the hosted platforms’ per-minute markup adds up. LiveKit’s cost is the raw cost of your ASR, LLM, TTS, and compute. Below the threshold, hosted wins on total cost including operator time. Above it, self-host wins.

Full stack control. Every choice is yours. Specific Deepgram model, specific Llama fine-tune on your own inference, specific TTS provider, specific VAD settings. Granularity is a feature when you need it.

Multimodal. LiveKit was built for real-time video first. Voice + video agents (retail assistants, support with screen share, training simulators) are genuinely better on LiveKit than on Vapi or Retell.

Open source. No lock-in. No platform risk. If LiveKit disappeared tomorrow (unlikely, it is Series B funded with major customers), your agent still runs on your infrastructure.

Where it is merely OK: time-to-first-agent. Days, not hours. The learning curve is real. Operating costs include a real engineer’s attention, not just a monthly subscription.

Honourable mentions

Three platforms worth naming even if they are not the main event:

Pipecat. Open source, Daily-backed, Python-first. A real competitor to LiveKit Agents for self-hosted voice. Lighter footprint, faster to prototype, somewhat less mature operationally. I often reach for Pipecat for spike/prototype work and LiveKit for production self-host.

ElevenLabs Conversational AI. ElevenLabs shipped a first-party voice agent platform in 2024 and has iterated fast. Beautiful voices (it is ElevenLabs), decent workflow tooling, tightly coupled to their TTS. Worth a serious look for consumer-facing voice apps where voice quality is the differentiator.

Bland AI. Developer-focused voice platform, strong on outbound campaign tooling and concurrency. Good if high-volume outbound is the primary use case and you want to skip building your own dialer on top of Vapi or Twilio.

Synthflow / Voicerr / Autocalls / Insighto. White-label layers built on top of Vapi or similar. Useful if you are building a reseller business and want the multi-tenant wrapper out of the box, rather than building it yourself on Vapi’s API.

Decision framework

Five questions, answered in order:

1. Hosted or self-hosted?

If your answer is self-hosted (data sovereignty, sovereign region, regulated industry, cost at volume), the choice is LiveKit Agents or Pipecat. Skip to question 4.

If hosted is fine, continue.

2. Developer-led or business-builder-led?

Developer-led, with code and APIs: Vapi or Retell. Vapi wins on flexibility and ecosystem.

Business-builder-led, with a visual workflow canvas: Retell wins on UX. Vapi’s builder is improving fast.

3. How important is model and voice flexibility?

Very important (you want to swap Claude for GPT-5 when it helps, or switch TTS voices for brand fit): Vapi. Moderate: Retell is fine. Low: either.

4. What telephony shape do you need?

AU numbers via Twilio with SIP trunking to an office PBX: Vapi’s integration is the cleanest I have deployed.

Global inbound/outbound with intensive outbound dialer logic: Vapi or Bland. Retell is adding capabilities but started inbound-first.

Self-hosted telephony, your own SIP trunk: LiveKit or Pipecat.

5. What is your volume?

Under 10,000 minutes/month: hosted wins every time. Pick Vapi or Retell by the answers above.

10,000-50,000 minutes/month: hosted still usually wins, but the economics get interesting.

Over 50,000 minutes/month: run the numbers properly. LiveKit or Pipecat self-hosted probably saves money, but only if you have an engineer dedicated to operating it.

Australian considerations

A few AU-specific notes that rarely appear in vendor comparisons.

AU phone numbers. All three platforms work with Twilio AU numbers (both mobile and landline). The Twilio integration quality differs; Vapi’s is the most polished in my builds. Vonage AU works on Vapi; Telnyx AU works on Vapi.

Latency from Australia. Vapi and Retell both have data-plane regions that can serve AU callers acceptably. Neither has an Australian-primary region at time of writing. Most of the end-to-end latency is in the model call (ASR + LLM + TTS), which is itself geo-dependent based on which model you use. Plan the regional choice end-to-end, not just the orchestrator.

Data sovereignty. If your contract requires AU-only data residency, hosted platforms are a problem. Call audio, transcripts, and metadata transit their infrastructure. Self-host on AWS ap-southeast-2 via LiveKit is the clean answer.

Privacy Act and consent. Recording calls in Australia has state-level nuances (two-party vs one-party consent). Your consent announcement and your recording retention policy matter regardless of platform. All three let you control recording; none of them write the consent policy for you.

Do Not Call Register. For outbound work. The register is a legal obligation, not a platform feature. Your list hygiene is your responsibility, not Vapi’s or Retell’s.

My pick for Australian SMB and mid-market

Vapi + Twilio AU is where I land for most Australian builds. The reasons:

  • Model choice lets me pick Claude or GPT per-use-case and swap fast when a better release ships.
  • TTS voice diversity lets me get Australian-accent brand fit.
  • Twilio integration is the most production-ready for AU numbers.
  • The affiliate programme is clean if I am building for an agency to resell.
  • Time-to-first-agent is hours, not days.

Where I will pick Retell instead: the client’s builder is a non-developer who will operate the workflow canvas day-to-day, and the call-type analytics dashboard is important enough to justify fewer model options.

Where I will pick LiveKit instead: data sovereignty is a hard requirement, or volume is high enough that hosted per-minute costs outweigh self-host engineering cost, or the use case is voice-plus-video.

The honest takes few comparison posts make

A few things I will say out loud that most shootout posts won’t.

“Latency” numbers in vendor marketing are often best-case. The 500ms figures are on a great day, with a great connection, with fast ASR and TTS, and a simple prompt. Real conversations land at 700-1,000ms most of the time. All three platforms are within the “feels human” window when tuned.

The orchestrator is rarely the bottleneck. Most bad voice agents are bad because the prompt is bad, the tools are flaky, or the escalation logic is missing. Switching from Vapi to Retell because “Retell has lower latency” almost never fixes a bad agent.

You will not make the wrong choice forever. The prompt and tool layer is mostly portable. Moving an agent from Vapi to Retell (or vice versa) is a one-to-two-week job, not a rewrite. Pick one, ship, and move later if you need to.

Voice quality matters more than framework. Spend an afternoon picking the voice (ElevenLabs Australian voices on Vapi, or Cartesia, or a cloned voice) before you spend a week comparing platforms. Your callers will remember the voice, not the orchestrator.

The evaluation bar is shifting. In 2024, “it works on the happy path” was acceptable. In 2026, production voice agents are evaluated on conversion rate, transfer rate, caller satisfaction, and cost per qualified outcome. Pick the platform whose analytics help you measure what matters.

What to do next

If you are starting a voice agent build: try Vapi first. If it does not feel right within a day, try Retell. If neither fits, you probably need LiveKit. Do not bake off all three before building anything.

If you are operating a voice agent on one of these and considering a switch: measure before you move. The cost of a switch is rarely worth it unless a specific unmet requirement is causing a specific loss.

If you are building a voice-agent practice and want help getting the architecture right: I run AI Voice Agent engagements that include platform selection against your specific shape. Two-week discovery, three-to-four-week build, transparent pricing, Australian infrastructure.

The best voice platform in 2026 is the one where your agent sounds like your brand, answers every call, and writes back to your CRM without tripping over itself. All three in this post can do that. The difference is how long it takes you and how much you pay to get there.


Prerequisite: AI Voice Agents with VAPI and Twilio: The Build Playbook. Companion: Best AI Agent Platforms and Frameworks in 2026. Or jump to AI Voice Agent services if you want a build.

Disclosure: the links to Vapi in this post are referral links. If you sign up through them I may earn a commission at no extra cost to you. The recommendation is not biased by the link, it is biased by shipping on the platform.

Frequently asked.

Retell vs Vapi vs LiveKit, which voice AI platform should I pick?
Vapi for fastest time-to-production on inbound/outbound phone bots, best prompt ergonomics, strong Twilio integration. Retell for sharpest latency on outbound campaigns and the cleanest analytics. LiveKit for self-hosted, custom media pipeline, or WebRTC-first use cases. For most Australian mid-market voice work, Vapi is the default recommendation.
What does a production voice AI agent actually cost?
Rough 2026 numbers: US$0.07–0.15 per minute all-in (LLM + STT + TTS + telephony) on Vapi or Retell. A 5-minute average call costs 35–75 cents. For a 1,000-call-per-month line, that is roughly A$500–1,000/month in platform cost, plus engineering. Self-hosted LiveKit can go lower per minute but carries devops overhead.
Are voice AI agents suitable for Australian businesses?
Yes, but with specific choices: use Australian Twilio numbers, pick TTS voices tested on Australian accents (ElevenLabs and Cartesia both have acceptable options), and route STT through providers with strong Aussie English support. Compliance-wise, the Scams Prevention Framework and Privacy Act obligations apply, build consent and disclosure into the call flow from day one.

Picked by shared topic. The through-line is agentic AI shipped into production, not the pilot theatre.

Read another.