Blog

April 29, 2026

Commotion Outperforms Other Leading Voice AI Providers in Independent Voice AI Benchmark

INDEPENDENT BENCHMARK REPORT

A rigorous blind human evaluation by Josh Talks Research placed Commotion ahead of three globally recognized providers across the dimensions that matter in real-world customer support voice AI.

Read the full independent benchmark report HERE.

The Evaluation: Rigorous, Blind, and Built for the Real World

When it comes to voice AI for customer support, synthetic demos and lab benchmarks tell only part of the story. The conditions that define the real world telephony audio at 8 kHz, distressed callers, and complex resolution paths are exactly where most providers struggle. 

Josh Talks Research, a specialist in voice AI evaluation, conducted a fully blind, head-to-head human evaluation comparing Commotion Laya v1.5 against three (3) other leading commercial providers in a call center setting. The word “blind” is key: listeners had no knowledge of which model produced each audio sample, eliminating brand bias from the results entirely.

The scale of the study gives its findings statistical weight:

Industry use cases represented: insurance, banking, telecom, travel, healthcare, e-commerce, logistics, ride-hailing, broadband, general support.

Commotion supports over 40 languages. For this benchmark, Josh Talks Research selected eight: English, the most universally sought language in enterprise voice AI, alongside seven linguistically complex Indian languages, including Hindi, Tamil, Malayalam, Marathi, Kannada, Telugu, and Bangla. 

This combination is a deliberate stress test. The Dravidian and Indo-Aryan language families represented here are phonologically diverse, widely spoken, and notoriously difficult for Text-to-Speech (TTS) systems to handle well. A model that performs strongly across this range is demonstrating genuine linguistic sophistication, not just polished performance on a narrow set of easy inputs. In other words a model that performs well across these 8 languages will likely perform in any language.

All audio was rendered at 8 kHz to replicate true telephony conditions. Evaluation prompts were drawn from real CX-style workflows, grounded in specific intents and resolution paths such as claim rejection, delivery delay, and SIM activation. The evaluation was designed not to test “good-sounding text-to-speech” in the abstract, but to test which model sounds right when a distressed customer is on the other end of the line.

The Results: Commotion Consistently Outperformed Across All Three Competitors

The headline finding is unambiguous. On decisive votes, where participants expressed a clear preference for Commotion Laya v1.5 in the main telephone benchmark 74% to 79% of the time.

To quote directly from the Josh Talks report “In our evaluation, we observed that Commotion Laya v1.5 was preferred over all three competitors in the main telephony benchmark. On decisive votes, it won 78.7% against Cartesia Sonic v3, 74.3% against Sarvam Bulbul v3, and 79.0% against ElevenLabs Turbo 2.5.”

“In our evaluation, we observed that Commotion Laya v1.5 was preferred over all three competitors in the main telephony benchmark. On decisive votes, it won 78.7% against Cartesia Sonic v3, 74.3% against Sarvam Bulbul v3, and 79.0% against ElevenLabs Turbo 2.5.”

— Josh Talk Reports, independent benchmark on Call Center CX TTS, April 2026

When tied votes are excluded and only vote-based industry benchmark results are used, the margin is even wider: Commotion received 83% to 85% of votes head-to-head against the three (3) providers. These figures held consistently across every language included in each comparison. This was not a result driven by strength in one region or one language.

Why Commotion Outperformed: Reliability, Not Just Preference

Preference data tells you what listeners choose. Issue-rate data tells you why. In this benchmark, the competing models were tagged far more frequently for problems that directly impact customer experience in a live call:

  • Irregular pacing
  • Mispronunciation
  • Robotic delivery
  • Missing-word errors


Commotion Laya v1.5 carried an issue rate of approximately 31%, compared to more than 61% in the other providers. In other words, Commotion’s output was flagged for problems at less than half the rate of its nearest rival. That reliability advantage is not cosmetic. In a live call-center environment, every mispronounced name, every robotic pause, and every dropped word affects customer trust.

Empathy: The Dimension That Separates Good from Great

Standard TTS benchmarks measure whether a voice sounds pleasant. This evaluation went further by testing whether Commotion Laya v1.5’s voice sounded appropriately human, considerate and emotionally aware in challenging situations: claims disputes, hospitalization follow-ups, and reassurance scenarios.

The empathy benchmark asked listeners not just which voice they preferred, but which one they perceived as more empathetic. Commotion Laya v1.5 outperformed on both measures:


The explanation, again, comes down to delivery breakdowns. When a model mispronounces a word mid-sentence or stumbles in pacing at a moment of high emotional weight, it doesn’t just sound wrong. It sounds uncaring. Commotion’s lower rate of delivery errors means its voice remains coherent and steady precisely when it matters most.

Consistent Across Every Use Case

One of the most operationally significant findings is the breadth of Commotion’s strong performance. The evaluation grouped results not only by language, but by support workflow type, covering the full range of scenarios a modern contact center handles. Commotion’s consistency across various use case categories indicates a model that is operationally ready for the full spectrum of customer interactions, not just the easy ones.

A model that wins in one scenario but falters in another is a liability for any enterprise deploying voice AI across multiple lines of business. Commotion’s consistency across all ten use case categories indicates a model that is operationally ready for the full spectrum of customer interactions, not just the easy ones.

What This Means for Organizations Evaluating Voice AI

Voice AI is no longer a novelty. It is infrastructure. Every mispronounced policy number, every robotic pause during a claims call, every missed word on a payment confirmation is a moment where a customer’s confidence in your brand erodes. The providers that will win in this space are those that are reliable under real-world conditions, not just impressive in demos.

This benchmark, conducted by an independent specialist with no stake in the outcome, provides exactly that kind of real-world signal. Across 10,546 preference votes, 332 unique evaluators, 8 languages, and 10 industry use cases, CommotionLaya v1.5 came out ahead on every dimension tested: overall preference, industry-specific performance, and empathy recognition.

Consistently outperforming across use cases, is a meaningful signal about where Commotion’s technology stands.

One further finding deserves a post of its own.

India is one of the most linguistically complex markets on earth, with 22 constitutionally recognized languages, dozens of major dialects, and code-switching patterns that defeat most voice AI systems. Commotion achieved favorable outcomes across all 8 Indian languages tested in this evaluation. The implications of that result, for what it says about Commotion’s technology and its readiness for the world’s most demanding multilingual markets, are explored in our next post.

Disclaimer: The study reported in this blog post was commissioned by Commotion, Inc. and independently conducted by Josh Talks Research. The views and conclusions expressed in this blog post represents the views of Commotion, Inc., on the report. Commotion, Inc. is not responsible or liable for the content, interpretations, methodology, or conclusions presented in the report. Click [here] to access the report dated April 12, 2026, published by Josh Talks Research to form your own views. All third-party trademarks belong to their respective owners. Source: “Call-Center CX TTS Benchmark: Commotion Laya v1.5 vs Sarvam Bulbul v3, Cartesia Sonic v3, and ElevenLabs Turbo 2.5,” Josh Talks Research, April 12, 2026.

View All Posts

Create a highly contextualized experience today

Find out how we can help your brand create highly contextual shopping and post-purchase experiences.

Request a Demo