Tech Updates

Startups

Artificial Intelligence

Europe

ElevenLabs Launches Scribe v2 Realtime for Live Transcription

The AI audio company introduces a speech-to-text model that transcribes in under 150 milliseconds with industry-leading accuracy.

The AI audio company introduces a speech-to-text model that transcribes in under 150 milliseconds with industry-leading accuracy.

The AI audio company introduces a speech-to-text model that transcribes in under 150 milliseconds with industry-leading accuracy.

NewDecoded

Published Nov 15, 2025

Nov 15, 2025

6 min read

Image from ElevenLabs

Breaking Real-Time Barriers

ElevenLabs introduced Scribe v2 Realtime on November 11, 2025, positioning it as the most accurate low-latency speech-to-text model available. The system achieves 93.5% accuracy across 30 commonly used European and Asian languages, delivering state-of-the-art accuracy in over 92 languages with ultra-low 150ms latency. Built specifically for live environments like voice agents, meeting assistants, and real-time captioning, the model sets a new performance standard for applications requiring instantaneous response.

Technical Advantages

On 500 hard samples containing background noise and complex information, Scribe v2 Realtime significantly outperforms all other models. The system incorporates negative latency prediction technology that anticipates upcoming words and punctuation before they are fully spoken, reducing perceived delay. The model transcribes speech in under 150 ms across English, French, German, Italian, Spanish, Portuguese, and 90 other languages. Additional capabilities include automatic language detection allowing mid-conversation language switching, voice activity detection, and manual commit controls for precise transcription management.

Enterprise Deployment

The model includes support for 11 Indian languages such as Hindi, Tamil, Malayalam, Telugu, Gujarati, Kannada, Odia, Bengali, Marathi, Punjabi and Sindhi. ElevenLabs has prioritized data localization with India data residency options, enabling organizations to deploy Speech to Text solutions in compliance with India's data regulations. The platform meets SOC 2, ISO 27001, PCI DSS Level 1, HIPAA, and GDPR compliance standards, with zero retention mode available for sensitive workloads.

Availability and Integration

Scribe v2 Realtime is available today through the ElevenLabs API and directly within ElevenLabs Agents. Developers can access the documentation to integrate the model into applications ranging from customer support voice agents to medical dictation systems. The release strengthens ElevenLabs' position in the conversational AI space following its recent $6.6 billion valuation.

ElevenLabs announced Scribe v2 Realtime on November 11, 2025, positioning it as the most accurate low-latency speech-to-text model available. The system delivers live transcription in under 150 milliseconds across 90+ languages, including 11 Indian languages, with 93.5% accuracy across 30 commonly used European and Asian languages.

Built for Real-World Conditions

Unlike traditional speech recognition systems that struggle with background noise and diverse accents, Scribe v2 Realtime was tested on 500 challenging samples containing complex audio environments. The model handles natural speech patterns including filler words, pauses, emotional cues, and low-quality audio while supporting complex domain terminology from medical and financial sectors. It features automatic language detection, allowing users to switch languages mid-conversation without manual reconfiguration.

Key Technical Features

The system introduces "negative latency" through next-word and punctuation prediction, anticipating probable words to enable seamless real-time transcription. Built-in Voice Activity Detection (VAD) segments audio precisely, while manual commit control gives developers full control over when to finalize transcript segments. It supports multiple audio formats including PCM (8-48 kHz) and μ-law encoding for telephony systems, with text conditioning that maintains transcription continuity after connection resets.

Enterprise Ready and Available Now

Scribe v2 Realtime meets comprehensive compliance standards including SOC 2, ISO 27001, PCI DSS Level 1, HIPAA, and GDPR, with EU and India data residency options and zero retention mode for sensitive workloads. The system is immediately available through the ElevenLabs API and fully integrated into the ElevenLabs Agents platform for building voice assistants, meeting transcription tools, and real-time captioning systems. Real-world deployments are already showing results, with case studies demonstrating automated support systems handling 22,000 calls per month. The system's purpose-built architecture for agentic use cases positions it for applications requiring instant understanding and response capabilities.

Decoded Take

Decoded Take

Decoded Take

This launch signals a critical inflection point in conversational AI infrastructure. While companies like OpenAI and Google have focused on large language models for reasoning, ElevenLabs is doubling down on the audio layer that makes those models accessible through voice. With sub-150ms transcription, the bottleneck in voice agent conversations shifts from speech recognition to LLM inference, pressuring foundation model providers to match this speed. The emphasis on "agentic use cases" and real-world robustness suggests ElevenLabs sees voice agents moving from demos to production deployments, where handling accents, noise, and domain-specific terminology becomes table stakes. By achieving enterprise compliance and multilingual support at launch, ElevenLabs is positioning to capture the enterprise voice AI market before competitors can match their latency-accuracy combination.

Share this article

Related Articles

Related Articles

Related Articles