Voxtral Transcribe 2 February 5, 2026, 3:38 PM 2 min read

Introducing Voxtral Transcribe 2: Revolutionizing Speech-to-Text Models

Introduction to Voxtral Transcribe 2

We just spotted an exciting update from Mistral that's worth sharing with the community - the release of Voxtral Transcribe 2, a next-generation speech-to-text model that's set to transform voice applications across industries. Here's what caught our attention about this update: it offers state-of-the-art transcription quality, diarization, and ultra-low latency.

Key Features of Voxtral Transcribe 2

The Voxtral Transcribe 2 family includes two models: Voxtral Mini Transcribe V2 for batch transcription and Voxtral Realtime for live applications. Some of the key features of these models include: * Precision diarization, real-time transcription, and a new audio playground * State-of-the-art transcription quality with speaker diarization, context biasing, and word-level timestamps in 13 languages * Ultra-low latency, with Voxtral Realtime delivering transcriptions with delay configurable down to sub-200ms * Best-in-class efficiency, with industry-leading accuracy at a fraction of the cost

Implications and Use Cases

So, why does this matter for developers and the tech community? The implications of Voxtral Transcribe 2 are significant, with potential use cases including: * Meeting intelligence, with multilingual recordings and speaker diarization * Voice agents and virtual assistants, with sub-200ms transcription latency * Contact center automation, with real-time transcription and sentiment analysis * Media and broadcast, with live multilingual subtitles and minimal latency

Getting Started with Voxtral Transcribe 2

Voxtral Mini Transcribe V2 is available now via API at $0.003 per minute, and can be tried out in the new Mistral Studio audio playground. Voxtral Realtime is available via API at $0.006 per minute, and as open weights on the Hugging Face Hub. With its enterprise-ready features and open weights, Voxtral Transcribe 2 is set to revolutionize the way we interact with voice technology.