
Grok
The world’s smartest AI (according to Elon)
4.6•12 reviews•1.9K followers
The world’s smartest AI (according to Elon)
4.6•12 reviews•1.9K followers
Grok is a free AI assistant designed by xAI to maximize truth and objectivity. Grok offers real-time search, image generation, trend analysis, and more.
This is the 8th launch from Grok. View more
Grok Voice API
Launching today
Grok now offers standalone Speech-to-Text and Text-to-Speech APIs for developers. The new voice stack covers real-time and batch transcription, multispeaker diarization, multichannel audio, text formatting, expressive TTS with speech tags, multilingual support, and simple usage-based pricing.







Free Options
Launch Team





ChatGPT by OpenAI
Claude by Anthropic
Flowtica Scribe
Hi everyone!
With the new transcription (Speech-to-Text) API now available, combined with their Voice Agent capabilities, it’s clear that @Grok is making a systematic push to capture the entire Voice AI ecosystem.
Looking specifically at the STT model, they have shipped a highly pragmatic feature set. It includes native WebSocket support for real-time streaming, built-in speaker diarization (a must-have for meetings), and intelligent text formatting that automatically handles numbers and currencies (it's cool and pretty useful in production!).
The pricing is also very aggressive: $0.10 per hour for batch and $0.20 per hour for streaming. xAI is once again putting some real price pressure on the market, isn't it?
@zaczuo How are you all handling noisy real-world audio? Does the streaming hold up, or is batch still king for cleaner results?
Athena
@zaczuo Strong move on the pricing.
the multispeaker diarization built right into the STT is a nice touch — that's usually a painful separate step. how's the latency on the real-time streaming? would love to see benchmarks vs whisper and deepgram
@zaczuo — the pricing puts real pressure on Deepgram and Whisper API. Curious about multilingual coverage — is speaker diarization accuracy consistent across languages, or is English still the primary target where the model performs best? That's usually where the gap shows up in production.
I've always appreciated the extent in which Grok can utilize voice for projects. Is the text to speech compatible and fluent with all manner of accents as well?