R1-AQA
Xiaomi's DeepSeek-R1 Inspired Audio AI
5 followers
Xiaomi's DeepSeek-R1 Inspired Audio AI
5 followers
R1-AQA, inspired by DeepSeek-R1, is the open-source audio question answering model from Xiaomi, Achieves SOTA performance on MMAU using reinforcement learning (GRPO).




Flowtica Scribe
Hi everyone!
Sharing R1-AQA, a new open-source audio question answering (AQA) model from Xiaomi – and it's taking a really interesting approach, inspired by DeepSeek-R1!
What's cool:
🎧 Audio Question Answering: It goes beyond simple transcription, allowing you to ask questions and get answers based on the audio's content.
🧠 Reinforcement Learning (GRPO): They used a technique called Group Relative Policy Optimization (GRPO) – a type of reinforcement learning – to train the model.
🏆 State-of-the-Art: Achieves top results on the MMAU Test-mini benchmark, beating models like GPT-4o and Gemini Pro.
🌱 Small Data: They did this with only 38,000 training samples, and based on Qwen2-Audio-7B-Instruct.
🔓 Both the code and the model weights are available.
The use of reinforcement learning is particularly interesting. It seems like a very effective way to train these kinds of models, even with limited data.
To try it yourself, upload an audio or video file here.