R1-AQA
Xiaomi's DeepSeek-R1 Inspired Audio AI
5 followers
Xiaomi's DeepSeek-R1 Inspired Audio AI
5 followers
R1-AQA, inspired by DeepSeek-R1, is the open-source audio question answering model from Xiaomi, Achieves SOTA performance on MMAU using reinforcement learning (GRPO).




Flowtica Scribe
Hi everyone!
Sharing R1-AQA, a new open-source audio question answering (AQA) model from Xiaomi β and it's taking a really interesting approach, inspired by DeepSeek-R1!
What's cool:
π§ Audio Question Answering: It goes beyond simple transcription, allowing you to ask questions and get answers based on the audio's content.
π§ Reinforcement Learning (GRPO): They used a technique called Group Relative Policy Optimization (GRPO) β a type of reinforcement learning β to train the model.
π State-of-the-Art: Achieves top results on the MMAU Test-mini benchmark, beating models like GPT-4o and Gemini Pro.
π± Small Data: They did this with only 38,000 training samples, and based on Qwen2-Audio-7B-Instruct.
π Both the code and the model weights are available.
The use of reinforcement learning is particularly interesting. It seems like a very effective way to train these kinds of models, even with limited data.
To try it yourself, upload an audio or video file here.