
MiniCPM
Ultra-efficient on-device AI, now even faster
282 followers
Ultra-efficient on-device AI, now even faster
282 followers
MiniCPM 4.0 is a family of ultra-efficient, open-source models for on-device AI. Offers significant speed-ups on edge chips, strong performance, and includes highly quantized BitCPM versions.
This is the 6th launch from MiniCPM. View more

VoxCPM2
Launched this week
VoxCPM2 is a 2B open-source TTS model with 30-language support, 48kHz output, voice design from text alone, controllable voice cloning, and real-time streaming fast enough for production voice workflows.





Free
Launch Team



Flowtica Scribe
Hi everyone!
VoxCPM2 is the next-generation open-source audio model from the @MiniCPM family, and it perfectly continues their signature trait of incredible "capability density" — packing all of these features into a model that is only 2B parameters!
Despite its highly compact size, the feature set it brings to the table is quite rare for an open-source release:
Voice Design: Instead of hunting for the perfect reference audio to clone, you can just prompt the model directly (e.g., (A young woman, gentle and sweet voice) Hello world.). It generates a completely novel voice on the fly.
Native 48kHz Output: It has a built-in super-resolution VAE, meaning no external upsamplers are needed to get studio-quality audio.
Controllable Voice Cloning: You can clone a voice from a short clip, but still steer the emotion, pacing, and style using text prompts.
Production-Ready: It hits an RTF of ~0.13 for real-time streaming and is fully open-source under the Apache-2.0 license.
It is incredibly refreshing to see this level of controllable, high-fidelity audio hit the open-source ecosystem in such a lightweight package.
Try it out here!
@zaczuo Have you seen folks using it yet for quick custom podcast intros or branded voiceovers in marketing?
2B params delivering 48kHz + voice design + cloning is impressive capability density. As someone building an audio/video editing tool that relies on audio analysis for precise segment boundaries, I appreciate how much source quality matters.
Curious: how does VoxCPM2 handle multilingual switching within a single utterance — e.g. Japanese with embedded English terms?
Voice design from text prompts instead of hunting for a reference clip is the thing I didn't know I needed. "A tired middle-aged man reading terms of service" and it just... makes that? 2B parameters for this is wild. Will try it locally today.