AVTR-1 Real-Time Open Weights Model - Generating uncanny AI avatars is now open source

The best real-time avatar model in the world is now open source with open weights. Take the model, tweak it, and use it at $0 cost. What's unique: our model listens while you speak — full-duplex; the avatar reacts in real-time, with minimal latency. • Every frame is generated, avoiding annoying animation loops from pre-rendered playback. • Full streaming infrastructure included so you can get started right away.

Add a comment

Replies

Best

Congrats on this amazing release! I’ve been in the real time Avatar business for the past 20 years. My first real time Avatar engine, a desktop windows app called Virtual Assistant Denise was released in 2008, and finding out you’re open sourcing your framework is just a unbelievable good news taking in consideration the high quality and competitiveness of your engine. You may be right now disrupting a very competitive market, where most big players will have to review their pricing. I have great respect for all those current Avatar companies as I’ve worked with most of them and know their very skilled people and their effort to stay competitive, and I do understand their efforts and investments to stay alive. But on the other hand, they need to go back in time and observe what companies like Unreal did to survive when Unity came up and disrupted the game market. Thanks to that, today we have small group of people releasing amazing games! Congrats again on this intrepid move, as developers can now focus on the creation of the final product, and not on finding ways to pay for subscriptions. TTS, STT, LLMs, memory and database frameworks were until a few time ago in the hands of a few companies and today they became commodities. Your decision to open source this engine is one big step to democratize this important piece of software to build human machine interactive product possible for everyone.

 Thank you so much! We are excited to see where this will go with our model available to everyone!

Open-sourcing the weights for a real-time avatar model is a much bigger deal than the headline suggests. The closed-stack incumbents in this space charge per-minute and effectively gate experimentation to whoever can afford to burn API credits playing with use cases. I make finance educational content on Mod3Loop (YouTube) and the choice between "talking head on camera" and "avatar reading a script" has been completely blocked on per-minute cost economics for indie creators — free weights changes that math entirely. Full-duplex listening while speaking is the unsexy part that actually makes these feel like presence rather than playback.

 Really appreciate this perspective and fully agree. The per-minute pricing model makes experimentation almost impossible for indie creators.

And yes, the active listening part is underrated. That’s what makes it feel like an actual interaction instead of just a generated video.

Full-duplex with "minimal latency" and every-frame-generated is the hard combo — what's the actual end-to-end latency, and on what GPU? Open weights only matter if a small team can self-host without a rack of H100s, so the number that decides adoption is "real-time on a single 4090" versus "real-time only on datacenter silicon." Which is it today?

 A decent gaming GPU is enough to actually have a conversation with the avatar. Here is more info from our GH repo

I’m curious if you guys have a preferred way that you want users to frame or position the avatars they deploy. In other words, do you want people to think they’re real, or should they always reveal up front that these are AI?

 Our goal isn’t to trick people into thinking they’re talking to a real human. It’s to make AI conversations feel more natural, emotionally responsive, and engaging.