Pop - Everyday messaging, voice first

by
Pop makes voice notes first class in everyday messaging. Amazing transcripts, a magic editor to summarise or clean up, edit the audio of your notes by editing the transcript & more.

Add a comment

Replies

Best
Maker
📌

Hello everybody!

I'm very excited to be launching Pop here today, the best way to voice message.

Now, rather than tell you about how great voice notes are in Pop, let me !

Looking forward to hearing all your feedback.

For voice, are local LLMs (smaller LLMs) sufficient?

 There is always a tension between wanting better transcription and the upsides of local models. For us the tradeoff right now is definitely in favour using frontier models, but I think over time this will change. Probably we could ship local transcription as an option already, for those who want it.

Google’s Edge Eloquent (cringe, I know) does this. They use local when necessary and cloud model when available. Local models are getting better at a pretty decent rate!

During usage, does Pop keep the voice sounding natural after cuts, or does it get glitchy?

 it works very well for day to day usage in our experience, but currently editing is just a straight cut, so in some cases you can notice a little glitch. I wouldn’t edit a podcast with it yet! We do have a way to improve this though, it should become close to perfect from an audio point of view, in most cases.

One question though: would you want a voice model or other ai processing to actually change how you sounded or even a bit of what you said? Because that would enable totally smooth edits, but on the other hand it can feel a bit inauthentic. What do you think?

the async focus is the right call to start with. the hardest part of voice messaging has always been that it's not editable in any meaningful way, so the transcript-based editing is a genuinely useful unlock. on the authenticity question you raised, i'd lean toward keeping it as a straight cut rather than ai-smoothing the audio. a slight glitch feels more human than a seamlessly stitched voice that isn't quite yours anymore.

Hey ! I really like the idea! Congrats on the launch. I use voice messages all the time, so this is super interesting. I am curious, what's the appeal to an app like this over traditional voice messaging in WhatsApp or Messages? Especially since users are able to go back and forth there with voice messages if they want.

 The key advantage is that we have way better transcripts & then use those transcripts for two main things: a) to let you skip around in a long message by tapping where you want to listen from and b) to let you delete / insert sections of audio using the transcript (so you can e.g. select a section of the transcript, delete it, which then deletes the audio as well as the text).

The "edit audio by editing the transcript" is the standout. Descript made it work for podcasts, but I haven't seen it for messaging yet. Does the cut audio sound natural after a delete, or are there obvious seams?

 There can be some obvious seams in some cases. We can & will make this better, it should get to near perfect from an audio point of view, but the actual words and speech tone being perfect is a different matter. There is a question about authenticity, because we could make a voice model totally smoothly fix up your voice, even fix grammar around edits, but is that what you want for talking to friends and loved ones? We're not sure that level of polish and editing fits.

 Honestly, I'd rather hear the seams than a voice that doesn't sound like the person. Voice notes are intimate by default. Friends would notice the polish more than they'd notice a small cut.

Voice-first messaging is having a moment. The async problem is always the challenge what happens when someone sends you a voice message and you're in a meeting? Does it auto-transcribe?

 Yes, every voice message has a state-of-the-art transcription.

 Auto-transcription is the feature that makes async voice actually work. Good call making it standard on every message.

Voice first messaging is the interface nobody built properly yet. Does Pop work async like voice notes, or is it expecting real time conversation?

 totally focused on async right now. We do have some plans for a chat to seamlessly switch between async and realtime, but that's a tricky design problem.