Launched this week

Gemini Omni Flash
High-quality video generation and conversational editing
208 followers
High-quality video generation and conversational editing
208 followers
Gemini Omni Flash (gemini-omni-flash-preview) just rolled out to developers via the Gemini API and Google AI Studio, natively supporting high-quality video generation and conversational editing from a combination of text, image and video inputs. This model is priced competitively at $0.10 per second of video output, which is the same as Veo 3.1 Fast.



Netlify
Hey PH fam 👋
Video creation has always meant stitching five tools together.
A script model here, a text-to-image model there, an image-to-video tool, a separate lip-sync app, a voice generator.
Each one its own contract, its own learning curve, its own headache.
Now Google's latest offering Gemini Omni Flash collapses all of that into one model. It's the first release in Google's new Omni family, and it does something most video models can't: it actually holds a conversation with you while you edit. You don't regenerate from scratch every time you want a tweak. You just talk to it.
How it works:
→ Feed it text, images, or short video clips as references
→ It generates a clip grounded in Gemini's real-world knowledge (history, biology, narrative logic, all of it)
→ Ask for changes in plain English: "make the lighting warmer," "swap the product," "extend the camera pan"
→ It remembers the last few turns, so your edits build instead of starting over
Why it's worth your attention:
→ Priced at $0.10 per second of 720p output, matching Veo 3.1 Fast
→ Launched at #1 on LMArena's Text-to-Video Arena
→ Every clip carries SynthID watermarking and C2PA credentials baked in, so provenance isn't an afterthought
→ Pairs naturally with Nano Banana 2 Lite: generate a still image, then animate it straight into video
What strikes me most isn't the generation quality, it's the editing model.
Most AI video tools still treat you like a one-shot prompt engineer. This treats you like a director who gets to say "no, try that again, but..."
Curious what you'd build first: a product explainer, a localized training video, or something nobody's tried yet?
@thisiskp_ If I generate an 8-second video, then edit just 1 second of it, will I be billed for the full 8 seconds again or only for the edited second?
The SynthID/C2PA point is what caught my eye more than the pricing. Once you go a few conversational edit turns deep on the same clip, does the credential chain track the full edit history back to the original generation, or does each new turn just stamp a fresh credential with no link to what it started from? For anything used in a context where provenance actually matters, that distinction seems like it would decide whether this is usable at all.
The conversational editing part is what I'd poke at first. At $0.10 per second of output, if I generate a 20-second clip and then say make the second shot slower, am I re-rendering and paying for the full 20 seconds each turn, or does it diff against the previous render? Iterative editing is where costs quietly balloon on these, so whether an edit re-bills the whole clip really changes the economics of building on it.
how does the conversational editing actually track changes across longer videos? like if i tweak a scene midway does it regenerate everything after or just hold the rest steady
How well does it hold up when I ask it to swap out a single object across multiple clips while keeping the lighting consistent? Curious if that kind of multi-shot edit actually feels coherent or if it still breaks halfway through.
Curious how the conversation editing handles continuity across longer clips, do the earlier frames stay consistent when you ask for revisions halfway through a generated video?
Picked it up yesterday and was honestly surprised how natural the conversational editing feels, you can tweak a scene by just asking. Video quality holds up well for the price too.