Gemini Omni Flash - High-quality video generation and conversational editing
by•
Gemini Omni Flash (gemini-omni-flash-preview) just rolled out to developers via the Gemini API and Google AI Studio, natively supporting high-quality video generation and conversational editing from a combination of text, image and video inputs. This model is priced competitively at $0.10 per second of video output, which is the same as Veo 3.1 Fast.


Replies
Netlify
Hey PH fam 👋
Video creation has always meant stitching five tools together.
A script model here, a text-to-image model there, an image-to-video tool, a separate lip-sync app, a voice generator.
Each one its own contract, its own learning curve, its own headache.
Now Google's latest offering Gemini Omni Flash collapses all of that into one model. It's the first release in Google's new Omni family, and it does something most video models can't: it actually holds a conversation with you while you edit. You don't regenerate from scratch every time you want a tweak. You just talk to it.
How it works:
→ Feed it text, images, or short video clips as references
→ It generates a clip grounded in Gemini's real-world knowledge (history, biology, narrative logic, all of it)
→ Ask for changes in plain English: "make the lighting warmer," "swap the product," "extend the camera pan"
→ It remembers the last few turns, so your edits build instead of starting over
Why it's worth your attention:
→ Priced at $0.10 per second of 720p output, matching Veo 3.1 Fast
→ Launched at #1 on LMArena's Text-to-Video Arena
→ Every clip carries SynthID watermarking and C2PA credentials baked in, so provenance isn't an afterthought
→ Pairs naturally with Nano Banana 2 Lite: generate a still image, then animate it straight into video
What strikes me most isn't the generation quality, it's the editing model.
Most AI video tools still treat you like a one-shot prompt engineer. This treats you like a director who gets to say "no, try that again, but..."
Curious what you'd build first: a product explainer, a localized training video, or something nobody's tried yet?
@thisiskp_ If I generate an 8-second video, then edit just 1 second of it, will I be billed for the full 8 seconds again or only for the edited second?
Looks interesting! will try it out!
The conversational editing part is what I'd poke at first. At $0.10 per second of output, if I generate a 20-second clip and then say make the second shot slower, am I re-rendering and paying for the full 20 seconds each turn, or does it diff against the previous render? Iterative editing is where costs quietly balloon on these, so whether an edit re-bills the whole clip really changes the economics of building on it.
how does the conversational editing actually track changes across longer videos? like if i tweak a scene midway does it regenerate everything after or just hold the rest steady
the conversational editing on video is such a smart move, makes iterating on outputs feel way less clunky than re-prompting from scratch
Picked it up yesterday and was honestly surprised how natural the conversational editing feels, you can tweak a scene by just asking. Video quality holds up well for the price too.
Pulled it into AI Studio yesterday and the conversational editing actually feels natural, like it understood I wanted to swap the background without rebuilding the whole clip. Pricing matching Veo 3.1 Fast makes it easy to justify experimenting more.
How well does it hold up when I ask it to swap out a single object across multiple clips while keeping the lighting consistent? Curious if that kind of multi-shot edit actually feels coherent or if it still breaks halfway through.
The conversational editing from a video input feels really natural, you can nudge a scene and it actually listens. Pricing matches Veo 3.1 Fast so it slots in nicely for quick iterations.
Curious how the conversation editing handles continuity across longer clips, do the earlier frames stay consistent when you ask for revisions halfway through a generated video?