I cut a NVDA 2-hour keynote into 121 clips with one prompt - no hands
We cut every time Jensen Huang said "AI" in his CES keynote — turns out he said it 121 times. One prompt handled the entire workflow.
How it worked:
- Dive chained two MCPs together (yt-dlp + ffmpeg)
- Download video → parse subtitle timestamps → cut 121 clips → merge
- All processed locally
One prompt:
Task: Create a compilation video of every exact moment Jensen Huang says "AI".
Video source:Instructions:
Download video in 720p + subtitles in JSON3 format (word-level timestamps)
Parse JSON3 to find every "AI" instance with precise start/end times
Use ffmpeg to cut clips (~50-100ms padding for natural sound)
Concatenate all clips chronologically
Output: Jensen_CES_AI.mp4
What is Dive?
Open-source MCP client that helps you install and use MCPs without touching the terminal. Local-first by design.
What is OAPHub?
Cloud MCP platform for tools you can't run locally — official MCPs (Figma, Sentry, Atlassian) and Pro MCPs for generative AI (Seedream, Kling, Flux).
We believe local-first. Dive handles what you can run yourself. OAPHub fills the gaps.
If you want to see how it runs:
📦 GitHub: github.com/OpenAgentPlatform/Dive

Replies