OpenAI has dipped its toes, or should I say its whole body, into the world of video generation.
Following in the footsteps of startups like
RunwayML, the titular AI company announced
Sora, a text-to-video AI model that’s capable of producing some stunning — almost concerning results.
Sora works like the rest of OpenAI’s offerings — enter a prompt as simple or as detailed as you like, and it will generate a minute-long 1080p video in whatever style you want, populated with things, people, animals, and different environments. You can also craft your blockbuster movie just by dropping in a still image which the AI will then go on to animate, or a video that can be extended by Sora.
According to OpenAI, Sora was trained on jaround 10,000 hours of “high quality video” and is built upon a transformer architecture, which apparently gives the model a superior scaling performance. It also uses the same
“recaptioning technique from DALL·E 3, which involves generating highly descriptive captions for the visual training data.” Safety was a big concern for the team as well, so it’s not open to the public yet. Rather, the company is working with “red-teamers” — experts in things like misinformation, hate content, and bias — who will be testing the model thoroughly before any release to the wider public.
Sora — with all of its mind-blowing capabilities isn’t perfect though, and the team recognizes its weaknesses, particularly when it comes to physics, saying “It may struggle with accurately simulating the physics of a complex scene, and may not understand specific instances of cause and effect.”
As mentioned, Sora isn’t currently available to the wider public, and there’s no release date yet. However, you can continue to reply to
Sam Altman and
maybe he will generate your prompt, or you can take a look at this
curated gallery of examples made by a maker.