UPDATED 18:55 EDT / FEBRUARY 15 2024

AI

OpenAI’s Sora joins text-to-video AI content generation race

OpenAI today announced Sora, a new text-to-video model that can generate videos up to a minute long while maintaining visual quality and adherence to the user’s prompt.

Text-to-video is arguably the next big thing in artificial intelligence and OpenAI isn’t the first to the party. Meta Platofrms Inc., Google LLC and Runway AI Inc., among others, also offer similar services. The challenge with all the services has been quality: Though the videos from some existing services make are highly impressive, the Holy Grail is making realistic videos, and not all get that close.

Sora is a diffusion model, a generative machine learning model that creates data such as images or videos by gradually refining random noise into structured patterns based on learned data distributions. Sora can generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background. The model also understands not only what the user has asked for in the prompt but also how those things exist in the physical world.

According to OpenAI, the model has a deep understanding of language, enabling it to interpret prompts accurately and generate “compelling characters that express vibrant emotions.” The service can also create multiple shots within a single generated video that accurately portray characters and visual style.

To its credit, OpenAI has been open about the model’s flaws as well. Sora, at least as it stands in testing, has weaknesses, including issues with accurately simulating the physics of a complex scene and may not understand specific instances of cause and effect. The model may also confuse spatial details of a prompt, for example mixing up left and right, and may struggle with precise descriptions of events that take place over time, such as following a specific camera trajectory.

Those flaws are an issue, but the model is young and some of the first demonstrations are stunning.

The video above was made using the prompt, “A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.”

Although Sora looks great, ChatGPT users will have to wait to get their hands on it. As of today, Sora is only being released to available “red teamers” to assess critical areas for harm or risks. OpenAI is also granting access to a number of visual artists, designers and filmmakers to gain feedback on how to advance the model to be most helpful for creative professionals.

“We’re sharing our research progress early to start working with and getting feedback from people outside of OpenAI and to give the public a sense of what AI capabilities are on the horizon,” OpenAI said.

Image: OpenAI

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.