UPDATED 14:30 EST / MAY 13 2024

The letters "GPT-4o" on an abstract pink and blue background

OpenAI unleashes GPT-4o, a new flagship model with real-time multimodal capabilities

OpenAI upped its artificial intelligence game today with a new flagship AI model named GPT-4o that can respond in real time to text, audio and image inputs, promoting more natural human-computer interactions.

The company says GPT-4o, with the “o” standing for “omni,” is a step toward making talking to an AI model feel more like speaking to or working with another human being. It can respond to voice inputs with an average of 320 milliseconds, which is similar to the human response time. It also matches GPT-4 Turbo in performance on text in English, with significant improvements to non-English languages.

“This is the first time that we’re making a huge step forward when it comes to the ease of use,” said Mira Murati, chief technology officer of OpenAI. “Until now, with voice mode, we had three models that come together to deliver this experience. We had transcription, intelligence and then text-to-speech all together in orchestration to deliver voice mode. This also brings a lot of latency to the experience, which breaks the immersion in collaboration with ChatGPT. Now, with GPT-4o, this all happens natively.”

The new model will soon be accessible to ChatGPT users for free, as it will be rolled out to power its experiences under the hood. OpenAI announced a version of ChatGPT that users can access without an account in April and today the company announced a desktop version for MacOS for free and paid users.

In a demonstration, OpenAI researchers showed onstage how the new model under the hood of ChatGPT is capable of real-time voice conversation, providing the sensation of a real person on the other line with near-instant emotive responses. The new model can also produce a broad range of emotional responses, that it can incorporate into its voice, including chuckling, that sensation of a “smile” in speech, soft sighs and other verbal queues that people associate with a human speaker.

During the demonstration, OpenAI asked the model to tell a bedtime story and had the model introduce drama into the tale, to which the model became more bombastic and grandiose in its tone. It told a bedtime story about a robot and while it was doing so the presenters continuously asked it to update its tone – right up until they requested the model tell the story in a “robotic voice,” and finish the story in a “singsong voice.” The model complied each time adroitly, shifting its tone and even playfully responding with “Initiating dramatic robotic voice.”

The demo also showed that you can interrupt the model when it’s speaking, meaning that it doesn’t need to finish a sentence before asking it about something else. This ability makes interacting with the model a lot more like a conversation, where sometimes interruptions are needed just to get a point across.

Since the model is “multimodal,” it’s also able to “see” images and video, which means that it can hold conversations about what’s happening on the screen or through the camera. To show off this capability OpenAI demonstrated by asking the model to watch as a math equation was written on a piece of paper.

The researchers showed it “3x + 1 = 4,” and asked the model to help them solve for x, but not tell them the answer. It then tutored them in how to solve the equation through the steps to find value, which ended up being “x = 1.” During the demo, ChatGPT managed to be a patient and thoughtful tutor.

The ChatGPT app can also be used to help with coding, and even if it cannot see what’s on the screen, it’s possible to copy code and send it to the app. From there, a developer can hold a conversation out loud with the model about the code. It’s also possible to share the entire screen with the model, allowing it to discuss the context of the screen.

Another use for GPT-4o within ChatGPT with its voice capability in multilingual capacity is that it can work as a real-time cross-translator. The model has improved quality and speed across 50 different languages, crossing 97% of the world’s population, so a user could ask the model, “Could you translate Italian into English and vice versa for me and my friend?” and it could provide that service. In the OpenAI demonstration, it even added a little personal touch with statements such as, “Your friend asked.”

Although the access to GPT-4o will be free as OpenAI rolls it out in ChatGPT, paid users will still have five times the capacity limits of free users. GPT-4o is also available to the application programming interface for developers. It’s twice as fast, 50% cheaper and provides five times higher rate limits than the GPT-4 Turbo model.

Image: OpenAI

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

OpenAI unleashes GPT-4o, a new flagship model with real-time multimodal capabilities

Image: OpenAI

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

Microsoft Ignite 2025

SC25

Refresh North America 2025

QAD Champions of Manufacturing 2025

Agentic AI Unleashed: The Future of Digital & IT Operations 2025

OpenAI unleashes GPT-4o, a new flagship model with real-time multimodal capabilities

Image: OpenAI

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

Microsoft Ignite 2025

SC25

Refresh North America 2025

QAD Champions of Manufacturing 2025

Agentic AI Unleashed: The Future of Digital & IT Operations 2025

Cookies