UPDATED 17:15 EDT / AUGUST 07 2025

GPT-5 AI

OpenAI unveils GPT-5, a new flagship AI model with high accuracy and coding power

In a highly anticipated announcement today, OpenAI released GPT-5, the company’s most recent state-of-the-art artificial intelligence model that outperforms previous models on intelligence benchmarks and answers questions with high accuracy.

“GPT-5 is a major upgrade over GPT-4o and a significant step along our path to AGI,” said co-founder and Chief Executive Sam Altman. “GPT-3 was sort of like talking to a high school student. There were flashes of brilliance lots of annoyance but people start to use it and get some value out of it.”

OpenAI emphasized GPT-5’s significant leap in capabilities, particularly in coding, front-end design and debugging large codebases. The model also delivers a deeper contextual understanding and expressive depth for writing and report generation, areas where earlier models occasionally faltered.

Under the hood, GPT-5 is a hybrid system. It routes between a standard model for direct answers and a “thinking” model for deeper reasoning. Depending on the complexity of the user’s prompt, GPT-5 automatically decides which model to engage, with an option for users to manually enable the “thinking” mode via model picker or by typing instructions such as “think hard about this.”

In evaluations, OpenAI said GPT-5 is showed much a significant increase in intelligence over previous models in performance benchmarks, especially in math, coding, visual perception and health.

In math performance, the company noted, it sets a new state of the art bar with 94.6% on AIME 2025 without tools, for coding 74.9% on SWE-bench Verified and 88% on Aider Polyglot, for multimodal understanding 84.2% on MMMU, and for health 46.2% on HealthBench Hard. The company said these scores appear during everyday use not just in competition.

This is nice to see in comparison to the previous models, but how does it hold up against the competition? Anthropic’s recent Claude Opus 4.1 model scored 74.5% on SWE-bench Verified, slightly below GPT-5, and Google LLC’s Gemini 2.5 Pro scored 59.6%.

“GPT-5 as a language model shows continued progress in making AI more useful for real work,” Arvind Jain, founder and CEO of Glean Technologies Inc., told SiliconANGLE in an email. “What’s notable is that reasoning over data, not just planning, is what drives the model’s accuracy.”

Comparatively, on Humanity’s Last Exam — a benchmark testing general intelligence across disciplines — a version of GPT-5 with superior reasoning, GPT-5 Pro, scored 42% with tools, just behind xAI Inc.’s Grok 4 Heavy at around 44%.

However, users might be more interested in what OpenAI has done about critical issues that affect them more directly such as hallucinations, or the model’s likelihood to confabulate and completely make up falsehoods. According to the company, with web search enabled GPT-5 about 45% less likely to generate factual errors than GPT-4o, and when thinking is enabled that is further reduced by about 80% less than o3.

The company also said it addressed the “sycophancy problem,” an issue that struck GPT-4o earlier this year where it would excessively and overly agree with users (sometimes to their detriment). This included making it less agreeable and use fewer unnecessary emojis — unless users ask it to, of course.

GPT-5 is rolling out today as the new default model for signed-in ChatGPT users, replacing GPT-4o. It auto-switches between reasoning and non-reasoning modes, while paid users can manually enable deeper reasoning.

Agentic coding and the rise of vibe development

The OpenAI team noted that AI models are beginning to saturate benchmarks and that not everything can be demonstrated by sheer numbers. To showcase how GPT-5 has become a better “brain” for agentic coding and developing applications, they demonstrated it in a real world situation where it was tasked with debugging a software audio problem.

After only a minute of work and thinking on the problem, the model came back with a working, bug-free solution, complete with code changes.

“GPT-5 is the smartest coding model we’ve used. Our team has found GPT-5 to be remarkably intelligent, easy to steer, and even to have a personality we haven’t seen in any other model,” said Michael Truell, co-founder and CEO of Anysphere Inc., the maker of Cursor, an AI agent-based coding platform. “It not only catches tricky, deeply hidden bugs but can also run long, multi-turn background agents to see complex tasks through to the finish.”

According to OpenAI, GPT-5 has been designed to become a better collaborator, particularly in agentic coding products such as Cursor, Windsurf, GitHub Copilot and CodexCLI. These are platforms where users prompt an AI model with a description of what they want it to do and then let it loose on their codebase, or a blank slate, and allow it to run on its own.

AI agents are capable of autonomously performing tasks by breaking down complex workflows to achieve goals by breaking them down into step-by-step plans and then executing them with little or no human intervention. They can also collaborate with a human developer similar to a pair programmer to assist them in producing a product to handle an intricate task.

GPT-5 in particular, OpenAI researchers said, has been designed to improve tool calling and follow instructions to swiftly perform coding tasks according to natural language prompts. This makes it ideal for what is known as “vibe coding,” an emerging software development trend where developers use AI to assist them by generating code from prompts, rather than writing it manually.

With the recent changes to the model and the additional capabilities have made the model better at reasoning out creative performance. This includes understanding color, user interface design and user intent.

“GPT-5 really brings the power of beautiful and effective code to everyone.” said Yan Dubois, a solutions architect, at OpenAI. “For me it’s the first time I trust a model to do my most important work. This is beyond vibe coding.”

For developers, GPT-5 comes in three tiers — GPT-5, GPT-5-mini and GPT-5-nano — offered via application programming interface with options for cost, latency and reasoning depth. Tool call preambles, verbosity controls and regex-enforced outputs are now part of the package, making it even more reliable and tunable than previous models.

Images: OpenAI

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.