

OpenAI today launched o3 and o4-mini, the latest additions to its lineup of reasoning-optimized language models.
The product milestone came against the backdrop of reports that the company may acquire Windsurf for $3 billion. Windsurf, officially Exafunction Inc., sells popular artificial intelligence coding tools. The company uses OpenAI models to power some of its features.
The ChatGPT developer’s first new algorithm, o3, is described as its most advanced reasoning model yet. The other addition to OpenAI’s portfolio, o4-mini, trades off some output quality for faster performance and lower pricing. Both models are described as being more cost-efficient their predecessors across “most real-world” tasks.
OpenAI says o3 has set new records across several popular AI performance benchmarks. One of them is SWE-bench, which evaluates AI models’ coding capabilities by asking them to fix issues in open-source projects. MMMLU, another benchmark on which o3 demonstrated state-of-the-art performance, includes college-level questions spanning topics such as science and business.
One contributor to the model’s output quality is that it’s better at tool use. That’s the process whereby a language model uses an external system, such as a code editor or a search engine, to carry out tasks it may not be capable of performing on its own. OpenAI says o3 can analyze and generate images, run Python code, search the web and interact with custom tools that customers connect via an application programming interface.
“In evaluations by external experts, o3 makes 20 percent fewer major errors than OpenAI o1 on difficult, real-world tasks,” OpenAI staffers detailed in today’s launch announcement.
The second new model that the company launched today, o4-mini, shares many of o3’s tool use features. The difference is that it’s smaller, which means it supports a narrower set of tasks but can complete them faster and more cost-efficiently. OpenAI says that this cost efficiency will enable it to provide significantly higher usage limits than for o3.
The company’s internal tests indicate that o4-mini is particularly useful for tasks that involve math, coding and visual input. Without tool use, the model can outperform the more advanced o3 across AIME 2024 and AIME 2025, two qualifying exams for the U.S. Math Olympiad. “In expert evaluations, it also outperforms its predecessor, o3‑mini, on non-STEM tasks as well as domains like data science,” OpenAI’s staffers detailed.
The company launched the models alongside a new open-source project dubbed Codex CLI. It’s an AI agent optimized for coding tasks that developers can run on their desktops. It’s accessible via the terminal, the part of a computer’s operating system that allows users to perform tasks by running scripts rather than navigating graphical interfaces.
OpenAI’s ambitions in the coding assistant market may extend beyond open-source programming agents. Citing sources familiar with the matter, Bloomberg and CNBC reported that the company is in talks to acquire Windsurf. It’s believed the deal could be worth $3 billion.
Windsurf, which until recently did business as Codeium, provides an AI programming assistant that can generate new code, explain existing code and perform related tasks. The assistant can be embedded into popular code editors via plugins. Windsurf also offers its own custom editor that was specifically built to help developers incorporate AI into their work.
THANK YOU