UPDATED 18:52 EDT / APRIL 14 2025

AI

OpenAI launches new GPT-4.1 series of language models for developers

OpenAI today made a trio of new language models available to developers through its application programming interface.

The flagship algorithm in the series, GPT-4.1, is described as being “significantly better” than its predecessor at coding. The two other models, GPT-4.1 mini and GPT-4.1 nano, trade off some output quality for lower pricing. All three models can process prompts with up to 1 million tokens, which allows them to analyze large uploads such as GitHub repositories.

When developers ask a language model to help with a coding task, they often send not one prompt but several. Additionally, prompts often reference earlier input. OpenAI says that GPT-4 can “pick out information from past messages in the conversation” 10.5% better than its predecessor, which makes it more useful for advanced programming tasks.

Programming-related prompts usually comprise a code file and an instruction to change certain parts of it. In response to such prompts, OpenAI’s earlier models often outputted not only the requested changes but also the parts of the original code file that weren’t modified. That drove up costs because the company charges customers based on model output volume.

According to OpenAI, its engineers configured GPT-4.1 to only output changed code lines instead of entire files. To further cut costs, users can cache answers to frequently-entered prompts. OpenAI has boosted its caching discount from 50% to 75% as part of today’s product update.

The company says GPT-4.1 also brings several other enhancements for developers. It’s better at generating user interfaces and is less likely to produce unnecessary code, which reduces the amount of time that software teams must spend filtering the model’s output.

GPT-4.1 mini, the second new model that OpenAI launched today, is a more hardware-efficient algorithm with less advanced capabilities. Nevertheless, it offers performance competitive with the predecessor to GPT-4.1. “It matches or exceeds GPT‑4o in intelligence evals while reducing latency by nearly half and reducing cost by 83%,” OpenAI staffers detailed in a blog post today.

The third addition to the company’s language model lineup is GPT‑4.1 nano. It’s designed for relatively simple tasks such as sorting documents based on topic or powering the code autocomplete features of a programming tool. Besides costing less, it also promises to provide significantly lower latency than OpenAI’s two other new models.

“We’ve improved our inference stack to reduce the time to first token, and with prompt caching, you can cut latency even further while saving on costs,” OpenAI’s staffers wrote. “In our initial testing, the p95 latency to first token for GPT‑4.1 is approximately 15 seconds with 128,000 tokens of context, and up to half a minute for a million tokens of context.”

OpenAI doesn’t plan to make GPT-4.1 available in ChatGPT. Instead, the company has opted to refine the coding and instruction-following capabilities of the earlier GPT‑4o model that powers the chatbot service.

Image: OpenAI

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU