

Anthropic PBC today debuted its newest large language model, Claude Sonnet 4.5, and a toolkit for building artificial intelligence agents.
The company describes the LLM as the world’s best coding model. Additionally, it says that Sonnet 4.5 has set a record on a benchmark designed to evaluate neural networks’ tool use capabilities.
Sonnet 4.5 is a hybrid reasoning model, which means it has two modes. When users enter relatively simple queries, the LLM quickly generates a response using a limited amount of computing power. When it receives a more complicated question, Sonnet 4.5 can spend a significant amount of time working on an answer. That approach boosts output quality at the expense of higher hardware usage.
Anthropic evaluated the model’s programming capabilities using a benchmark called SWE-bench Verified. Sonnet 4.5 set a new industry record with a 82% score. The next two highest scores were also achieved by Anthropic models while the fourth place went to GPT-5 Codex, which answered 74.5% of the questions correctly.
Sonnet 4.5 also set a record on a second benchmark called OSWorld. It’s used to measure how well neural networks interact with external applications such as databases. Sonnet 4.5 achieved a record score of 61.4%, a nearly 20% improvement over the Sonnet 4 model Anthropic released four months ago.
The company claims that its latest LLM also outperformed the competition across more than a half-dozen other benchmarks. According to Anthropic, those tests evaluate AI models’ ability to perform tasks such as interpreting graphs and analyzing financial data.
Sonnet 4.5 is available through Anthropic’s Claude chatbot service, Claude Code programming assistant and its application programming interface. The latter two products received updates today in conjunction with the LLM launch.
Developers interact with Claude Code by entering instructions into a command line interface. Anthropic has made several usability improvements to that interface as part of today’s update. Additionally, it’s rolling out an extension that embeds Claude Code in the popular Visual Studio Code programming tool. The extension is currently available in beta.
The other major addition to Claude Code is a feature that automatically saves the user’s code after every major change. If an error finds its way into the workflow, developers can rewind their code to an earlier, reliable version.
The upgrades are rolling out alongside a development toolkit called the Claude Agent SDK. According to Anthropic, its engineers originally built the toolkit to power Claude Code. Customers can use it to build AI agents.
Claude Agent SDK enables an agent to delegate work to so-called subagents that can perform multiple tasks in parallel, which speeds up processing. Additionally, the toolkit makes it easier to build AI applications that can interact with external systems. To reduce the risk of hallucinations, agents built with Claude Agent SDK can check their output for accuracy issues.
The toolkit can be used with the Claude API, which now provides access to Sonnet 4.5. The LLM is joined by several other enhancements.
According to Anthropic, developers can now give its AI models access to a “dedicated memory directory” with information that can help them answer prompts. When the information is no longer needed, it can be removed from a model’s context window using a new context editing tool. Anthropic says that the enhancements will enable the Claude API to tackle more complicated tasks than before.
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.