AI
AI
AI
Anthropic PBC today opened access to Claude Opus 4.7, the latest addition to its popular line of large language models.
The company says that the LLM is significantly better than its predecessor at coding tasks. Opus 4.7 scored 64.3% on the SWE-Bench Pro programming benchmark, nearly 10% higher than Opus 4.6. The new model also solved more of the tasks in the Terminal-Bench 2.0 dataset, which comprises coding challenges that involve the command line.
Although Opus 4.7 is better than its predecessor in multiple respects, it’s not Anthropic’s most capable LLM. Last month, the company previewed a model called Claude Mythos that is significantly more adept at code generation. The company has not made the latter LLM broadly available over concerns that it could be misused by hackers.
Opus 4.7 features a mechanism that detects attempts to harness the model for cyberattacks. According to Anthropic, its engineers will collect data about the mechanism’s effectiveness and use the findings to build guardrails for Mythos. The hope is that those guardrails will enable the company to make “Mythos-class models” broadly available to customers in a safe manner.
Cybersecurity professionals often research threats by simulating hacker tactics. As a result, the prompts they send to Opus 4.7 have a good chance of being blocked by Anthropic. The company plans to address the issue with a new initiative called the Cyber Verification Program. It will see Anthropic loosen the guardrails around cybersecurity professionals’ accounts to allow a broader range of prompts.
Coding is not the only area where Opus 4.7 performs better than the company’s earlier models. According to Anthropic, it’s also better at visual reasoning tasks. Opus 4.7 can “see images in greater resolution” and is more adept at generating visual assets such as user interface designs.
The model performs some tasks nearly as well as Mythos. Opus 4.7 came within 1% of the frontier model’s score on GPQA Diamond, a collection of graduate-level science questions. OpenAI Group PBC’s GPT-5.4, meanwhile, topped Mythos’ score on BrowseComp, a benchmark designed to test LLMs’ online research skills.
Anthropic is rolling out Opus 4.7 alongside a number of other product updates.
The company’s application programming interface enables developers to set a so-called effort level for its LLMs. Increasing the effort level boosts both output quality and interference costs. Anthropic today introduced a new tier called xhigh that sits between the highest and second-highest effort levels. According to the company, the addition will enable developers to optimize their workloads’ cost-performance ratio in a more fine-grained manner.
Anthropic has also added a second cost management feature to its API. Customers can now set task budgets, parameters that define the maximum number of tokens Claude may process while carrying out a task. Token usage directly influences the cost of inference runs.
Claude Code, Anthropic’s programming assistant, has received a slash command called ultrareview. It instructs the tool to scan a code file for bugs and other issues. Claude Code customers with a Max subscription can use the feature alongside another newly added automation capability, auto mode, that enables the assistant to complete long-running programming tasks more quickly.
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.