

Researchers from Anthropic PBC today published two papers that shed new light on how large language models perform processing.
According to the company, the findings provide a better understanding of LLMs’ reasoning workflows. Additionally, the new research could improve how developers audit the reliability of their models. The ability to check that an LLM generates accurate output is a core requirement of enterprise machine learning projects.
During the research initiative that produced the two papers, Anthropic staffers asked one of the company’s Claude LLMs for the “opposite of small.” They then repeated the question in multiple languages. The goal, the company detailed, was to determine how the LLM goes about processing prompts.
Anthropic found that some of the internal components used by Claude to answer the question only understand one language. Other components, meanwhile, are language-agnostic. Additionally, Claude appears to have significantly more modules of the latter variety than smaller LLMs.
The language-agnostic components provide “additional evidence for a kind of conceptual universality — a shared abstract space where meanings exist and where thinking can happen before being translated into specific languages,” Anthropic researchers explained in a blog post. “More practically, it suggests Claude can learn something in one language and apply that knowledge when speaking another.”
That’s important because the ability to apply concepts from one domain to another is a key element of reasoning. “Studying how the model shares what it knows across contexts is important to understanding its most advanced reasoning capabilities,” the researchers explained.
The ability to plan ahead is another requisite of advanced reasoning. Claude can do that too, Anthropic found. Its researchers made the discovery by studying how the LLM generates poetry.
On paper, Claude should generate the first line of a poem, generate the first part of the second line and then find a way to make the second line’s ending rhyme. In practice, however, the model starts thinking about the second line’s ending much earlier. This indicates Claude possesses the ability to plan future tasks when it’s conducive to do so ahead of time.
Anthropic determined that the LLM can also adjust its plans when necessary. After the company disabled one of the components that Claude used to produce a rhyme, the model found a way of generating it using a different component. “This demonstrates both planning ability and adaptive flexibility — Claude can modify its approach when the intended outcome changes,” the researchers explained.
In another evaluation, Anthropic studied how Claude tackles questions that can be answered by “memorizing” training data. The company found that the model didn’t memorize the information but rather generated an answer through a multistep reasoning workflow.
One way developers can check an LLM’s reliability is by asking it to explain how it answers prompts. While studying Claude’s reasoning capabilities, Anthropic determined that the explanations the model provides don’t always reflect its thought process.
The company’s researchers asked the LLM to answer a series of simple math questions. Claude claimed it solved them using standard methods. Upon closer inspection, however, Anthropic’s researchers discovered that the model took an entirely different approach than the one it described.
“This may reflect the fact that the model learns to explain math by simulating explanations written by people, but that it has to learn to do math ‘in its head’ directly, without any such hints, and develops its own internal strategies to do so,” Anthropic’s researchers detailed.
Tracing how Claude answers a prompt with a few dozen words currently takes several hours of manual work. According to Anthropic, understanding the way LLMs process more complex requests will require improvements to the observation methods it detailed today. The company’s researchers believe that it might be possible to use AI to speed up the workflow.
THANK YOU