UPDATED 21:08 EDT / MAY 09 2023

OpenAI is using GPT-4 to explain the behavior of large language models

ChatGPT creator OpenAI LP is working on the development of a tool that it says will eventually help it understand which parts of a large language model are responsible for its behavior.

The tool is far from a finished article, but the company has open-sourced the code and made it available on GitHub for others to explore and refine.

In a blog post today, OpenAI explained that LLMs are sometimes said to be akin to a “black box.” It’s difficult to understand why a generative artificial intelligence model responds in the way it does to certain kinds of prompts. The aim of its “interpretability research” is to try to shed more light on why LLMs behave as they do.

“Language models have become more capable and more broadly deployed, but our understanding of how they work internally is still very limited,” OpenAI’s researchers explained. “For example, it might be difficult to detect from their outputs whether they use biased heuristics or engage in deception.”

Somewhat ironically, OpenAI’s new tool relies on an LLM itself to try to figure out the various functions of the components of other, less sophisticated LLMs. In the case of its research, OpenAI attempted to use GPT-4, its latest and most advanced LLM, to try to understand one of its predecessors, GPT-2.

To understand how, it’s important to first understand how LLMs work. They’re roughly modeled on the human brain, made up of multiple “neurons” that each observe a specific pattern in text to influence the model’s response to a specific prompt. So if a model is asked about which superheroes have the best superpowers, a neuron geared toward Marvel superheroes may increase the probability of the LLM naming characters from the Marvel comic and movie universe.

OpenAI’s researchers said it’s possible to exploit this neuron-based architecture to break GPT-2 down into its individual components. The tool works by running text sequences and looks for examples where a specific neuron is frequently activated. It then shows these highly active neurons to GPT-4 and asks it to generate an explanation.

Specifically, the tool will ask GPT-4 to predict how the neuron might behave. It will then compare these predictions with the real-world behavior of that neuron to see how accurate they are. OpenAI said the methodology allows it to explain the behavior of each neuron within GPT-2, and also rate that explanation based on its actual behavior when prompted.

GPT-2 is made up of 307,200 neurons in total, and OpenAI’s researchers said they were able to generate explanations for all of them. These explanations were then compiled into a database that has been made open-source alongside the actual tool.

The idea is that the research may one day help to improve the performance of LLMs by reducing negative aspects such as “bias” or “toxicity,” OpenAI’s researchers said. However, the team behind it admitted it will be some time before the tool becomes genuinely useful for this purpose.

As the results demonstrate, it was able to explain the behavior of only about 1,000 of GPT-2’s neurons with a high degree of confidence. For the remaining 306,000 neurons, there’s a lot of work to be done to understand and predict their behavior more accurately.

OpenAI also said there’s a lot of room for improvement in its research. For instance, though it focused on short natural language explanations, it conceded that some neurons may have much more complex behavior that’s impossible to describe so succinctly. “For example, neurons could be highly polysemantic (representing many distinct concepts) or could represent single concepts that humans don’t understand or have words for,” the researchers said.

In the long term, OpenAI said one of its goals is to go beyond simple neurons to try and find and explain entire neural circuits responsible for implementing more complex behaviors, including both neurons and the “attention heads” that work with them. In addition, the researchers would also like to explain the mechanisms that cause each neuron to behave in certain ways.

“We explained the behavior of neurons without attempting to explain the mechanisms that produce that behavior,” the researchers wrote. “This means that even high-scoring explanations could do very poorly on out-of-distribution texts, since they are simply describing a correlation.”

Although there’s a long way to go, OpenAI said it’s excited about the progress it has made in using LLMs to form, test and iterate on general hypotheses, just as a human interpretability researcher would do.

Photo: Andrew Neel/Pexels

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

Are you AWS customer? Support SiliconANGLE Financially by buying your AWS services from our Marketplace portal page and links.

https://siliconangle.com/aws-marketplace/

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

OpenAI is using GPT-4 to explain the behavior of large language models

Photo: Andrew Neel/Pexels

A message from John Furrier, co-founder of SiliconANGLE:

Are you AWS customer? Support SiliconANGLE Financially by buying your AWS services from our Marketplace portal page and links.

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

RAISE Summit 2026

Pure Accelerate 2026

FinOps X 2026

Snowflake Summit 2026

Freshworks Refresh 2026

OpenAI is using GPT-4 to explain the behavior of large language models

Photo: Andrew Neel/Pexels

A message from John Furrier, co-founder of SiliconANGLE:

Are you AWS customer? Support SiliconANGLE Financially by buying your AWS services from our Marketplace portal page and links.

LATEST STORIES

LATEST STORIES

RAISE Summit 2026

Pure Accelerate 2026

FinOps X 2026

Snowflake Summit 2026

Freshworks Refresh 2026