OpenAI is using GPT-4 to explain the behavior of large language models
ChatGPT creator OpenAI LP is working on the development of a tool that it says will eventually help it understand which parts of a large language model are responsible for its behavior.
The tool is far from a finished article, but the company has open-sourced the code and made it available on GitHub for others to explore and refine.
In a blog post today, OpenAI explained that LLMs are sometimes said to be akin to a “black box.” It’s difficult to understand why a generative artificial intelligence model responds in the way it does to certain kinds of prompts. The aim of its “interpretability research” is to try to shed more light on why LLMs behave as they do.
“Language models have become more capable and more broadly deployed, but our understanding of how they work internally is still very limited,” OpenAI’s researchers explained. “For example, it might be difficult to detect from their outputs whether they use biased heuristics or engage in deception.”
Somewhat ironically, OpenAI’s new tool relies on an LLM itself to try to figure out the various functions of the components of other, less sophisticated LLMs. In the case of its research, OpenAI attempted to use GPT-4, its latest and most advanced LLM, to try to understand one of its predecessors, GPT-2.
To understand how, it’s important to first understand how LLMs work. They’re roughly modeled on the human brain, made up of multiple “neurons” that each observe a specific pattern in text to influence the model’s response to a specific prompt. So if a model is asked about which superheroes have the best superpowers, a neuron geared toward Marvel superheroes may increase the probability of the LLM naming characters from the Marvel comic and movie universe.
OpenAI’s researchers said it’s possible to exploit this neuron-based architecture to break GPT-2 down into its individual components. The tool works by running text sequences and looks for examples where a specific neuron is frequently activated. It then shows these highly active neurons to GPT-4 and asks it to generate an explanation.
Specifically, the tool will ask GPT-4 to predict how the neuron might behave. It will then compare these predictions with the real-world behavior of that neuron to see how accurate they are. OpenAI said the methodology allows it to explain the behavior of each neuron within GPT-2, and also rate that explanation based on its actual behavior when prompted.
GPT-2 is made up of 307,200 neurons in total, and OpenAI’s researchers said they were able to generate explanations for all of them. These explanations were then compiled into a database that has been made open-source alongside the actual tool.
The idea is that the research may one day help to improve the performance of LLMs by reducing negative aspects such as “bias” or “toxicity,” OpenAI’s researchers said. However, the team behind it admitted it will be some time before the tool becomes genuinely useful for this purpose.
As the results demonstrate, it was able to explain the behavior of only about 1,000 of GPT-2’s neurons with a high degree of confidence. For the remaining 306,000 neurons, there’s a lot of work to be done to understand and predict their behavior more accurately.
OpenAI also said there’s a lot of room for improvement in its research. For instance, though it focused on short natural language explanations, it conceded that some neurons may have much more complex behavior that’s impossible to describe so succinctly. “For example, neurons could be highly polysemantic (representing many distinct concepts) or could represent single concepts that humans don’t understand or have words for,” the researchers said.
In the long term, OpenAI said one of its goals is to go beyond simple neurons to try and find and explain entire neural circuits responsible for implementing more complex behaviors, including both neurons and the “attention heads” that work with them. In addition, the researchers would also like to explain the mechanisms that cause each neuron to behave in certain ways.
“We explained the behavior of neurons without attempting to explain the mechanisms that produce that behavior,” the researchers wrote. “This means that even high-scoring explanations could do very poorly on out-of-distribution texts, since they are simply describing a correlation.”
Although there’s a long way to go, OpenAI said it’s excited about the progress it has made in using LLMs to form, test and iterate on general hypotheses, just as a human interpretability researcher would do.
Photo: Andrew Neel/Pexels
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU