UPDATED 17:48 EST / DECEMBER 14 2023

AI

OpenAI details automated approach to supervising AI models

OpenAI’s Superalignment team, which focuses on addressing the risks posed by advanced artificial intelligence models, today published its first research paper.

The group was formed in June under the leadership of OpenAI Chief Scientist Ilya Sutskever and head of alignment Jan Leike. Over the next four years, it will have access to 20% of the compute capacity the company has “secured to date,” which is reportedly worth billions of dollars. The Superalignment team’s goal is to develop new ways of preventing advanced AI models from generating harmful output.

Current approaches to supervising AI models depend on human input. OpenAI’s concern is that human input may be insufficient to regulate a hypothetical future neural network with superhuman reasoning capabilities. If such a neural network would generate output that might be difficult for engineers to understand, such as a file containing millions of lines of code, manually checking the output for risks could become impossible.

That’s the challenge the paper published today by OpenAI’s Superalignment team seeks to tackle. The paper proposes to address the limitations of manual AI model supervision by automating the process. OpenAI is arguing that an advanced neural network could potentially be prevented from generating harmful output by a second, less advanced neural network.

The paper’s authors refer to their automated supervision method as weak-to-strong generalization. Before publicly detailing the method, they carried out an experiment to test its effectiveness. They used an AI model on par with GPT-2, a relatively simple language model OpenAI released in 2019, to supervise the company’s latest GPT-4 model.

The researchers identified an implementation challenge during the test: the quality of an advanced AI model’s output can be reduced by the less advanced model that supervises it. The issue is most likely to emerge when a user asks a question that is too complicated for the less advanced AI. If the advanced neural network answers a question correctly but the less advanced model believes that the answer is incorrect, it may block the output. 

OpenAI’s researchers developed an algorithm to address the challenge. “We use a simple method that encourages the strong model to be more confident — including confidently disagreeing with the weak supervisor if necessary,” they wrote in a blog post that accompanied the paper.

The researchers applied the method to the experiment in which they used an AI model on par with GPT-2 to supervise GPT-4. The quality of the latter model’s output was reduced, but to a lesser extent than before. “We show that we can use a GPT-2-level model to elicit most of GPT-4’s capabilities — close to GPT-3.5-level performance — generalizing correctly even to hard problems where the small model failed,” they detailed. 

OpenAI has released a collection of open-source code files designed to help developers test and refine its automated AI supervision method. Additionally, the company is launching a $10 million grants program to support research in this area. OpenAI will also support research projects that explore other approaches to supervising advanced AI models.

Image: OpenAI

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU