Google’s DeepMind creates generative AI model with fact checker to crack unsolvable math problem
Google LLC’s DeepMind artificial intelligence research unit claims to have cracked an unsolvable math problem using a large language model-based chatbot equipped with a fact-checker to filter out useless outputs.
By using a filter, DeepMind researchers say the LLM can generate millions of responses, but only submit the ones that can be verified as accurate.
It’s a milestone achievement, as previous DeepMind breakthroughs have generally relied on AI models that were specifically created to solve the task in hand, such as predicting weather or designing new protein shapes. Those models were trained on very accurate and specific datasets, which makes them quite different from LLMs such as OpenAI’s GPT-4 or Google’s Gemini.
Those LLMs are trained on vast and varied datasets, enabling them to perform a wide range of tasks and talk about almost any subject. But the approach carries risks, as LLMs are susceptible to so-called “hallucinations,” which is the term for producing false outputs.
Hallucinations are a big problem for LLMs. Gemini, which was only released this month and is said to be Google’s most capable LLM ever, has already shown it’s vulnerable, inaccurately answering fairly simple questions such as who won this year’s Oscars.
Researchers believe that hallucinations can be fixed by adding a layer above the AI model that verifies the accuracy of its outputs before passing them onto users. But this kind of safety net is tricky to build when LLMs have been trained to discuss such a wide range of topics.
At DeepMind, Alhussein Fawzi and his team members created a generalized LLM called FunSearch, which is based on Google’s PaLM2 model. They added a fact-checking layer, called an “evaluator.” In this case, FunSearch has been geared to solving only math and computer science problems by generating computer code. According to DeepMind, this makes it easier to create a fact-checking layer, because its outputs can be rapidly verified.
Although the FunSearch model is still susceptible to hallucinations and generating inaccurate or misleading results, the evaluator can easily filter them out, and ensure the user only receives reliable outputs.
“We think that perhaps 90% of what the LLM outputs is not going to be useful,” Fawzi said. “Given a candidate solution, it’s very easy for me to tell you whether this is actually a correct solution and to evaluate the solution, but actually coming up with a solution is really hard. And so mathematics and computer science fit particularly well.”
According to Fawzi, FunSearch is able to generate new scientific knowledge and ideas, which is a new milestone for LLMs.
The researchers tested its abilities by giving it a problem, plus a very basic solution in source code, as an input. Then, the model generated a database of new solutions that were checked by the evaluator for their accuracy. The most reliable of those solutions are then fed back into the LLM as inputs, together with a prompt asking it to improve on its ideas. According to Fawzi, by doing it this way, FunSearch produces millions of potential solutions that eventually converge to create the most efficient result.
When tasked with mathematical problems, FunSearch writes computer code that can find the solution, rather than trying to tackle it directly.
Fawzi and his team tasked FunSearch with finding a solution to the cap set problem, which involves determining patterns in points, where no three points make a straight line. As the number of points grows, the problem becomes vastly more complex.
However, FunSearch was able to create a solution consisting of 512 points across eight dimensions, which is larger than any human mathematician has managed. The results of the experiment were published in the journal Nature.
Although most people are unlikely ever to come across the cap set problem, let alone attempt to solve it, it’s an important achievement. Even the best human mathematicians do not agree on the best way to solve this challenge. According to Terence Tao, a professor at the University of California, who describes the cap set problem as his “favorite open question,” FunSearch is an extremely “promising paradigm” since it can potentially be applied to many other math problems.
FunSearch proved as much when tasked with the bin-packing problem, where the goal is to efficiently place objects of different sizes into the least number of containers as is possible. Fawzi said FunSearch was able to find solutions that outperform the best algorithms created to solve this particular problem. Its results could have significant implications in industries such as transport and logistics.
FunSearch is also notable because, unlike with other LLMs, users can actually see how it goes about generating its outputs, meaning they can learn from it. This sets it apart from other LLMs, where the AI is more akin to a “black box.”
Image: bedneyimages/Freepik
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU