UPDATED 20:42 EDT / JANUARY 09 2025

Diffbot boosts LLM accuracy by tapping into its vast Knowledge Graph of up-to-date information

Knowledge graph startup Diffbot Technologies Corp., which maintains one of the largest online knowledge indexes, is looking to tackle the problem of hallucinations in artificial intelligence chatbots by ensuring the accuracy of their responses.

The company has just launched a fine-tuned version of Meta Platforms Inc.’s Llama 3.3, saying that its responses are enhanced using a new technique called graph retrieval-augmented generation.

Diffbot’s large language model is not like typical AI models, which are trained on vast databases. Instead, it’s trained on a small amount of data and taught how to search for information within the company’s vast Knowledge Graph, which contains more than 1 trillion interconnected facts and is constantly updated.

Diffbot’s Knowledge Graph has been crawling the public internet for the last eight years, categorizing web pages into different groups, such as people, companies, articles and products. It extracts the most recent information from these sites using natural language processing and compute vision to keep its database up to date.

That database is updated every four to five days with “millions of new data points,” and it’s what’s being used to fuel Diffbot’s latest AI model to ensure its responses are grounded in the most up-to-date and accurate information.

That’s different from most other LLMs, which rely on static information that’s encoded into their training data. According to Diffbot, this makes its AI model much more accurate than others. If it’s asked about a recent news event, for example, it will search the Knowledge Graph for the most recent updates, extract the most relevant data, and cite the sources of that information to the user. So not only is it more accurate, but also more transparent than other chatbots.

Diffbot founder and Chief Executive Mike Tung told VentureBeat that he believes the AI industry will shift toward a standard that will see most general-purpose reasoning bots distilled to about 1 billion parameters, rather than the multibillion-parameter LLMs being developed today. He argues that it’s unsustainable to try to integrate all of the latest knowledge within AI models. Rather, it’s better to teach the models to use the tools necessary to search for external knowledge.

The startup hopes to finally solve the question of so-called “hallucinations,” which occur when AI models cannot find the answer to a user’s question and, instead of saying they don’t know, fabricate their responses. This tendency makes it risky to deploy AI, and Diffbot believes the solution is to ground AI systems in “verifiable facts” rather than trying to cram as much knowledge as possible into them.

Tung provided an example of users wanting to know the latest weather forecast in their area. “Instead of generating an answer based on outdated training data, our model queries a live weather service and provides a response grounded in real-time information,” he explained.

Diffbot says benchmarks show that its method is far more reliable. It achieved an 81% score on the FreshQA benchmark, which is designed to test AI models on real-time factual knowledge, beating both Gemini and ChatGPT. In addition, it achieved a 70.36% score on MMLU-Pro, which tests AI models for their academic knowledge.

The best thing about Diffbot’s model is that it’s being made open source, so companies will be able to download it and run it on their own machines and fine-tune it for their needs. For instance, they’ll be able to customize it to search their own databases, as well as Diffbot’s Knowledge Graph.

“You can run it locally on your machine,” Tung said, adding that this also makes it superior from a privacy perspective. “There’s no way you can run Google Gemini without sending your data over to Google and shipping it outside of your premises.”

Diffbot hopes that its LLM will be used by enterprises for workloads that require exceptional accuracy and full accountability, and it has made some inroads there, providing data services to Duck Duck Go Inc., Cisco Systems Inc. and Snap Inc.

Its model can be downloaded via GitHub now, and there’s a public demo available at diffy.chat. Companies that want to deploy it on their own hardware can choose the smallest 8 billion-parameter version, which can run on only a single Nvidia A100 graphics processing unit. The biggest, 70 billion-parameter model requires two H100 GPUs.

Image: Diffbot

A message from John Furrier, co-founder of SiliconANGLE:

Support our open free content by sharing and engaging with our content and community.

Join theCUBE Alumni Trust Network

Where Technology Leaders Connect, Share Intelligence & Create Opportunities

11.4k+

CUBE Alumni Network

C-level and Technical

Domain Experts

15M+

theCUBE

Viewers

Connect with 11,413+ industry leaders from our network of tech and business leaders forming a unique trusted network effect.

SiliconANGLE Media is a recognized leader in digital media innovation serving innovative audiences and brands, bringing together cutting-edge technology, influential content, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — such as those established in Silicon Valley and the New York Stock Exchange (NYSE) — SiliconANGLE Media operates at the intersection of media, technology, and AI. .

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a powerful ecosystem of industry-leading digital media brands, with a reach of 15+ million elite tech professionals. The company’s new, proprietary theCUBE AI Video cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Diffbot boosts LLM accuracy by tapping into its vast Knowledge Graph of up-to-date information

Image: Diffbot

A message from John Furrier, co-founder of SiliconANGLE:

Join theCUBE Alumni Trust Network

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

Google Cloud Partner AI Series

Black Hat USA 2025

Open Storage Summit 2025

World of Workato 2025

VMware Explore 2025

RECENT CUBE EVENTS

AWS Mid-Year Leadership Summit 2025

RAISE Summit 2025

Blue Yonder AI and the Autonomous Supply Chain 2025

Data Protection & AI Summit 2025

Open Source Summit NA 2025

Diffbot boosts LLM accuracy by tapping into its vast Knowledge Graph of up-to-date information

Image: Diffbot

A message from John Furrier, co-founder of SiliconANGLE:

Join theCUBE Alumni Trust Network

LATEST STORIES

LATEST STORIES

Google Cloud Partner AI Series

Black Hat USA 2025

Open Storage Summit 2025

World of Workato 2025

VMware Explore 2025

AWS Mid-Year Leadership Summit 2025

RAISE Summit 2025

Blue Yonder AI and the Autonomous Supply Chain 2025

Data Protection & AI Summit 2025

Open Source Summit NA 2025

Cookies