UPDATED 20:08 EDT / MAY 16 2024

OpenAI agrees to deal with Reddit to scrape its content for AI training

In the latest development in the war for data among artificial intelligence model builders, ChatGPT is getting a new source of up-to-date content thanks to a deal between OpenAI and Reddit Inc. that was announced today.

The partnership will enable OpenAI’s large language models, including GPT-3.5 and GPT-4, to “better understand and showcase Reddit content, especially on recent topics,” the companies said in a joint statement. In addition, the deal will see OpenAI become an advertising partner to Reddit, running ads on its popular website and app.

The deal is said to be valued at about $60 million, though a spokesperson for Reddit declined to disclose the terms of the deal when asked by Reuters.

OpenAI announced the partnership on the same day as it rolled out a number of updates to ChatGPT aimed at enhancing its data analysis capabilities, giving users the ability to interact with tables and charts and upload files from Google Drive and Microsoft OneDrive.

The AI phenom has struck a number of deals with publishers to bring more training data to its artificial intelligence models. In recent weeks, it has announced similar partnerships with the likes of the Financial Times and Dotdash Media Inc. Those initiatives followed a deal that was struck with the German publisher Axel Springer SE last year to enable ChatGPT to be trained on content from publications such as Business Insider and Politico in the U.S., and Bild and Die Welt in Germany.

By partnering with Reddit, OpenAI will be able to access that company’s Data API and obtain “real-time, structured and unique content” directly from Reddit. In addition, Reddit will add some new “AI-powered features” to its platform, but it hasn’t said what they might be.

Reddit caused some controversy last year when it announced that it will start charging developers to access its application programming interface, which provides access to its rich repository of human-generated content, including high-quality information. The move resulted in a number of popular third-party Reddit clients shutting down, leading to protests in many popular subreddits.

The company said at the time it had made the decision because a number of large AI companies were scraping its data without paying anything for it. It consequently began a policy of making money from its trove of content, notably striking a deal with Google LLC first, and then OpenAI today.

For OpenAI, the main advantage it gets from the deal is it can access a wealth of rich, up-to-date content that can aid in the training of its LLMs. Like other AI firms, OpenAI wants to diversify its training methods beyond simple internet scraping, which has become a fairly contentious issue that potentially violates a lot of copyrights. By partnering with Reddit, it knows it won’t have any legal issues if its chatbots lean too much on its content.

For Reddit, the deal brings the company a nice new revenue stream at a time when it’s facing heavy competition for advertising dollars from social media rivals such as Facebook, Instagram and TikTok.

Holger Mueller of Constellation Research Inc. said the deal makes sense for OpenAI, because like most AI vendors, it has little to no data of its own that it can use to train its AI models. As such, it makes sense for the company to obtain access to rich sources of content such as Reddit, he said.

“Moreover, Reddit itself benefits from the renumeration it recieves for its data, and perhaps also an attribution to its content when it’s used by OpenAI’s models to inform an answer, bringing more traffic to its site,” Mueller explained. ”

The analyst said the controversy around this deal is what it means for the people who actually create Reddit’s content, namely its users. “Where this will leave Reddit’s users in terms of intellectual property ownership is another story,” he said. “But it’s a reminder that when technology disruption happens, data protection and privacy are often the weakest link.”

ChatGPT gets better at analyzing data

ChatGPT is also getting a number of enhancements that aim to improve its data analysis skills, giving users the ability to interact with tables and charts via a new, expandable view, OpenAI said in a blog post. In addition, users will be able to feed files to ChatGPT directly from Google Drive and Microsoft OneDrive, and customize and download charts to embed into their presentations and documents.

OpenAI said the improvements build on ChatGPT’s existing ability to understand datasets and complete tasks associated with them. To get started, users simply upload one or more datasets, enabling ChatGPT to analyze them by writing and running Python code on their behalf.

The company says it can work with data in a number of ways. For instance, it can merge and clean large datasets, create charts based on the information in Excel files, uncover insights, create summaries and so on. The idea is that novices can perform more in-depth analyses, while advanced users can save time on tasks such as cleaning up their data.

The improved data analysis capabilities will be made available within OpenAI’s newest flagship model, GPT-4o, for ChatGPT Plus, Team and Enterprise subscribers only.

Image: Mike Wheatley

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

OpenAI agrees to deal with Reddit to scrape its content for AI training

ChatGPT gets better at analyzing data

Image: Mike Wheatley

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

Oracle Data Deep Dive NYC 2026

HPE World Quantum Day 2026

Qlik Connect 2026

Nutanix .NEXT 2026

KubeCon + CloudNativeCon EU 2026

OpenAI agrees to deal with Reddit to scrape its content for AI training

ChatGPT gets better at analyzing data

Image: Mike Wheatley

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

Oracle Data Deep Dive NYC 2026

HPE World Quantum Day 2026

Qlik Connect 2026

Nutanix .NEXT 2026

KubeCon + CloudNativeCon EU 2026

Cookies