UPDATED 14:48 EDT / JULY 25 2024

AI

Reddit blocks Bing, several other search engines from indexing its platform on AI training concerns

Reddit Inc. has blocked some search engines’ access to  content from its namesake forum platform.

The development was first reported by 404 Media on Wednesday and Reddit confirmed the move in a blog post this morning. The company’s new search engine restrictions don’t apply to Google LLC, with which it recently inked a content licensing deal reportedly worth $60 million per year.

According to 404 Media, new Reddit posts no longer appear in Bing and DuckDuckGo. The change is also believed to affect smaller Google rivals such Mojeek and Qwant, two search engines popular in the U.K. and France, respectively. At least some new Reddit content is still accessible to customers of Kagi, a paid search engine that uses Google technology to power some of its results.

The reduced visibility of Reddit content stems from a recent change to the platform’s robots.txt file. The record contains instructions for search engines and other services that may wish to scrape Reddit content. The company has updated its robots.txt file with code designed to block unauthorized scrapers, as well as a text snippet that reads “Reddit believes in an open Internet, but not the misuse of public content.”

The development comes about five months after word emerged that Reddit had inked a content licensing deal with Google. Under the contract, the search giant may incorporate user posts on Reddit into the training datasets with which it builds artificial intelligence models. In exchange, Reddit will reportedly receive $60 million per year.

“This is not at all related to our recent partnership with Google,” a Reddit spokesperson told The Verge. “We have been in discussions with multiple search engines. We have been unable to reach agreements with all of them, since some are unable or unwilling to make enforceable promises regarding their use of Reddit content, including their use for AI.”

The statement leaves open the possibility that Reddit posts could eventually reappear in Bing and other Google alternatives. Additionally, the social network will make its content accessible for researchers and certain other parties that intend to use it for noncommercial purposes. The nonprofit behind the Internet Archive, an archive of the web, is one of the organizations that has been granted access.

Google is licensing content from not only Reddit but also other partners to support its AI development efforts. In February, the company inked a deal to license user-generated content from the operator of the Stack Overflow programming forum. The agreement gives Google permission to make Stack Overflow posts available in the Google Cloud console and the Gemini for Google Cloud chatbot.

Rival OpenAI has signed a series of similar licensing deals. The ChatGPT developer signed content agreements with both Reddit and Stack Overflow earlier this year. OpenAI is also licensing articles from the Associated Press, Axel Springer, the Financial Times and a range of other publishers. 

Photo: Unsplash

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU