UPDATED 17:51 EDT / JANUARY 08 2024

AI

OpenAI argues New York Times’ AI copyright lawsuit is ‘without merit’

OpenAI claims that a copyright lawsuit brought against it and Microsoft Corp. by The New York Times is “without merit.”

The artificial intelligence developer made the argument in a blog post published today. The response comes less than two weeks after the Times filed its lawsuit, which accuses OpenAI of using millions of the paper’s articles to train its AI models. Additionally, ChatGPT is alleged to have displayed paywalled content in response to some user prompts.

Rumors that the Times may pursue legal action against OpenAI first emerged in August. That month, the paper updated its terms of service with a provision prohibiting companies from scraping its content for AI training purposes. According to NPR, the Times began weighing litigation after negotiations with OpenAI about a potential content licensing deal became “contentious.”

The AI developer has not detailed what datasets it used to train its latest large language models. However, OpenAI did disclose that LLMs released before GPT-3.5 drew on an open-source dataset called Common Crawl. That dataset, the Times’ lawsuit states, contains about 16 million records from websites operated by the paper.

A second argument included in the lawsuit is that ChatGPT sometimes displays paywalled articles when prompted to do so by users. The issue allegedly also affects Microsoft’s Bing Chat service, which is based on the same GPT-4 model as ChatGPT.

In addition to claiming that the lawsuit is without merit, the blog post OpenAI published today pushes back against two of the core copyright concerns the Times has raised.

The AI developer argues that training AI models using publicly available content is fair use. Its blog post goes on to state that the Times’ “content didn’t meaningfully contribute to the training of our existing models and also wouldn’t be sufficiently impactful for future training.”

The blog post also addresses the Times’ concerns about ChatGPT providing access to paywalled articles. According to OpenAI, the phenomenon is a “rare bug” that it’s currently working to fix. The AI developer’s blog post goes on to claim that “The New York Times is not telling the full story.”

“The regurgitations The New York Times induced appear to be from years-old articles that have proliferated on multiple third-party websites,” OpenAI stated. “It seems they intentionally manipulated prompts, often including lengthy excerpts of articles, in order to get our model to regurgitate. Even when using such prompts, our models don’t typically behave the way The New York Times insinuates.”

The lawsuit is the latest in a series of legal complaints brought against generative AI developers over the past few quarters. Previously, OpenAI and Microsoft were sued for the manner in which they used open-source code to train the GitHub Copilot programming assistant. A separate lawsuit filed last January accused Stability AI Ltd., DeviantArt Inc. and Midjourney Inc. of using copyrighted images to develop their AI models. 

Photo: Unsplash

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU