UPDATED 11:10 EDT / NOVEMBER 22 2023

Microsoft, OpenAI sued over alleged unauthorized use of nonfiction authors’ work in AI training

Artificial intelligence startup OpenAI and Microsoft Corp. have been hit with a new lawsuit alleging that the companies violated copyright by using the works of nonfiction authors to train AI models, including OpenAI’s ChatGPT.

Julian Sancton, author of the New York Times bestseller “Madhouse at the End of the Earth” and reporter, is the principal plaintiff of the class-action suit filed in New York federal court Tuesday. It’s one among several lawsuits led by authors against OpenAI and other AI firms over copyright misuse, which included notable writers such as George R.R. Marin and John Grisham.

According to the lawsuit, filed by law firm Susman Godfrey Godfrey LLP, OpenAI allegedly scraped the content of hundreds of thousands of nonfiction books to train their AI models. Large language models such as OpenAI’s ChatGPT can understand and produce humanlike speech. To do this, the models need to ingest large bodies of text that resemble human interaction, and the more diverse the better. As a result, companies that produce LLMs gather as much data and text as possible, especially naturally written language, which often comes from books.

“Defendants took these works; they made unlicensed copies of them; and they used those unlicensed copies to digest and analyze the copyrighted expression in them, all for commercial gain,” the complaint reads. “The end result is a computer model that is not only built on the work of thousands of creators and authors, but also built to generate a wide range of expression — from short-form articles to book chapters — that mimics the syntax, style, and themes of the copyrighted works on which it was trained.”

As for the basis of the infringement, the lawsuit says nonfiction authors spend years of their lives conceiving, researching and writing their work. As such, scraping and then transforming that work without compensation constitutes wide-scale theft.

The lawsuit claims that OpenAI and Microsoft collaborated closely on the production and deployment of the models and stressed that Microsoft’s relationship made it a partner in the infringement. Microsoft has also made substantial investments, to the tune of $13 billion, in the AI startup and deeply incorporated OpenAI’s models into its products with its AI-powered Copilot capabilities and across its cloud offerings.

The defendants of the class action are asking to restrain OpenAI and Microsoft from continuing to use their nonfiction works to train the AI models. The lawsuit also seeks damages and restitution for the alleged copyright infringement already committed.

The plaintiffs in this case may have an uphill battle as this isn’t the first case that has sought to bring AI developers to heel when it comes to using copyrighted works for training models. Most recently Sarah Silverman’s lawsuit against Meta Platforms Inc. over its alleged unauthorized use of authors’ books to train its generative AI Llama 2 model hit a roadblock when U.S. District Judge Vince Chhabria trimmed her lawsuit on Monday.

The judge dismissed a number of the complaints in the lawsuit alleging that copyright infringement took place in training the model based on the core theories that the AI system was itself an infringing derivative work based only on the information scraped from copyrighted material. “This is nonsensical,” he wrote in the order. “There is no way to understand the Llama models themselves as a recasting or adaptation of any of the plaintiffs’ books.”

The lawsuit was built on the legal decision by a federal judge that clipped the wings of another lawsuit filed by three artists against generative AI image providers: Stability AI Ltd., Deviant Art Inc. and Midjourney Inc. In that decision, U.S. District Judge William Orrick found that copyright infringement claims could not proceed because the plaintiffs failed to show that the generators produced substantially similar artwork and that the lawsuit was “defective in numerous respects.”

In the case of Silverman’s lawsuit, Chhabria said that in order to prevail it must be shown that the outputs would need to “incorporate some portion of” her books, which echoed a portion of Orrik’s decision.

Going forward, lawsuits against AI model developers will most likely have to provide clear and present evidence that their works can be reproduced in whole or in part in some closely similar substance before judges will allow their lawsuits to proceed. The mere mention that their works have been scraped or read as part of the training process is insufficient to trigger copyright infringement.

Photo: Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Microsoft, OpenAI sued over alleged unauthorized use of nonfiction authors’ work in AI training

Photo: Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

KubeCon + CloudNativeCon EU 2026

RSAC 2026 Conference

Nvidia GTC 2026

Google Cloud AI Agents in Action Series 2025/2026

MWC Barcelona 2026

Microsoft, OpenAI sued over alleged unauthorized use of nonfiction authors’ work in AI training

Photo: Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

KubeCon + CloudNativeCon EU 2026

RSAC 2026 Conference

Nvidia GTC 2026

Google Cloud AI Agents in Action Series 2025/2026

MWC Barcelona 2026

Cookies