UPDATED 12:50 EDT / JULY 10 2023

Comedian Sarah Silverman sues OpenAI and Meta over copyright infringement

Comedian and author Sarah Silverman and two authors are suing the developer of ChatGPT OpenAI LP and Mark Zuckerberg’s Meta Platforms Inc., claiming that the companies used copyrighted materials from their books when training their artificial intelligence chatbots.

A pair of class action lawsuits have been filed by Silverman, as well as authors Chris Golden and Rich Kadrey, alleging that the two companies have remixed portions of their books without consent, compensation or credit. According to the lawsuit, their books were used to train OpenAI’s GPT-3.5 and GPT-4, which underlies its ChatGPT chatbot, and Meta’s LLaMA AI large language model.

Large language model AI chatbots have wowed the world with their capability to understand and respond conversationally in what sounds like human speech. They do this by adjusting to training data to resemble more closely the information ingested from large bodies of text, and the more diverse the data, the better. As a result, companies pull in as much data and text as they can – especially natural written language, such as human conversations, written interviews and especially books.

As part of the OpenAI lawsuit, the plaintiffs offered exhibits that showed that ChatGPT was capable of summarizing their books easily, which showed that it had ingested portions of the text. This goes beyond simply providing “back matter” summaries of what’s publicly available from marketing materials. Examples included asking AI to summarize entire chapters of Sarah Silverman’s “The Bedwetter,” her memoir.

“When ChatGPT was prompted to summarize books written by each of the Plaintiffs, it generated very accurate summaries,” the lawsuit said. “The summaries get some details wrong. This is expected, since a large language model mixes together expressive material derived from many sources. Still, the rest of the summaries are accurate, which means that ChatGPT retains knowledge of particular works in the training dataset.”

The lawsuit alleges that OpenAI and Meta trained their LLMs based on a large dataset of books from what is known as a “shadow library” of copyrighted works around the internet sourced from websites such as Library Genesis (also known as Libgen), Z-Library, Sci-Hub and Biblotik. Shadow library websites provide access to research papers, magazines, nonfiction and fiction books, images, comics and audiobooks without regard to copyright for mass download through link aggregation.

The Meta complaint explains how authors believe that their books were included in one of these shadow libraries and assembled by a research organization called EleutherAI consisting of a dataset called ThePile. The data was then included in the training set for Meta’s LLaMA large language model. “These shadow libraries have long been of interest to the AI-training community because of the large quantity of copyrighted material they host,” the complaint read. “For that reason, these shadow libraries are also flagrantly illegal.”

Lawyers Joseph Saveri and Matthew Butterick, who are representing Silverman and the other authors, filed a similar lawsuit against OpenAI on behalf of two other authors alleging the same issue. In 2022, they teamed up to file suit alleging that GitHub Copilot violated copyright.

The same lawyers have been behind the lawsuit against art-generating AI providers Stability AI Ltd., Midjourney Inc. and Deviant Art Inc. filed by three artists alleging that their artwork was being used without their permission or credit. Separately, Getty Images also filed a lawsuit against Stability AI, alleging it used more than 12 million copyrighted images in its training set.

All that follows an ongoing trend showing that AI’s requirement for a vast amount of training data to produce results needs to be sourced from somewhere means that the models can potentially run afoul of the rights of artists. This is only beginning to be noticed by the regulators and legal professionals, who are playing catch-up with the technology.

Legislators in the European Union have sought to adjust to this new paradigm with the upcoming passage of the Artificial Intelligence Act, which includes a requirement for AI models to disclose copyrighted material used to train models. It would provide a path for copyright holders to be compensated for its use when used by AI models.

Image: Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Comedian Sarah Silverman sues OpenAI and Meta over copyright infringement

Image: Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

Google Cloud AI Agents in Action Series 2025/2026

MWC Barcelona 2026

Vast Forward 2026

CES 2026

AWS re:Invent 2025

Comedian Sarah Silverman sues OpenAI and Meta over copyright infringement

Image: Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

Google Cloud AI Agents in Action Series 2025/2026

MWC Barcelona 2026

Vast Forward 2026

CES 2026

AWS re:Invent 2025

Cookies