Comedian Sarah Silverman has joined a class action lawsuit against OpenAI and another against Meta accusing the companies of copyright infringement, saying they “copy and ingest” her protected work to train their artificial intelligence programs, according to court documents.
The lawsuit, in which she joined authors Christopher Golden and Richard Kadrey, was filed Friday in the San Francisco Division of the U.S. District Court for the Northern District of California. Each lawsuit says the company in question made unauthorized copies of the authors’ works, including Silverman’s memoir, “The Bedwetter,” by scraping illegal online “shadow libraries” that contain the texts of thousands of books.
The lawsuit against Meta cites the company’s own research paper on LLaMA, the large language model it uses to train chatbots. According to the paper, which went public in February, scientists incorporated text from The Pile into their training dataset; according to the lawsuit, some of that text comes from shadow libraries.
“Their copyrighted materials were copied and included as part of training,” the lawsuit alleges. “Many of the plaintiffs’ books appear in the data set that Meta admitted to using.”
Neither OpenAI nor Meta responded to requests for comment on the lawsuits on Monday. The plaintiffs are seeking damages and injunctive relief, which may include changes to the LLaMA and ChatGPT programs.
Less is known about the source of training datasets for OpenAI’s ChatGPT program. But the lawsuit argues that ChatGPT’s ability to generate summaries of the plaintiffs’ works “is only possible if ChatGPT is trained in the plaintiffs’ copyrighted works.”
The text generated when asked to summarize Silverman’s memoir, ‘The Bedwetter’, has been included as an exhibit.
“One of the main topics in the first part of the memoir is Silverman’s struggle with enuresis, or bed-wetting, that extended into her teens,” the program wrote. “This problem caused her great distress and embarrassment, but also nurtured her resilience and ability to cope with adversity.”
The attorneys representing the three authors, Joseph Saveri and Matthew Butterick, are also representing other creators in separate lawsuits challenging Copilot, a coding assistant powered by artificial intelligence on GitHub and an image generator produced by Stability AI.
On a website publishing their lawsuit against AI companies, the lawyers claim that “much of the material in the training datasets used by OpenAI and Meta comes from copyrighted works – including books written by plaintiffs – that have been used by OpenAI without permission. and Meta were copied, without credit, and without compensation.”
The twin lawsuits are among a growing number of lawsuits that could define the boundaries of how artificial intelligence learns, and what role copyright laws will play in the material algorithms use to train datasets.
“I expect more to follow,” said Robert deBrauwere, who specializes in digital media and intellectual property at the law firm Pryor Cashman, where he is a partner. He’s not involved in the Silverman lawsuit.