OpenAI Attys Must Share Internal Comms In Copyright MDL

A New York federal magistrate judge on Monday ordered OpenAI's in-house attorneys to share their internal communications regarding deleted training datasets with authors suing over the alleged use of copyrighted works to train ChatGPT, rejecting OpenAI's argument that the communications are privileged.

U.S. Magistrate Judge Ona T. Wang said OpenAI's argument that the "reasons" for deleting the datasets are all privileged "strains credulity."

"Attorney-client privilege does not protect facts, even when they are shared with an attorney in otherwise privileged communications," the judge said.

Beyond that, OpenAI's state of mind is critical for determining whether the alleged infringement was willful, and OpenAI has maintained that it did not willfully infringe, Judge Wang said.

"A jury is entitled to know the basis for OpenAI's purported good faith," she said, adding that "[w]hat matters is that OpenAI has put its state of mind at issue, and OpenAI may not selectively use attorney-client privilege to restrict class plaintiffs' inquiry into evidence concerning OpenAI's purported good faith in this way."

Specifically, Judge Wang ordered OpenAI to produce all written communications with in-house counsel in 2022 regarding the reasons the datasets were deleted, as well as all internal references to Library Genesis that OpenAI has redacted or withheld on the basis of attorney-client privilege.

LibGen is an online library that OpenAI and other AI companies have been accused of using to train their large language models, or LLMs. It's been described as a so-called "shadow library" of pirated works.

Judge Wang on Monday also gave the plaintiffs in the case the ability to depose in-house lawyers on those communications and their personal knowledge.

"Given OpenAI's ongoing improper privilege assertions, class plaintiffs are entitled to depositions of up to two hours for each OpenAI attorney who participated in such communications in 2022, which will not count against the total deposition hours cap previously set by the court," Judge Wang said.

OpenAI has until Dec. 8 to turn over the communications and until Dec. 19 to make those lawyers available for depositions, according to the order.

Judge Wang is overseeing discovery in sprawling multidistrict litigation accusing OpenAI, and its financial backer Microsoft, of infringing copyrighted works by training its LLMs with copyright-protected works without permission. OpenAI is also accused of violating the Digital Millennium Copyright Act.

The cases involve numerous plaintiffs and include a putative class action that was brought in the Golden State. The litigation also includes lawsuits in New York brought by The New York Times Co. and other news organizations, as well as litigation brought by the Authors Guild along with award-winning writers including George Saunders, Scott Turow, Jonathan Franzen and Jia Tolentino.

In April, the Judicial Panel on Multidistrict Litigation centralized the pretrial work for the cases in the Southern District of New York.

A proposed class of authors in the litigation have argued Microsoft had been attempting to avoid turning over the documents for months with misleading statements. According to a September letter, Microsoft's counsel previously said there was no "LibGen dataset" to be turned over to the authors. A different representation was given during a subsequent deposition of Microsoft, the authors said at the time.

"This material doesn't just go to the heart of plaintiffs' claims against Microsoft; it is, alone, a smoking gun of copyright infringement," the authors said. "The court should not allow Microsoft to continue to bury it."

In Monday's order, Judge Wang said it is undisputed that an OpenAI employee downloaded pirated copies of books from LibGen in 2018. The authors contend OpenAI used those downloaded LibGen books to create two datasets, "Books1" and "Books2." During the discovery process, they said they learned OpenAI deleted those datasets in 2022.

When the authors asked OpenAI for materials related to the reasons for deleting the datasets, it refused, pointing to attorney-client privilege, Judge Wang said.

She noted that "OpenAI's position on whether the reasons for the deletion are privileged has shifted several times." And the judge already determined that a number of disputed documents containing Slack messages between OpenAI employees regarding the datasets' deletion were not subject to attorney-client privilege after reviewing the messages in camera, or privately, in an order last month.

Judge Wang said a number of other Slack messages regarding the datasets are also not privileged and must be produced, though some messages are clearly lawyers giving legal advice and can be withheld.

She also held that OpenAI has waived privilege over "non-use" as a reason for the datasets' deletion. For one, OpenAI didn't object when she ruled in October that many communications were not privileged, according to the order.

"Thus, if the 'reasons' are not privileged, then there is no waiver of privilege because there is no privilege to waive, and OpenAI cannot refuse to testify about any of the reasons on the basis of attorney-client privilege," Judge Wang said. "Even if a 'reason' like 'non-use' could be privileged, OpenAI has waived privilege by making a moving target of its privilege assertions."

The company openly stated that the datasets were deleted "due to their non-use," and that was left on the docket for 15 months, she said. It then asserted that all of the reasons for deleting the datasets were privileged, and then later, that not all aspects of the reasons were privileged, the judge said.

"OpenAI's later assertions that all 'reasons' are privileged compels a finding of waiver by disclosure," Judge Wang said. "If all 'reasons' are privileged, then 'non-use,' as a reason for the deletion, is privileged, but was disclosed multiple times."

OpenAI has also waived privilege over all communications "by putting its good faith and state of mind at issue," the judge said.

"As courts in this district have recognized, there is a fundamental conflict where a party asserts a good faith defense based on advice of counsel but then blocks inquiry into their state of mind by asserting attorney-client privilege," Judge Wang said.

OpenAI continues to make factual assertions that its conduct wasn't willful, and that means it has put its good faith and state of mind at issue in the case, the judge said.

"It is not a stretch for class plaintiffs to posit that communications regarding the reasons for deleting Books1 and Books2 could be probative of OpenAI's willfulness," Judge Wang said. "Indeed, the court's in-camera review … confirm that such communications are likely probative of willfulness."

In a separate order on Monday, Judge Wang denied OpenAI's request for a protective order to limit future testimony to a list of just 52 topics and exclude hundreds of other topics.

An OpenAI spokesperson said in a statement Monday, "We disagree with the ruling and intend to appeal."

Counsel for the authors did not immediately respond to a request for comment late on Monday.

The authors are represented by Anna Freymann, Wesley Dozier, Danna Elmasry, Kenneth Byrd, Rachel Geman and Reilly Stoler of Lieff Cabraser Heimann & Bernstein LLP, Scott Sholder of Cowan DeBaets Abrahams & Sheppard LLP, and Justin Nelson, Amber Magee, Charlotte Lepic, Jordan Connors and Rohit Nath of Susman Godfrey LLP.

The New York Times is represented in-house by Karen A. Chesley and Ian Crosby, by Genevieve Vose Wallace, Katherine M. Peaslee, Davida Brook, Elisha Barron, Zachary B. Savage, Tamar Lusztig, Eudokia Spanos, Scarlett Collings, Emily K. Cronin and Alexander Frawley of Susman Godfrey LLP, and by Steven Lieberman, Jennifer B. Maisel and Kristen J. Logan of Rothwell Figg Ernst & Manbeck PC.

The Alden newspapers are represented by Steven Lieberman, Jennifer B. Maisel, Robert Parker, Jenny L. Colgate, Mark Rawls, Kristen J. Logan and Bryan B. Thompson of Rothwell Figg Ernst & Manbeck PC.

The Center for Investigative Reporting is represented by Jon Loevy, Michael Kanovitz, Stephen Stich Match, Matthew Topic, Thomas Kayes and Steven Art of Loevy + Loevy.

OpenAI is represented by Andrew F. Dawson, Robert A. Van Nest, R. James Slaughter, Paven Malhotra, Michelle S. Ybarra, Nicholas S. Goldberg, Thomas E. Gorman, Katie Lynn Joyce, Sarah Salomon, Christopher S. Sun, Andrew S. Bruns and Edward A. Bayley of Keker Van Nest & Peters LLP, Andrew M. Gass, Joseph R. Wetzel, Sarang V. Damle, Elana Nightingale Dawson, Michael David, Allison L. Stillman, Rachel R. Blitzer, Herman Yue and Luke A. Budiardjo of Latham & Watkins LLP, and Joseph C. Gratz, Rose S. Lee, Andrew L. Perito, Carolyn Homer, Eric K. Nikolaides and Emily C. Wood of Morrison Foerster LLP.

Microsoft is represented by Annette Hurst, Christopher Cariello, Marc Shapiro, Sheryl Garko and Laura Najemy of Orrick Herrington & Sutcliffe LLP and Jeffrey Jacobson, Jared Briant, Kristin Stoll-DeBell, Carrie Beyer and Elizabeth Scheibel of Faegre Drinker Biddle & Reath LLP.

The case is In re: OpenAI Inc. Copyright Infringement Litigation, case number 1:25-md-03143, in the U.S. District Court for the Southern District of New York.

OpenAI Attys Must Share Internal Comms In Copyright MDL

AI-made summary

Hailey Konnath

Found it useful? Spread the word.