Explore the intersection of artificial intelligence and law. This hub offers insights into regulations, confidentiality, and copyright in the context of AI.

Regulation of AI

The evolving legal framework for AI

AI & Confidentiality

Data security and privilege considerations

AI & Copyright

Explore intellectual property implications

Training Data Cases

Is training AI models on copyrighted data infringement?

Creators of AI models need to train their models on data. High-quality data, and tons of it. Large language models are supposed to learn world knowledge and culture from their training data. And multimodal models learn how to create music, images, or movies from the highest quality existing examples. But these materials are often copyrighted.

The Fair Use Argument

AI developers consistently argue that using copyrighted material as training data constitutes fair use, and therefore not infringement. Good AI models are not supposed to output copies of the training data, but should learn like a human learns from reading books or watching films. The point of AI models is to generalize beyond the training data to create novel output--a transformative use that provides a public benefit diffrent than the underlying material. A user might infringe if they intentionally generate output too similar to a copyrighted work. But the developer has made only a tool and should not be liable for anything.

The Counter Argument

Rightsholders counter that AI developers make actual copies of works during training, and can often reproduce portions of training data verbatim. Even if the use is "transformative," the copying process itself is commercial and substitutes for properly licensing the works. And AI models harm relevant markets by generating similar content at scale.

This page lists legal cases in the US that involve the issue of whether using copyrighted data for AI training constitutes infringement or fair use. Many cases have been filed, but no US court has squarely addressed the issue.

The "fair use" defense will be the core issue for many of these cases, and the AI community in general. Defendants have not moved to dismiss on fair use grounds for procedural reasons: it is a defense, and the burden is on the defendant to prove it, whereas on a motion to dismiss, courts must accept the plaintiffs' allegations as true.

Doe v. GitHub, Inc.

Filed: Nov 3, 2022

Developers allege DMCA and breach of contract claims for Github's use of software uploaded to its platform to train AI coding models.

Current Status:

Trial court dismissed the DMCA claims, but refused to dismiss the contract lcaims. Case is stayed pending the Plaintiffs' request for interlocutory appeal of the DMCA dismissal.

Andersen v. Stability AI Ltd.

Filed: Jan 23, 2023

Visual artists allege creators of Stable Diffusion and Midjourney infringed copyrights by using their works as training data.

Current Status:

The court has dismissed multiple claims but a few remain. The court has not answered the core question of whether training on copyrighted data constitutes infringement.

Getty Images v. Stability AI

Filed: Feb 3, 2023

Getty Images accuses Stability AI of infringing on millions of photographs in training Stable Diffusion.

Current Status:

Jurisdictional motion to dismiss is pending.

Tremblay v. OpenAI, Inc.

Filed: Jun 28, 2023

Authors allege OpenAI infringed their copyrighted books in training GPT models.

Current Status:

Case is in discovery. Certain claims were dismissed, but the claim for direct copyright infringement remains at issue.

Kadrey v. Meta Platforms, Inc.

Filed: Jul 7, 2023

Authors allege Meta infringed their copyrighted books by using them to train LLaMA models.

Current Status:

Meta successfully moved to dismiss all claims except the core claim of copyright infringement. Case is in discovery.

Leovy v. Google

Filed: Jul 11, 2023

Authors allege Google violated copyright by training AI models on the authors' works.

Current Status:

Hearing on motion to dismiss set for December 18, 2024, but the motion does not present the core copyright issues.

Huckabee v. Bloomberg

Filed: Oct 17, 2023

Authors allege unauthorized use of books in AI training by Meta, Bloomberg, Microsoft, and EleutherAI.

Current Status:

Motion to dismiss is pending. The case is unique in that the trained model, BloombergGPT, was never publicly released.

Concord Music Group, Inc. v. Anthropic PBC

Filed: Oct 18, 2023

Music publishers sue Anthropic for copyright infringement for using lyrics in AI training.

Current Status:

Preliminary injunction hearing set for 11/25/24

Alter v. OpenAI

Filed: Nov 21, 2023

Authors allege OpenAI infringed their copyrighted books in training GPT models.

Current Status:

Case is in discovery

The New York Times Company v. Microsoft Corporation

Filed: Dec 28, 2023

NYT claims OpenAI infringed their copyrights by training GPT models on their news articles.

Current Status:

A motion to dismiss is fully briefed and has been pending. The case is in discovery.

The Intercept Media and Raw Story Media v. OpenAI

Filed: Feb 28, 2024

News organizations allege OpenAI stripped out copyright management information in journalism works used to train ChatGPT, and therefore violated DMCA.

Current Status:

Awaiting court's ruling on motion to dismiss, which was heard on 11/1/24.

Nazemian and Dubus v. NVIDIA Corporation

Filed: Mar 8, 2024

Authors allege unauthorized use of books to train NVIDIA's NeMo LLM.

Current Status:

Case is in discovery.

Daily News v. Microsoft

Filed: Apr 30, 2024

Newspaper publishers sue Microsoft and OpenAI for copyright infringement.

Current Status:

Case is in discovery.

UMG Recordings, Inc. v. Suno, Inc.

Filed: Jun 24, 2024

Record labels claim Suno infringed their copyrighted recordings by using them to train AI music generation models.

Current Status:

Case is in discovery.

UMG Recordings, Inc. v. Uncharted Labs, Inc.

Filed: Jun 24, 2024

Record labels claim Udio infringed their copyrighted recordings by using them to train AI music generation models.

Current Status:

Case is in discovery.

Center for Investigative Reporting v. OpenAI

Filed: Jun 27, 2024

Nonprofit news organization alleges copyright infringement in OpenAI's training of GPT models.

Current Status:

Case is in discovery. Motion to dismiss is pending, but not on the core copyright infringement claim.

Millette v. OpenAI

Filed: Aug 2, 2024

YouTube creators allege OpenAI infringed copyright by using transcriptions of videos as training data for AI models.

Current Status:

Motion to dismiss due 12/16/2024, further papers due in early 2025. The same plaintiff filed cases against Google and Nvidia.

Dow Jones v. Perplexity

Filed: Oct 21, 2024

Owners of New York Post and Wall Street Journal allege Perplexity 'cop[ied] without authorization ... copyrighted works for inclusion into Perplexity's RAG index,' and have outputs which reproduce the plaintiffs' content.

Current Status:

Perplexity has not yet appeared

Case Title	Date Filed	Description	Status
Doe v. GitHub, Inc.	Nov 3, 2022	Developers allege DMCA and breach of contract claims for Github's use of software uploaded to its platform to train AI coding models.	Trial court dismissed the DMCA claims, but refused to dismiss the contract lcaims. Case is stayed pending the Plaintiffs' request for interlocutory appeal of the DMCA dismissal.
Andersen v. Stability AI Ltd.	Jan 23, 2023	Visual artists allege creators of Stable Diffusion and Midjourney infringed copyrights by using their works as training data.	The court has dismissed multiple claims but a few remain. The court has not answered the core question of whether training on copyrighted data constitutes infringement.
Getty Images v. Stability AI	Feb 3, 2023	Getty Images accuses Stability AI of infringing on millions of photographs in training Stable Diffusion.	Jurisdictional motion to dismiss is pending.
Tremblay v. OpenAI, Inc.	Jun 28, 2023	Authors allege OpenAI infringed their copyrighted books in training GPT models.	Case is in discovery. Certain claims were dismissed, but the claim for direct copyright infringement remains at issue.
Kadrey v. Meta Platforms, Inc.	Jul 7, 2023	Authors allege Meta infringed their copyrighted books by using them to train LLaMA models.	Meta successfully moved to dismiss all claims except the core claim of copyright infringement. Case is in discovery.
Leovy v. Google	Jul 11, 2023	Authors allege Google violated copyright by training AI models on the authors' works.	Hearing on motion to dismiss set for December 18, 2024, but the motion does not present the core copyright issues.
Huckabee v. Bloomberg	Oct 17, 2023	Authors allege unauthorized use of books in AI training by Meta, Bloomberg, Microsoft, and EleutherAI.	Motion to dismiss is pending. The case is unique in that the trained model, BloombergGPT, was never publicly released.
Concord Music Group, Inc. v. Anthropic PBC	Oct 18, 2023	Music publishers sue Anthropic for copyright infringement for using lyrics in AI training.	Preliminary injunction hearing set for 11/25/24
Alter v. OpenAI	Nov 21, 2023	Authors allege OpenAI infringed their copyrighted books in training GPT models.	Case is in discovery
The New York Times Company v. Microsoft Corporation	Dec 28, 2023	NYT claims OpenAI infringed their copyrights by training GPT models on their news articles.	A motion to dismiss is fully briefed and has been pending. The case is in discovery.
The Intercept Media and Raw Story Media v. OpenAI	Feb 28, 2024	News organizations allege OpenAI stripped out copyright management information in journalism works used to train ChatGPT, and therefore violated DMCA.	Awaiting court's ruling on motion to dismiss, which was heard on 11/1/24.
Nazemian and Dubus v. NVIDIA Corporation	Mar 8, 2024	Authors allege unauthorized use of books to train NVIDIA's NeMo LLM.	Case is in discovery.
Daily News v. Microsoft	Apr 30, 2024	Newspaper publishers sue Microsoft and OpenAI for copyright infringement.	Case is in discovery.
UMG Recordings, Inc. v. Suno, Inc.	Jun 24, 2024	Record labels claim Suno infringed their copyrighted recordings by using them to train AI music generation models.	Case is in discovery.
UMG Recordings, Inc. v. Uncharted Labs, Inc.	Jun 24, 2024	Record labels claim Udio infringed their copyrighted recordings by using them to train AI music generation models.	Case is in discovery.
Center for Investigative Reporting v. OpenAI	Jun 27, 2024	Nonprofit news organization alleges copyright infringement in OpenAI's training of GPT models.	Case is in discovery. Motion to dismiss is pending, but not on the core copyright infringement claim.
Millette v. OpenAI	Aug 2, 2024	YouTube creators allege OpenAI infringed copyright by using transcriptions of videos as training data for AI models.	Motion to dismiss due 12/16/2024, further papers due in early 2025. The same plaintiff filed cases against Google and Nvidia.
Dow Jones v. Perplexity	Oct 21, 2024	Owners of New York Post and Wall Street Journal allege Perplexity 'cop[ied] without authorization ... copyrighted works for inclusion into Perplexity's RAG index,' and have outputs which reproduce the plaintiffs' content.	Perplexity has not yet appeared

Regulation of AI

AI & Confidentiality

AI & Copyright

Training Data Cases

The Fair Use Argument

The Counter Argument

Contact us