Analysis 14 min read read

The AI Fair Use Defense: What Courts Actually Look For in 2026

A deep analysis of how US courts are actually applying the four fair use factors to AI training cases in 2026 — from Thomson Reuters v. Ross to NYT v. OpenAI, with practical implications for developers, creators, and attorneys.


title: "The AI Fair Use Defense: What Courts Actually Look For in 2026"

slug: "ai-fair-use-defense-four-factors-2026"

category: "Analysis"

status: "draft"

author: "AI Copyright Legal Editorial Team"

date: "2026-06-10"

excerpt: "A deep analysis of how US courts are actually applying the four fair use factors to AI training cases in 2026 — from Thomson Reuters v. Ross to NYT v. OpenAI, with practical implications for developers, creators, and attorneys."

tags:

- fair use

- AI training

- copyright law

- legal analysis

- Section 107


The AI Fair Use Defense: What Courts Actually Look For in 2026

Every AI company facing a copyright lawsuit starts with the same two words: fair use. It's the nuclear defense — if it works, the entire case collapses. If it fails, the damages can reach into the billions.

But here's what most coverage misses: fair use in AI cases isn't one question. It's four. And in 2026, those four factors are producing very different answers depending on the facts of each case.

Since the Supreme Court's landmark decision in Andy Warhol Foundation v. Goldsmith (2023) reshaped how courts analyze the first factor, every AI copyright defendant has been forced to argue on new terrain. Meanwhile, the US Copyright Office weighed in with Part 3 of its AI Report in May 2025, and Congress is circling with the Obernolte-Trahan "Great American AI Act" introduced in May 2026.

This article walks through each of the four fair use factors as they're actually being litigated — not in theory, but in the specific cases shaping the law right now.


The Four-Factor Framework: A Quick Refresher

Section 107 of the Copyright Act sets out four factors courts must weigh:

1. Purpose and character of the use — Is it transformative? Commercial? Nonprofit educational?

2. Nature of the copyrighted work — Is the original creative or factual? Published or unpublished?

3. Amount and substantiality — How much was taken, and was it the "heart" of the work?

4. Market effect — Does the use harm the existing or potential market for the original?

No single factor is dispositive. Courts weigh them together. But since Google v. Oracle (2021) and Warhol (2023), the first and fourth factors have dominated the analysis — and that pattern continues in AI cases.


Factor One: Purpose and Character of the Use

The Transformative Use Question

For decades, "transformative use" was the fair use trump card. If you changed the purpose or added something new, you were probably safe. Warhol changed that.

In Andy Warhol Foundation v. Goldsmith, the Supreme Court held that Warhol's silkscreen of Prince, based on Goldsmith's photograph, was not transformative enough to qualify for fair use because both works served the same essential purpose: magazine illustrations of the musician. The Court emphasized that a new expression isn't transformative if it merely supersedes the objects of the original — meaning it serves as a substitute in the same market.

This ruling directly threatens AI companies. If an AI model trained on copyrighted books can generate new text that competes with those books in the marketplace, the use starts to look less transformative under Warhol.

How the Factor Splits in AI Cases

The strongest fair use argument for AI defendants: Training is an intermediate, non-expressive use. The model isn't "reading" books for pleasure; it's extracting statistical patterns about language. This argument draws support from Authors Guild v. Google (2015), where the Second Circuit held that Google's mass digitization of books for a searchable database was transformative because it served a different purpose — search — rather than reading.

The strongest counter-argument for plaintiffs: Generative AI outputs can and do compete directly with the works they were trained on. When ChatGPT summarizes a news article or Claude generates prose in the style of a specific author, the output substitutes for the original. Under Warhol, this commercial substitution defeats transformative use.

The US Copyright Office's Part 3 Report (May 2025) notably declined to resolve this question. Instead, the Office stated that "the application of existing law to the use of copyrighted works in AI training raises complex factual and legal questions that courts are best positioned to address in the context of specific cases."

Thomson Reuters v. Ross Intelligence: The Preview

The first major ruling on AI training and fair use came from the District of Delaware in Thomson Reuters v. Ross Intelligence (September 2025). Thomson Reuters sued Ross, a legal research startup, for using Westlaw headnotes to train its AI.

Judge Stephanos Bibas granted summary judgment to Thomson Reuters, holding that Ross's use was not fair use. On Factor One, Bibas found Ross's use was commercial and not meaningfully transformative — Ross was building a competing legal research tool, using Thomson Reuters' copyrighted headnotes to do it. This was the Warhol principle in action: same purpose, same market, no fair use.

Ross Intelligence shut down in 2021 due to litigation costs. But the 2025 ruling set a precedent that reverberates through every other AI training case.

NYT v. OpenAI/Microsoft: The Billion-Dollar Factor One

The New York Times v. OpenAI and Microsoft case, filed in December 2023, remains the highest-stakes AI copyright battle. The Times alleges that ChatGPT was trained on millions of NYT articles and that the model can reproduce passages nearly verbatim.

On Factor One, the Times has a compelling argument: OpenAI is a commercial entity that built a product directly competitive with journalism. ChatGPT answers questions that might otherwise drive readers to NYT.com. Under Warhol and Thomson Reuters v. Ross, that's a weak fair use position.

OpenAI's response, filed in early 2024 and supplemented through 2025-2026, argues that training is a transformative intermediate step and that verbatim reproduction is a "bug, not a feature" they're actively working to fix. The case is still in discovery as of June 2026, with no trial date set.

The Music Cases: Factor One in the Creative Domain

The music AI cases — Concord Music Group v. Anthropic (filed October 2023) and UMG v. Suno and UMG v. Udio (both filed June 2024) — push Factor One even harder. Music is among the most creative (and therefore most protected) categories of copyrighted work.

Anthropic argues that training Claude on song lyrics is transformative because the model learns linguistic patterns, not melodies. But the Concord plaintiffs point to evidence that Claude can reproduce lyrics verbatim — sometimes with only minor prompting — which looks a lot like substitution, not transformation.

In May 2026, the tech industry filed an amicus brief in Concord v. Anthropic arguing that AI training on publicly available text is fair use, citing the importance of the technology to American competitiveness. But the brief's existence signals anxiety — if fair use were a slam dunk, nobody would need amicus briefs.


Factor Two: Nature of the Copyrighted Work

This is ordinarily the least important factor, but it matters enormously in AI cases because the training data is so varied.

Creative vs. Factual Works

The law gives stronger protection to creative works (novels, poems, songs, paintings) than to factual works (news articles, databases, scientific papers). Fair use is harder to establish when the copied work is highly creative.

This creates a paradox for AI defendants: The more creative the training data, the more valuable it is for building expressive AI models — and the harder it is to claim fair use.

For instance, in the class action Kadrey v. Meta (filed 2023 alongside Silverman v. OpenAI), the plaintiffs are fiction authors including Richard Kadrey and Sarah Silverman. Factor Two strongly favors them because novels and creative writing sit at the peak of copyright protection.

In contrast, Thomson Reuters' Westlaw headnotes — while copyrightable — are more factual in nature. That Factor Two leaned slightly toward Ross, but wasn't enough to overcome Factors One and Four.

Published vs. Unpublished

Unpublished works receive stronger protection. This matters for the cases involving "shadow library" datasets like Books3, which contained many unpublished or out-of-print works. In Hobbs & Stone v. Meta (filed April 2026), the plaintiffs specifically allege Meta knowingly used pirated books — including unpublished manuscripts — to train Llama.


Factor Three: Amount and Substantiality

"The Whole Thing" Problem

AI models are typically trained on entire works. Not excerpts. Not snippets. The whole book, the whole article, the whole song.

Factor Three asks both: how much was taken (quantitatively) and what was taken (qualitatively — the "heart" of the work). Training on 100% of a copyrighted book is quantitatively extreme. Even Google Books, which scanned entire volumes, only displayed snippets — the court found the use was limited even though the copying was comprehensive.

AI training makes this factor look terrible for defendants on its face. But there's a wrinkle: the copying is intermediate. The final model doesn't store the books; it stores weighted parameters derived from them.

The NVIDIA Shadow Library Ruling

In Doe v. NVIDIA (2025), Judge William Orrick III ruled that NVIDIA's training scripts for downloading shadow library datasets "have no other purpose than copyright infringement." While this wasn't strictly a Factor Three ruling — it was about contributory infringement — it highlights how courts view the wholesale ingestion of pirated datasets. Copying entire libraries of protected works, without permission, is not a good look under Factor Three.

The Verbatim Output Question

The NYT v. OpenAI complaint included exhibits showing ChatGPT reproducing NYT articles almost verbatim. If the model can output near-copies of training data, that strengthens Factor Three against fair use — because the "amount taken" isn't just what was used during training; it's what the model makes available to users.

OpenAI and other defendants argue that verbatim outputs are errors they're working to eliminate, not an intended feature. But under Factor Three, a court may look at what the model can do, not just what it was designed to do.


Factor Four: Market Effect

The Most Important Factor

The Supreme Court has called Factor Four "undoubtedly the single most important element of fair use" (Harper & Row v. Nation Enterprises, 1985). If the use harms the market for the original — or a potential market the copyright holder might reasonably develop — fair use is unlikely.

In the AI context, market harm is the heart of every plaintiff's case:

  • Authors: AI-generated books on Amazon compete directly with human-authored works. The Authors Guild reported in 2025 that AI-generated content now represents an estimated 3-5% of new Kindle titles — and growing.
  • News publishers: When ChatGPT or Perplexity answers a query by summarizing reporting, the user never visits the publisher's site. The CNN v. Perplexity lawsuit, filed in early 2026, explicitly frames this as market substitution.
  • Music rightsholders: Suno and Udio generate custom songs on demand. If you can type "make me a sad country song about my breakup," you might never stream a real sad country song again.
  • Visual artists: Midjourney and DALL-E generate images in specific artists' styles. The Andersen v. Stability AI class action (the "Midjourney case") alleges that AI-generated images compete directly with the artists' own commissions and licensing revenue.

The Licensing Market Argument

Under Warhol and American Geophysical Union v. Texaco (1995), courts consider not just actual market harm but harm to potential licensing markets. This is crucial for AI.

Copyright holders are increasingly licensing their works for AI training. The deals between OpenAI and the Financial Times (2024), News Corp (2024), and Reddit (2024) show that a licensing market already exists. In March 2026, OpenAI Chair Bret Taylor testified that the Reddit deal was done specifically "to avoid litigation."

If a viable licensing market exists — and 2026 has proven that it does — then AI companies can't argue there's no way to pay for training data. They can license. If they choose not to, they're undermining a real market, which Factor Four punishes.

The US Copyright Office's Part 3 Report explicitly flagged this: "The existence of an emerging market for training data licenses may weigh against a finding of fair use in some cases."

The Indemnification Market as Evidence

Another interesting data point: Microsoft, Google, Adobe, and OpenAI now all offer some form of copyright indemnification for users of their AI tools. If fair use were clearly settled in favor of AI training, these companies wouldn't need billion-dollar indemnity war chests. The existence of these programs is a market signal that fair use is far from guaranteed.


The Global Picture: Fair Use vs. Fair Dealing

One reason fair use is such a high-stakes fight in the US is that other jurisdictions have different frameworks — and most are less friendly to AI training.

UK: Fair Dealing, TDM Exception

The UK has a "fair dealing" framework that's narrower than US fair use, with specific enumerated purposes. In 2024, the UK government introduced a text and data mining (TDM) exception for non-commercial research, but commercial AI training remains outside its scope. The House of Lords Communications and Digital Committee recommended in early 2025 that the UK require licensing for commercial AI training — essentially rejecting a fair use approach.

EU: The DSM Directive and AI Act

The EU's 2019 Copyright in the Digital Single Market (DSM) Directive created a TDM exception, but crucially, Article 4 allows rightsholders to opt out of TDM for commercial purposes. This opt-out mechanism, combined with the EU AI Act's transparency requirements (which took full effect in August 2025), means EU-based AI training requires navigating a web of licensing obligations.

The CJEU's ruling in Meta v. Italian Press Publishers (April 2026) reinforced that publishers have rights over how their content is used by platforms — a principle that extends to AI training.

Japan: The Most AI-Friendly Regime

Japan's 2018 amendments to its Copyright Act created one of the broadest TDM exceptions in the world, allowing commercial AI training without permission in most cases. Japan has positioned itself as an AI development haven, and several US companies have shifted training operations there. However, Japan's Agency for Cultural Affairs is now reviewing whether to narrow the exception amid creator pushback.

China: Strict Controls

China's approach is fundamentally different. AI training on copyrighted works without permission is generally considered infringing under Chinese law, and the 2023 Administrative Provisions on Generative AI Services require licensing and content disclosure. The Chinese model effectively rejects fair use for AI training, instead creating a mandatory licensing regime.


The Settlement Dynamic: Fair Use's Invisible Pressure

Perhaps the most revealing indicator of fair use's weakness in 2026 is the settlement landscape.

Anthropic's $1.5 billion author settlement (proposed in early 2026, awaiting court approval as of June 2026 after Judge Martinez-Olguin demanded more detail on payout structures in May 2026) would be the largest AI copyright settlement in history. If fair use were a strong defense, Anthropic wouldn't be writing billion-dollar checks.

Similarly, OpenAI's licensing spree — deals with Axel Springer, Le Monde, Financial Times, News Corp, Reddit, and others — is de facto admission that training without permission carries legal risk. As Bret Taylor's March 2026 testimony made explicit: the deals are litigation avoidance.

The pattern is clear: AI companies are hedging against a fair use loss. They're settling with powerful plaintiffs, licensing from willing publishers, and reserving their fair use arguments for the cases they can't avoid.


Practical Implications: What This Means for You

For AI Developers

1. Fair use is not a safe harbor. Every major ruling trend since 2023 (Warhol, Thomson Reuters v. Ross, Copyright Office Part 3) points toward a narrower fair use for AI training.

2. License where you can. The cost of licensing is almost certainly lower than the cost of losing a class action.

3. Document your transformative purpose. If you're training an AI for a genuinely different purpose than the training data's original market, document it extensively. That evidence may save you on Factor One.

4. Implement output filters. Preventing verbatim reproduction strengthens your Factor Three argument and reduces the risk of being a test case.

For Content Creators and Publishers

1. Register your copyrights. You can't sue for infringement — and can't get statutory damages — without registration.

2. Opt out where possible. Many datasets and crawlers respect robots.txt and opt-out mechanisms. The EU's TDM opt-out is legally enforceable.

3. Monitor the licensing market. If AI companies are licensing from your peers, they should be licensing from you too. The growing licensing market strengthens Factor Four arguments.

4. Document market harm. If you can show that AI outputs are substituting for your work — lost commissions, declining page views, reduced streaming revenue — you strengthen the fair use case against the AI company.

For Attorneys

1. Fair use is fact-intensive. No two AI training cases are the same. The nature of the training data, the purpose of the model, the degree of output similarity, and the licensing context all matter.

2. Warhol changed everything. Pre-2023 fair use precedents must be re-examined through the Warhol lens. What looked transformative before may not anymore.

3. Watch the settlements. The settlement landscape reveals what sophisticated defendants actually think of their fair use odds. Spoiler: they're not confident.

4. Global strategy matters. A fair use win in the US doesn't help in the EU or China. Counsel for global AI deployment must navigate multiple frameworks.


What to Watch: The Next 12 Months

The fair use question won't be resolved in 2026. But several developments will shape the trajectory:

  • Summary judgment in NYT v. OpenAI: If Judge Sidney Stein rules on fair use before trial, it could be the most consequential copyright ruling since Warhol — or since the 1976 Copyright Act itself.
  • The Great American AI Act: Rep. Obernolte's and Rep. Trahan's draft bill, introduced May 2026, proposes a federal framework for AI training that could preempt fair use litigation entirely — or codify a licensing mandate.
  • Anthropic settlement approval: If Judge Martinez-Olguin approves the $1.5B settlement, it sets a price point for training data that will ripple through every pending case.
  • More Copyright Office guidance: Part 3's final version is due later in 2026. If the Office strengthens its language on licensing markets, Factor Four gets even harder for defendants.

The Bottom Line

Fair use for AI training is not dead in 2026. But it's on life support in the commercial context, and the prognosis isn't improving.

Every major ruling since Warhol has narrowed the doctrine's reach. Every settlement and license deal signals defendant anxiety. Every new regulation — from the EU AI Act to California's AB 2013 to the draft Great American AI Act — chips away at the argument that unlicensed training is legally safe.

The four factors still work. But in 2026, they're working for plaintiffs more often than they're working for defendants. And until a Supreme Court ruling or a comprehensive federal statute settles the question, AI fair use will remain the most expensive guessing game in tech.


Further Reading:


Last updated: June 10, 2026. This article is for informational purposes only and does not constitute legal advice. Consult a qualified attorney for guidance on your specific situation.

Related Articles

Analysis

AI Training and Copyright: How 10 Countries Are Handling It Differently in 2026

A 2026 comparative analysis of AI training and copyright rules in the United States, EU, UK, Japan, ...

Analysis

Dreams of Violets at Tribeca 2026: What the First AI-Generated Film at a Major Festival Means for Copyright Law

Tribeca premieres the first fully AI-generated film on June 10, 2026. Here is what the milestone mea...

Analysis

When Your Character Gets an AI Makeover: The BuzzFeed Cuppy Controversy and What It Means for Creator Rights

BuzzFeed greenlit an AI-generated Cuppy series through Amazon's Project Nara. Original creator Loryn...

Analysis

AI Remixes, Colorizations & Copyright: Who Owns a Machine-Altered Masterpiece?

The Ansel Adams Trust's condemnation of an unauthorized AI-colorized print of 'Moonrise' has exposed...

Analysis

AI Copyright Licensing in 2026: How Big Tech-Publisher Deals Are Reshaping the Industry

From OpenAI's Reddit deal to publisher lawsuits against Meta, 2026 marks a turning point in AI copyr...