Guide 11 min read

Is AI Training Fair Use? How Global Copyright Laws Are Evolving in 2026

Is training AI on copyrighted data fair use? The answer depends on where you are. Here's how the US, EU, UK, Japan, and other jurisdictions are handling AI training copyright in 2026.

Is AI Training Fair Use? How Global Copyright Laws Are Evolving in 2026

Feed millions of copyrighted books, articles, images, and songs into a model's training pipeline. Is that fair use, a licensed activity, or plain infringement?

In 2026, the answer depends almost entirely on which country's law applies. The US leans on a flexible fair use test that courts are now actively reshaping. The EU built a structured text-and-data-mining (TDM) regime with an opt-out, then layered transparency duties on top through the AI Act. Japan is the permissive outlier. The UK is still negotiating with itself. China and Singapore carved their own paths.

This guide walks through each major jurisdiction, the landmark rulings that moved the needle, and what any of this means for creators, rights holders, and AI developers right now.

What "AI Training" Actually Involves (Legally)

Before comparing regimes, it helps to separate the distinct copyright acts involved when an AI model is trained.

  • Acquisition and reproduction. Scraping or downloading copies of works into a dataset is a reproduction of the work.
  • Intermediate copying. Preprocessing, tokenization, and storage during training create further copies.
  • Model weights. Whether trained weights themselves contain "copies" of training data is contested.
  • Outputs. Model outputs that closely resemble training works can create separate infringement questions.

Where the law intervenes differs by jurisdiction. Some target the input (acquisition and reproduction). Others focus on output (substantial similarity, style imitation). And a few, like the EU AI Act, layer transparency obligations on top without resolving the underlying copyright question.

United States: Fair Use Under Real Pressure

The US has no dedicated AI-training statute. Instead, courts apply the four-factor fair use test from 17 U.S.C. § 107:

1. Purpose and character (is it transformative?)

2. Nature of the copyrighted work

3. Amount and substantiality used

4. Effect on the potential market

The Transformative Use Question

For years, AI developers leaned on Authors Guild v. Google (2015), which blessed large-scale book digitization for a search index as transformative. The argument: if Google Books is fair use, ingesting text to train an LLM should be too.

The US Copyright Office pushed back in its Part 3 Report on Generative AI Training (released May 2025). The Office concluded that training on copyrighted works can qualify as fair use, but the analysis is "highly fact-specific" and some uses, especially those that produce outputs substituting for the originals, likely fall outside the doctrine. The Office explicitly cautioned against treating Google Books as a blanket license for generative AI.

The Cases Rewriting the Playbook

A wave of 2024-2026 rulings has started filling in the detail:

  • Thomson Reuters v. Ross Intelligence (D. Del., 2025). Judge Bibas granted summary judgment against Ross on fair use for training a competing legal-research AI on Westlaw headnotes. The decisive factor was market substitution.
  • Bartz v. Anthropic (N.D. Cal., 2025). Judge Alsup held that training on lawfully purchased books was "quintessentially transformative," but that Anthropic's use of pirated shadow-library copies was not fair use. Settlement followed.
  • Kadrey v. Meta (N.D. Cal., 2025). Judge Chhabria ruled for Meta on the specific record before him but stressed that the plaintiffs failed to develop the market-dilution theory, not that training was categorically lawful.
  • NYT v. OpenAI / Microsoft (S.D.N.Y., ongoing). Summary-judgment briefing in 2026 is focused squarely on whether generative outputs that reproduce NYT articles defeat the fair use defense.

The trend line in US case law is clear: how the data was acquired and whether the output competes with the original are doing most of the work. Training on lawful copies for genuinely new purposes is surviving. Training on pirated corpora, or to build a direct substitute, is not.

European Union: Opt-Out TDM Plus AI Act Transparency

The EU doesn't use a fair use framework at all. It uses structured exceptions.

The Two TDM Exceptions (DSM Directive, 2019)

Articles 3 and 4 of the Copyright in the Digital Single Market Directive (2019/790) created two text-and-data-mining exceptions:

  • Article 3 allows TDM for scientific research by research organizations and cultural-heritage institutions. No opt-out.
  • Article 4 allows TDM for any purpose, including commercial AI training, unless the rights holder has expressly reserved their rights in a machine-readable way.

Article 4 is the provision that matters for generative AI. It turns AI training into a default-legal activity that rights holders can switch off via a reservation, most commonly through robots.txt, TDM Reservation Protocol signals, or metadata like tdmrep.

The AI Act's Copyright Transparency Requirements

The EU AI Act (Regulation 2024/1689) added a second layer. Providers of general-purpose AI models must:

  • Implement a policy to respect EU copyright law, including Article 4 opt-outs, even when training occurs outside the EU if the model is placed on the EU market.
  • Publish a "sufficiently detailed summary" of training content, using a template from the EU AI Office.

Those rules became enforceable for new models on August 2, 2025 and extend to pre-existing models by August 2, 2027. Fines can reach 3% of global turnover.

Practically: if you want to train on EU-origin content commercially, you must honor opt-outs and document what you used. It's more prescriptive than US fair use, but also more predictable.

United Kingdom: Still Arguing

The UK has existing TDM exceptions, but only for non-commercial research (Section 29A of the Copyright, Designs and Patents Act 1988). Commercial AI training sits in a legal gray zone.

The government proposed a broader commercial TDM exception in 2023, withdrew it after creator backlash, then ran a consultation in late 2024 proposing an opt-out model similar to the EU's Article 4. The December 2024 consultation floated three options: do nothing, a commercial TDM exception, or opt-out with transparency.

As of mid-2026, no final rule has landed. The Getty Images v. Stability AI case, partially heard in the High Court in 2024-2025, has given judges a first meaningful look at whether scraping and training violate UK copyright and database rights. The outcome will heavily influence legislative direction.

For now, UK-based commercial AI training has no safe harbor. Most developers structure training outside the UK or rely on licenses.

Japan: The Permissive Outlier

Japan has the most AI-friendly copyright regime among major economies.

Article 30-4 of the Japanese Copyright Act (amended in 2018) allows the use of copyrighted works for purposes not aimed at enjoying the expression of the work, including machine learning. The exception is broad: it covers commercial training, there's no opt-out mechanism, and it applies to any type of work.

The Japan Copyright Office clarified in a March 2024 guidance document that:

  • Training on copyrighted works is generally permitted under Article 30-4.
  • But the exception does not apply if the purpose is to "enjoy" the expression (for example, building a model designed to reproduce a specific artist's style at scale).
  • Outputs that are substantially similar to training works can still infringe separately.

The result: Japan has attracted AI training workloads and faces ongoing pressure from creator groups. Expect refinements, not a reversal.

Other Jurisdictions Worth Watching

Singapore

Singapore's 2021 Copyright Act includes a computational data analysis exception (Section 244) that broadly permits TDM for any purpose, including commercial AI, provided the copy was lawfully accessed. No opt-out. It's structurally similar to Japan's approach.

Canada

Canada relies on fair dealing, which is narrower than US fair use because it requires an enumerated purpose (research, education, etc.). A 2021 consultation on AI and IP floated a TDM exception, but no amendment has been enacted. Ongoing cases against AI developers are testing how far fair dealing stretches.

China

Chinese courts have begun ruling on AI copyright. The Beijing Internet Court held in Li v. Liu (2023) that an AI-generated image could receive copyright protection where the human prompter showed sufficient creative input. On training, China has issued Interim Measures for the Management of Generative AI Services (2023) requiring providers to respect intellectual property rights, but has not enacted a TDM exception. Enforcement remains uneven.

Australia

Australia has no TDM exception. Its Productivity Commission recommended one in 2016, but parliament has not acted. A government consultation on "Safe and Responsible AI" in 2024 flagged copyright as an open issue.

Quick Comparison Table

| Jurisdiction | Approach | Commercial AI Training | Opt-Out | Transparency |

|---|---|---|---|---|

| United States | Fair use (case-by-case) | Often defensible, depends on facts | No formal mechanism | None by statute |

| European Union | Article 4 TDM exception | Permitted unless opted out | Required (machine-readable) | Yes (AI Act summary) |

| United Kingdom | Non-commercial TDM only | No safe harbor | Proposed | Proposed |

| Japan | Article 30-4 | Broadly permitted | None | None |

| Singapore | Section 244 TDM | Broadly permitted | None | None |

| Canada | Fair dealing | Uncertain | None | None |

| China | No dedicated exception | Uncertain, some liability | None | Via gen-AI rules |

Where the Law Is Heading

Four trends run across jurisdictions in 2026.

1. Provenance is replacing good faith. Courts and regulators increasingly ask where training data came from. Pirated corpora are a liability across every major jurisdiction. Paid licenses, public-domain collections, and opt-out compliance are the path forward.

2. Transparency is becoming the global baseline. Even where substantive rules differ, disclosure requirements are converging. The EU AI Act's training-content summary template is being studied in the UK, Canada, and several US state legislatures.

3. Outputs are the new battleground. Memorization, style cloning, and verbatim regurgitation raise infringement risks independent of how training was conducted. Expect more output-side rulings in 2026-2027.

4. Collective licensing is gaining traction. Publisher deals (OpenAI with major news organizations, Anthropic's settlements) and music-industry negotiations point toward a licensing market emerging alongside the legal regime, not instead of it.

What This Means for You

If you're a creator or rights holder

  • Implement a machine-readable opt-out. robots.txt entries for known AI crawlers plus a TDM reservation signal are the minimum.
  • Track AI Act transparency summaries. When models disclose what they trained on, use that to decide whether to license, object, or sue.
  • Register works in the US. Registration is a prerequisite for statutory damages and attorney's fees.

If you're building or deploying AI

  • Know your data provenance. Document sources, licenses, and opt-out compliance.
  • Apply the strictest applicable regime. If your model is available in the EU, EU rules apply even if training happened elsewhere.
  • Build output-side guardrails. Memorization tests, near-duplicate detection, and style filters reduce downstream liability.
  • Prefer licensed or synthetic data where feasible, especially for high-value verticals like news, books, music, and stock imagery.

If you're a business user of AI

  • Check your vendor's indemnification. Major providers now indemnify enterprise users against copyright claims arising from outputs, subject to conditions.
  • Understand your jurisdiction. A tool that's clean in Tokyo may expose you to liability in London.
  • See our AI Copyright Compliance: The 2026 Survival Guide for Businesses for a structured compliance checklist.

Key Takeaways

  • No single answer to "is AI training fair use?" exists globally. The answer depends on jurisdiction, data source, and use case.
  • The US is resolving the question case by case under fair use, with data acquisition and market substitution doing most of the work.
  • The EU has a structured system: Article 4 TDM opt-out plus AI Act transparency.
  • Japan and Singapore remain the most permissive. The UK and Canada are the most uncertain.
  • The direction of travel is toward more transparency, more licensing, and more attention to where training data came from.

Further Reading


This article is for informational purposes and is not legal advice. AI copyright law is evolving rapidly. Consult a qualified attorney for advice on specific situations. Published by the AI Copyright Legal editorial team (AI-assisted, human-reviewed).

Related Articles

Guide

AI Copyright Infringement Penalties in 2026: Fines, Damages & Consequences

What fines and damages can AI companies actually face for copyright infringement in 2026? A deep div...

Guide

Who Owns AI-Generated Code? Copyright, GitHub Copilot & the 2026 Legal Landscape

Can you copyright AI-generated code? What the GitHub Copilot lawsuit, US Copyright Office, and globa...

Guide

How to Find an AI Copyright Attorney for Your Case (2026)

Whether you've received a cease-and-desist letter, discovered your work in an AI training dataset, o...

Guide

Drafting a Corporate Policy for AI-Generated Content (2026 Template)

Learn how to draft a comprehensive corporate policy for AI-generated content in 2026. Includes a rea...

Guide

Breaking Down the EU AI Act: What Are the Copyright Transparency Requirements?

The EU AI Act creates the world's first legally binding copyright transparency requirements for AI c...