Analysis 11 min read

AI Training and Copyright: How 10 Countries Are Handling It Differently in 2026

A 2026 comparative analysis of AI training and copyright rules in the United States, EU, UK, Japan, China, India, Brazil, Australia, Singapore, and South Korea.

AI Training and Copyright: How 10 Countries Are Handling It Differently in 2026

AI training is not governed by one global copyright rule. In 2026, the practical answer to “can we train on copyrighted works?” depends heavily on where the training happens, where the model is offered, what kind of content is used, whether rightsholders opted out, and whether the output competes with the original market.

That makes global AI copyright compliance harder than most startup playbooks admit. A U.S. team may be thinking in terms of fair use. A European publisher may be thinking in terms of text-and-data mining reservations. A Japanese developer may point to Japan’s unusually broad information-analysis exception. A Chinese platform may be more focused on algorithm filing, synthetic-content labeling, and content governance. None of those frames fully solves the others.

This article compares ten important jurisdictions: the United States, European Union, United Kingdom, Japan, China, India, Brazil, Australia, Singapore, and South Korea. It is written for companies building, fine-tuning, licensing, or deploying generative AI systems across borders. It is not a substitute for local legal advice, but it gives you a working map of the risk.

For background, pair this with our AI fair use defense analysis, the AI copyright compliance checklist, and our guide to proving human authorship in AI-assisted works.


The core split: fair use, TDM exceptions, licensing, and transparency

Countries are not merely choosing “pro-AI” or “anti-AI” positions. They are choosing different legal architectures.

Four models dominate:

1. Open-ended balancing, especially U.S. fair use. Courts ask purpose, nature, amount, and market effect.

2. Specific text-and-data mining exceptions, common in Europe and parts of Asia-Pacific.

3. Licensing-led models, where collecting societies, publishers, and AI developers negotiate permission.

4. Transparency and governance rules, which may not decide infringement directly but create evidence trails and obligations.

A global AI company usually needs all four in its compliance stack. A fair use memo may help in New York. It will not answer EU opt-out management. An EU training-data summary may help with transparency. It will not prove that your output does not substantially reproduce a protected novel. A license may solve one dataset. It may not cover downstream fine-tuning, embeddings, retrieval, or commercial output.

1. United States: fair use is powerful, but less predictable after Warhol and Ross

The United States remains the most closely watched AI training jurisdiction because major cases against OpenAI, Microsoft, Anthropic, Meta, Stability AI, and others are pending or developing there.

The legal center is 17 U.S.C. § 107, the fair use statute. Courts consider four factors: purpose and character, nature of the work, amount used, and market effect. AI developers often argue that training is transformative because the model learns statistical relationships rather than republishing books, images, or articles. Rightsholders respond that copying entire works at industrial scale, especially to build commercial substitutes or competing licensing products, is not fair.

Three U.S. developments matter for 2026 risk analysis.

First, the Supreme Court’s May 18, 2023 decision in Andy Warhol Foundation v. Goldsmith warned against treating “transformative” as a magic word. The Court focused on whether the challenged use shared a similar commercial purpose with the original licensing market. For AI, that reasoning matters when training or outputs compete with the markets rightsholders already license.

Second, the February 2025 ruling in Thomson Reuters Enterprise Centre GmbH v. Ross Intelligence Inc. rejected Ross’s fair use defense for copying Westlaw headnotes to build a competing legal research tool. The court emphasized commercial substitution and market harm. That case was not a generative AI model in the ChatGPT sense, but it is highly relevant where copyrighted material is used to build a product serving the same customers.

Third, pending cases such as The New York Times v. Microsoft/OpenAI and author suits against Anthropic and Meta keep the central training question unresolved. Some claims focus on ingestion. Others focus on memorization, output similarity, removal of copyright management information, or alleged use of pirated datasets.

Practical U.S. rule: fair use may be plausible for some training, especially research-like or non-substitutive uses, but it is not a blanket permission slip. The more your product competes with the source market, reproduces expressive material, or depends on pirated copies, the weaker the defense becomes.

2. European Union: TDM exceptions exist, but opt-outs and AI Act transparency change the game

The EU has a more rule-based structure than the United States. The 2019 DSM Copyright Directive created text-and-data mining exceptions, especially Articles 3 and 4.

Article 3 covers research organizations and cultural heritage institutions for scientific research. Article 4 is broader and can cover commercial text-and-data mining, but only if rightsholders have not reserved their rights “in an appropriate manner,” such as machine-readable opt-outs for online content.

That opt-out feature is crucial. In the EU, the compliance question is not simply “is training transformative?” It is often: did the rightsholder reserve rights, was the source lawfully accessed, and was the mining activity within the exception?

The EU AI Act adds another layer. Providers of general-purpose AI models must maintain technical documentation, respect EU copyright law, and publish sufficiently detailed summaries of training content. The AI Act does not automatically decide whether training was lawful, but it increases visibility. A summary can help rightsholders identify whether their sectors, sources, or works may have been used.

For developers, the EU creates an operational burden: track lawful access, identify opt-outs, document datasets, and prepare training-content summaries. For rightsholders, the EU offers a more concrete path than U.S. litigation alone: reserve rights clearly and use transparency disclosures to investigate.

Practical EU rule: commercial TDM may be possible, but opt-out compliance and documentation are not optional details. Treat EU training as an evidence-management problem from day one.

3. United Kingdom: uncertainty after the failed broad TDM proposal

The UK has been debating AI training intensely. Its current copyright framework includes a text-and-data analysis exception for non-commercial research, but a broader commercial exception has been politically difficult.

In 2022, the UK government floated a broad TDM exception that would have favored AI developers. Creative industries pushed back strongly. By 2023 and 2024, the government shifted toward consultation, codes of practice, and licensing discussions rather than immediate sweeping reform.

That leaves UK developers in an awkward position. The UK has a strong AI policy agenda and wants to encourage innovation, but copyright owners have not accepted a free commercial training exception. Litigation risk remains meaningful if copyrighted works are copied without permission outside narrow exceptions.

The UK also has a mature collective licensing culture in music, publishing, and media. That means a licensing-led solution may be more likely than a purely litigation-led one, but only if AI developers can offer transparency, auditability, and reasonable compensation.

Practical UK rule: do not assume U.S.-style fair use exists. UK fair dealing is narrower and purpose-specific. For commercial training, licensing and careful source control are usually safer than relying on implied permission.

4. Japan: one of the broadest information-analysis exceptions, with important limits

Japan is often cited as relatively AI-friendly because Article 30-4 of Japan’s Copyright Act permits use of works for information analysis and other non-expressive purposes, provided the use does not unreasonably prejudice the copyright owner’s interests.

This framework can be favorable for machine learning because training may be characterized as extracting information rather than enjoying or communicating the expressive work itself. Japan’s approach is more specific than U.S. fair use and broader than many fair dealing systems.

But “broad” does not mean “risk-free.” The exception has limits. Uses that substitute for the expressive market, reproduce protected expression, or unreasonably harm rightsholders can still create problems. Output infringement remains a separate issue: even if training is permitted, a generated image, song, character, or text passage that is substantially similar to a protected work may still infringe.

Japan also has active cultural industries—manga, anime, games, music—that care deeply about unauthorized imitation. A technically available training exception may not protect a business from reputational backlash, platform restrictions, or disputes over outputs.

Practical Japan rule: Japan may be relatively permissive for training analysis, but companies still need output controls, source documentation, and market-harm review.

5. China: copyright is only one piece of a broader AI governance regime

China’s AI rules are not just about copyright. They combine copyright law with platform regulation, algorithm governance, generative AI measures, data security, and synthetic-content labeling.

China’s 2023 Interim Measures for Generative Artificial Intelligence Services require providers to respect intellectual property rights and take measures to improve training-data quality, authenticity, accuracy, objectivity, and diversity. China also regulates deep synthesis and recommendation algorithms, including filing and labeling obligations for certain services.

On copyright itself, Chinese courts have shown willingness to recognize human-created AI-assisted outputs in some contexts while scrutinizing originality and human contribution. The Beijing Internet Court’s November 2023 decision involving an AI-generated image found copyright protection where the human user made sufficiently individualized choices through prompts and adjustments. That does not answer training legality, but it shows courts are not treating every AI-related work as automatically outside copyright.

For training, developers should be cautious about licensed source material, personal information, public content scraping, and politically or socially sensitive outputs. Compliance is not only “do we have a copyright defense?” It is also “can we operate the service under Chinese AI content and data rules?”

Practical China rule: build copyright clearance into a broader governance program covering data provenance, labeling, safety, filing, and output controls.

6. India: no AI-specific training exception, high uncertainty, huge market stakes

India has a fast-growing AI market and a major creative economy, but its copyright statute does not yet provide a clear AI training exception comparable to Japan’s or the EU’s commercial TDM structure.

India recognizes fair dealing for specific purposes such as private or personal use, research, criticism, review, and reporting current events. It is not an open-ended U.S.-style fair use doctrine. That makes broad commercial AI training harder to justify without licensing, especially if full copyrighted works are copied into datasets.

India’s courts have experience with software, intermediary liability, education copying, and digital platforms, but the core AI training question remains unsettled. The risk is especially important for language models trained on Indian books, news, film scripts, music lyrics, and regional-language content.

A practical complication is linguistic diversity. Training data may include works in Hindi, Bengali, Tamil, Telugu, Marathi, Urdu, and many other languages, often from publishers or creators who are not part of global licensing deals. A company that clears English-language datasets but ignores regional content may still face local disputes.

Practical India rule: assume commercial training on copyrighted works needs careful legal review and, where feasible, licenses. Do not import U.S. fair use assumptions into Indian operations.

7. Brazil: emerging AI policy meets author-rights tradition

Brazil’s copyright law has strong author-rights features and does not currently provide a broad, clear commercial AI training exception. Brazil has also been actively debating AI regulation, including risk-based governance proposals.

For AI training, the absence of a broad exception means developers should focus on lawful access, licensing, dataset provenance, and avoiding output substitution. Brazil’s creative sectors—music, audiovisual, journalism, books, and visual arts—are significant. Portuguese-language training also raises market-specific issues that may not be covered by licenses negotiated for English datasets.

Brazilian law includes limitations and exceptions, but they are generally more specific than U.S. fair use. That matters because large-scale copying for model training may not fit comfortably into traditional categories.

For rightsholders, Brazil may become a jurisdiction where collective licensing and transparency demands grow. For developers, a “we scraped the open web” explanation is unlikely to be a complete compliance strategy.

Practical Brazil rule: treat Brazil as a licensing-and-provenance jurisdiction unless future reforms create a clearer TDM path.

8. Australia: fair dealing plus policy review, but no blanket commercial training right

Australia uses fair dealing, not U.S. fair use. Existing exceptions cover purposes such as research or study, criticism or review, parody or satire, reporting news, and certain library or disability-access uses. There is no general open-ended fair use defense, despite years of reform debate.

That makes commercial AI training uncertain. A developer might argue that some intermediate copying is technical or non-expressive, but the statute does not provide the same broad balancing test as U.S. law. Australia has been examining copyright and AI policy, but as of 2026 companies should not assume a broad training exception exists.

Australia also has active media bargaining and news-policy debates. Training on news content may raise not only copyright issues but competition, platform, and compensation concerns. For models deployed in education, government, or enterprise contexts, procurement teams may increasingly ask for dataset assurances.

Practical Australia rule: use licenses, clean datasets, and documented risk assessments for commercial training. Fair dealing is not a general AI safe harbor.

9. Singapore: explicit computational data analysis exception with conditions

Singapore is one of the more important Asia-Pacific jurisdictions for AI because it combines a pro-innovation technology policy with a modernized copyright statute.

Singapore’s Copyright Act 2021 includes an exception for computational data analysis. It can support text-and-data mining and machine learning, but it has conditions. Access must be lawful, and the exception does not mean businesses can bypass technical protection measures or ignore contractual limits without analysis. Copies made for computational analysis also need to be handled within statutory boundaries.

This makes Singapore more structured than jurisdictions with no TDM rule, but less casual than “anything online is free to train on.” It is attractive for regional AI operations precisely because the law recognizes computational analysis, yet companies still need source governance.

Singapore’s broader AI governance approach also emphasizes accountability, model risk management, and practical frameworks. That can be useful for enterprise buyers who want documentation rather than abstract legal theories.

Practical Singapore rule: Singapore can be favorable for computational analysis, but compliance still depends on lawful access, contract review, and copy-management discipline.

10. South Korea: active reform debate and strong creative-industry pressure

South Korea sits at the intersection of advanced AI development and globally powerful creative industries: K-pop, film, television, games, webtoons, and publishing. That makes AI training law politically sensitive.

Korean copyright law includes limitations and exceptions, and policymakers have discussed AI and data-mining reforms. But companies should be cautious about assuming a broad commercial training right. The economic value of Korean creative content means rightsholders are likely to resist uncompensated model training, especially for music, images, performance, characters, and webtoon styles.

Korea also has strong personality, unfair competition, and platform-related concerns that can overlap with copyright. A model that generates a K-pop-like voice, a webtoon-like character, or a drama-script imitation may trigger more than a narrow copying analysis.

Practical Korea rule: expect licensing pressure and scrutiny for culturally valuable datasets. Use conservative controls for music, audiovisual, webtoon, game, and celebrity-adjacent content.


Comparison table: the compliance posture by jurisdiction

  • United States: open-ended fair use; high litigation uncertainty; market substitution is critical.
  • European Union: TDM exceptions plus opt-outs; AI Act training summaries and copyright-policy duties.
  • United Kingdom: narrow fair dealing; broad commercial TDM remains unresolved; licensing likely important.
  • Japan: broad information-analysis exception; output infringement and market harm still matter.
  • China: copyright plus AI service governance, labeling, data, and content controls.
  • India: no clear AI training exception; fair dealing is limited; regional-language rights matter.
  • Brazil: author-rights tradition; no broad TDM safe harbor; provenance and licensing are key.
  • Australia: fair dealing, not fair use; no blanket commercial training right.
  • Singapore: computational data analysis exception with lawful-access and management conditions.
  • South Korea: reform debate plus strong creative industries; licensing pressure likely.

A global compliance framework for AI training in 2026

A company operating across these jurisdictions should not build ten separate compliance programs. It should build one global program with local switches.

Start with five controls.

1. Dataset provenance

For every dataset, record source, collection date, license, access method, jurisdiction, content type, and restrictions. “Common Crawl” is not a legal analysis. It is a starting point for one.

2. Rights reservation and opt-out handling

For EU-facing operations, track machine-readable rights reservations. For publisher or artist opt-outs elsewhere, decide whether the company will honor them globally as a policy choice even where the law is uncertain.

3. Market-substitution review

Ask whether the model or output competes with the source market. A legal research model trained on legal headnotes, a lyrics model trained on lyrics, or an image generator marketed as a substitute for specific artists faces higher risk than a general analysis tool with licensed or filtered data.

4. Output controls

Training legality is only half the question. Add filters and review for verbatim text, near-identical images, recognizable characters, music similarity, code license contamination, and requests to imitate living creators.

5. Documentation for regulators, customers, and courts

In 2026, “trust us” is not enough. Enterprise buyers, regulators, and courts want records. Keep dataset cards, vendor contracts, model cards, opt-out logs, incident records, and human review notes.

Our corporate AI-generated content policy template can help turn those controls into internal rules.

The biggest mistake: choosing the most permissive country and ignoring the rest

Some AI teams ask: “Can we train in the most permissive jurisdiction and deploy everywhere?” That is risky.

Copyright claims can arise where copying occurs, where the service is offered, where outputs are distributed, where rightsholders are located, or where contracts point. Data transfers, cloud infrastructure, and user access can create multiple legal touchpoints. A model trained in Japan but deployed in the EU may still face EU transparency and opt-out questions. A U.S. fair use theory may not satisfy a UK customer. A Singapore computational-analysis exception may not solve Indian publisher claims.

The better approach is not forum shopping. It is rights-aware architecture: separate datasets by license and jurisdiction, document transformations, honor opt-outs where required, and use higher-risk content only with stronger permission.

Final takeaway

AI training law in 2026 is fragmented, but not impossible to manage. The key is to stop asking a single abstract question—“is AI training legal?”—and start asking operational questions:

  • Which works are in the dataset?
  • Where were they accessed?
  • What licenses or exceptions apply?
  • Did rightsholders opt out?
  • Does the model compete with the source market?
  • Can outputs reproduce protected expression?
  • What documentation can we show if challenged?

The United States may eventually produce landmark fair use rulings. The EU is already forcing transparency and opt-out discipline. Japan and Singapore offer more explicit room for computational analysis. The UK, India, Brazil, Australia, and South Korea remain more cautious or unsettled for commercial training.

For companies, the winning strategy is neither panic nor permissionless scraping. It is disciplined evidence: know your datasets, know your markets, know your jurisdictions, and design the model so legal review is part of the pipeline rather than a fire drill after launch.

Related Articles

Analysis

The AI Fair Use Defense: What Courts Actually Look For in 2026

A deep analysis of how US courts are actually applying the four fair use factors to AI training case...

Analysis

Dreams of Violets at Tribeca 2026: What the First AI-Generated Film at a Major Festival Means for Copyright Law

Tribeca premieres the first fully AI-generated film on June 10, 2026. Here is what the milestone mea...

Analysis

When Your Character Gets an AI Makeover: The BuzzFeed Cuppy Controversy and What It Means for Creator Rights

BuzzFeed greenlit an AI-generated Cuppy series through Amazon's Project Nara. Original creator Loryn...

Analysis

AI Remixes, Colorizations & Copyright: Who Owns a Machine-Altered Masterpiece?

The Ansel Adams Trust's condemnation of an unauthorized AI-colorized print of 'Moonrise' has exposed...

Analysis

AI Copyright Licensing in 2026: How Big Tech-Publisher Deals Are Reshaping the Industry

From OpenAI's Reddit deal to publisher lawsuits against Meta, 2026 marks a turning point in AI copyr...