Judge Rules NVIDIA's Shadow Library Scripts 'Have No Other Purpose' Than Copyright Infringement
A federal judge denied NVIDIA's motion to dismiss a contributory copyright infringement lawsuit, ruling that scripts distributed to download pirated book datasets have no legitimate purpose — marking the first AI training case to apply the Supreme Court's Cox v. Sony framework.

Federal Court Delivers Major Blow to NVIDIA in AI Copyright Case
In a ruling that could reshape the legal landscape for AI training practices, U.S. District Judge Jon Tigar has denied NVIDIA's motion to dismiss a contributory copyright infringement lawsuit brought by a group of authors. The May 5, 2026 order found that scripts NVIDIA distributed to corporate customers for downloading pirated book datasets "have no other purpose than to speed up the process of infringement."
The decision marks the first time a federal court has applied the Supreme Court's recent Cox v. Sony framework to an AI training copyright case — and the result was decidedly unfavorable for the chip giant.
Background: Authors Take on the AI Hardware Giant
The case traces back to early 2024, when several authors, including Abdi Nazemian, filed a class action lawsuit against NVIDIA alleging that the company's NeMo Megatron AI models were trained using the Books3 dataset — a collection of copyrighted works sourced from the pirate site Bibliotik.
As discovery progressed, the plaintiffs uncovered additional evidence that NVIDIA had contacted Anna's Archive, one of the world's largest shadow libraries, inquiring about "high-speed access" to its massive collection of pirated books. This revelation added fuel to the authors' claims that NVIDIA's AI training pipeline was built on a foundation of systematic copyright infringement.
The Motion to Dismiss
NVIDIA filed a comprehensive motion to dismiss in January 2026, characterizing the authors' allegations as "speculative, vague, and legally insufficient." The company sought dismissal of:
- Direct copyright infringement claims linked to Bibliotik, Books3, and The Pile dataset
- Contributory copyright infringement allegations centered on scripts and tools NVIDIA distributed to corporate customers for automatically downloading The Pile
- Claims related to Anna's Archive, Z-Library, LibGen, Sci-Hub, and the Slimpajama dataset (though NVIDIA withdrew this request in March, narrowing the dispute)
The Cox v. Sony Standard
Central to NVIDIA's defense was the Supreme Court's recent ruling in Cox v. Sony, which significantly tightened the standard for contributory copyright infringement. Under the new framework, plaintiffs must demonstrate "active encouragement through specific acts" rather than merely showing that a product could be used for infringement.
NVIDIA argued that its NeMo Megatron Framework as a whole has "substantial non-infringing uses," and that under Cox, the plaintiffs needed to show NVIDIA marketed or promoted the framework specifically as a piracy tool.
Judge Tigar's Ruling: Scripts Are the Key
Judge Tigar rejected NVIDIA's broad framing of the issue. Rather than analyzing the entire Megatron framework, the court zeroed in on the specific scripts NVIDIA distributed to clients — tools designed solely to automate the downloading and preprocessing of The Pile dataset, which contains the infringing Books3 collection.
"The scripts are alleged to have no other purpose than to speed up the process of infringement, unlike the digital video recorder systems at issue in Sony Corp. or the internet service provided in Cox," Judge Tigar wrote in his order.
This distinction proved fatal to NVIDIA's defense. The court found that the scripts satisfied both the "inducement" and "tailored to infringement" standards required under the new Cox framework for contributory infringement liability.
BitTorrent: 'Merely a Tool'
NVIDIA also attempted to dismiss all allegations concerning BitTorrent protocol usage. Judge Tigar found this request "pretty thin," noting that the complaint contains only one reference to BitTorrent — a descriptive line about how Bibliotik distributes pirated works.
In a colorful analogy, the judge wrote: "Asking to dismiss allegations concerning BitTorrent is like asking to dismiss allegations concerning paintbrushes in a case about a dolphin painting," citing Folkens v. Wyland Worldwide, a copyright dispute over a painting of two dolphins.
The court's refusal to strip BitTorrent from the case is significant in light of Meta's parallel troubles, where BitTorrent seeding resulted in direct copyright infringement claims. NVIDIA appeared to want that door closed before discovery could open it.
What NVIDIA Won — and What Comes Next
NVIDIA did secure one partial victory: Judge Tigar dismissed the vicarious copyright infringement claim, finding that the authors failed to adequately allege that NVIDIA had both the legal right to control direct infringers and a direct financial interest in the infringement. However, the court granted the authors 21 days to address the deficiencies and refile.
Broader Implications for AI Training
This ruling arrives at a critical moment for the AI industry. The decision establishes that while general-purpose AI frameworks may enjoy protection under the Sony doctrine of substantial non-infringing uses, specific tools designed to facilitate access to infringing datasets do not receive the same shelter.
For AI companies, the message is clear: the manner in which training data is acquired matters as much as how it is used. Companies that provide automated tools for downloading datasets containing copyrighted material may face contributory infringement liability, even under the more defendant-friendly Cox standard.
The ruling also comes just days after major publishers filed a new lawsuit against Meta and Mark Zuckerberg, similarly accusing the company of training AI models on pirated books. Together, these cases signal an intensifying legal reckoning for the AI industry's data acquisition practices.
Key Takeaways
1. First application of Cox v. Sony to AI training — and it favored the copyright holders
2. Purpose-specific tools face higher scrutiny — general frameworks may be protected, but scripts designed solely for downloading infringing content are not
3. Shadow library connections are legally toxic — NVIDIA's contacts with Anna's Archive and use of Bibliotik-sourced data remain central to the case
4. BitTorrent seeding remains a liability risk — courts are unwilling to dismiss protocol-level allegations before discovery
5. The AI copyright litigation wave continues to grow — with new cases filed monthly against major tech companies
The case is Nazemian et al. v. NVIDIA Corporation, No. 3:24-cv-01428 (N.D. Cal.).
Related Articles
German Court Sets Twin Test for AI Images: No Copyright Without Human Creativity, No Infringement Without Copied Specifics
The Higher Regional Court of Düsseldorf (case I-20 W 2/26) holds that AI-generated images only quali...
Court RulingEU Top Court Backs Italy in Meta Press Publisher Copyright Fight
The Court of Justice of the European Union on May 12, 2026 upheld Italy's right to make Meta negotia...
Court RulingFederal Judge Rules NVIDIA AI Training Scripts Have No Other Purpose Than Copyright Infringement
In a landmark ruling, U.S. District Judge Jon Tigar denied NVIDIA motion to dismiss a contributory c...
Court RulingSupreme Court Denies AI Copyright Challenge
The Supreme Court denied certiorari in Thaler v. Perlmutter, confirming AI cannot hold copyrights.
AnalysisWhen Your Character Gets an AI Makeover: The BuzzFeed Cuppy Controversy and What It Means for Creator Rights
BuzzFeed greenlit an AI-generated Cuppy series through Amazon's Project Nara. Original creator Loryn...