Authors v. Anthropic
1.5B settlement rejected by judge.
Andrea Bartz et al. vs. Anthropic
U.S. District Court, Northern District of California (Judge William Alsup)
Copyright infringement: Anthropic used copyrighted books from The Pile dataset to train Claude witho...
$1.5 billion (settlement offered, rejected)
Authors v. Anthropic — The $1.5 Billion AI Training Case
Case Summary
Multiple authors including Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson sued Anthropic in August 2024 for using their copyrighted books to train the Claude AI model. The books were included in "The Pile," a dataset that inadvertently contained copyrighted works. Anthropic offered a record $1.5 billion settlement, which the judge rejected.
Timeline
| Date | Event |
|---|---|
| Aug 2024 | Authors file lawsuit |
| Early 2025 | Judge Alsup rules purchased books = fair use |
| Mid 2025 | Alsup rules The Pile usage = NOT fair use |
| Aug 2025 | Anthropic offers $1.5B settlement |
| Late 2025 | Judge Alsup rejects settlement terms |
| 2026 | Renegotiation ongoing |
Key Legal Issues
The Dual Fair Use Ruling
Judge Alsup made a critical distinction:
Fair Use (Purchased Books):
- Anthropic legally bought physical books and digitized them
- Use was transformative (training, not reproduction)
- AI does not reproduce books verbatim
- Result: Fair use
NOT Fair Use (The Pile):
- Books were included without license or purchase
- Dataset creators had not obtained permission
- Some works remained after copyright concerns were raised
- Result: Copyright infringement
Why the Settlement Was Rejected
Judge Alsup rejected the $1.5B settlement because:
- ~$3,000 per author was deemed insufficient
- Release terms were too broad (would prevent future claims)
- Opt-out process was unnecessarily complicated
- Insufficient disclosure about which specific works were used
- Terms would be forced "down the throat of authors"
Significance
The Numbers
- $1.5 billion — largest copyright settlement offer in U.S. history
- ~500,000 authors affected
- ~$3,000 per author (deemed insufficient)
What This Establishes
1. Unlicensed use of copyrighted works for AI training CAN be infringement
2. Purchasing books and digitizing them for training MAY be fair use
3. The source/provenance of training data matters legally
4. Courts will scrutinize settlement terms to protect creators
5. AI companies face real financial liability for training data choices
Current Status
SETTLEMENT REJECTED — Parties are renegotiating revised terms that address Judge Alsup's concerns. New proposal expected in 2026.