Generative artificial intelligence has evolved significantly faster than ever before over the last several years and continues to change how content is created, consumed, and distributed. Whether it is drafting professional letters, composing music, or code generation, large language models (LLMs) have become central to various aspects of creativity and professional practices.
However, these developments have also sparked one of the most contentious legal issues of the digital era: does the use of copyrighted material to train AI constitute a violation of copyright law? With artificial intelligence presenting novelty to the legal system, two recent U.S. court rulings in cases involving Meta and Anthropic are beginning to define what might emerge as defining case law. At first glance, these decisions appear to favour the AI companies; however, closer inspection reveals a complex and uncertain legal environment. The discussion below challenges these rulings and evaluates the fair use doctrine about AI, as well as its implications for the future development of generative AI.
Background: The Data Problem in AI Training
Chat-based AI systems, such as Llama (by Meta) and Claude (by Anthropic), have been trained on large datasets comprising books, websites, articles, and various online media. Even after investing so much in high-quality computing infrastructure and human experts or talent, this third essential element, data, often originates on the internet without the implied permission of the content producers. Writers, music artists, publishing houses and artisans argue that this is exploitative and will not work, especially when their work is under a copyright.
These disputes were witnessed in two prominent cases within the United States, both of which ended as summary judgments by favouring AI enterprises. However, the decisions did not close the case. Instead, they’ve opened a new legal conversation around the rights of content creators and the responsibilities of AI developers.
Understanding Fair Use: The Legal Backbone of AI’s Defense
Under Section 107 of the U.S. Copyright Act, courts determine “fair use” by examining four primary factors:
- Purpose and Character of the Use
- Nature of the Copyrighted Work
- Amount and Substantiality of the Portion Used
- Effect on the Market for the Original Work
Importantly, fair use is an affirmative defence—the defendant admits to using copyrighted content but argues that their use is legally permissible.
Let’s explore how these factors played out in the Meta and Anthropic cases.
Case Study 1: Anthropic and the Use of Purchased and Pirated Books
Overview: In Anthropic’s case, the company utilized both physical books and pirated digital copies from sources such as Books3 and LibGen for training its Claude model.
Key Rulings:
- Transformative Use: The court deemed the act of scanning legally purchased books for internal research and model training as transformative, thereby satisfying one of the core tenets of fair use.
- Use of Pirated Books: The court was less lenient regarding Anthropic’s Use of pirated books. Judge William Alsup noted that creating a research library with millions of pirated works was not transformative, setting the stage for a separate trial on this issue.
Notable Quote: “Pirating copies to build a research library without paying for it… was its use — and not a transformative one.” – Judge Alsup.
Case Study 2: Meta’s Training of Llama with Torrented Data
Overview: Meta downloaded over 80.6 terabytes of content, including books from shadow libraries such as Z-Library and LibGen, to train its Llama models. Plaintiffs included high-profile authors like Sarah Silverman and Ta-Nehisi Coates.
Key Rulings:
- Fair use Affirmed: Judge Vince Chhabria ruled that Meta’s Use of these books for AI training qualified as fair use, largely because the plaintiffs failed to demonstrate significant market harm.
- Limits Acknowledged: Despite ruling in Meta’s favour, Judge Chhabria criticized the company’s stance that restricting AI training would halt technological progress, calling it “nonsense.”
Notable Quote: “Meta has defeated the plaintiffs’ half-hearted argument that its copying causes or threatens significant market harm… but this may be in tension with reality.” – Judge Chhabria.
Five Key Legal Takeaways from the Rulings
1. Training Methods Matter
The distinction between purchased and pirated content is legally significant. While scanning purchased books was deemed fair use, pirated content remains a potential legal landmine for AI companies.
2. Transformative Use Is Central
Courts emphasized that merely converting print books into digital formats for internal use can be transformative. However, indiscriminate scraping or pirating content lacks this protection.
3. Training AI ≠ Copying Content Verbatim
Both courts drew analogies between AI training and human learning. The assertion was that teaching a model to learn from content, rather than reproducing it verbatim, does not inherently violate copyright.
4. No Immediate Market Harm Means No Violation
One of the biggest weaknesses in the plaintiffs’ cases was their inability to demonstrate how AI training hurt sales or licensing markets for their work.
5. Piracy Could Still Be Punished
While narrow judgments protected the AI companies for certain uses, both courts kept the door open for future lawsuits, especially regarding the use of pirated materials. Anthropic, for example, is still facing a full trial for its use of unauthorized digital content.
Implications for the Industry
1. More Transparency Required
AI companies have long avoided disclosing their training datasets to avoid legal liability. These rulings suggest that transparency might become a legal necessity, especially when dealing with copyrighted material.
2. Licensing Markets May Emerge
Though courts currently do not recognize a market for licensing content specifically for AI training, this could change. Industry-wide standards for compensating creators might be developed as a result of continued litigation and public pressure.
3. Regulatory Frameworks Are Coming
Policymakers in the U.S., the EU, and other jurisdictions are now more motivated than ever to define clear AI regulations that strike a balance between innovation and creator rights. These recent rulings will likely influence upcoming legislation.
4. Cases to Watch
Upcoming lawsuits against OpenAI and other companies will test how these precedents apply to AI outputs, not just training data. This could shift legal scrutiny from inputs to inference layers—a significantly different legal battleground.
Conclusion: A Turning Point, Not the Final Word
The two copyright judgments involving Meta and Anthropic are not all-out wins for the AI firms; they are the first steps on the long and evolving road of law. In addition to shedding light on how courts define the meaning of fair use in the context of artificial intelligence, they also raise numerous questions that will likely remain unanswered.
Regulators, industry participants, and courts continue to struggle with answering these questions. Still, one aspect of the debate that remains unresolved is the ownership of data, leaving the key determinant of the future of generative AI unanswered: the discussion has yet to be concluded. The future of AI development, the people who benefit from it, and the compensation system that other creators receive in an automated world will be determined within a couple of years.