少点错误 07月26日 01:12
Anthropic Faces Potentially “Business-Ending” Copyright Lawsuit
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

AI初创公司Anthropic因在训练其人工智能模型时大规模使用盗版书籍,正面临一项可能导致公司破产的集体诉讼。联邦法官已允许该诉讼继续进行,涉及的作者作品数量可能达数百万。判决可能导致数十亿甚至数千亿美元的赔偿,远超该公司的融资额。此案突显了AI行业在数据获取和版权问题上的法律风险,并可能重塑整个行业的游戏规则。Anthropic曾被视为行业中的道德典范,但此次法律挑战对其声誉和未来发展构成严重威胁。

📚 Anthropic公司因使用盗版书籍训练AI模型,面临一项可能导致公司破产的集体诉讼。联邦法官已批准该诉讼,涉及数百万美国作者的作品,可能面临巨额赔偿,最高可达数千亿美元,远超其融资总额,对其业务构成生存威胁。

⚖️ 法官裁定,Anthropic大规模下载和存储盗版书籍的行为不属于“合理使用”范畴,构成版权侵权。尽管该公司辩称其数据获取存在挑战,但其处理和保存盗版内容的方式,如保留多份副本并积极获取新副本,使其难以获得同情。

💥 此案的法律判决不仅对Anthropic构成严峻挑战,也为整个AI行业敲响了警钟。如果Anthropic败诉并上诉失败,其判例可能影响到OpenAI、Meta等其他AI公司,因为它们也面临类似的版权诉讼,可能导致行业对训练数据的获取和使用方式进行重大调整。

💡 Anthropic虽然在数据获取上付出了数百万美元购买和扫描书籍,但其大量依赖盗版资源的举动,与Meta等公司内部文件显示的对LibGen等盗版来源的知情和使用形成对比,凸显了AI行业在数据合规性方面存在的普遍问题。

⏳ 此次事件标志着AI行业在“快速行动,打破常规”的模式下,因侵犯版权可能付出的巨大代价。Anthropic的命运以及法院的判决,将为AI模型训练的数据来源和版权问题树立重要的行业先例。

Published on July 25, 2025 5:01 PM GMT

A class action over pirated books exposes the 'responsible' AI company to penalties that could bankrupt it — and reshape the entire industry

This is the full text of a post first published on Obsolete, a Substack that I write about the intersection of capitalism, geopolitics, and artificial intelligence. I’m a freelance journalist and the author of a forthcoming book called Obsolete: Power, Profit, and the Race to Build Machine Superintelligence. Consider subscribing to stay up to date with my work.

Anthropic, the AI startup that’s long presented itself as the industry’s safe and ethical choice, is now facing legal penalties that could bankrupt the company. Damages resulting from its mass use of pirated books would likely exceed a billion dollars, with the statutory maximum stretching into the hundreds of billions.

Last week, William Alsup, a federal judge in San Francisco, certified a class action lawsuit against Anthropic on behalf of nearly every US book author whose works were copied to build the company’s AI models. This is the first time a US court has allowed a class action of this kind to proceed in the context of generative AI training, putting Anthropic on a path toward paying damages that could ruin the company.

The class action certification came just one day after Bloomberg reported that Anthropic is fundraising at a valuation potentially north of $100 billion — nearly double the $61.5 billion investors pegged it at in March. According to Crunchbase, the company has raised $17.2 billion in total. However, much of that funding has come in the form of Amazon and Google cloud computing credits — not real money.

Santa Clara Law professor Ed Lee warned in a blog post that the ruling means “Anthropic faces at least the potential for business-ending liability.” He separately wrote that if Anthropic loses at trial, it will need to post a bond for the full amount of damages during any appeal, unless the judge grants an exception. In most cases, this bond needs to be backed by 100 percent collateral, plus a 1-2 percent annual premium.

Lee wrote in another post that Judge Alsup “has all but ruled that Anthropic’s downloading of pirated books is [copyright] infringement,” leaving “the real issue at trial… the jury’s calculation of statutory damages based on the number of copyrighted books/works in the class.”

On Thursday, the company filed a motion to stay — a request to essentially pause the case — in which they acknowledged the books covered likely number “in the millions.” Anthropic’s lawyers also wrote, “the specter of unprecedented and potentially business-threatening statutory damages against the smallest one of the many companies developing [large language models] with the same books data.”

The company could settle, but doing so could still cost billions given the scope of potential penalties.

Anthropic, for its part, told Obsolete it “respectfully disagrees” with the decision, arguing the court “failed to properly account for the significant challenges and inefficiencies of having to establish valid ownership millions of times over in a single lawsuit,” and said it is “exploring all avenues for review.”

The plaintiffs lawyers did not reply to a request for comment.

From “fair use” win to catastrophic liability

Just a month ago, Anthropic and the rest of the industry were celebrating what looked like a landmark victory. Alsup had ruled that using copyrighted books to train an AI model — so long as the books were lawfully acquired — was protected as “fair use.” This was the legal shield the AI industry has been banking on, and it would have let Anthropic, OpenAI, and others off the hook for the core act of model training.

But Alsup split a very fine hair. In the same ruling, he found that Anthropic’s wholesale downloading and storage of millions of pirated books — via infamous “pirate libraries” like LibGen and PiLiMi — was not covered by fair use at all. In other words: training on lawfully acquired books is one thing, but stockpiling a central library of stolen copies is classic copyright infringement.

Thanks to Alsup’s ruling and subsequent class certification, Anthropic is now on the hook for a class action encompassing five to seven million books — although only works with registered US copyrights are eligible for statutory damages, and the precise number remains uncertain. A significant portion of these datasets consists of non-English titles, many of which were likely never published in the US and may fall outside the reach of US copyright law. For example, an analysis of LibGen’s holdings suggests that only about two-thirds are in English.

Assuming that only two-fifths of the five million books are covered and the jury awards the statutory minimum of $750 per work, you still end up with $1.5 billion in damages. And as we saw, the company’s own lawyers just said the number is probably in the millions (though it’s worth noting they have an incentive to amplify the stakes in the case to the judge).

The statutory maximum and with five million books covered? $150,000 per work, or $750 billion total — a figure Anthropic’s lawyers called “ruinous.” No jury will award that, but it gives you a sense of the range.

The previous record for a case like this was set in 2019, when a federal jury found Cox Communications liable for $1 billion after the nation’s biggest music labels accused the company of turning a blind eye to rampant piracy by its internet customers. That verdict was overturned on appeal years later and is now under review by the Supreme Court.

But even that historic sum could soon be eclipsed if Anthropic loses at trial.

The decision to treat AI training as fair use was widely covered as a win for the industry — and, to be fair, it was. But Anthropic is now facing an existential threat, with barely a mention. Outside of the legal and publishing press, only Reuters and The Verge have covered the class certification ruling, and neither discussed the fact that this case could spell the end for Anthropic.

Respecting copyright is “not doable”

The legal uncertainty now facing the company comes as the industry continues an aggressive push in Washington to reshape the rules in their favor. In comments submitted earlier this year to the White House’s “AI Action Plan,” Meta, Google, and OpenAI all urged the administration to protect AI companies’ access to vast training datasets — including copyrighted materials — by clarifying that model training is unequivocally “fair use.” Ironically, Anthropic was the only leading AI company to not mention copyright in its White House submission.

At the Wednesday launch of the AI Action Plan, President Trump dismissed the idea that AI firms should pay to use every book or article in their training data, calling strict copyright enforcement “not doable” and insisting that “China’s not doing it.” In spite of these comments, Trump’s actual plan is conspicuously silent on copyright and administration officials told the press the issue should be left to the courts.

Anthropic made some mistakes

Anthropic isn’t just unlucky to be up first. The judge described this case as the “classic” candidate for a class action: a single company downloading millions of books in bulk, all at once, using file hashes and ISBNs to identify the works. The lawyers suing Anthropic are top-tier, and the judge has signaled he won’t let technicalities slow things down. A single trial will determine how much Anthropic owes; a jury could choose any number between the statutory minimum and maximum.

Crucially, the order makes clear that every act of downloading a pirated book is its own copyright violation — even if Anthropic later bought a print copy or only used part of the book for training.

And the company’s handling of the data after the piracy isn’t winning it any sympathy.

As detailed in the court order, Anthropic didn’t just download millions of pirated books; it kept them accessible to its engineers, sometimes in multiple copies, and apparently used the trove for various internal tasks long after training. Even when pirate sites started getting taken down, Anthropic scrambled to torrent fresh copies. After a company co-founder discovered a mirror of “Z-Library,” a database shuttered by the FBI, he messaged his colleagues: “[J]ust in time.” One replied, “zlibrary my beloved.”

That made it much easier for the judge to say: this is “Napster” for the AI age, and the copyright law is clear.

Anthropic is separately facing a major copyright lawsuit from the world’s biggest music publishers, who allege that the company’s chatbot Claude reproduced copyrighted lyrics without permission — a case that could expose the firm to similar per-work penalties from thousands to potentially millions of songs.

Ironically, Anthropic appears to have tried harder than some better-resourced competitors to avoid using copyrighted materials without any compensation. Starting in 2024, the company spent millions buying books, often in used condition — cutting them apart, scanning them in-house, and pulping the originals — to feed its chatbot Claude, a step no rival has publicly matched.

Meta, despite its far deeper pockets, skipped the buy-and-scan stage altogether — damning internal messages show engineers calling LibGen “obviously pirated” data and revealing that the approach was approved by Mark Zuckerberg.

Why the other companies should be nervous

If Anthropic settles, it could end up the only AI company forced to pay out — if judges in other copyright cases follow Meta’s preferred approach and treat downloading and training as a single, potentially fair use act.

Right now, Anthropic’s only real hope is to win on appeal and convince a higher court to treat the downloading and training together as a single act of training, which the judge deemed fair use. (The judge in Meta’s case assessed them together as one act of training, which gives Anthropic a shot.) But appeals usually have to wait until after a jury trial — so the company faces a brutal choice: settle for potentially billions, or risk a catastrophic damages award and years of uncertainty. If Anthropic goes to trial and loses on appeal, the resulting precedent could drag Meta, OpenAI, and possibly even Google into similar liability.

OpenAI and Microsoft now face 12 consolidated copyright suits — a mix of proposed class actions by book authors and cases brought by news organizations (including The New York Times) — in the Southern District of New York before Judge Sidney Stein.

If Stein were to certify an authors’ class and adopt an approach similar to Alsup’s ruling against Anthropic, OpenAI’s potential liability could be far greater, given the number of potential covered works.

What’s next

A trial is set for December 1st. Unless Anthropic can pull off a legal miracle, the industry is about to get a lesson in just how expensive “move fast and break things” can be when the thing you’ve broken is copyright law — a few-million times over.

A multibillion dollar settlement or jury award would be a death-knell for almost any four-year-old company, but the AI industry is different. The cost to compete is enormous, and the leading firms are already raising multibillion dollar rounds multiple times a year.

That said, Anthropic has access to less capital than its rivals at the frontier — OpenAI, Google DeepMind, and, now, xAI. Overall, company-killing penalties may be unlikely, but they’re still possible, and Anthropic faces the greatest risk at the moment.

And some competitors seem to have functionally unlimited capital. To build out its new superintelligence team, Meta has been poaching rival AI researchers with nine-figure pay packages, and Zuckerberg recently said his company would invest “hundreds of billions of dollars” into its efforts.

To keep up with its peers, Anthropic recently decided to accept money from autocratic regimes, despite earlier misgivings. On Sunday, CEO Dario Amodei issued a memo to staff saying the firm will seek investment from Gulf states, including the UAE and Qatar. The memo, which was obtained and reported on by Kylie Robison at WIRED, admitted the decision would probably enrich “dictators” — something Amodei called a “real downside.” But, he wrote, the company can’t afford to ignore “a truly giant amount of capital in the Middle East, easily $100B or more.”

Amodei apparently acknowledged the perceived hypocrisy of the decision, after his October essay/manifesto “Machines of Loving Grace” extolled how important it is that democracies win the AI race.

In the memo, Amodei wrote, “Unfortunately, I think ‘No bad person should ever benefit from our success’ is a pretty difficult principle to run a business on.”

The timing is striking: the note to staff went out only days after the class action certification suddenly presented Anthropic with potentially existential legal risk.


The question of whether generative AI training can lawfully proceed without permission from rights-holders has become a defining test for the entire industry.

OpenAI and Meta may still wriggle out of similar exposure, depending on how their judges rule and whether they can argue that the core act of AI training is protected by fair use. But for now, it’s Anthropic — not OpenAI or Meta — that’s been forced onto the front lines, while the rest of the industry holds its breath.

Edited by Sid Mahanta, with inspiration and review from my friend Vivian.

If you enjoyed this post, please subscribe to Obsolete



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Anthropic 人工智能 版权侵权 AI训练数据 法律风险
相关文章