TechCrunch News 01月10日
Mark Zuckerberg gave Meta’s Llama team the OK to train on copyrighted works, filing claims
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

针对Meta的版权诉讼中,原告律师指控Meta CEO扎克伯格批准使用盗版电子书和文章数据集LibGen训练Llama AI模型。Meta辩称其行为受合理使用原则保护,但原告方包括知名作家Sarah Silverman和Ta-Nehisi Coates等,对此表示反对。新解密的文件显示,Meta内部曾有人担忧LibGen的盗版性质及其对公司与监管机构谈判的影响,但扎克伯格仍批准使用。Meta被指控不仅使用盗版数据,还涉嫌删除版权信息,甚至通过BT下载方式传播盗版内容。此案目前仅涉及Meta早期Llama模型,结果仍有待法院裁决。

⚖️Meta被指控使用盗版数据集LibGen训练其Llama AI模型,引发版权诉讼。原告方包括多位知名作家,他们认为Meta侵犯了他们的版权。

⚠️Meta内部员工曾明确指出LibGen为“盗版数据集”,并担忧其使用可能损害公司与监管机构的谈判地位,但扎克伯格仍批准使用。

📝Meta被指控删除LibGen数据中的版权信息,包括“copyright”和“acknowledgments”等字样,以掩盖其侵权行为,同时还通过BT下载方式传播盗版内容。

👨‍⚖️法官驳回了Meta要求对文件进行大量编辑的请求,指出Meta此举并非为了保护商业机密,而是为了避免负面宣传。

Counsel for plaintiffs in a copyright lawsuit filed against Meta allege that Meta CEO Mark Zuckerberg gave the green light to the team behind the company’s Llama AI models to use a data set of pirated ebooks and articles for training.

The case, Kadrey v. Meta, is one of many against tech giants developing AI that accuse the companies of training models on copyrighted works without permission. For the most part, defendants like Meta have asserted that they’re shielded by fair use, the U.S. legal doctrine that allows for the use of copyrighted works to make something new as long as it’s sufficiently transformative. Many creators reject that argument.

In newly unredacted documents filed with the U.S. District Court for the Northern District of California late Wednesday, plaintiffs in Kadrey v. Meta, who include bestselling authors Sarah Silverman and Ta-Nehisi Coates, recount Meta’s testimony from late last year, during which it was revealed that Zuckerberg approved Meta’s use of a data set called LibGen for Llama-related training.

LibGen, which describes itself as a “links aggregator,” provides access to copyrighted works from publishers including Cengage Learning, Macmillan Learning, McGraw Hill, and Pearson Education. LibGen has been sued a number of times, ordered to shut down, and fined tens of millions of dollars for copyright infringement.

According to Meta’s testimony, as relayed by plaintiffs’ counsel, Zuckerberg cleared the use of LibGen to train at least one of Meta’s Llama models despite concerns within Meta’s AI exec team and others at the company. The filing quotes Meta employees as referring to LibGen as a “data set we know to be pirated,” and flagging that its use “may undermine [Meta’s] negotiating position with regulators.”

The filing also cites a memo to Meta AI decision-makers noting that after “escalation to MZ,” Meta’s AI team “[was] approved to use LibGen.” (MZ, here, is rather obvious shorthand for “Mark Zuckerberg.”)

The details seemingly line up with reporting from The New York Times last April, which suggested that Meta cut corners to gather data for its AI. At one point, Meta was hiring contractors in Africa to aggregate summaries of books and considering buying the publisher Simon & Schuster, according to the Times. But the company’s execs determined that it would take too long to negotiate licenses and reasoned that fair use was a solid defense.

The filing Wednesday contains new accusations, like that Meta might’ve tried to conceal its alleged infringement by stripping the LibGen data of attribution.

According to plaintiffs’ counsel, Meta engineer Nikolay Bashlykov, who works on the Llama research team, wrote a script to remove copyright info, including the word “copyright” and “acknowledgments,” from ebooks in LibGen. Separately, Meta allegedly stripped copyright markers from science journal articles and “source metadata” in the training data it used for Llama.

“This discovery suggests that Meta strips [copyright information] not just for training purposes,” the filing reads, “but also to conceal its copyright infringement, because stripping copyrighted works … prevents Llama from outputting copyright information that might alert Llama users and the public to Meta’s infringement.”

According to the latest filing, Meta also revealed during depositions that it torrented LibGen, a move that gave some Meta research engineers pause. Torrenting, a way of distributing files across the web, requires that torrenters simultaneously “seed,” or upload, the files they’re trying to obtain.

Plaintiffs’ counsel alleges that Meta effectively engaged in another form of copyright infringement by torrenting LibGen and thus helping to spread its contents. Meta also tried to conceal its activities, counsel alleges, by minimizing the number of files it uploaded.

According to the filing, Meta’s head of generative AI, Ahmad Ah-Dahle, “cleared the path” for torrenting LibGen — brushing aside Bashlykov’s reservations that doing so “could be legally not OK.”

“Had Meta bought plaintiffs’ works in a bookstore or borrowed them from a library and trained its Llama models on them without a license, it would have committed copyright infringement,” wrote plaintiffs’ counsel in the filing. “Meta’s decision to bypass lawful methods of acquiring books and become a knowing participant in an illegal torrenting network … serves as proof of copyright infringement.”

The case against Meta is far from decided. As of now, it only pertains to Meta’s earliest Llama models — not its recent releases. And the court may well decide in Meta’s favor if it’s persuaded by the company’s fair use argument.

But the allegations don’t reflect well on Meta, as the judge presiding over the case, Judge Thomas Hixson, noted in an order on Wednesday rejecting Meta’s request to redact large portions of the filing.

“It is clear that Meta’s sealing request is not designed to protect against the disclosure of sensitive business information that competitors could use to their advantage,” Hixson wrote. “Rather, it is designed to avoid negative publicity.”

We’ve reached out to Meta for comment and will update this piece if we hear back.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Meta Llama AI 版权侵权 LibGen 合理使用
相关文章