Home Tech Zuckerberg approved Meta’s use of ‘pirated’ books to train AI models, authors say

Zuckerberg approved Meta’s use of ‘pirated’ books to train AI models, authors say

0 comments
'Disheartening': Fact-checker reacts to Meta's move to scrap his position

Mark Zuckerberg approved Meta’s use of “pirated” versions of copyrighted books to train the company’s artificial intelligence models, a group of authors alleges in a US court filing.

Citing internal communications from Meta, the document claims that the social media company’s CEO endorsed the use of the LibGen data set, a vast archive of online books, despite warnings within the company’s executive AI team. that it is a data set that “we know is hacked.” .

The internal message says that using a database containing pirated material could weaken the Facebook and Instagram owner’s negotiations with regulators, according to the document. “Media coverage suggesting that we have used a data set that we know is hacked, such as LibGen, may undermine our negotiating position with regulators.”

American author Ta-Nehisi Coates, comedian Sarah Silverman and the other authors suing Meta for copyright infringement filed the allegations in a filing made public Wednesday in federal court in California.

The authors sued Meta in 2023, arguing that the social media company misused their books to train Llama, the large language model that powers its chatbots.

The Library Genesis, or LibGen, data set is a “shadow library” that originated in Russia and claims to contain millions of novels, nonfiction books, and scientific journal articles. Last year, a federal court in New York ordered the anonymous LibGen operators pay a group of publishers 30 million dollars (£24 million) in damages for copyright infringement.

The use of copyrighted content in training AI models has become a legal battleground in the development of generative AI tools like the ChatGPT chatbot, with creative professionals and publishers warning that using their work without permission is putting their livelihoods and business models at risk.

The filing cites a memo, referencing Mark Zuckerberg’s initials, that notes that “following escalation to MZ,” Meta’s AI team “has been approved to use LibGen.”

Citing internal communications, the document also says that Meta engineers discussed accessing and reviewing LibGen data, but hesitated to initiate that process because “torrenting,” a term for peer-to-peer file sharing, from “a laptop corporate (owned by Meta) does not work. I don’t feel well”.

A U.S. District Judge, Vince Chhabria, last year dismissed allegations that text generated by Meta’s AI models infringed on authors’ copyrights and that Meta illegally removed management information from copyright (CMI) of your books, which refers to information about the work, including the title. name of the author and copyright owner. However, the plaintiffs were given permission to amend their claims.

skip past newsletter promotion

The writers argued this week that the evidence bolstered their infringement claims and justified reviving their CMI case and adding a new allegation of computer fraud.

Chhabria said during a hearing Thursday that he would allow the writers to file an amended complaint, but expressed skepticism about the merits of the fraud and CMI’s claims.

Meta has been contacted for comment.

Reuters contributed to this article.

You may also like