Tuesday, 16 September 2025
27.3 C
Singapore
26.9 C
Thailand
19 C
Indonesia
27.1 C
Philippines

Internal chats expose Meta’s approach to AI training data

Court filings reveal Meta staff debated using copyrighted materials for AI training, discussing legal risks and alternative data sources like Libgen.

Meta employees have debated using copyrighted materials to train artificial intelligence (AI) models for years, even when acquiring such content raised legal concerns. According to newly unsealed court documents, internal discussions among staff reveal how Meta may have obtained and used copyrighted books and other materials without clear permission.

The documents were filed as part of the lawsuit Kadrey v. Meta, one of several ongoing copyright disputes involving AI in the U.S. legal system. The plaintiffs, including authors Sarah Silverman and Ta-Nehisi Coates, argue that Meta’s use of protected works in AI training is unlawful. On the other hand, Meta insists that its actions fall under “fair use.”

Earlier court filings claimed that Meta CEO Mark Zuckerberg approved training AI models on copyrighted content and that the company had halted negotiations with book publishers over licensing deals. The latest documents, which include internal chat logs, provide further insight into how Meta’s AI team may have approached this controversial issue.

Staff conversations reveal concerns and strategies

One conversation from February 2023 shows Meta researchers openly discussing acquiring copyrighted books for AI training despite potential legal risks. According to the filings, Xavier Martinet, a Meta research engineer, suggested a bold approach: “My opinion would be (in the line of ‘ask forgiveness, not for permission’): we try to acquire the books and escalate it to execs so they make the call.”

Instead of striking licensing agreements with publishers, Martinet proposed purchasing e-books at retail prices to build a dataset for AI training. When a colleague pointed out that using unauthorised materials could lead to legal trouble, Martinet responded that many AI startups already used pirated books. “Worst case: we find out it is finally okay, while a gazillion startups just pirated tons of books on BitTorrent,” he wrote.

Melanie Kambadur, a senior manager for Meta’s Llama AI model research team, acknowledged that using copyrighted material required approval but noted that Meta’s legal team had become “less conservative” in approving training data than before. “We need to get licenses or approvals on publicly available data still,” she said, according to the filings. “The difference now is we have more money, more lawyers, more business development help, the ability to fast track/escalate for speed, and lawyers are being a bit less conservative on approvals.”

Another key discussion highlighted in the court documents involves Libgen, a website known for offering free access to copyrighted books. Meta employees considered using Libgen as a training data source despite its reputation for copyright infringement. Libgen has faced multiple lawsuits, shutdown orders, and hefty fines.

In an internal email to Meta AI Vice President Joelle Pineau, Sony Theakanath, a director of product management at Meta, described Libgen as “essential to meet SOTA numbers across all categories,” referring to maintaining state-of-the-art (SOTA) AI performance. Theakanath also suggested ways to reduce legal risks, such as filtering out content “clearly marked as pirated/stolen” and not publicly disclosing the use of Libgen data. “We would not disclose the use of Libgen datasets used to train,” he wrote.

These internal discussions prove how Meta approached sourcing training data for its AI models. The lawsuit is ongoing, and the outcome could have significant implications for how AI companies use copyrighted materials in the future.

Hot this week

Samsung could launch two Galaxy Z Fold8 models in 2026

Samsung may release two Galaxy Z Fold8 models in 2026, including one with a square-like screen, alongside the Galaxy Z Flip8.

Garmin launches fēnix 8 MicroLED smartwatch with record-breaking brightness

Garmin unveils the fēnix 8 MicroLED, the world’s brightest smartwatch with advanced health, navigation, and performance features.

Cloudera named leader in IDC APAC MarketScape for unified AI platforms

Cloudera has been named a Leader in the IDC APAC MarketScape 2025 for unified AI platforms, recognised for governance, security, and innovation.

Delta Electronics showcases future-ready building automation at MARVEX 2025

Delta Electronics showcases future-ready building automation solutions at MARVEX 2025, focusing on sustainability, efficiency, and smarter operations.

Maxicare adopts Agentforce to streamline dental authorisations

Maxicare adopts Salesforce’s Agentforce to automate dental authorisations, improving clinic efficiency and member healthcare services.

Biwin unveils Mini SSD, a tiny storage device that could replace microSD cards

Biwin launches Mini SSD, a tiny yet powerful storage device that could replace microSD cards if industry standards are adopted.

Apple brings major upgrades to Powerbeats Pro 2 with iOS 26

Apple adds heart rate, fitness, and smart usability upgrades to Powerbeats Pro 2 with iOS 26, launching on 15 September.

UltraGreen.ai secures US$188 million anchor investment at US$1.3 billion valuation

UltraGreen.ai secures US$188 million anchor investment led by 65EP, Vitruvian, and August, valuing the firm at US$1.3 billion.

ConnectingDNA launches AI-powered DNA wellness marketplace in Singapore

ConnectingDNA launches the world’s first AI-powered DNA wellness marketplace in Singapore, offering personalised health insights and secure data protection.

Related Articles

Popular Categories