Thursday, 27 November 2025
29.7 C
Singapore
21.7 C
Thailand
24.4 C
Indonesia
27.3 C
Philippines

Anthropic study reveals malicious data can easily sabotage AI models

Anthropic warns that small amounts of malicious training data can easily sabotage large AI models like Claude.

Anthropic, the artificial intelligence company behind the Claude models that now power Microsoft’s Copilot, has issued a stark warning about the fragility of modern AI systems. In a study conducted with the UK AI Security Institute and The Alan Turing Institute, researchers found that large language models (LLMs) can be easily compromised with only a small amount of malicious data inserted into their training sets.

The study tested AI models ranging from 600 million to 13 billion parameters, demonstrating that even sophisticated systems can be manipulated into producing nonsensical or misleading outputs. The researchers discovered that injecting just 250 malicious files into a model’s training data was sufficient to trigger a “denial-of-service backdoor” attack. When a specific trigger token, such as <SUDO>, appeared in a prompt, the affected model began generating meaningless responses or incorrect information.

According to the researchers, this finding highlights a critical vulnerability in the way AI systems learn from large-scale internet data. By subtly poisoning that data, attackers could cause models to malfunction without needing to alter a significant portion of their overall training material.

Bigger models are not necessarily safer

One of the most surprising revelations from the study is that increasing a model’s size does not necessarily make it safer. The researchers observed that models with 13 billion parameters were just as vulnerable to data poisoning as those with far fewer.

This discovery challenges a long-held belief within the AI community that larger models are more resilient to corruption. In reality, the study found that the effectiveness of such attacks depends on the number of poisoned files introduced, not the total volume of training data.

In practical terms, this means that even high-performance AI systems used by major corporations could be compromised through relatively small-scale manipulations. Anthropic’s findings call into question the assumption that scaling up models automatically enhances their robustness or security.

Implications for AI safety and trust

The implications of this research extend far beyond technical circles. As AI systems like Anthropic’s Claude and OpenAI’s ChatGPT become increasingly integrated into everyday applications—such as email writing, spreadsheet analysis, and presentation generation—the potential for exploitation grows.

If these systems are compromised, users could face a flood of inaccurate information, damaging the credibility of AI technologies. For businesses that rely on AI for sensitive operations such as financial forecasting or data analysis, even minor disruptions could have serious consequences.

Anthropic’s research serves as a reminder that as AI technology advances, so too do the methods of attack. The study underscores the urgent need for more robust defenses, including improved detection of poisoned data and stronger safeguards during the training process. Without these measures, even the most advanced AI systems may remain vulnerable to manipulation.

Hot this week

Belkin Zootopia accessories you need before Zootopia 2 arrives

Belkin’s latest Zootopia collection brings fun designs and practical features to power banks, cables, cases and straps for everyday use.

OpenAI introduces a new shopping assistant in ChatGPT

OpenAI launches a new ChatGPT shopping assistant that helps users compare products, find deals, and search for images ahead of Black Friday.

ChatGPT introduces new shopping research tool for personalised product guidance

ChatGPT launches a shopping research tool that creates personalised buyer’s guides through interactive product discovery.

Microsoft adds on-device AI support to the Advanced Paste tool in Windows 11

Microsoft updates Advanced Paste in Windows 11 with on-device AI support, new model options and an improved interface.

Cybercriminals use fake Battlefield 6 downloads and trainers to spread malware

Malware disguised as pirated Battlefield 6 downloads and trainers is targeting players with stealers and C2 agents.

Global mobile gaming ads surge in 2025 as AI and interactivity reshape engagement

Mobile gaming ads grew strongly in 2025 as AI-driven optimisation and interactive formats reshaped global user acquisition strategies.

POCO enters premium smartphone segment with new F8 series

POCO launches the F8 Ultra, F8 Pro, and two new tablets as it enters the premium flagship market with new performance and audio features.

Crunchyroll brings world-first premieres and major anime showcases to AFA Singapore 2025

Crunchyroll brings exclusive premieres, guest panels and a large interactive booth to AFA Singapore 2025.

Kaspersky warns of AI-generated sites distributing remote access software in global campaign

Kaspersky reports a global campaign using AI-generated websites to distribute remote access tools and gain control of victims’ devices.

Related Articles