Sunday, 12 October 2025
31.1 C
Singapore
32.5 C
Thailand
25.9 C
Indonesia
29.1 C
Philippines

Anthropic study reveals malicious data can easily sabotage AI models

Anthropic warns that small amounts of malicious training data can easily sabotage large AI models like Claude.

Anthropic, the artificial intelligence company behind the Claude models that now power Microsoft’s Copilot, has issued a stark warning about the fragility of modern AI systems. In a study conducted with the UK AI Security Institute and The Alan Turing Institute, researchers found that large language models (LLMs) can be easily compromised with only a small amount of malicious data inserted into their training sets.

The study tested AI models ranging from 600 million to 13 billion parameters, demonstrating that even sophisticated systems can be manipulated into producing nonsensical or misleading outputs. The researchers discovered that injecting just 250 malicious files into a model’s training data was sufficient to trigger a “denial-of-service backdoor” attack. When a specific trigger token, such as <SUDO>, appeared in a prompt, the affected model began generating meaningless responses or incorrect information.

According to the researchers, this finding highlights a critical vulnerability in the way AI systems learn from large-scale internet data. By subtly poisoning that data, attackers could cause models to malfunction without needing to alter a significant portion of their overall training material.

Bigger models are not necessarily safer

One of the most surprising revelations from the study is that increasing a model’s size does not necessarily make it safer. The researchers observed that models with 13 billion parameters were just as vulnerable to data poisoning as those with far fewer.

This discovery challenges a long-held belief within the AI community that larger models are more resilient to corruption. In reality, the study found that the effectiveness of such attacks depends on the number of poisoned files introduced, not the total volume of training data.

In practical terms, this means that even high-performance AI systems used by major corporations could be compromised through relatively small-scale manipulations. Anthropic’s findings call into question the assumption that scaling up models automatically enhances their robustness or security.

Implications for AI safety and trust

The implications of this research extend far beyond technical circles. As AI systems like Anthropic’s Claude and OpenAI’s ChatGPT become increasingly integrated into everyday applications—such as email writing, spreadsheet analysis, and presentation generation—the potential for exploitation grows.

If these systems are compromised, users could face a flood of inaccurate information, damaging the credibility of AI technologies. For businesses that rely on AI for sensitive operations such as financial forecasting or data analysis, even minor disruptions could have serious consequences.

Anthropic’s research serves as a reminder that as AI technology advances, so too do the methods of attack. The study underscores the urgent need for more robust defenses, including improved detection of poisoned data and stronger safeguards during the training process. Without these measures, even the most advanced AI systems may remain vulnerable to manipulation.

Hot this week

Coursera partners with OpenAI to make trusted learning content available in ChatGPT

Coursera joins OpenAI’s first generation of ChatGPT apps, making trusted learning content accessible to millions of users worldwide.

Apple discontinues the Clips app after eight years of creative video editing

Apple ends support for its Clips video-editing app, removing it from the App Store after eight years of creative use.

Fireblocks to support Moomoo Singapore in scaling digital asset services

Fireblocks partners with Moomoo Singapore to deliver secure, scalable digital asset services and expand retail access to cryptocurrency.

Semperis launches unified identity recovery and crisis management solution

Semperis launches Ready1 for Identity Crisis Management, combining identity recovery and crisis management to speed cyberattack response and recovery.

NVIDIA Blackwell redefines AI inference performance with record-breaking InferenceMAX results

NVIDIA Blackwell leads the new InferenceMAX benchmarks with unmatched AI performance, 15x ROI, and record-breaking efficiency.

Apple discontinues the Clips app after eight years of creative video editing

Apple ends support for its Clips video-editing app, removing it from the App Store after eight years of creative use.

Little Nightmares 3 disappoints despite striking visuals

Review finds Little Nightmares 3 visually strong but frustratingly dark, with unclear puzzles and weak horror atmosphere.

Microsoft expands Copilot on Windows with Office document creation and Gmail integration

Microsoft updates Copilot on Windows with Office document creation, Gmail integration, and new AI productivity features.

OpenAI seeks to reduce political bias in ChatGPT responses

OpenAI says its latest GPT-5 models are less politically biased after internal stress tests of its responses.

Related Articles