Friday, 7 November 2025
31.8 C
Singapore
31 C
Thailand
27.7 C
Indonesia
28.9 C
Philippines

Anthropic study reveals malicious data can easily sabotage AI models

Anthropic warns that small amounts of malicious training data can easily sabotage large AI models like Claude.

Anthropic, the artificial intelligence company behind the Claude models that now power Microsoft’s Copilot, has issued a stark warning about the fragility of modern AI systems. In a study conducted with the UK AI Security Institute and The Alan Turing Institute, researchers found that large language models (LLMs) can be easily compromised with only a small amount of malicious data inserted into their training sets.

The study tested AI models ranging from 600 million to 13 billion parameters, demonstrating that even sophisticated systems can be manipulated into producing nonsensical or misleading outputs. The researchers discovered that injecting just 250 malicious files into a model’s training data was sufficient to trigger a “denial-of-service backdoor” attack. When a specific trigger token, such as <SUDO>, appeared in a prompt, the affected model began generating meaningless responses or incorrect information.

According to the researchers, this finding highlights a critical vulnerability in the way AI systems learn from large-scale internet data. By subtly poisoning that data, attackers could cause models to malfunction without needing to alter a significant portion of their overall training material.

Bigger models are not necessarily safer

One of the most surprising revelations from the study is that increasing a model’s size does not necessarily make it safer. The researchers observed that models with 13 billion parameters were just as vulnerable to data poisoning as those with far fewer.

This discovery challenges a long-held belief within the AI community that larger models are more resilient to corruption. In reality, the study found that the effectiveness of such attacks depends on the number of poisoned files introduced, not the total volume of training data.

In practical terms, this means that even high-performance AI systems used by major corporations could be compromised through relatively small-scale manipulations. Anthropic’s findings call into question the assumption that scaling up models automatically enhances their robustness or security.

Implications for AI safety and trust

The implications of this research extend far beyond technical circles. As AI systems like Anthropic’s Claude and OpenAI’s ChatGPT become increasingly integrated into everyday applications—such as email writing, spreadsheet analysis, and presentation generation—the potential for exploitation grows.

If these systems are compromised, users could face a flood of inaccurate information, damaging the credibility of AI technologies. For businesses that rely on AI for sensitive operations such as financial forecasting or data analysis, even minor disruptions could have serious consequences.

Anthropic’s research serves as a reminder that as AI technology advances, so too do the methods of attack. The study underscores the urgent need for more robust defenses, including improved detection of poisoned data and stronger safeguards during the training process. Without these measures, even the most advanced AI systems may remain vulnerable to manipulation.

Hot this week

WhatsApp reportedly testing companion app for Apple Watch

WhatsApp is testing a companion app for Apple Watch, allowing users to view and reply to messages directly from their wrist.

Double-day sales drive growth across Southeast Asia in Q4 2024

Criteo reports strong Q4 growth in Southeast Asia as double-day sales like 11.11 drive new buyers, higher spending, and regional retail gains.

Innovation drives legacy industries at TechInnovation 2025

Industry leaders at TechInnovation 2025 shared how innovation and collaboration are helping legacy businesses modernise for the future.

Future-proofing resilience for business continuity

Multi-cloud and event-driven architecture are redefining resilience by helping enterprises maintain seamless operations through global outages.

Porsche brings Formula E innovation to the new Cayenne Electric

Porsche brings Formula E racing technology to the new Cayenne Electric, combining high efficiency, fast charging, and advanced cooling.

Devialet: How Phantom Ultimate reflects the future of compact high-end sound

Devialet’s Phantom Ultimate shows how innovation, software, sustainability, and design are shaping the next era of compact high-end audio.

Ambitionz introduces Cipher, an AI platform built to think like a game developer

Ambitionz launches Cipher, an AI designed to think like a game developer, with early access for Roblox creators worldwide.

Corning and Nokia partner to bring fibre to the edge for enterprise networks

Corning and Nokia partner to deliver fibre-to-the-edge and optical LAN solutions, offering scalable, high-speed, and sustainable enterprise networks.

AI adoption grows 20% in Singapore as 170,000 businesses embrace the technology

AI adoption in Singapore rises 20% in 2025, with 170,000 businesses now using AI across finance, tech, and healthcare sectors.

Related Articles