Monday, 16 June 2025
29.3 C
Singapore
28.2 C
Thailand
20.1 C
Indonesia
28.7 C
Philippines

Anthropic aims to uncover how AI models think by 2027

Anthropic CEO Dario Amodei aims to understand how AI models work by 2027 and urges industry-wide action for safety and transparency.

Anthropic’s CEO, Dario Amodei, has shared a clear message: we must better understand how artificial intelligence (AI) models work. Amodei sets a bold target in a new essay published on June 20, titled The Urgency of Interpretability. By 2027, Anthropic hopes to detect most problems within advanced AI systems reliably. While the task is complex, Amodei believes AI must be safe and responsible in society.

Why understanding AI is so important

When you interact with a powerful AI tool, such as a chatbot or summarising assistant, you might assume the developers know exactly how it works. But according to Amodei, that’s not the case. Even the companies creating the most advanced models don’t always understand why they make certain decisions or sometimes make mistakes.

For example, OpenAI recently released two new models called o3 and o4-mini. While they perform better on some tasks, they also tend to “hallucinate” more — in other words, produce false or confusing information. The problem? No one knows precisely why this happens.

Amodei warns that we could face serious risks if we build more powerful AI systems without improving our understanding. He compares the future of AI to “a country of geniuses in a data centre” — brilliant but mysterious and potentially unpredictable.

Chris Olah, Anthropic’s co-founder, adds that today’s AI systems are more grown than built. That means improvements often come from trial and error, not from clear plans or designs. As a result, researchers may create intelligent systems without fully grasping how they function.

What Anthropic is doing about it

Anthropic is a leader in mechanistic interpretability, which tries to open AI’s “black box.” The company wants to figure out exactly how AI systems make decisions and understand what drives their behaviour.

One promising area of research involves studying “circuits” within AI models. These are patterns that show how models process information. For instance, Anthropic has found a specific circuit that helps AI determine which US cities belong to which states. It’s just one example — researchers estimate millions of such circuits could be in a single model.

In the long run, Amodei says his team hopes to develop something like an “MRI scan” for AI systems. These deep checks would help spot problems such as lying, manipulation, or unexpected behaviour. He believes these scans will be essential for safely testing and launching future AI tools. While this could take 5 to 10 years, the company is already progressing early.

Recently, Anthropic also made its first outside investment in a startup working on AI interpretability, showing its commitment to this mission.

A call for shared responsibility

In his essay, Amodei doesn’t just speak to his team. He encourages others in the AI field — especially at OpenAI and Google DeepMind — to invest more in research that explains how AI works. He also suggests governments should get involved but in a careful way. For instance, light regulations can be set that require companies to share their safety practices.

He goes further, saying the US government should control the export of advanced computer chips to China. He worries that without such limits, we might end up in a global AI race where no one is paying enough attention to safety.

Unlike some major tech firms, Anthropic supported California’s AI safety bill, SB 1047, which would have set standards for reporting safety risks in advanced models. While the bill faced pushback, Anthropic offered helpful suggestions, showing its willingness to lead on responsibility.

In the end, Amodei’s message is simple but serious. As AI becomes central to business, defence, and everyday life, we must learn how these systems work. Without that knowledge, we’re building tools that could one day act in ways we don’t understand — a risk we can’t afford to take.

Hot this week

Tesla accuses ex-engineer of stealing robot hand tech to launch rival firm

Tesla sued an ex-engineer for stealing robotic tech secrets to launch a rival startup, Proception, sparking a major legal fight in robotics innovation.

Gamevil: From RPG trailblazer to blockchain pivot in mobile gaming’s shifting landscape

Gamevil’s evolution into Com2uS Holdings shows how mobile gaming giants adapt through acquisitions, platform shifts, and blockchain innovation.

CMF Phone 2 Pro review: Playful power meets practical design

CMF Phone 2 Pro blends standout design, smooth performance and creative features into a lightweight phone that’s fun and practical to use.

Commvault strengthens data protection with post-quantum cryptography capabilities

Commvault expands post-quantum cryptography support with HQC to protect long-term data from future quantum computing threats.

Hong Kong to build new AI supercomputing centre in bid to lead global tech race

Hong Kong plans a new AI supercomputing centre to boost its tech hub status and support growing start-ups across the Greater Bay Area.

Informatica deepens partnership with Databricks to support new Iceberg and OLTP services

Informatica joins Databricks as launch partner for new Iceberg and OLTP solutions, introducing AI tools to speed up GenAI development.

Hong Kong opens skies to larger drones in bid to grow low-altitude economy

Hong Kong will allow the testing of larger drones to boost its low-altitude economy and improve logistics, following mainland China's lead.

Hong Kong to build new AI supercomputing centre in bid to lead global tech race

Hong Kong plans a new AI supercomputing centre to boost its tech hub status and support growing start-ups across the Greater Bay Area.

Steam adds full native support for Apple Silicon Macs

Steam runs natively on Apple Silicon Macs, ditching Rosetta 2 for smoother performance and better gaming on M1 and M2 devices.

Related Articles

Popular Categories