Sunday, 3 August 2025
28.8 C
Singapore
30.4 C
Thailand
22.5 C
Indonesia
28.9 C
Philippines

Anthropic aims to uncover how AI models think by 2027

Anthropic CEO Dario Amodei aims to understand how AI models work by 2027 and urges industry-wide action for safety and transparency.

Anthropic’s CEO, Dario Amodei, has shared a clear message: we must better understand how artificial intelligence (AI) models work. Amodei sets a bold target in a new essay published on June 20, titled The Urgency of Interpretability. By 2027, Anthropic hopes to detect most problems within advanced AI systems reliably. While the task is complex, Amodei believes AI must be safe and responsible in society.

Why understanding AI is so important

When you interact with a powerful AI tool, such as a chatbot or summarising assistant, you might assume the developers know exactly how it works. But according to Amodei, that’s not the case. Even the companies creating the most advanced models don’t always understand why they make certain decisions or sometimes make mistakes.

For example, OpenAI recently released two new models called o3 and o4-mini. While they perform better on some tasks, they also tend to “hallucinate” more — in other words, produce false or confusing information. The problem? No one knows precisely why this happens.

Amodei warns that we could face serious risks if we build more powerful AI systems without improving our understanding. He compares the future of AI to “a country of geniuses in a data centre” — brilliant but mysterious and potentially unpredictable.

Chris Olah, Anthropic’s co-founder, adds that today’s AI systems are more grown than built. That means improvements often come from trial and error, not from clear plans or designs. As a result, researchers may create intelligent systems without fully grasping how they function.

What Anthropic is doing about it

Anthropic is a leader in mechanistic interpretability, which tries to open AI’s “black box.” The company wants to figure out exactly how AI systems make decisions and understand what drives their behaviour.

One promising area of research involves studying “circuits” within AI models. These are patterns that show how models process information. For instance, Anthropic has found a specific circuit that helps AI determine which US cities belong to which states. It’s just one example — researchers estimate millions of such circuits could be in a single model.

In the long run, Amodei says his team hopes to develop something like an “MRI scan” for AI systems. These deep checks would help spot problems such as lying, manipulation, or unexpected behaviour. He believes these scans will be essential for safely testing and launching future AI tools. While this could take 5 to 10 years, the company is already progressing early.

Recently, Anthropic also made its first outside investment in a startup working on AI interpretability, showing its commitment to this mission.

A call for shared responsibility

In his essay, Amodei doesn’t just speak to his team. He encourages others in the AI field — especially at OpenAI and Google DeepMind — to invest more in research that explains how AI works. He also suggests governments should get involved but in a careful way. For instance, light regulations can be set that require companies to share their safety practices.

He goes further, saying the US government should control the export of advanced computer chips to China. He worries that without such limits, we might end up in a global AI race where no one is paying enough attention to safety.

Unlike some major tech firms, Anthropic supported California’s AI safety bill, SB 1047, which would have set standards for reporting safety risks in advanced models. While the bill faced pushback, Anthropic offered helpful suggestions, showing its willingness to lead on responsibility.

In the end, Amodei’s message is simple but serious. As AI becomes central to business, defence, and everyday life, we must learn how these systems work. Without that knowledge, we’re building tools that could one day act in ways we don’t understand — a risk we can’t afford to take.

Hot this week

Google reverses decision to deactivate most goo.gl short links

Google cancels plan to deactivate most goo.gl short links, keeping them live except those previously flagged as inactive.

Microsoft’s Bing gains ground as Google’s search share slips

Microsoft’s Bing gains US and global search share, challenging Google’s dominance with AI-powered updates and increased ad revenue.

Creative introduces Stage Pro as a compact soundbar alternative to Stage 360

Creative’s Stage Pro offers a compact, versatile soundbar alternative without Dolby Atmos, now available for S$199.

US fines Cadence US$140 million over illegal tech sales to Chinese military-linked university

Cadence to pay US$140 million and plead guilty to violating US export controls over sales to a Chinese military university.

Nearly half of the smartphones shipped to the US in Q2 2025 are made in India

Nearly half of smartphones shipped to the US in Q2 2025 were made in India, as Chinese production declines and tariffs loom.

Zeekr opens second showroom in Singapore with new Zeekr House

Zeekr opens its second showroom in Singapore, offering test drives and full EV services at the new Zeekr House on Ubi Road.

Microsoft and DISG launch AI accelerator to support 300 businesses in Singapore

Microsoft and DISG have launched the Agentic AI Accelerator to help 300 Singapore firms adopt AI with up to S$700K in support.

Microsoft’s Bing gains ground as Google’s search share slips

Microsoft’s Bing gains US and global search share, challenging Google’s dominance with AI-powered updates and increased ad revenue.

Google reverses decision to deactivate most goo.gl short links

Google cancels plan to deactivate most goo.gl short links, keeping them live except those previously flagged as inactive.

Related Articles

Popular Categories