Sunday, 30 November 2025
26.1 C
Singapore
13.9 C
Thailand
19.9 C
Indonesia
27.7 C
Philippines

OpenAI and Anthropic conduct cross-company AI safety evaluations

OpenAI and Anthropic evaluated each other’s AI systems, revealing safety gaps and stressing the need for stronger safeguards in the industry.

Two of the world’s leading artificial intelligence companies, OpenAI and Anthropic, have carried out safety reviews of each other’s AI models. The rare collaboration comes at a time when most AI developers are seen as rivals, competing to release faster and more capable tools. Both companies shared detailed reports of their evaluations, highlighting vulnerabilities in their systems and outlining areas for improvement.

Anthropic’s findings on OpenAI models

Anthropic said it assessed OpenAI’s models for sycophancy, whistleblowing, self-preservation, and potential to enable human misuse. The evaluation also examined whether the models could undermine AI safety tests and oversight.

The company’s review of OpenAI’s o3 and o4-mini models found results similar to those from its own systems. However, it raised concerns about misuse risks associated with the GPT-4o and GPT-4.1 general-purpose models. Anthropic also noted that sycophancy, where models overly agree with users, remained an issue in most models except for o3.

Notably, the assessment did not include OpenAI’s most recent release, GPT-5, which features Safe Completions. This function is designed to protect users and the wider public from harmful or dangerous queries. OpenAI has faced heightened scrutiny after a recent wrongful death lawsuit alleged that its ChatGPT product exchanged messages with a teenager about suicide plans for months before the young person’s death.

OpenAI’s evaluation of Claude models

In its own review, OpenAI evaluated Anthropic’s Claude models for their ability to follow instruction hierarchies, resist jailbreak attempts, avoid hallucinations, and identify scheming behaviour. The findings indicated that Claude models performed well in instruction-following tests and displayed a high refusal rate in hallucination scenarios, meaning they were more likely to decline providing potentially inaccurate answers when uncertain.

The assessments provide a rare, technical insight into how top AI companies are actively stress-testing one another’s models. While the detailed reports are highly technical, they offer a glimpse into the complex work involved in AI safety testing.

Industry tensions and safety concerns

This collaboration is particularly noteworthy given recent tensions between the two firms. Earlier this month, Anthropic barred OpenAI from accessing its tools after alleging that programmers from OpenAI had used the Claude model in the development of GPT systems, breaching Anthropic’s terms of service.

Despite this dispute, both companies have demonstrated a willingness to prioritise safety over rivalry. The findings underscore the growing need for robust AI safety measures, as governments, legal experts, and technology critics intensify their efforts to implement stronger safeguards, particularly to protect vulnerable users, including minors.

Hot this week

Global mobile gaming ads surge in 2025 as AI and interactivity reshape engagement

Mobile gaming ads grew strongly in 2025 as AI-driven optimisation and interactive formats reshaped global user acquisition strategies.

Apple is expected to overtake Samsung as the world’s leading smartphone maker

Apple is projected to overtake Samsung as the world’s top smartphone maker, driven by strong iPhone 17 demand and upcoming device launches.

Google warns staff of rapid scaling demands to keep pace with AI growth

Google tells staff it must double AI capacity every six months as leaders warn of rapid growth, rising demand, and tough years ahead.

Lara Croft becomes gaming’s best-selling heroine amid new Tomb Raider rumours

Lara Croft becomes gaming’s best-selling heroine as new Tomb Raider rumours fuel excitement.

Crunchyroll brings world-first premieres and major anime showcases to AFA Singapore 2025

Crunchyroll brings exclusive premieres, guest panels and a large interactive booth to AFA Singapore 2025.

DeepSeek launches open AI model achieving gold-level scores at the Maths Olympiad

DeepSeek launches Math-V2, the first open AI model to achieve gold-level scores at the International Mathematical Olympiad.

AI browsers vulnerable to covert hacks using simple URL fragments, experts warn

Experts warn AI browsers can be hacked with hidden URL fragments, posing risks invisible to traditional security measures.

Slop Evader filters out AI content to restore pre-ChatGPT internet

Slop Evader filters AI-generated content online, restoring pre-ChatGPT search results for a more human web.

Lara Croft becomes gaming’s best-selling heroine amid new Tomb Raider rumours

Lara Croft becomes gaming’s best-selling heroine as new Tomb Raider rumours fuel excitement.

Related Articles

Popular Categories