Wednesday, 15 October 2025
27.4 C
Singapore
27.2 C
Thailand
25.1 C
Indonesia
27.5 C
Philippines

OpenAI and Anthropic conduct cross-company AI safety evaluations

OpenAI and Anthropic evaluated each other’s AI systems, revealing safety gaps and stressing the need for stronger safeguards in the industry.

Two of the world’s leading artificial intelligence companies, OpenAI and Anthropic, have carried out safety reviews of each other’s AI models. The rare collaboration comes at a time when most AI developers are seen as rivals, competing to release faster and more capable tools. Both companies shared detailed reports of their evaluations, highlighting vulnerabilities in their systems and outlining areas for improvement.

Anthropic’s findings on OpenAI models

Anthropic said it assessed OpenAI’s models for sycophancy, whistleblowing, self-preservation, and potential to enable human misuse. The evaluation also examined whether the models could undermine AI safety tests and oversight.

The company’s review of OpenAI’s o3 and o4-mini models found results similar to those from its own systems. However, it raised concerns about misuse risks associated with the GPT-4o and GPT-4.1 general-purpose models. Anthropic also noted that sycophancy, where models overly agree with users, remained an issue in most models except for o3.

Notably, the assessment did not include OpenAI’s most recent release, GPT-5, which features Safe Completions. This function is designed to protect users and the wider public from harmful or dangerous queries. OpenAI has faced heightened scrutiny after a recent wrongful death lawsuit alleged that its ChatGPT product exchanged messages with a teenager about suicide plans for months before the young person’s death.

OpenAI’s evaluation of Claude models

In its own review, OpenAI evaluated Anthropic’s Claude models for their ability to follow instruction hierarchies, resist jailbreak attempts, avoid hallucinations, and identify scheming behaviour. The findings indicated that Claude models performed well in instruction-following tests and displayed a high refusal rate in hallucination scenarios, meaning they were more likely to decline providing potentially inaccurate answers when uncertain.

The assessments provide a rare, technical insight into how top AI companies are actively stress-testing one another’s models. While the detailed reports are highly technical, they offer a glimpse into the complex work involved in AI safety testing.

Industry tensions and safety concerns

This collaboration is particularly noteworthy given recent tensions between the two firms. Earlier this month, Anthropic barred OpenAI from accessing its tools after alleging that programmers from OpenAI had used the Claude model in the development of GPT systems, breaching Anthropic’s terms of service.

Despite this dispute, both companies have demonstrated a willingness to prioritise safety over rivalry. The findings underscore the growing need for robust AI safety measures, as governments, legal experts, and technology critics intensify their efforts to implement stronger safeguards, particularly to protect vulnerable users, including minors.

Hot this week

Salesforce launches Agentforce 360 to power the era of the agentic enterprise

Salesforce launches Agentforce 360, an AI platform designed to boost human potential and transform how businesses work in the age of AI.

Global mobile app demand remains resilient as APAC leads growth surge

Adjust’s 2025 Mobile App Growth Report shows global app demand rising, led by APAC’s strong growth in gaming and entertainment.

Square Enix unveils new Dissidia Final Fantasy after eight years, but fans are disappointed by mobile exclusivity

Square Enix announces a new Dissidia Final Fantasy for mobile, but fans express disappointment after eight years without a mainline release.

Singaporean workers shoulder thousands in job-related expenses amid reimbursement delays

Many Singaporean workers spend thousands of dollars out of pocket on work expenses each year due to delayed reimbursements.

IO Interactive unveils gameplay and release date for 007 First Light

IO Interactive reveals new gameplay footage and confirms a 27 March 2026 release date for 007 First Light across major platforms.

Salesforce launches Agentforce 360 to power the era of the agentic enterprise

Salesforce launches Agentforce 360, an AI platform designed to boost human potential and transform how businesses work in the age of AI.

Singlife partners with Salesforce to launch AI agent for customer service

Singlife partners with Salesforce to launch an AI agent that enhances customer service response times and efficiency.

ASUS launches Ascent GX10 personal AI supercomputer

ASUS launches the Ascent GX10 personal AI supercomputer, delivering petaflop-scale performance in a compact desktop form.

Global mobile app demand remains resilient as APAC leads growth surge

Adjust’s 2025 Mobile App Growth Report shows global app demand rising, led by APAC’s strong growth in gaming and entertainment.

Related Articles