Two of the world’s leading artificial intelligence companies, OpenAI and Anthropic, have carried out safety reviews of each other’s AI models. The rare collaboration comes at a time when most AI developers are seen as rivals, competing to release faster and more capable tools. Both companies shared detailed reports of their evaluations, highlighting vulnerabilities in their systems and outlining areas for improvement.
Anthropic’s findings on OpenAI models
Anthropic said it assessed OpenAI’s models for sycophancy, whistleblowing, self-preservation, and potential to enable human misuse. The evaluation also examined whether the models could undermine AI safety tests and oversight.
The company’s review of OpenAI’s o3 and o4-mini models found results similar to those from its own systems. However, it raised concerns about misuse risks associated with the GPT-4o and GPT-4.1 general-purpose models. Anthropic also noted that sycophancy, where models overly agree with users, remained an issue in most models except for o3.
Notably, the assessment did not include OpenAI’s most recent release, GPT-5, which features Safe Completions. This function is designed to protect users and the wider public from harmful or dangerous queries. OpenAI has faced heightened scrutiny after a recent wrongful death lawsuit alleged that its ChatGPT product exchanged messages with a teenager about suicide plans for months before the young person’s death.
OpenAI’s evaluation of Claude models
In its own review, OpenAI evaluated Anthropic’s Claude models for their ability to follow instruction hierarchies, resist jailbreak attempts, avoid hallucinations, and identify scheming behaviour. The findings indicated that Claude models performed well in instruction-following tests and displayed a high refusal rate in hallucination scenarios, meaning they were more likely to decline providing potentially inaccurate answers when uncertain.
The assessments provide a rare, technical insight into how top AI companies are actively stress-testing one another’s models. While the detailed reports are highly technical, they offer a glimpse into the complex work involved in AI safety testing.
Industry tensions and safety concerns
This collaboration is particularly noteworthy given recent tensions between the two firms. Earlier this month, Anthropic barred OpenAI from accessing its tools after alleging that programmers from OpenAI had used the Claude model in the development of GPT systems, breaching Anthropic’s terms of service.
Despite this dispute, both companies have demonstrated a willingness to prioritise safety over rivalry. The findings underscore the growing need for robust AI safety measures, as governments, legal experts, and technology critics intensify their efforts to implement stronger safeguards, particularly to protect vulnerable users, including minors.