Thursday, 28 August 2025
29.6 C
Singapore
31.8 C
Thailand
18.8 C
Indonesia
28.2 C
Philippines

OpenAI and Anthropic conduct cross-company AI safety evaluations

OpenAI and Anthropic evaluated each other’s AI systems, revealing safety gaps and stressing the need for stronger safeguards in the industry.

Two of the world’s leading artificial intelligence companies, OpenAI and Anthropic, have carried out safety reviews of each other’s AI models. The rare collaboration comes at a time when most AI developers are seen as rivals, competing to release faster and more capable tools. Both companies shared detailed reports of their evaluations, highlighting vulnerabilities in their systems and outlining areas for improvement.

Anthropic’s findings on OpenAI models

Anthropic said it assessed OpenAI’s models for sycophancy, whistleblowing, self-preservation, and potential to enable human misuse. The evaluation also examined whether the models could undermine AI safety tests and oversight.

The company’s review of OpenAI’s o3 and o4-mini models found results similar to those from its own systems. However, it raised concerns about misuse risks associated with the GPT-4o and GPT-4.1 general-purpose models. Anthropic also noted that sycophancy, where models overly agree with users, remained an issue in most models except for o3.

Notably, the assessment did not include OpenAI’s most recent release, GPT-5, which features Safe Completions. This function is designed to protect users and the wider public from harmful or dangerous queries. OpenAI has faced heightened scrutiny after a recent wrongful death lawsuit alleged that its ChatGPT product exchanged messages with a teenager about suicide plans for months before the young person’s death.

OpenAI’s evaluation of Claude models

In its own review, OpenAI evaluated Anthropic’s Claude models for their ability to follow instruction hierarchies, resist jailbreak attempts, avoid hallucinations, and identify scheming behaviour. The findings indicated that Claude models performed well in instruction-following tests and displayed a high refusal rate in hallucination scenarios, meaning they were more likely to decline providing potentially inaccurate answers when uncertain.

The assessments provide a rare, technical insight into how top AI companies are actively stress-testing one another’s models. While the detailed reports are highly technical, they offer a glimpse into the complex work involved in AI safety testing.

Industry tensions and safety concerns

This collaboration is particularly noteworthy given recent tensions between the two firms. Earlier this month, Anthropic barred OpenAI from accessing its tools after alleging that programmers from OpenAI had used the Claude model in the development of GPT systems, breaching Anthropic’s terms of service.

Despite this dispute, both companies have demonstrated a willingness to prioritise safety over rivalry. The findings underscore the growing need for robust AI safety measures, as governments, legal experts, and technology critics intensify their efforts to implement stronger safeguards, particularly to protect vulnerable users, including minors.

Hot this week

ITE and TP-Link sign partnership to boost enterprise tech skills in Singapore

ITE and TP-Link partner to launch new ICT training courses, equipping students with enterprise networking and security skills.

Trend Micro launches scam radar in Singapore to combat rising online fraud

Trend Micro launches Scam Radar in Singapore, offering AI-powered real time scam detection as survey reveals widespread exposure.

Adyen: How fragmented cross-border payments are hindering ASEAN’s growth potential

Fragmented cross-border payments are slowing Southeast Asia’s digital growth. Adyen’s Ben Wong explains how AI, unified platforms, and regional initiatives are reshaping the future of payments in ASEAN.

Oyster malware campaign targets IT professionals with fake software tools

Oyster malware campaign targets IT professionals with fake tools like WinSCP and PuTTY, raising ransomware concerns.

TechLaw.Fest marks 10th edition with focus on digital innovation in law

TechLaw.Fest 2025 marks its 10th edition in Singapore with keynotes, global legal tech discussions, and the launch of the e-Apostille.

Microsoft’s Copilot AI to debut on Samsung TVs and monitors in 2025

Microsoft’s Copilot AI will launch on Samsung’s 2025 TVs and monitors, offering personalised support, recommendations, and voice-activated features.

Samsung to host virtual Unpacked event on 4 September

Samsung will host a virtual Unpacked event on 4 September, just before IFA Berlin, sparking speculation about new foldable devices.

WhatsApp introduces AI-powered Writing Help and Message Summaries in Singapore

WhatsApp launches Writing Help and Message Summaries in Singapore, offering AI-powered assistance with strong data privacy measures.

Bus Aunty review: Bringing bus arrival times into the home

Bus Aunty brings real-time bus arrival times into Singapore homes with an e-ink display, but quirks and pricing limit its appeal.

Related Articles

Popular Categories