Tuesday, 29 April 2025
29.2 C
Singapore
30.3 C
Thailand
26.5 C
Indonesia
28.9 C
Philippines

DeepSeek claims its ‘reasoning model’ outperforms OpenAI’s o1 on key benchmarks

DeepSeek’s R1 claims to outperform OpenAI’s o1 in reasoning tasks, but regulatory and geopolitical issues shape its limitations and potential impact.

Chinese AI lab DeepSeek has unveiled its reasoning model, DeepSeek-R1, which it says rivals OpenAI’s o1 on several key AI benchmarks. The model, now available on the AI development platform Hugging Face under an MIT license, is open for commercial use without restrictions.

DeepSeek claims that R1 surpasses o1 performance on benchmarks such as AIME, MATH-500, and SWE-bench Verified. AIME evaluates models using other models, MATH-500 tests word problem-solving, and SWE-bench Verified assesses programming tasks.

How R1 works and what sets it apart

R1 is designed as a reasoning model, meaning it checks its work to avoid common pitfalls faced by typical AI systems. While this self-checking process takes slightly longer — often seconds to minutes more — it ensures higher reliability, especially in science, mathematics, and physics.

The model boasts an impressive 671 billion parameters, significantly enhancing its problem-solving capabilities. For comparison, models with more parameters are typically better at understanding and solving complex problems. Alongside the full version of R1, DeepSeek has also released smaller “distilled” versions, ranging from 1.5 billion to 70 billion parameters. The smallest versions are light enough to run on a standard laptop, while the full-scale R1 requires robust hardware.

For developers who need access to the full R1 but lack the necessary infrastructure, DeepSeek offers the model through its API at costs 90%-95% lower than those of OpenAI’s o1, making it an attractive option for many users.

Challenges and geopolitical implications

However, DeepSeek’s Chinese origins bring certain limitations. The model’s outputs must comply with regulations imposed by China’s internet watchdog, ensuring that its responses align with “core socialist values.” This means R1 avoids answering politically sensitive topics, such as Tiananmen Square or Taiwan’s autonomy. Many other Chinese AI models also avoid controversial discussions to remain in compliance with the government.

The launch of R1 coincides with rising tensions between the U.S. and China over AI technology. Recently, the Biden administration proposed stricter export rules, limiting China’s access to advanced AI chips and models. These rules would tighten existing restrictions on the tools needed to develop cutting-edge AI systems if implemented.

In a policy recommendation last week, OpenAI urged the U.S. government to prioritise American AI development to maintain its competitive edge. Chris Lehane, OpenAI’s VP of policy, identified DeepSeek’s parent company, High Flyer Capital Management, as a competitor to watch.

A growing trend in Chinese AI

DeepSeek is not alone in challenging U.S. dominance in AI. Other Chinese labs, such as Alibaba and Moonshot AI’s Kimi, have also developed models they claim rival OpenAI’s o1. DeepSeek, however, was the first to preview its reasoning model, R1, back in November.

Dean Ball, an AI researcher at George Mason University, noted that these developments suggest Chinese AI labs are becoming “fast followers.” He highlighted the accessibility of DeepSeek’s distilled models, which allow powerful reasoning capabilities to operate on local hardware.

With models like R1, Chinese AI firms continue to push boundaries despite regulatory challenges and geopolitical tensions.

Hot this week

Mac-style tools are coming to iOS 19 and iPadOS 19 to boost productivity

Apple is planning Mac-style updates in iOS 19 and iPadOS 19 to boost productivity, with features expected at WWDC 2025.

GumGum reports digital ads up to 90% more carbon efficient than industry average

GumGum cuts digital ad emissions by up to 90% versus industry norms, using global sustainability standards and Cedara’s carbon reporting tools.

Exclusive Networks: Are Singapore businesses ready for AI, cybersecurity and the 2025 digital landscape?

Explore how AI is transforming cybersecurity in Singapore, the impact of Budget 2025, workforce gaps, and risks facing ASEAN businesses.

Meta’s Oversight Board asks for clarity on new hate speech rules

Meta’s Oversight Board is urging more transparency on hate speech policy changes and urging the company to protect vulnerable users.

ChatGPT joins forces with The Washington Post in new content partnership

OpenAI partners with The Washington Post to bring trusted news summaries to ChatGPT, offering better access to reliable information.

India could manufacture all US-bound iPhones by the end of 2026

Apple plans to manufacture all iPhones for the US market in India by the end of 2026 to avoid China tariffs and secure its supply chain.

Razer Launches Pro Click V2 and V2 Vertical Mice: Blending Gaming and Productivity

Razer's new Pro Click V2 and V2 Vertical mice offer gaming precision and ergonomic comfort, with AI prompt access and long battery life, available now!

Nintendo Pop-Up Store and Mario Kart Fun Return to Jewel Changi Airport

Experience the magic of Nintendo at Jewel Changi Airport with the return of the Pop-Up Store and the exciting Mario Kart Jewel Circuit Challenge!

Lian Li’s new Lancool 207 Digital case brings a 6-inch LCD screen to your PC

Lian Li's Lancool 207 Digital PC case brings a bright 6-inch LCD screen to your setup, offering style, function, and full customisation.

Related Articles

Popular Categories