Sunday, 26 October 2025
31.7 C
Singapore
30.8 C
Thailand
28.3 C
Indonesia
28.5 C
Philippines

Alibaba unveils upgraded Qwen3 model, surpasses OpenAI and DeepSeek in maths and coding

Alibaba’s upgraded Qwen3 model beats OpenAI and DeepSeek in maths and coding, cementing China’s role in global AI development.

Alibaba Group Holding has released a significantly upgraded version of its third-generation Qwen3 large language model (LLM), positioning the Chinese tech giant ahead of competitors OpenAI and DeepSeek in several key benchmarks. The new model, named Qwen3-235B-A22B-Instruct-2507-FP8, was announced on 16 July through AI community platforms Hugging Face and ModelScope, Alibaba’s open-source initiative.

According to Alibaba, the updated model has shown “significant improvements in general capabilities”, particularly in areas such as instruction following, logical reasoning, text comprehension, mathematics, science, coding, and tool usage.

One of the most notable achievements of the new Qwen3 model was its score of 70.3 points on the 2025 American Invitational Mathematics Examination. This result outpaced DeepSeek-V3-0324, released in March, which scored 46.6, and OpenAI’s GPT-4o-0327, which managed just 26.7 points.

In the field of programming, the Qwen3 model earned a score of 87.9 on the MultiPL-E benchmark. This places it slightly ahead of DeepSeek’s 82.2 and OpenAI’s 82.7, though it still trails Anthropic’s Claude Opus 4 Non-thinking model, which achieved a score of 88.5.

Model upgrades and expanded capabilities

The upgraded Qwen3-235B-A22B-Instruct-2507-FP8 builds on a previous version known as Qwen3-235B-A22B-FP8, enhancing its capabilities across a range of applications. However, it functions only in what is referred to as a “non-thinking” mode. In this setting, the AI generates outputs directly without showing its reasoning process, unlike models designed for step-by-step logical thought.

Despite this, the model’s capacity to process information has increased substantially. Its token limit has expanded eightfold to 256,000, allowing it to handle significantly longer texts in a single interaction, which could be helpful in scenarios requiring extended analysis or multi-turn conversations.

Additionally, Alibaba announced that a separate Qwen model with three billion parameters will be embedded into HP’s AI assistant, “Xiaowei Hui”, across its personal computers sold in China. This integration is expected to boost the assistant’s performance in tasks such as document drafting and meeting summarisation.

Global recognition and ongoing competition

The Qwen3 family, introduced in late April, comprises models with parameter sizes ranging from 600 million to 235 billion. Its largest model, the Qwen3-235B-A22B-No-Thinking, is currently recognised as the third-best open-source AI model globally. It is ranked behind Kimi K2, developed by Chinese start-up Moonshot AI, and DeepSeek’s DeepSeek R1-0528, which is a fine-tuned reasoning-focused model.

Recent rankings from Hugging Face also reflect the growing prominence of Qwen models in China’s AI landscape. According to its June assessment, three out of the top ten Chinese LLMs were part of the Qwen series, underlining Alibaba’s competitive edge in the country’s burgeoning AI sector.

Jensen Huang, CEO of Nvidia, highlighted China’s strong performance in open-source AI during a visit last week. Speaking amid renewed business activity between the US and China following a June breakthrough in trade discussions, Huang said that Alibaba’s Qwen, along with models from DeepSeek and Moonshot, represented “the best open reasoning models in the world today”, describing them as “very advanced”.

Hot this week

Nokia and ST Engineering to enhance Bangkok’s metro communications network

Nokia and ST Engineering to deploy a high-capacity IP/MPLS communications network for Bangkok’s new Orange Line, boosting safety and efficiency.

Neato cloud shutdown leaves robot vacuums limited to manual operation

Neato’s cloud services are shutting down, leaving its robot vacuums without app control and limited to manual operation.

Proofpoint recognised as a leader again in Gartner 2025 Magic Quadrant for digital communications governance

Proofpoint named a leader again in Gartner’s 2025 Magic Quadrant for digital communications governance and archiving solutions.

Amazon introduces revamped Luna game streaming service with new multiplayer collection

Amazon revamps Luna with new multiplayer games, smartphone controls, and a refreshed library for Prime members and subscribers.

GM introduces hands-free, eyes-off driving for Escalade IQ in 2028

GM unveils plans for hands-free, eyes-off driving in the Escalade IQ by 2028, alongside AI voice assistants, robotics, and energy innovations.

Samsung One UI 8.5 may introduce a new notification prioritisation tool

Samsung’s upcoming One UI 8.5 update may include a new tool that prioritises important notifications to improve alert management.

Neato cloud shutdown leaves robot vacuums limited to manual operation

Neato’s cloud services are shutting down, leaving its robot vacuums without app control and limited to manual operation.

New Nomad Stratos Band blends titanium durability with everyday comfort

Nomad launches the Stratos Band, a hybrid Apple Watch band combining titanium and FKM rubber for durability and everyday comfort.

Red Hat: Building a secure foundation for hybrid cloud and AI in APAC

Red Hat Enterprise Linux 10 strengthens security and compliance for hybrid cloud and AI in APAC, helping enterprises navigate complex regulations.

Related Articles