Alibaba unveils upgraded Qwen3 model, surpasses OpenAI and DeepSeek in maths and coding

Alibaba Group Holding has released a significantly upgraded version of its third-generation Qwen3 large language model (LLM), positioning the Chinese tech giant ahead of competitors OpenAI and DeepSeek in several key benchmarks. The new model, named Qwen3-235B-A22B-Instruct-2507-FP8, was announced on 16 July through AI community platforms Hugging Face and ModelScope, Alibaba’s open-source initiative.

According to Alibaba, the updated model has shown “significant improvements in general capabilities”, particularly in areas such as instruction following, logical reasoning, text comprehension, mathematics, science, coding, and tool usage.

One of the most notable achievements of the new Qwen3 model was its score of 70.3 points on the 2025 American Invitational Mathematics Examination. This result outpaced DeepSeek-V3-0324, released in March, which scored 46.6, and OpenAI’s GPT-4o-0327, which managed just 26.7 points.

In the field of programming, the Qwen3 model earned a score of 87.9 on the MultiPL-E benchmark. This places it slightly ahead of DeepSeek’s 82.2 and OpenAI’s 82.7, though it still trails Anthropic’s Claude Opus 4 Non-thinking model, which achieved a score of 88.5.

Model upgrades and expanded capabilities

The upgraded Qwen3-235B-A22B-Instruct-2507-FP8 builds on a previous version known as Qwen3-235B-A22B-FP8, enhancing its capabilities across a range of applications. However, it functions only in what is referred to as a “non-thinking” mode. In this setting, the AI generates outputs directly without showing its reasoning process, unlike models designed for step-by-step logical thought.

Despite this, the model’s capacity to process information has increased substantially. Its token limit has expanded eightfold to 256,000, allowing it to handle significantly longer texts in a single interaction, which could be helpful in scenarios requiring extended analysis or multi-turn conversations.

Additionally, Alibaba announced that a separate Qwen model with three billion parameters will be embedded into HP’s AI assistant, “Xiaowei Hui”, across its personal computers sold in China. This integration is expected to boost the assistant’s performance in tasks such as document drafting and meeting summarisation.

Global recognition and ongoing competition

The Qwen3 family, introduced in late April, comprises models with parameter sizes ranging from 600 million to 235 billion. Its largest model, the Qwen3-235B-A22B-No-Thinking, is currently recognised as the third-best open-source AI model globally. It is ranked behind Kimi K2, developed by Chinese start-up Moonshot AI, and DeepSeek’s DeepSeek R1-0528, which is a fine-tuned reasoning-focused model.

Recent rankings from Hugging Face also reflect the growing prominence of Qwen models in China’s AI landscape. According to its June assessment, three out of the top ten Chinese LLMs were part of the Qwen series, underlining Alibaba’s competitive edge in the country’s burgeoning AI sector.

Jensen Huang, CEO of Nvidia, highlighted China’s strong performance in open-source AI during a visit last week. Speaking amid renewed business activity between the US and China following a June breakthrough in trade discussions, Huang said that Alibaba’s Qwen, along with models from DeepSeek and Moonshot, represented “the best open reasoning models in the world today”, describing them as “very advanced”.

Hot topics

Going elsewhere?

Cybersecurity

Marketing

Southeast Asia

Geek

Hot topics

Going elsewhere?

Cybersecurity

Marketing

Southeast Asia

Geek

Alibaba unveils upgraded Qwen3 model, surpasses OpenAI and DeepSeek in maths and coding

Model upgrades and expanded capabilities

Global recognition and ongoing competition

Topics

Related Articles

Categories

Other Headlines

Follow Us