Saturday, 13 December 2025
25.8 C
Singapore
22.2 C
Thailand
20.7 C
Indonesia
26.9 C
Philippines

NVIDIA Blackwell Ultra sets new benchmark in MLPerf inference tests

NVIDIA’s Blackwell Ultra architecture sets new records in MLPerf Inference v5.1, boosting AI performance and reducing costs for enterprises.

NVIDIA’s Blackwell Ultra architecture has set new standards in AI performance, achieving record results in the latest MLPerf Inference v5.1 benchmarks. The NVIDIA GB300 NVL72 rack-scale system, powered by Blackwell Ultra, delivered the highest throughput on the new reasoning inference benchmark, outperforming previous-generation systems by up to 45% in DeepSeek-R1 inference throughput compared with GB200 NVL72 platforms.

Inference performance is vital for the economics of AI infrastructure. Higher throughput enables more tokens to be processed at speed, boosting revenue, lowering total cost of ownership (TCO) and increasing overall productivity. NVIDIA’s latest achievement underscores its ongoing efforts to push the limits of AI factory performance.

Enhanced architecture and full-stack optimisation

The Blackwell Ultra architecture builds on the foundation of its predecessor with significant improvements. Each GPU now offers 1.5 times more NVFP4 AI compute and double the attention-layer acceleration compared with Blackwell. It also features up to 288GB of HBM3e memory per GPU, providing greater capacity for large-scale AI workloads.

A key factor in these results is NVIDIA’s full-stack co-design approach, integrating hardware and software innovations. Blackwell and Blackwell Ultra incorporate hardware acceleration for NVFP4, NVIDIA’s custom 4-bit floating point format. This delivers better accuracy than other FP4 formats and comparable results to higher-precision options. NVIDIA TensorRT Model Optimizer quantised models such as DeepSeek-R1, Llama 3.1 405B, and Llama 2 70B to NVFP4, and together with the open-source TensorRT-LLM library, enabled higher performance without sacrificing accuracy.

The company also highlighted disaggregated serving, a technique that separates context processing and token generation for large language models. This method was crucial to record-breaking results on the Llama 3.1 405B Interactive benchmark, nearly doubling performance per GPU compared with traditional serving approaches.

Broad industry adoption and availability

NVIDIA’s ecosystem of partners also contributed to the strong benchmark results. Submissions came from leading cloud providers and server makers including Azure, Broadcom, Cisco, CoreWeave, Dell Technologies, HPE, Lenovo, Oracle, Supermicro, and the University of Florida. These results demonstrate that the market-leading performance of NVIDIA’s AI platform is accessible across a wide range of systems and services.

The company made its first benchmark submissions using the NVIDIA Dynamo inference framework, further strengthening its position in AI optimisation. Organisations deploying AI applications can now leverage these advances through major cloud platforms and server vendors, benefiting from reduced TCO and higher returns on investment.

NVIDIA’s record-breaking results in MLPerf Inference v5.1 reaffirm its leadership in AI computing. By delivering stronger performance and efficiency, Blackwell Ultra provides a compelling platform for enterprises building and scaling next-generation AI applications.

Hot this week

New research finds growing public demand for modern emergency call systems in Australia and New Zealand

New study shows strong public support for modern, data-driven and AI-enabled emergency call systems in Australia and New Zealand.

Pudu Robotics unveils new robot dog as it expands global presence

Pudu Robotics unveils its new D5 robot dog in Tokyo as part of its global push into service and industrial robotics.

2026 Predictions Part 1: The five forces reshaping Asia’s digital economy

Five forces are redefining Asia’s digital economy in 2026, from AI adoption and data sovereignty to new security and workforce demands.

AMD introduces EPYC Embedded 2005 series for compact, power-efficient AI systems

AMD launches the EPYC Embedded 2005 Series, offering compact, power-efficient processors for constrained networking, storage and industrial systems.

PGL brings Counter-Strike 2 Major to Singapore in November 2026

PGL confirms the Counter-Strike 2 Major is coming to Singapore in November 2026, marking the first CS2 Major in Southeast Asia.

PlayStation introduces limited edition Genshin Impact DualSense controller

PlayStation announces a limited edition Genshin Impact DualSense controller for PS5, launching in Singapore on 21 January 2026.

PGL brings Counter-Strike 2 Major to Singapore in November 2026

PGL confirms the Counter-Strike 2 Major is coming to Singapore in November 2026, marking the first CS2 Major in Southeast Asia.

Denodo: Rethinking data architecture for AI agility and measurable ROI in Asia-Pacific

Denodo highlights how modern, composable data architectures powered by logical data management are helping Asia-Pacific enterprises accelerate AI adoption, ensure governance, and achieve measurable ROI.

Veeam completes acquisition of Securiti AI to build unified trusted data platform

Veeam completes its US$1.725 billion acquisition of Securiti AI to form a unified trusted data platform for secure and scalable AI adoption.

Related Articles

Popular Categories