Friday, 7 November 2025
31.8 C
Singapore
31 C
Thailand
27.7 C
Indonesia
28.9 C
Philippines

NVIDIA Blackwell redefines AI inference performance with record-breaking InferenceMAX results

NVIDIA Blackwell leads the new InferenceMAX benchmarks with unmatched AI performance, 15x ROI, and record-breaking efficiency.

NVIDIA’s Blackwell platform has topped the new SemiAnalysis InferenceMAX v1 benchmarks, demonstrating record performance and industry-leading efficiency in real-world AI workloads. The results highlight the platform’s ability to deliver superior return on investment (ROI) and dramatically reduce total cost of ownership (TCO), reinforcing its position as the preferred choice for large-scale AI inference.

InferenceMAX v1 is the first independent benchmark to measure the total cost of compute across a wide range of models and scenarios, reflecting the real-world demands of modern AI. As applications evolve from single-response outputs to complex, multi-step reasoning, the economics of inference are becoming critical. Blackwell’s strong performance underscores how NVIDIA’s full-stack approach meets these demands, combining advanced hardware with continuous software optimisation.

A key result from the benchmarks shows that a US$5 million investment in an NVIDIA GB200 NVL72 system can generate US$75 million in DSR1 token revenue, representing a 15-fold ROI. NVIDIA’s B200 system also achieves a cost of just two cents per million tokens on the open-source gpt-oss model, a fivefold reduction in cost per token within two months.

“Inference is where AI delivers value every day,” said Ian Buck, vice president of hyperscale and high-performance computing at NVIDIA. “These results show that NVIDIA’s full-stack approach gives customers the performance and efficiency they need to deploy AI at scale.”

Full-stack innovation drives performance gains

NVIDIA’s leadership in inference performance is built on deep collaboration with the open-source community and continuous hardware-software co-design. Partnerships with OpenAI, Meta, and DeepSeek AI ensure that major models like gpt-oss 120B, Llama 3 70B, and DeepSeek R1 are optimised for NVIDIA’s infrastructure, enabling organisations to run the latest models more efficiently.

The company’s TensorRT LLM v1.0 library represents a major advance in performance, using parallelisation techniques and NVIDIA NVLink Switch’s 1,800 GB/s bidirectional bandwidth to accelerate inference. New techniques such as speculative decoding in the gpt-oss-120b-Eagle3-v2 model further boost efficiency, tripling throughput to 30,000 tokens per GPU and achieving 100 tokens per second per user.

For large-scale models such as Llama 3.3 70B, which require extensive computational resources, the NVIDIA Blackwell B200 sets new performance records in the InferenceMAX v1 benchmarks. It delivers more than 10,000 tokens per second per GPU and 50 tokens per second per user, quadrupling throughput compared with the previous-generation H200 GPU.

Efficiency reshapes AI economics

As AI deployments scale, efficiency metrics like tokens per watt, cost per million tokens, and tokens per second per user are becoming just as important as raw throughput. The Blackwell architecture delivers 10 times more throughput per megawatt compared with its predecessor, directly translating into increased token revenue and improved operational efficiency.

This efficiency also reduces the cost per million tokens by 15 times, significantly lowering operating costs and enabling wider adoption of AI technologies across industries. The InferenceMAX benchmarks use the Pareto frontier to demonstrate how Blackwell balances cost, power consumption, throughput, and responsiveness to maximise ROI across varied production workloads.

While some systems achieve peak performance in specific scenarios, NVIDIA’s holistic approach ensures sustained efficiency and value in real-world environments. This comprehensive optimisation is essential for enterprises shifting from pilot projects to full-scale AI “factories” — infrastructures designed to transform data into tokens, insights, and decisions in real time.

Blackwell’s performance is underpinned by its advanced features, including the NVFP4 low-precision format for improved efficiency without sacrificing accuracy, fifth-generation NVLink for connecting up to 72 GPUs as a unified processor, and the NVLink Switch, which enables high concurrency through advanced parallelisation algorithms. Combined with continuous software updates, these innovations have more than doubled Blackwell’s performance since its initial launch.

With a vast ecosystem of hundreds of millions of GPUs deployed globally, over seven million CUDA developers, and contributions to more than 1,000 open-source projects, NVIDIA’s platform is designed to scale and evolve alongside the AI industry. Its Think SMART framework further supports enterprises in optimising cost per token, managing latency service-level agreements, and adapting to dynamic workloads.

As benchmarks like InferenceMAX continue to evolve, they will remain critical tools for organisations looking to make informed infrastructure decisions. NVIDIA’s results show that performance, efficiency, and ROI are not competing goals — they can be achieved together through a full-stack approach, setting a new standard for the future of AI inference.

Hot this week

Canon introduces EOS R6 Mark III and RF45mm f/1.2 STM for creators and enthusiasts

Canon unveils the EOS R6 Mark III and RF45mm f/1.2 STM, offering high-end imaging and video performance for creators and enthusiasts.

DJI unveils Osmo Mobile 8 with Apple DockKit integration and pet tracking

DJI’s new Osmo Mobile 8 gimbal features an Apple DockKit, 360-degree rotation, and pet tracking for enhanced creative control.

New Relic launches AI monitoring and MCP server to drive enterprise observability

New Relic launches Agentic AI Monitoring and MCP Server to boost enterprise observability and accelerate AI adoption across workflows.

Apple launches browser-based App Store after 17 years

Apple has finally launched a browser version of the App Store, letting users browse apps online for the first time in 17 years.

Singapore Gulf Bank partners with Fireblocks to enhance digital asset security

Singapore Gulf Bank partners with Fireblocks to enhance secure digital asset management, automation, and stablecoin capabilities.

Devialet: How Phantom Ultimate reflects the future of compact high-end sound

Devialet’s Phantom Ultimate shows how innovation, software, sustainability, and design are shaping the next era of compact high-end audio.

Ambitionz introduces Cipher, an AI platform built to think like a game developer

Ambitionz launches Cipher, an AI designed to think like a game developer, with early access for Roblox creators worldwide.

Corning and Nokia partner to bring fibre to the edge for enterprise networks

Corning and Nokia partner to deliver fibre-to-the-edge and optical LAN solutions, offering scalable, high-speed, and sustainable enterprise networks.

AI adoption grows 20% in Singapore as 170,000 businesses embrace the technology

AI adoption in Singapore rises 20% in 2025, with 170,000 businesses now using AI across finance, tech, and healthcare sectors.

Related Articles

Popular Categories