NVIDIA BlueField-4 powers new class of AI-native storage infrastructure for next-generation AI
NVIDIA introduces BlueField-4-powered AI-native storage to boost long-context inference performance and efficiency for next-generation AI systems.
NVIDIA has announced a new class of AI-native storage infrastructure designed to support the next phase of large-scale, agentic artificial intelligence systems. Unveiled at CES, the NVIDIA BlueField-4 data processor now powers the NVIDIA Inference Context Memory Storage Platform, extending the company’s push beyond compute and networking into storage architectures built specifically for advanced AI workloads.
Table Of Content
The announcement reflects a broader shift in how AI systems are designed and deployed. As models grow to trillions of parameters and move beyond single-turn interactions, they increasingly rely on long-term and shared context to reason across multiple steps, tasks, and agents. Traditional storage architectures were not designed for these demands, particularly when real-time inference, energy efficiency, and scale are critical.
By integrating BlueField-4 with a purpose-built storage platform for inference context, NVIDIA is positioning storage as a first-class component of the AI stack. The company says this approach is essential for enabling intelligent AI agents that can retain memory over time, collaborate across systems, and operate efficiently at gigascale.
Addressing the storage bottleneck in long-context AI inference
Modern AI models generate vast amounts of context data during inference, typically stored in what is known as a key-value cache. This cache plays a central role in maintaining accuracy, continuity, and responsiveness, especially in multi-turn and multi-agent scenarios. However, storing this data directly on GPUs for extended periods creates significant bottlenecks, limiting throughput and reducing overall system efficiency.
NVIDIA argues that existing storage systems are not suited to the demands of long-context inference. AI-native applications require fast, scalable infrastructure that can store and share context data across nodes without introducing latency or excessive power consumption. The Inference Context Memory Storage Platform is designed to address this gap by extending effective GPU memory capacity and enabling high-speed sharing of context across clusters of rack-scale AI systems.
According to NVIDIA, the platform can boost tokens processed per second by up to five times while delivering up to five times greater power efficiency compared with traditional storage approaches. These gains are achieved by moving context memory management closer to the network and storage layer, rather than relying solely on GPU memory.
Jensen Huang, founder and chief executive officer of NVIDIA, said the shift reflects how AI itself is evolving. “AI is revolutionising the entire computing stack, and now, storage,” Huang said. “AI is no longer about one-shot chatbots but intelligent collaborators that understand the physical world, reason over long horizons, stay grounded in facts, use tools to do real work, and retain both short- and long-term memory. With BlueField-4, NVIDIA and our software and hardware partners are reinventing the storage stack for the next frontier of AI.”
How BlueField-4 enables scalable, efficient context memory
At the centre of the new platform is the NVIDIA BlueField-4 data processor, which forms part of the company’s broader BlueField platform spanning hardware, software, and networking. BlueField-4 is designed to offload and accelerate data movement, security, and storage functions, allowing GPUs to focus on compute-intensive inference tasks.
The Inference Context Memory Storage Platform uses BlueField-4 to increase key-value cache capacity at the cluster level, supporting long-context, multi-turn agentic inference at scale. NVIDIA says this is particularly important for future AI factories built on its Rubin architecture, where efficient sharing of context across many GPUs becomes a limiting factor.
A key feature of the platform is its ability to accelerate the sharing of context data across AI nodes. This is enabled through tight integration with NVIDIA’s software stack, including the DOCA framework, the NIXL library, and NVIDIA Dynamo software. Together, these components are designed to maximise tokens per second, reduce the time to first token, and improve responsiveness in multi-turn interactions.
BlueField-4 also introduces hardware-accelerated placement of key-value cache data. By managing placement directly in hardware, the platform reduces metadata overhead and data movement, while ensuring secure and isolated access from GPU nodes. NVIDIA says this approach improves both performance and security, particularly in large, multi-tenant AI environments.
Networking plays a central role in the design. The platform is enabled by NVIDIA Spectrum-X Ethernet, which provides a high-performance network fabric for RDMA-based access to AI-native context memory. This allows efficient data sharing and retrieval across nodes, supporting high-bandwidth collaboration between AI agents and improving overall system throughput.
Ecosystem support and availability timeline
NVIDIA is positioning the Inference Context Memory Storage Platform as a foundation for a new generation of AI storage systems, and it is working closely with storage and infrastructure partners to bring the technology to market. A broad group of storage innovators are already building next-generation AI storage platforms based on BlueField-4.
These include AIC, Cloudian, DDN, Dell Technologies, HPE, Hitachi Vantara, IBM, Nutanix, Pure Storage, Supermicro, VAST Data, and WEKA. NVIDIA says early engagement from these partners highlights growing industry recognition that AI inference requires storage architectures fundamentally different from those used in traditional enterprise or cloud workloads.
The company expects BlueField-4-powered platforms to be available in the second half of 2026. This timeline aligns with NVIDIA’s broader roadmap for next-generation AI infrastructure, as organisations prepare for increasingly complex AI systems that rely on persistent memory, collaboration across agents, and efficient scaling across large clusters.
By extending its focus to inference-native storage, NVIDIA is signalling that the future of AI performance will depend not only on faster GPUs, but on tightly integrated systems spanning compute, networking, and storage. As agentic AI moves from research to production at scale, infrastructure capable of supporting long-term memory and high-speed context sharing is likely to become a critical differentiator.