NVIDIA introduces Inference Context Memory Storage Platform
NVIDIA introduced the Inference Context Memory Storage Platform, a new AI-native storage infrastructure designed to address growing volumes of context data generated by larger Artificial Intelligence (AI) models and long-context processing.
The company said the platform provided extended Graphics Processing Unit (GPU) memory capacity and faster context sharing across nodes, and that persistent context for multi-turn agents improved responsiveness and increased throughput while delivering up to 5x gains in tokens per second and power efficiency compared with traditional storage.
BlueField-4 data processors powered the platform and the design used a key-value (KV) cache to hold context outside GPUs, with RDMA-based access over NVIDIA Spectrum‑X Ethernet as the network fabric. NVIDIA described hardware-accelerated KV cache placement managed by BlueField-4, the NVIDIA DOCA framework, integration with the NVIDIA NIXL library and NVIDIA Dynamo software, and elimination of metadata overhead to reduce data movement and provide secure, isolated access from GPU nodes.
Among the early collaborators named, Artificial Intelligence Cloud (AIC), Cloudian, DDN, Dell Technologies, HPE, Hitachi Vantara, IBM, Nutanix, Pure Storage, Supermicro, VAST Data and WEKA were building next-generation AI storage platforms using BlueField-4 and NVIDIA Rubin cluster-level KV cache capacity was cited for scaling long-context, multi-turn inference.
“AI is revolutionizing the entire computing stack — and now, storage,” said Jensen Huang, founder and CEO of NVIDIA. “AI is no longer about one-shot chatbots but intelligent collaborators that understand the physical world, reason over long horizons, stay grounded in facts, use tools to do real work, and retain both short- and long-term memory. With BlueField-4, NVIDIA and our software and hardware partners are reinventing the storage stack for the next frontier of AI.”
NVIDIA said BlueField-4 would be available in the second half of 2026.