Skip to main content

MinIO Introduces MemKV for AI Inference Context Memory

MinIO introduced MemKV, a context memory store built for AI inference workloads. The company said the product addresses context loss during inference, a condition that can force GPUs to repeat work already performed.

MinIO described a recompute tax that results when context cannot be retained by the infrastructure closest to the GPU. In MinIO’s description, that loss can increase time and compute needs and affect operating cost at scale, including scenarios where many GPUs operate concurrently.

MemKV was designed for NVIDIA BlueField-4 STX architecture and is described as providing microsecond context retrieval at petabyte scale. MinIO said it supports NVIDIA Dynamo and NVIDIA NIXL, runs on STX infrastructure as a single ARM64-native binary embedded in the storage tier, and uses end-to-end RDMA transport from GPU memory to NVMe while bypassing file-system or object-storage protocols. MinIO also said it operates with 2-16 MB block sizes and is built for NVIDIA Spectrum-X Ethernet and PCIe Gen6.

MinIO said MemKV joined AIStor as the second pillar of its product portfolio, extending its data foundation into an inference memory tier where inference runs. The company stated MemKV provides persistent, shared context across GPU clusters and said benchmarks showed improved time-to-first-token at production concurrency. MinIO also cited a typical enterprise deployment with 128 GPUs and a 128K-token context length, describing an increase in GPU utilization from ~50% to over 90% and $2 million in annual compute savings. “The industry has been papering over context loss for years because at small scale you may be able to absorb the recompute tax and move on. At the GPU density hyperscalers and neoclouds are building toward, that is no longer true. A GPU recomputing context it has already generated is burning power without return, and at a thousand GPUs that is not inefficiency, it is structural drag,” said AB Periasamy, co-founder and CEO, MinIO. “Yield economics at this scale demand something purpose-built for the inference data path. MemKV was designed for exactly this.”

MinIO said MemKV was available today.