Skip to main content

NVIDIA launches Dynamo 1.0 open source software for AI inference at scale

NVIDIA introduced Dynamo 1.0, an open source software platform intended to support inference operations at scale. This platform targets complex Artificial Intelligence (AI) workloads by coordinating graphical processing unit (GPU) and memory resources across clusters.

This initiative carries operational importance as it addresses the growing complexity of scaling inference workloads within data centers. Dynamo 1.0 acts as a distributed Operating System (OS) that manages resource orchestration amid varied and unpredictable AI requests.

The technology enables splitting inference processing across GPUs with enhanced traffic control and data movement between GPUs and cost-effective storage. It optimizes agentic AI and long prompt tasks by routing requests to GPUs with relevant short-term memory and offloading unused memory to improve resource efficiency.

Integration of Dynamo and NVIDIA TensorRT-LLM optimizations occurs natively within various open source frameworks such as LangChain, llm-d, LMCache, SGLang, and vLLM. Core components like KVBM for memory management and NVIDIA NIXL for data transfer support functionalities. NVIDIA also provides CUDA kernels for open source projects.

NVIDIA's inference platform is utilized by cloud service providers including Amazon Web Services, Microsoft Azure, Google Cloud, and Oracle Cloud Infrastructure. Cloud partners such as Alibaba Cloud and CoreWeave, AI-focused companies like Cursor and Perplexity, inference endpoint providers, and global enterprises such as ByteDance, PayPal, and Pinterest have adopted the platform.

Chen Goldberg of CoreWeave said, “As AI moves from experimental pilots to continuous, large-scale production, the underlying infrastructure must be as dynamic as the models it supports. Supporting NVIDIA Dynamo allows us to offer a more seamless, resilient environment for deploying complex AI agents.” Danila Shtan of Nebius said, “Delivering reliable AI inference at scale isn’t just about powerful GPUs, it’s about the software that turns that performance into real customer outcomes.” Matt Madrigal of Pinterest said, “With NVIDIA Dynamo optimizing our deployment, we’re expanding the seamless and personalized experiences we deliver, powered by high-performance AI infrastructure.” Vipul Ved Prakash of Together AI said, “NVIDIA Dynamo 1.0, combined with cutting-edge inference research from Together AI, helps us deliver a high-performance stack to offer accelerated, cost-effective inference for large-scale production workloads.”

Dynamo 1.0 is available worldwide as a production-ready solution for developers to integrate into their AI inference workflows.