NVIDIA introduces Physical AI Data Factory Blueprint to scale AI training data workflows
NVIDIA has introduced the Physical Artificial Intelligence (AI) Data Factory Blueprint, an open reference architecture designed to automate and scale the production, augmentation, and evaluation of training data for physical AI systems, including those used in robotics, vision AI agents, and autonomous vehicles.
The development addresses operational challenges by enabling extensive data processing and curation. It integrates with cloud service providers such as Microsoft Azure and Nebius, facilitating the use of large-scale compute resources for managing training data efficiently. Several developers, including FieldAI, Hexagon Robotics, Linker Vision, and Uber, are applying the blueprint to speed up AI model development.
The blueprint incorporates NVIDIA's Cosmos open world foundation models alongside coding agents to convert limited datasets into comprehensive collections that include rare scenarios and edge cases. Its modular workflows include data curation and annotation via NVIDIA Cosmos Curator, data augmentation and diversification through Cosmos Transfer, and automated data validation by NVIDIA Cosmos Evaluator, which ensures physical accuracy.
The scope of this initiative involves integrating the blueprint with cloud platforms to establish agent-driven workflows for physical AI training and validation. Microsoft Azure's toolkit supports these processes with connections to services such as Azure Internet of Things (IoT) Operations and Microsoft Fabric, while Nebius has embedded the framework within its AI Cloud infrastructure. Early adopters like Milestone Systems and RoboForce leverage Nebius's platform for applications ranging from video analytics to industrial robotics.
Rev Lebaredian, vice president of Omniverse and simulation technologies at NVIDIA, said, “Physical AI is the next frontier of the AI revolution, where success depends on the ability to generate massive amounts of data. Together with cloud leaders, we’re providing a new kind of agentic engine that transforms compute into the high-quality data required to bring the next generation of autonomous systems and robots to life. In this new era, compute is data.” The company highlighted the inclusion of NVIDIA OSMO, an open-source orchestration framework managing workflows across compute environments, and its integration with coding agents like Claude Code and OpenAI Codex to enhance AI-native operations.
The NVIDIA Physical AI Data Factory Blueprint was expected to be available on GitHub in April 2026, providing resources for developers to implement scalable physical AI data workflows across multiple platforms and use cases.