Skip to main content

Rafay Systems and Aviz Networks deliver integrated orchestration for GPU cloud infrastructures

Rafay Systems collaborated with Aviz Networks to provide an integrated solution for managing Graphics Processing Unit (GPU) cloud infrastructures, focusing on multi-tenant Artificial Intelligence (AI) fabrics with network visibility and self-service functionality.

The collaboration addresses operational challenges in deploying GPU cloud environments, aiming to streamline resource utilization and reduce deployment times. The joint solution targets environments facing difficulties with multi-tenancy, orchestration, and network fabric monitoring, facilitating consumption models across diverse GPU platforms.

Technically, Rafay contributes enterprise-grade Kubernetes and GPU lifecycle management capabilities. Aviz integrates AI-optimized fabric orchestration, network visibility, and tenant-aware automation over Spectrum-X switches, GPU NICs, and server networking components. This integration ensures east-west traffic performance and workload isolation, while providing Full Stack Observability (FSO) for real-time insights into compute and network layers.

The scope of the partnership includes enabling on-demand access to Central Processing Unit (CPU) and GPU resources through secure and tenant-aware automation, managing tenant-to-GPU binding in Kubernetes clusters, and coordinating logical network segmentation for isolation and visibility. The combined platform allows GPU cloud environments to be deployed within weeks, replacing manual provisioning workflows with integrated APIs.

Haseeb Budhani, CEO and Co-Founder of Rafay Systems, said, “Cloud providers and enterprises need a simple way to consume GPU infrastructure without reinventing orchestration stacks. Our partnership with Aviz gives customers not just self-service compute, but the tools and visibility they need to run AI workloads at scale.” Vishal Shukla, CEO and Co-Founder of Aviz Networks, said, “Aviz was founded to make AI networking simple, open, and multi-vendor - while giving networking teams the best experience in this hyper-evolving world of AI fabrics and exponential bandwidth needs. Together with Rafay, we deliver a powerful combination: Rafay's compute lifecycle automation with Aviz's fabric-level multi-tenant orchestration and observability. This allows GPU cloud providers to achieve AWS-like efficiency with a simple, intuitive stack.”

Plans outlined by the organizations indicate a continuation of efforts to integrate compute orchestration with AI fabric monitoring and control, enabling efficient deployment models across multi-vendor GPU environments.