Skip to main content

Cisco details AI network design with 8000 series switches and Aviz ONES

Enterprise IT leaders face growing challenges as Artificial Intelligence (AI) and Large Language Model (LLM) workloads demand networks that deliver lossless, low-latency performance. A recent collaboration between Cisco and Aviz Networks outlined design principles and technologies for constructing high-performance AI fabrics that address these requirements.

Cisco 8000 series and SONiC for AI networking

Cisco presented its 8000 series switches, integrating Silicon One silicon with the open SONiC Network Operating System (OS). This combination supports flexible, programmable data center fabrics tailored for GPU-intensive AI clusters. Critical networking techniques such as Remote Direct Memory Access (DMA) (RDMA) over Converged Ethernet version 2 (RoCE v2), Priority Flow Control (PFC), and Explicit Congestion Notification (ECN) were emphasized for maintaining lossless traffic and minimizing latency during heavy Graphics Processing Unit (GPU) workloads.

Key congestion management mechanisms

Remote Direct Memory Access (RDMA) enables direct server-to-server memory access without Central Processing Unit (CPU) overhead, optimizing throughput and latency in GPU-centric environments. Power Factor Correction (PFC) ensures no packet loss within priority classes, particularly for RDMA traffic. ECN provides in-band feedback to endpoints when congestion occurs, preventing packet drops and stabilizing communication, which collectively sustain predictable AI training job durations.

Observability and network validation with Aviz ONES

Aviz Networks showcased its Open Networking Enterprise Suite (ONES), which delivers real-time telemetry and analytics across multi-vendor environments. ONES enhances visibility into congestion points, validates Quality of Service (QoS) policies, and automates troubleshooting within leaf-spine topologies. The suite integrates with Cisco hardware to provide end-to-end observability tailored to AI traffic characteristics and network health.

Scalable leaf-spine architectures for AI workloads

The bootcamp detailed validated topologies for building scalable, non-blocking leaf-spine fabrics suitable for expansive AI clusters. These architectures rely on advanced congestion management and open programmability to accommodate growing GPU demands while ensuring efficient load balancing and minimal packet loss.

These presentations and demonstrations provide technical insights relevant to enterprise architects and Security Operations (SecOps) managers tasked with deploying AI-optimized data center networks. The combined approach from Cisco and Aviz emphasizes open standards, integrated telemetry, and robust congestion control designed to meet the specific needs of large-scale AI environments.