Skip to main content

Aviz Networks details ONES 2.0 enhancements for SONiC-based AI fabric networking

Aviz Networks has updated its ONES platform to version 2.0, focusing on enhancing monitoring and management capabilities for SONiC-based Artificial Intelligence (AI) fabric networks using Remote Direct Memory Access (DMA) (RDMA) over Converged Ethernet (RoCE). This development addresses the growing need for improved visibility and control in AI workload environments where low latency and reliable data flow are critical.

Technical Enhancements in ONES 2.0

ONES 2.0 introduces advanced metrics collection specifically for RDMA over Converged Ethernet (RoCE) traffic, including monitoring Power Factor Correction (PFC) counters, receive/transmit watermarks, and Quality of Service (QoS) drop counters. These metrics enable network administrators to observe traffic prioritization, identify congestion points, and evaluate queue usage in real time.

The platform supports proactive congestion management, which facilitates the early detection and remediation of potential network bottlenecks. This function is crucial for maintaining consistent performance during AI model training and inference, where large data volumes and latency sensitivity are common.

Multi-Vendor and Integration Support

ONES 2.0 normalizes telemetry data from various hardware vendors, allowing for uniform monitoring of RoCE across heterogeneous SONiC fabric environments. It also integrates with external orchestration tools and third-party APIs such as Representational State Transfer (REST) and Prometheus, providing centralized management and extended monitoring capabilities.

User Interface and Operational Insights

The updated user interface offers comprehensive visualization of RoCE traffic flows at the node and interface levels, alongside detailed displays of QoS configurations and pause frame statistics. Visualization features include insights into lossless traffic mapping and queue drops, which support granular troubleshooting and performance analysis.

Real-time tracking of queue behavior ensures that DMA operations over Ethernet maintain their low latency characteristics, even under substantial data loads typical of AI workloads. This aspect supports operational stability during high-demand processing tasks.

Cross-Functional Benefits

ONES 2.0 connects with third-party systems via widely used APIs, enabling teams to correlate RoCE telemetry with broader application and infrastructure metrics. This integration assists NetOps, DevOps, and AI infrastructure groups in achieving faster issue resolution and aligning network performance with AI workload requirements.

The platform's combined telemetry and visualization tools provide a unified view that aids in managing complex AI fabric networks, enhancing reliability and responsiveness in data-intensive environments.

This Blog Signals brief offers a factual summary of Aviz Networks' ONES 2.0 platform, detailing its enhancements for SONiC-based AI fabrics and their significance for technical decision-makers managing AI network infrastructures.