Skip to main content

Aviz ONES 2.0: Enhancements for SONiC-based AI Fabrics

Aviz ONES 2.0 introduces enhancements designed for SONiC-based Artificial Intelligence (AI) fabric networks, focusing on improved data transfer capabilities and real-time monitoring. This update is pertinent for IT leaders considering advanced networking technologies to support AI workloads.

Technology Overview

ONES 2.0 utilizes Remote Direct Memory Access (DMA) (RDMA) over Converged Ethernet (RoCE) to facilitate high-throughput, low-latency communications essential for real-time AI Operations (AIOps). The technology boosts training speeds, allowing for efficient data movement while reducing latency.

Traffic Monitoring Features

This version enhances traffic monitoring by providing detailed metrics including Power Factor Correction (PFC) counters, Rx/Tx watermarks, and Quality of Service (QoS) drop counters. These tools offer administrators insights into traffic management and help to mitigate network congestion proactively.

Proactive Congestion Management

With proactive congestion management, ONES 2.0 can identify potential network bottlenecks early, which is particularly beneficial for AI workloads that depend on large data sets for model training and inference tasks.

Multi-Vendor Support

The ONES platform supports SONiC fabrics across various vendors, standardizing telemetry metrics and RDMA over Converged Ethernet (RoCE) data collection. It integrates with orchestration tools and third-party APIs, allowing for centralized monitoring and configuration in heterogeneous AI environments.

User Interface and Visualizations

ONES 2.0 features a user-friendly interface for visualizing RoCE traffic. It enables configuration of QoS, traffic flow maps, and provides insights into queue behaviors, aiding network operators in performance optimization.

Integration with Existing Systems

Through Representational State Transfer (REST) APIs and Prometheus exporters, ONES connects with third-party systems, allowing teams to correlate telemetry with application metrics, thereby enhancing observability across the network.

Benefits for Network Teams

Network operations, development operations, and AI infrastructure teams can leverage ONES’s unified telemetry and real-time dashboards to improve issue resolution and ensure alignment between network operations and AI performance objectives.

This update highlights essential developments in networking technologies suitable for AI applications, reflecting a fact-based summary of the original blog post.