Skip to main content

Aviz Networks outlines ONES 3.0 RoCE observability and QoS features

ONES 3.0 extends RDMA over Converged Ethernet (RoCE) visibility and control with a Power Factor Correction (PFC) watchdog, scheduler and WRED-based queue management to maintain lossless traffic for GPU-centric Artificial Intelligence (AI) workloads across enterprise networks.

Research overview

The vendor blog describes ONES 3.0 as an update focused on observability and control for RDMA-over-Ethernet (RoCE) traffic in AI fabrics, building on prior support for lossless communication and proactive congestion handling.

The release emphasizes real-time telemetry, configurable scheduling, and mechanisms intended to detect and recover from flow-control issues under heavy network load.

Key findings

New capabilities highlighted include a PFC watchdog for flow-control fault detection, scheduler profiles for prioritized packet handling, and WRED profiles to manage queue behavior and prevent buffer overflow.

The UI additions provide DSCP and 802.1p mappings, PFC counters, transmit and queue-drop metrics, and time-range views for both live and historical analysis.

Technical breakdown

The blog explains RoCE enables Remote Direct Memory Access (DMA) (RDMA) over Ethernet, offering low-latency, high-throughput, lossless transfers that are useful for distributed Graphics Processing Unit (GPU) training and inter-node memory access.

ONES 3.0 ties RoCE controls to Quality of Service (QoS) primitives—DSCP mapping, dot1p priority, PFC, WRED and congestion notification packets—to maintain packet delivery and scheduling under contention.

Product update

The interface surfaces comprehensive performance indicators, RoCE configuration visualization, an interactive topology map, and centralized QoS views to reduce reliance on Command-Line Interface (CLI) commands for policy inspection and troubleshooting.

The Rule Engine update adds streamlined rule creation, custom thresholds, and integrations with alerting endpoints to support automated detection and notifications for RoCE and AI-fabric metrics.

Operational impact

According to the blog, the combined telemetry, visualization and alerting features aim to shorten mean time to detection and provide data for capacity planning and scheduler tuning in GPU-heavy environments.

The vendor also describes automation paths such as continuous monitoring of PFC/ECN metrics and automated scheduler adjustments to limit manual intervention during congestion events.

This “Blog Signals brief” is a fact-based summary of the vendor blog and is intended to inform enterprise IT and security decision-makers about ONES 3.0's observability and RoCE management features.