Skip to main content

ONES Fabric Manager details SONiC QoS orchestration for AI fabrics

ONES Fabric Manager 3.0 introduces YAML-driven orchestration for SONiC Quality of Service (QoS) to enable lossless, priority-aware traffic for GPU-cluster communications, a capability relevant to IT and security leaders managing Artificial Intelligence (AI) fabrics.

Product Update

ONES Fabric Manager 3.0 automates SONiC QoS provisioning by creating and assigning profiles from a YAML template, including DSCP and DOT1P mappings, Power Factor Correction (PFC) settings, ECN thresholds, and scheduler policies across fabric interfaces.

The Fabric Manager binds the generated profiles to interfaces and supports environments running SONiC network operating systems on multi-vendor hardware, reducing manual configuration steps.

Technical breakdown

Priority Flow Control (PFC) can mark specific interface queues as lossless by sending pause frames to upstream senders when queue congestion occurs, preventing packet drops for marked traffic.

Explicit Congestion Notification (ECN) uses buffer thresholds to generate congestion-notification packets (ECN-CNP) that prompt senders to lower transmission rates, while egress schedulers such as DWRR, WRR, and STRICT determine how queues are serviced under contention.

Key findings

ONES maps DSCP and DOT1P values to traffic classes, traffic classes to queues and priority groups, and then creates named profiles (for example, DSCP_TC_PROFILE, TC_QUEUE_PROFILE) which are attached to the relevant interfaces.

The orchestration supports WRED-style ECN profiles on PFC-enabled queues and can designate a STRICT-scheduled queue for ECN-CNP traffic so congestion notifications can traverse the fabric even when other queues are congested.

Operational impact

Day-2 operations are supported via updated YAML templates or a NetOps Application Programming Interface (API), allowing administrators to modify mappings, PFC settings, ECN thresholds, and scheduler weights as traffic patterns change.

ONES also exposes observability into QoS metrics such as queue utilization, PFC events, and ECN counters to assist administrators in tuning configurations for GPU-centric workloads.

Enterprise IT and security leaders can use these orchestration capabilities to align QoS settings with AI workload requirements. This “Blog Signals brief” is a fact-based summary of the vendor blog.