Skip to main content

ONES details rule engine capabilities for SONiC network monitoring

ONES 2.0 introduces a Rule Engine designed to improve monitoring and alerting within SONiC network environments by enabling device- and interface-specific rule configurations and integrating with communication platforms such as Slack and Zendesk. This development is relevant for enterprise IT and security professionals managing multi-vendor SONiC fabrics who require detailed, low-noise alerts and streamlined incident response.

Research overview

The ONES Rule Engine integrates with network operations to offer comprehensive observability covering Central Processing Unit (CPU), memory, power supply units, fans, and traffic metrics. It supports precision rule creation targeting hardware SKUs, device roles, and Operating System (OS) versions, allowing for tailored alerting that minimizes irrelevant notifications.

Operator-defined severity levels enable prioritization of critical and warning alerts, while real-time notifications and summaries aid collaboration across teams. The system's architecture supports dynamic inclusion and exclusion of devices from rule scopes, enhancing flexibility in diverse network environments.

Technical breakdown

The Rule Engine monitors system health indicators like CPU usage, memory consumption, and core temperatures, with thresholds displayed in the user interface. It continuously tracks component health, including fans and power supplies, issuing immediate alerts on failures to facilitate rapid response.

Traffic monitoring encompasses link utilization and error rates, with specific focus on Application-Specific Integrated Circuit (ASIC) IPv4/IPv6 table usage to prevent software fallback scenarios. Health parameters of transceivers such as voltage and temperature are also assessed to anticipate potential issues.

Product update

ONES 2.0 extends alerting functionalities through integrations with Slack and Zendesk, enabling real-time alerts in communication channels and automatic ticket creation to centralize issue tracking. Alert summaries provide detailed contexts including metric values, severity, timestamps, and metadata such as device IP, role, region, and hardware specifications.

The Rule Engine incorporates anti-noise guardrails by capping maximum alert counts per metric-device combination, reducing alert fatigue. Its extensible framework allows integration with other platforms beyond Slack and Zendesk as operational needs evolve.

Operational impact

By offering device- and interface-level granularity in monitoring and alerting, ONES 2.0 facilitates proactive issue detection and faster mean time to repair in complex SONiC networks. Rich alert metadata and platform integrations support improved collaboration and streamlined workflows.

The configuration flexibility and noise reduction features support scaling in multi-vendor deployments while preserving service levels and operational efficiency.

Further, ongoing enhancements are planned to expand visibility into Remote Direct Memory Access (DMA) (RDMA) over Converged Ethernet traffic, security compliance, and Service Level Agreement (SLA) measurements, informing broader network management strategies.

Testing of ONES Center is recommended prior to widespread deployment to assess compatibility across preferred vendor platforms.

1) The ONES Rule Engine improves SONiC monitoring by enabling detailed rule-based alerting on key network and device metrics, promoting proactive management.

2) It integrates with Slack and Zendesk to deliver real-time alerts and automate ticket creation, enhancing incident handling.

3) Alert caps prevent redundant notifications in large-scale settings, managing administrator workload.

4) Alerts cover system health, component failures, traffic conditions, ASIC capacity, transceiver parameters, and SONiC service statuses with comprehensive contextual data.

5) The engine normalizes observability across different platforms, supporting tailored policies critical for maintaining Service Level Agreements (SLAs).

6) Customization by hardware SKU, role, and OS version ensures vendor-aware monitoring and relevance.

7) Continuous component health tracking enables timely detection and remediation of fan and power supply issues.

8) Traffic and capacity monitoring help identify network bottlenecks and maintain forwarding stability.

9) Rich alert context combined with Slack and Zendesk integrations supports efficient visualization and end-to-end resolution.

Overall, ONES 2.0’s Rule Engine enhances the monitoring and alerting capabilities within SONiC networks by delivering configurable, context-rich alerts integrated with collaboration and ticketing platforms, facilitating efficient operations and incident response. This summary provides an objective representation of the vendor's blog content for enterprise IT decision-makers.