Skip to main content

ONES 2.0 details rule engine and alert system for SONiC network monitoring

ONES 2.0 introduces an enhanced Rule Engine designed to provide detailed alerting and monitoring capabilities for Software for Open Networking in the Cloud (SONiC) environments, addressing the needs of enterprise IT and security teams to manage multi-vendor network fabrics effectively.

Research overview

The ONES Rule Engine integrates alert and notification mechanisms with the ability to monitor a wide variety of device and interface metrics. This system allows users to define custom rules targeting specific hardware, roles, and Operating System (OS) versions, facilitating tailored alerting that aims to reduce operational noise.

The platform supports fine-grained controls and severity levels to help prioritize issues, contributing to proactive network management across diverse SONiC deployments.

Key findings

The Rule Engine covers monitoring dimensions including Central Processing Unit (CPU) load, memory use, power supply unit (PSU) status, fan operation, and network traffic parameters such as RX/TX rates. Additionally, it tracks Application-Specific Integrated Circuit (ASIC) table usage to avert performance degradation and monitors transceiver metrics relevant to maintaining link stability.

Alerts are categorized by severity, such as critical and warning, and are enriched with context including device IP, role, region, SKU, serial number, OS, and interface specifics like speed and transceiver type.

Technical breakdown

The Rule Engine enables dynamic inclusion or exclusion of devices from particular alerting rules, and administrators can apply thresholds to limit the number of alerts generated per metric per device, reducing redundant notifications. It supports integration with communication platforms like Slack for real-time alert dissemination and weekly summaries.

Integration with ticketing systems such as Zendesk allows automatic creation of tickets based on alerts, aiming to centralize response workflows and track resolution progress effectively.

Operational impact

By providing detailed telemetry and customizable rules, ONES 2.0's Rule Engine supports multi-vendor SONiC operations by consolidating observability and enabling scale. It facilitates faster identification and remediation of network issues, which can contribute to reduced mean time to repair (MTTR).

The alerting system's extensibility allows organizations to connect with additional platforms beyond Slack and Zendesk as operational requirements evolve.

Leadership perspective

ONES 2.0 presents the Rule Engine as a comprehensive approach to monitoring and alerting that standardizes observability across diverse hardware and software configurations. This approach is intended to assist network managers in maintaining service quality and operational efficiency within complex environments.

The system's ability to generate precise, low-noise alerts tailored to device conditions underlines an effort to balance visibility with operational workload.

Integration with widely used collaboration and ticketing tools further supports coordinated incident management efforts.

In summary, ONES 2.0's Rule Engine advances alerting and monitoring functionalities for SONiC networks, offering customizable, context-rich alerts and integrations that support enterprise-scale operations. This Blog Signals brief reflects a neutral, fact-based summary of the capabilities described in the vendor's blog post.