Vitrage - Decision Insights

Vitrage is an OpenStack service for Root Cause Analysis (RCA) and deduced alarms that builds and analyzes a unified view of cloud events, alarms, and topology to identify problems and their likely causes (cloud operations / observability).

Aggregates alarms, events, and resource data from multiple sources into a unified Entity Relationship (ER) graph (observability / telemetry correlation).
Performs RCA on OpenStack cloud incidents using correlation rules and topology context (incident analysis / RCA).
Generates deduced alarms and identifies potential risks based on inferred relationships and conditions (event enrichment / risk detection).
Provides templates and configuration for defining correlation logic, RCA rules, and business-driven scenarios (policy-based analytics / rule engine).
Integrates with other OpenStack services and external monitoring systems through drivers and data sources (cloud management / ecosystem integration).

More About Vitrage

Vitrage is an OpenStack project focused on RCA and deduced alarms for cloud environments. It addresses the problem of fragmented telemetry and alarm data in OpenStack deployments by building a consolidated view of resources, their relationships, and operational events. Through this model, operators can analyze incidents in context and distinguish between root causes and downstream symptoms across complex stacks of services and infrastructure.

The core capability of Vitrage is its ER graph (graph analytics / observability), which represents resources, alarms, events, and their dependencies. Vitrage consumes information from multiple data sources, such as OpenStack services and external monitoring systems, and normalizes them into a single topology-aware model. It then applies correlation rules and templates to this model to detect patterns, propagate alarm states, and classify which alerts are root causes and which are derived effects.

Vitrage uses a template-driven rule engine (policy-based automation) for defining RCA logic, deduced alarms, and operational scenarios. Templates describe conditions over the graph, such as specific combinations of alarms, resource types, and relationships, and define actions or conclusions that should follow. This allows operators to encode domain knowledge, capture recurring fault patterns, and maintain consistency in incident handling. Templates can be updated as infrastructure, monitoring tools, or operational practices evolve.

In enterprise and institutional environments, Vitrage is deployed as part of an OpenStack control plane (cloud infrastructure management). It runs as an OpenStack service with APIs for querying the graph, alarms, and RCA results. Operations teams use it to filter noisy alarm streams, identify probable root causes more quickly, and assess the impact of infrastructure failures on higher-level services. By correlating alarms from various sources with OpenStack topology, Vitrage supports analysis of cascading failures and cross-service dependencies.

Vitrage integrates with other OpenStack components and external systems through dedicated data sources and drivers (systems integration). Data sources can include resource inventory and topology services, alarm providers, and event streams. Vitrage aligns this incoming data with its internal model to maintain an updated representation of the environment. This interoperability allows organizations to leverage existing monitoring and alerting tools while adding correlation and RCA capabilities on top.

From a directory and taxonomy perspective, Vitrage fits into the categories of observability, incident analysis, and cloud operations analytics. It functions as a correlation and RCA layer for OpenStack clouds, using graph modeling, rules, and templates to interpret alarms and events in context. For enterprises running OpenStack, it provides a structured way to reduce alarm noise, understand dependencies, and improve operational visibility across heterogeneous infrastructure and services.