OpenStack Vitrage - Decision Insights

OpenStack Vitrage is an OpenStack service for Root Cause Analysis (RCA) (observability/operations analytics) and deduced alarms across cloud infrastructures.

Topological analysis of OpenStack resources and their relationships (cloud infrastructure modeling).
Correlation of alarms and events from multiple monitors into contextualized views (observability/monitoring).
RCA of faults to distinguish primary failures from derivative problems (operations analytics).
Definition and execution of business or operational scenarios over the resource graph (policy-driven operations).
Integration with other OpenStack services and external monitoring tools through plugins and drivers (platform integration).

More About OpenStack Vitrage

OpenStack Vitrage is a service within the OpenStack ecosystem that focuses on RCA (operations analytics) and deduced alarms for OpenStack-based clouds. It addresses the problem of understanding how alarms, events, and resource states across a complex OpenStack deployment relate to each other, so that operators can identify the underlying cause of incidents rather than treating each alert as an isolated issue.

Vitrage builds and maintains a topology graph of the cloud environment (cloud infrastructure modeling), representing resources such as compute instances, hosts, volumes, networks, and their dependencies. This graph can incorporate data from multiple OpenStack services, as well as external monitoring systems, and serves as the foundation for correlation, reasoning, and policy evaluation. The topology view allows operators and automated workflows to see how faults in one component may propagate to dependent components.

The service consumes alarms and events from various sources (observability/monitoring), such as OpenStack services and third-party monitoring tools. Using predefined templates and rules, Vitrage correlates these alarms with the topology graph to infer deduced alarms and to identify probable root causes. This correlation helps separate primary faults from secondary symptoms, reducing noise and focusing operations teams on remediation steps that address the source of the problem.

Vitrage supports a template-based mechanism for modeling business or operational scenarios (policy-driven operations). Templates describe how specific conditions in the topology and alarm set should be interpreted, including which alarms to generate, how to classify root cause versus dependent issues, and which actions or notifications to trigger. This approach allows enterprises to encode domain knowledge and operational policies directly into the analysis engine.

In enterprise environments, Vitrage operates as an integrated OpenStack service (cloud management), communicating with core components through standard OpenStack APIs and messaging. It can interoperate with services such as telemetry, orchestration, and other monitoring or alarming tools, making it part of broader operations and management workflows. Its plugin-based architecture (platform integration) allows operators to extend topology sources, alarm sources, and actions, aligning the service with existing monitoring stacks and IT service management processes.

From a directory perspective, OpenStack Vitrage fits within observability, fault management, and RCA for OpenStack clouds. It combines infrastructure modeling, alarm correlation, and policy execution to support operators in diagnosing issues and managing complex cloud deployments.