Masakari
Masakari is an OpenStack service (infrastructure resiliency) that provides automated recovery of Virtual Machine (VM) workloads when compute hosts fail in an OpenStack cloud.
- Automated instance recovery and failover for OpenStack compute host failures (infrastructure resiliency)
- Detection and handling of Nova compute host failures, process faults, and instances with error states (cloud operations)
- Integration with OpenStack Nova and other controller services through APIs and notifications (cloud orchestration)
- Configurable recovery workflows for instance evacuation, restart, and shutdown actions (infrastructure automation)
- Support for high availability designs in OpenStack-based private and public clouds (cloud infrastructure)
More About Masakari
Masakari is an OpenStack project (infrastructure resiliency) focused on providing automated recovery for VM instances when compute hosts encounter failures in an OpenStack environment. It addresses scenarios where the underlying Nova compute node or associated services become unavailable, and instances on that node must be recovered or evacuated to maintain workload continuity.
The service operates by monitoring the health of compute hosts and related processes (cloud monitoring) and triggering recovery workflows when faults are detected. Masakari works with OpenStack Nova (compute management) to identify affected instances and to execute actions such as instance evacuation, restart, or power-off, depending on configuration and failure type. This enables operators to reduce manual intervention during host failures and to enforce consistent recovery behavior across the cloud.
Masakari consists of an Application Programming Interface (API) service, an engine, and notification mechanisms (cloud services architecture). The API service receives notifications or failure reports, usually from monitoring systems or agents, and the engine coordinates the appropriate recovery steps based on defined policies. Recovery actions are carried out through integration with Nova and other core OpenStack services using their public APIs. This design allows Masakari to fit into existing OpenStack control plane deployments without direct manipulation of underlying hypervisors beyond what Nova exposes.
In enterprise and institutional environments, Masakari is used to support high availability requirements for virtualized workloads running on OpenStack clouds (enterprise infrastructure). Operators configure host monitoring, define which instances are protected, and specify recovery priorities and behaviors. The project is suited to private, public, and hybrid OpenStack deployments where host-level failures must be addressed within automated operational runbooks instead of ad hoc manual procedures.
Masakari interacts with other OpenStack components through standard Representational State Transfer (REST) APIs and OpenStack messaging patterns (cloud integration). It fits into architectures that also use external monitoring tools or agents to detect host failures and forward events into the Masakari API. Because it focuses on host and instance recovery workflows rather than basic compute scheduling, Masakari is generally positioned alongside Nova as an add-on service to enhance resilience for workloads that require higher availability.
From a directory and taxonomy perspective, Masakari can be categorized under OpenStack ecosystem services, focusing on high availability and automated recovery for VM instances (cloud infrastructure, infrastructure resiliency, operations automation). It is relevant to platform engineers, cloud operators, and Site Reliability Engineering (SRE) teams responsible for continuity of virtualized applications on OpenStack-based platforms.