Litmus
Litmus is an open-source cloud-native chaos engineering platform (resilience testing) for Kubernetes and other modern infrastructure to validate reliability through controlled fault injection and experiment orchestration.
- Chaos experiment orchestration and fault injection workflows for Kubernetes workloads (resilience testing).
- Chaos experiment catalog and reusable test scenarios for common Kubernetes and infrastructure failure modes (test libraries).
- Observability and result collection for chaos runs, including experiment status and system behavior (monitoring and reporting).
- Custom chaos experiment authoring and integration with Continuous Integration and Continuous Deployment (CI/CD) pipelines and GitOps workflows (DevOps automation).
- Extensible architecture with operators, CRDs, and integrations to run chaos at cluster, application, and infrastructure layers (cloud-native operations).
More About Litmus
Litmus is an open-source chaos engineering platform (resilience testing) under the Cloud Native Computing Foundation that focuses on validating the reliability of Kubernetes environments and related cloud-native infrastructure through controlled experiments. It targets use cases where enterprises need to understand how applications and platforms behave under faults such as pod failures, node outages, resource pressure, or network issues, and enables teams to simulate these conditions in a systematic way.
The project provides a framework for defining, scheduling, and observing chaos experiments (test orchestration) that run against Kubernetes clusters and associated services. At its core, Litmus uses Kubernetes-native constructs such as Custom Resource Definitions (CRDs) and operators (cloud-native control plane) to manage the lifecycle of chaos workflows. Experiments are described as Kubernetes resources, which allows them to be versioned, automated, and integrated into existing cluster management and DevOps practices.
Litmus includes a catalog of predefined chaos experiments (test libraries) that cover common failure scenarios, such as pod deletion, container kill, disk stress, Central Processing Unit (CPU) or memory stress, and network latency or loss, where applicable to the supported environments described by the project. These experiments are packaged to be reusable and composable, so platform and application teams can build more complex scenarios that reflect real production failure conditions. The use of standard Kubernetes resources and APIs (container orchestration) allows these tests to be applied across different clusters and distributions where Kubernetes is supported.
For enterprise environments, Litmus supports integration into CI/CD pipelines and GitOps workflows (DevOps automation), enabling chaos experiments to be run as part of continuous testing or pre-production validation. This supports practices where reliability checks are embedded into release processes, so regressions in resiliency can be detected alongside functional or performance issues. Litmus also exposes experiment status, results, and metrics (observability) that can be consumed by monitoring and logging systems, helping Site Reliability Engineering (SRE) and platform teams correlate failures injected by chaos tests with application and infrastructure behavior.
The project’s architecture is designed around Kubernetes operators, experiment CRDs, and workflows (cloud-native operations), which provides an extensible model for adding new fault types or integrating with additional infrastructure components where supported by Litmus. This extensibility is relevant for organizations that operate heterogeneous environments but use Kubernetes as a control plane. Litmus is positioned in the enterprise tooling landscape as a chaos engineering and resilience testing platform focused on cloud-native ecosystems, particularly Kubernetes-based platforms, enabling structured failure testing as part of reliability engineering and platform operations practices.