Skip to main content

Kuberhealthy

Kuberhealthy is a Kubernetes-native synthetic monitoring system (observability) that runs periodic, test-like workloads inside a cluster to detect and report operational issues from the perspective of applications.

  • Provides Kubernetes-native synthetic checks that run as pods and validate cluster and application health (observability).
  • Exposes check results and metrics via Prometheus-compatible endpoints for ingestion by monitoring stacks (monitoring/metrics).
  • Uses Kubernetes Custom Resource Definitions (CRDs) to configure and manage checks declaratively (Kubernetes platform operations).
  • Supports custom check containers so teams can implement domain-specific health tests (extensibility/tooling).
  • Integrates with Kubernetes deployments and namespaces to validate core components such as Domain Name System (DNS), networking, and Application Programming Interface (API) behavior (infrastructure reliability).

More About Kuberhealthy

Kuberhealthy is a synthetic monitoring framework (observability) designed to run inside Kubernetes clusters and continuously verify that cluster services and workloads behave as expected. Instead of only scraping system metrics, it executes real workloads as checks, observing the cluster from the perspective of applications and users. The project is hosted under the Cloud Native Computing Foundation (foundation) and follows Kubernetes conventions for deployment and configuration.

The core of Kuberhealthy consists of a controller and a set of checks (observability/health checking). Checks are implemented as Kubernetes workloads—typically pods or jobs—that run on a schedule to perform specific tasks such as creating resources, resolving DNS, or performing Hypertext Transfer Protocol (HTTP) operations. Each check reports its status, including success or failure and any error text, back to the Kuberhealthy controller. This model allows platform and Site Reliability Engineering (SRE) teams to identify failures that may not be visible in static metrics, such as issues with scheduling, networking, or cluster-level services.

Kuberhealthy uses Kubernetes Custom Resource Definitions (CRDs) (Kubernetes extensibility) to define and manage checks declaratively. Administrators create custom resources that specify which check image to run, its schedule, and configuration parameters. The controller interprets these CRDs, orchestrates the check pods, and tracks their results. This CRD-based approach aligns with GitOps and Infrastructure-as-Code (IaC) workflows, where check definitions can be stored, versioned, and reviewed alongside application manifests.

The system exposes check results and operational data as Prometheus metrics (monitoring/metrics), enabling integration with existing observability stacks. Each check’s status is represented as metric values that can be scraped by Prometheus and visualized in dashboards or used to trigger alerts in tools that consume Prometheus-format data. Kuberhealthy also provides an HTTP status endpoint that summarizes overall health from all configured checks.

Kuberhealthy supports built-in checks for common cluster behaviors (observability), such as DNS resolution or Kubernetes API interactions, and it enables custom check images for organization-specific scenarios. Teams can write checks in their preferred language, package them as containers, and register them using CRDs. This extensible model allows enterprises to standardize synthetic monitoring patterns across multiple clusters, environments, and namespaces.

In enterprise environments, Kuberhealthy is typically deployed as a cluster-level monitoring component (platform reliability). It runs in one or more namespaces and is granted permissions to execute checks that create and remove Kubernetes resources. Its alignment with Kubernetes-native APIs, CRDs, and Prometheus metrics positions it in the observability and platform operations category, where it complements metric collection, logging, and tracing tools by continuously validating that the cluster can perform critical operations in real conditions.