Skip to main content

Thanos

Thanos is an open-source, CNCF-hosted project that provides a highly available, multi-tenant, long-term storage and query layer for Prometheus metrics (observability/monitoring).

  • Global query view across multiple Prometheus deployments and historical metric data (observability/monitoring)
  • Long-term, durable storage for Prometheus time-series in object storage systems such as S3-compatible backends (data storage/observability)
  • High-availability metric ingestion and querying through redundant components and deduplication of replicas (reliability/observability)
  • Modular components including sidecar, store, compactor, ruler, and query for composable architectures (observability/platform infrastructure)
  • Multi-tenant and multi-cluster metric aggregation for centralized monitoring environments (observability/monitoring)

More About Thanos

Thanos addresses the problem of scaling Prometheus-based monitoring (observability/monitoring) across multiple clusters, environments, and long retention periods while maintaining a global, consistent query layer. Prometheus by design focuses on local scraping and short- to medium-term retention; Thanos extends these capabilities to support durable storage, federated queries, and high availability without discarding the Prometheus data model or query language.

At its core, Thanos introduces a set of components that integrate with existing Prometheus servers. The Thanos Sidecar (observability/agent) runs alongside Prometheus instances, uploading time-series blocks to an object storage backend and exposing the Prometheus data via a gRPC Store Application Programming Interface (API) (network protocol/observability). The Thanos Store Gateway (observability/service) reads historical data from object storage and serves it to query components. The Thanos Query component (observability/query engine) aggregates data from sidecars, store gateways, and other sources to provide a global query view using PromQL (query language/observability).

Thanos also includes the Compactor (data management/observability), which compacts and down-samples historical blocks in object storage to control storage usage and improve query performance over long time ranges. The Ruler component (alerting/observability) evaluates recording and alerting rules over the data stored in Thanos, supporting rule evaluation beyond a single Prometheus instance. These modules interact through well-defined APIs and are typically deployed as containerized services in Kubernetes (container orchestration/platform), though the architecture is not limited to Kubernetes environments.

In enterprise and institutional environments, Thanos is used to centralize monitoring data from multiple Kubernetes clusters, data centers, or regions. Organizations deploy Thanos to achieve durable metric retention in object stores such as S3-compatible services, Ground Control Segment (GCS), or other supported backends (object storage/data infrastructure). This enables uniform PromQL queries across live and historical data, independent of where the metrics were originally scraped. The deduplication features help operate redundant Prometheus pairs for high availability without double-counting metrics.

Thanos belongs in the observability and monitoring (observability) category, specifically as a scalable metrics storage and querying layer built around Prometheus. It interoperates closely with the Prometheus scraping model and remote storage ecosystem and is part of the Cloud Native Computing Foundation (open-source foundation/cloud-native). For enterprises, Thanos provides a structured way to build a multi-tenant, multi-cluster metrics platform with long-term retention, object-storage-based durability, and a consistent global query interface that aligns with existing Prometheus tooling and workflows.