Skip to main content

Fluid

Fluid is an open-source Kubernetes data orchestration and acceleration project that virtualizes and manages data access for data-intensive applications in cloud-native environments (data orchestration / data access acceleration).

  • Dataset abstraction and lifecycle management on Kubernetes for data-intensive workloads (data orchestration).
  • Co-location of compute and data through node-local caching to reduce data access latency (data access acceleration).
  • Pluggable runtime engine framework integrating distributed cache and storage systems such as Alluxio and JindoRuntime (storage integration).
  • Data access optimization for Artificial Intelligence (AI), Machine Learning (ML), big data, and analytics workloads running on Kubernetes (AI/ML and analytics infrastructure).
  • Multi-tenant operation, namespace-scoped resources, and integration with Kubernetes APIs and controllers (Kubernetes-native data management).

More About Fluid

Fluid is a CNCF project that addresses data access bottlenecks for data-intensive workloads running on Kubernetes by orchestrating and accelerating data access through a Kubernetes-native abstraction called Dataset (data orchestration). It targets scenarios such as AI, ML, deep learning, big data processing, and analytics where containers need to read large volumes of data from various storage backends with predictable performance (AI/ML and analytics infrastructure).

At the core of Fluid is the Dataset custom resource, which describes data sources and caching policies and decouples application pods from underlying storage systems (storage virtualization). Fluid introduces Runtime custom resources that represent cache engines or data access runtimes deployed on Kubernetes nodes. Through these resources and associated controllers, Fluid manages the lifecycle of distributed cache clusters that sit between applications and remote storage, enabling node-local or cluster-local caching and reducing dependence on direct remote storage access (data access acceleration).

Fluid supports a pluggable runtime engine architecture, with implementations such as AlluxioRuntime and JindoRuntime that integrate with corresponding distributed caching or storage acceleration systems (storage integration). These runtimes can mount data from object storage or distributed file systems and expose it to applications through POSIX-compatible interfaces or mount points inside containers, while managing caching, prefetching, and data eviction according to policies defined in the Dataset (storage access optimization).

In enterprise Kubernetes clusters, Fluid is deployed via standard Kubernetes manifests or Helm charts and operates through controllers, CRDs, and DaemonSets that coordinate cache daemons and sidecars on worker nodes (Kubernetes platform services). Platform teams can define Datasets and Runtimes in namespaces, bind them to applications, and use Kubernetes scheduling features to align workloads with nodes that host cached data, which reduces network traffic to external storage and stabilizes read performance for training jobs, batch analytics, and streaming consumers (resource and workload orchestration).

Fluid integrates with the broader cloud-native ecosystem by relying on Kubernetes APIs, Role-Based Access Control (RBAC), and namespace scoping, and by supporting multi-tenancy where different teams manage independent Datasets and Runtimes within the same cluster (multi-tenant data management). It provides mechanisms for data preloading, warm-up, and fine-grained configuration of cache capacity and topology, enabling operators to balance storage utilization and application performance (performance engineering tooling). Within a technical taxonomy, Fluid fits in the categories of Kubernetes data orchestration, data access acceleration, and cache-based data virtualization for AI/ML and big data platforms.