Skip to main content

CubeFS

CubeFS is a cloud-native Distributed File System (DFS) (storage infrastructure) designed to provide scalable, high-throughput, and highly available storage for containerized and data-intensive workloads.

  • DFS with separation of metadata and data services (storage infrastructure).
  • POSIX-compatible access via client and FUSE, supporting hierarchical directories and standard file operations (file storage).
  • Object storage compatibility with S3-like interfaces for cloud-native applications (object storage).
  • Kubernetes integration for persistent volumes and containerized workloads (container storage).
  • Erasure coding and data replication capabilities for fault tolerance and durability (data protection).

More About Cubefs

CubeFS is an open-source DFS (storage infrastructure) under the Cloud Native Computing Foundation (CNCF) that targets large-scale, cloud-native environments. It is designed for scenarios such as big data analytics, Artificial Intelligence (AI) and Machine Learning (ML) workloads, media processing, and general-purpose shared storage where applications require a POSIX-style file system or S3-compatible object access with horizontal scalability and high throughput.

The architecture of CubeFS separates metadata and data into distinct services (distributed storage architecture). Metadata is managed by dedicated metadata servers that handle directory hierarchies, file attributes, and namespace operations, while data is stored in distributed data nodes that manage chunks or volumes. This separation allows metadata operations and data I/O to scale independently and enables cluster operators to size and tune each layer according to workload characteristics.

CubeFS supports POSIX semantics through a client and FUSE-based mounting approach (file storage), enabling applications to access it like a traditional file system with standard operations such as read, write, rename, and directory traversal. In addition, CubeFS exposes S3-compatible object storage interfaces (object storage), allowing cloud-native and microservice-based applications that already integrate with S3 APIs to store and retrieve data without changes to application code.

The system provides data protection through replication and erasure coding mechanisms (data protection). Replication maintains multiple copies of data across nodes for availability, while erasure coding reduces storage overhead for cold or archival workloads by splitting data into fragments with parity. These approaches help maintain data durability and service continuity in the presence of node or disk failures.

In Kubernetes environments, CubeFS integrates as a storage backend for persistent volumes (container storage). It can be consumed through standard Kubernetes storage interfaces, enabling stateful workloads to share the same DFS or object storage space. This supports deployment models where a single CubeFS cluster serves multiple namespaces, tenants, or applications.

From an operational perspective, CubeFS provides management and monitoring components (storage operations), including cluster configuration, node management, and health monitoring tools as documented in the official materials. Administrators can manage capacity scaling by adding or removing data and metadata nodes, and can configure placement policies and redundancy strategies appropriate to their infrastructure topology.

Within an enterprise technology taxonomy, CubeFS can be categorized as a cloud-native distributed file and object storage platform (storage infrastructure) that supports POSIX file access, S3-compatible object interfaces, and Kubernetes-integrated persistent storage. It is relevant for organizations standardizing on CNCF-aligned components for data platforms, AI infrastructure, and container platforms where shared, horizontally scalable storage is required across heterogeneous workloads.