Volcano
Volcano is an open-source batch scheduling system (workload orchestration) built on Kubernetes for high-performance, AI/ML, and big data workloads that require queueing, gang scheduling, and co-scheduling semantics.
- Kubernetes-native batch workload scheduler with queue and job management (workload orchestration).
- Support for gang scheduling, co-scheduling, and priority-based scheduling for multi-pod jobs (batch computing).
- Job-level features such as dependencies, retries, lifecycle management, and fairness across queues and tenants (resource management).
- Integration with Kubernetes custom resources and controllers to manage batch jobs, queues, and related policies (Kubernetes extension).
- Support for High performance computing (HPC), AI/ML, and big data scenarios requiring coordinated resource allocation across nodes (high-performance computing).
More About Volcano
Volcano is a batch scheduling system (workload orchestration) designed for Kubernetes environments that run HPC, Artificial Intelligence (AI) and Machine Learning (ML) (AI/ML), and big data workloads. It addresses the problem of orchestrating jobs that require coordinated allocation of resources across multiple pods and nodes, which extends beyond the capabilities of the default Kubernetes scheduler for batch use cases.
The project introduces a scheduling framework (batch computing) that focuses on job queues, fairness, and co-scheduling semantics. Volcano supports concepts such as gang scheduling and co-scheduling (scheduling policy), where a job is scheduled only when all its required pods can be placed, which is important for parallel computing frameworks and distributed training workloads. It also supports priority and preemption policies, resource sharing strategies, and quota management across queues.
Volcano uses Kubernetes CustomResourceDefinitions (Kubernetes extension) to model batch jobs, queues, and related scheduling objects. Typical resources include custom job definitions that encapsulate multiple pods, dependencies between tasks, and policies such as minimum available tasks. These are controlled by Volcano controllers and a scheduler component that integrates with the Kubernetes Control Plane (KCP) to make placement decisions based on defined policies.
In enterprise and institutional environments, Volcano is used to run HPC-style jobs, AI/ML training and inference pipelines, and data processing jobs on Kubernetes clusters (enterprise workload management). It enables organizations to share cluster resources across multiple teams and tenants while enforcing queue-level and job-level policies. Capabilities such as job retries, time-to-live handling, and job status tracking support day-two operations for batch workloads.
From an architectural perspective, Volcano operates as an add-on scheduler and controller set (cluster orchestration) that works alongside the core Kubernetes components. It relies on standard Kubernetes APIs, pods, and nodes, which allows it to interoperate with existing cluster tooling and observability systems. Its use of CRDs and plug-in style scheduling policies allows extension and customization for domain-specific workloads.
Within an enterprise technology catalog, Volcano fits into categories such as batch workload orchestration for Kubernetes, HPC job scheduling, and AI/ML and data engineering pipeline scheduling. It is relevant for platform teams building multi-tenant Kubernetes-based compute platforms that must support parallel jobs, resource-aware scheduling, and fair sharing of cluster capacity.