Carina is an open-source project for automated resource management and scheduling of Graphics Processing Unit (GPU) resources across Kubernetes clusters for Artificial Intelligence (AI) and Machine Learning (ML) workloads (infrastructure orchestration).

Automated GPU resource scheduling and allocation across Kubernetes clusters (infrastructure orchestration).
Support for co-locating multiple workloads onto shared GPU resources where feasible (resource optimization).
Integration with Kubernetes cluster management and scheduling primitives (container orchestration).
Awareness of GPU topology and characteristics to inform placement decisions (infrastructure optimization).
Targeted at AI, deep learning, and high-performance workloads that rely on GPU acceleration (AI/ML infrastructure).

More About Carina

Carina is an open-source GPU resource management and scheduling system (infrastructure orchestration) designed to operate within Kubernetes environments and support AI, ML, and other GPU-accelerated workloads. It focuses on efficient utilization of GPU resources spread across one or more Kubernetes clusters, coordinating placement and sharing so that workloads can use GPU capacity with minimal manual intervention from cluster operators.

The project addresses the problem of fragmented and underutilized GPU resources in containerized environments. Traditional Kubernetes scheduling is not tailored for GPUs and often treats them as coarse-grained, indivisible resources. Carina adds GPU-aware scheduling (resource optimization) that can take into account specific GPU attributes and usage patterns and can schedule multiple containers or pods onto a single GPU when appropriate. This helps operators run mixed workloads, such as different AI training or inference jobs, on the same GPU estate while maintaining control over allocation.

Carina integrates with Kubernetes as an add-on component (container orchestration extension). It uses Kubernetes APIs and scheduling hooks to track GPU resources, understand which nodes host which GPUs, and place pods accordingly. The system maintains metadata about GPU topology and capabilities (infrastructure optimization), which can include aspects such as GPU type or capacity, and uses this information during scheduling decisions. This approach lets Carina function within existing Kubernetes-based platforms without replacing the native control plane.

In enterprise environments, Carina can be used by platform engineering teams, Machine Learning Operations (MLOps) groups, or infrastructure administrators who run multi-tenant AI or High performance computing (HPC) workloads on Kubernetes clusters. By exposing GPU resources through Kubernetes abstractions and policies, Carina supports centralized governance (platform management) while enabling different teams to submit jobs that consume GPU capacity according to configured limits and priorities. It supports deployment in multi-tenant clusters where isolation and predictable access to GPU resources are operational requirements.

From a taxonomy perspective, Carina fits within the Kubernetes ecosystem (cloud-native infrastructure) as a GPU scheduling and resource management extension. It inter-operates with Kubernetes workloads, pods, and namespaces and aligns with common AI and ML runtimes that depend on GPU acceleration. For directory positioning, Carina can be categorized under infrastructure orchestration, container orchestration extensions, and AI/ML infrastructure tooling, with a focus on GPU-aware scheduling and resource sharing.