Skip to main content

Kaito

Kaito is an open-source project under the Cloud Native Computing Foundation (CNCF) that automates deployment and operations of Large Language Model (LLM) workloads on Kubernetes clusters (AI/ML infrastructure orchestration).

  • Automates provisioning and deployment of LLM workloads onto Kubernetes clusters (infrastructure automation).
  • Abstracts GPU-aware scheduling, resource management, and placement of LLM inference and training jobs (container orchestration).
  • Provides opinionated workflows and templates for running LLM services on existing Kubernetes infrastructure (AI platform operations).
  • Integrates with cloud-native tooling and practices for running Artificial Intelligence (AI) workloads within enterprise Kubernetes environments (cloud-native AI infrastructure).
  • Targets platform engineering teams operating Graphics Processing Unit (GPU) clusters for LLM workloads, aligning with CNCF cloud-native patterns (platform engineering).

More About Kaito

Kaito is a Cloud Native Computing Foundation (CNCF) project focused on running LLM workloads on Kubernetes (AI/ML infrastructure orchestration). It addresses the operational tasks involved when enterprises run GPU-based LLM inference or training on existing Kubernetes clusters, including deployment, scheduling, and lifecycle management.

The project concentrates on automating the deployment of LLM workloads onto Kubernetes clusters (infrastructure automation). It provides workflows and configuration patterns that enable platform and infrastructure teams to run LLM services as Kubernetes-native resources. By aligning with Kubernetes constructs, Kaito reduces the need for bespoke orchestration logic for GPU nodes and LLM-serving components.

Kaito abstracts GPU-aware scheduling and resource management for LLM jobs (container orchestration). This includes selecting appropriate GPU nodes, managing resource requests and limits, and coordinating placement strategies for LLM services. The abstraction allows application and data teams to request LLM capabilities without direct interaction with low-level cluster scheduling details.

For platform operators, Kaito offers opinionated workflows and templates for running LLM services on existing Kubernetes infrastructure (AI platform operations). These workflows standardize how LLM inference endpoints and related services are defined, deployed, and scaled within an organization’s cluster environment. This supports reuse of existing observability, networking, and security practices already in place for other Kubernetes workloads.

Kaito is designed for use in environments where Kubernetes is the central control plane for compute resources, including GPU-accelerated nodes (cloud-native AI infrastructure). Enterprises can integrate Kaito into existing platform engineering stacks, aligning LLM workloads with established Continuous Integration and Continuous Deployment (CI/CD) pipelines, Role-Based Access Control (RBAC), and policy enforcement. This positioning places Kaito within the broader CNCF ecosystem as a project that connects AI/ML workloads to cloud-native operational models.

From a taxonomy perspective, Kaito fits into categories such as AI/ML infrastructure orchestration, Kubernetes-based GPU workload management, and platform engineering for LLM services. It is relevant for technical stakeholders who operate or design Kubernetes platforms that must support LLM inference and training at scale, while maintaining consistent operational patterns across heterogeneous workloads.