Skip to main content

Apache Kyuubi

Apache Kyuubi is a distributed and multi-tenant Structured Query Language (SQL) gateway for big data and lakehouse engines that provides unified, JDBC/ODBC-accessible analytical services over data lakes and warehouses (data processing / analytics).

  • Distributed SQL gateway service exposing JDBC/ODBC-compatible endpoints for Apache Spark and related engines (data processing / analytics).
  • Multi-tenant, high-availability architecture with server-side session and workload management (platform services / resource management).
  • Support for standard SQL-based access to data lakes, warehouses, and lakehouse architectures (data access / query federation).
  • Integration with existing authentication and authorization mechanisms for secure data access (identity and access management).
  • Deployment options for on-premises (on-prem) and cloud environments, including integration with Kubernetes and YARN-based clusters (infrastructure orchestration).

More About Apache Kyuubi

Apache Kyuubi is a “distributed and multi-tenant gateway to provide unified access to large-scale data processing and analytics”. It is designed as a shared SQL service layer that allows clients to connect using standard JDBC and ODBC interfaces while delegating query execution to underlying compute engines such as Apache Spark. This architecture positions Kyuubi in the data processing and analytics (data processing / analytics) category, focused on providing a stable, shared endpoint for interactive analytics and batch workloads over data lakes and data warehouses.

At its core, Kyuubi operates as a SQL gateway (data access / query gateway) that exposes a unified endpoint compatible with widely used database connectivity standards. Clients connect via JDBC, ODBC, or other standard SQL interfaces, and Kyuubi manages sessions, statements, and operations on their behalf. The gateway translates client requests into jobs executed by the configured engine, typically Apache Spark, and returns results using familiar database interaction patterns. This decouples client applications from direct engine management and improves reuse of compute resources across multiple users and applications.

The project provides a multi-tenant architecture (platform services / multi-tenancy) in which multiple users and applications share a common service while maintaining isolation through session-level configuration and resource control. Kyuubi servers can be deployed in a clustered mode to support high availability and scalability (infrastructure services / high availability). The service can manage lifecycle aspects such as engine startup, reuse, and teardown, which reduces overhead for individual users and simplifies administration of shared Spark environments.

In enterprise deployments, Kyuubi is typically used as the SQL entry point for data lake and lakehouse platforms (data platform / lakehouse access). It can sit between BI tools, data science notebooks, or custom applications and the underlying data processing engine. Because it exposes standard JDBC and ODBC interfaces, it interoperates with many business intelligence and analytics tools without requiring engine-specific drivers. Kyuubi also integrates with cluster resource managers such as Kubernetes and Apache Hadoop YARN (infrastructure orchestration), enabling operators to schedule and scale compute resources according to organizational policies.

Security and governance are addressed through integration with existing authentication and authorization mechanisms (security / identity and access management). Kyuubi can be configured to use enterprise identity systems, Marketing Automation Platform (MAP) users to engine-level access controls, and align with existing security policies around data access. Centralizing access through the gateway can simplify auditing and control of who runs which workloads on shared analytics infrastructure.

From a directory and taxonomy perspective, Apache Kyuubi fits into the categories of SQL gateway and query service for big data engines (data processing / analytics), connectivity middleware exposing JDBC/ODBC endpoints (data access / connectivity), and multi-tenant shared service for Spark and related engines (platform services). It is relevant for organizations that want to standardize analytical access to data lakes and lakehouse systems while managing compute engines centrally and providing familiar SQL-based connectivity to downstream tools.