Apache Kafka
Apache Kafka is a distributed event streaming platform (event streaming, data infrastructure) for publishing, storing, and subscribing to continuous data streams at scale.
- Distributed, partitioned, replicated commit log for event storage and streaming (event streaming, data infrastructure).
- High-throughput, low-latency message publishing and subscribing between producers and consumers (messaging, integration middleware).
- Durable, fault-tolerant storage across a Kafka cluster with replication and configurable retention (data durability, high availability).
- Stream processing support via Kafka Streams and integrations for building real-time applications (stream processing, application integration).
- Connectors framework for integrating external systems such as databases and key-value stores (data integration, Extract, Transform, Load (ETL)).
More About Apache Kafka
Apache Kafka is a distributed event streaming platform (event streaming, data infrastructure) designed to handle high-throughput, real-time data feeds by decoupling data producers from consumers. It addresses the problem of reliably transporting and storing continuous streams of records, such as logs, metrics, transactions, and event data, across different applications and services in an enterprise environment.
Kafka organizes data into topics, which are split into partitions and replicated across a cluster of brokers (distributed systems, cluster computing). This partitioned and replicated log structure enables horizontal scalability and resilience. Producers write records to topics, while consumers subscribe to those topics and read data at their own pace, which supports both real-time streaming and batch-style consumption patterns (messaging and data integration).
At its core, Kafka provides a durable, append-only commit log, with configurable retention policies for how long data is stored. This log-based architecture supports replay of events, which is useful for rebuilding state in downstream systems or reprocessing data for new applications (data engineering, event sourcing). The platform supports strong ordering guarantees within partitions and allows consumer groups for parallel processing and load balancing (stream processing, workload management).
Kafka Streams is a client library (stream processing) that enables developers to build applications and microservices that consume, process, and produce data stored in Kafka topics. It supports operations such as filtering, joining, aggregating, and windowing over streams and tables, and runs as part of the application process without requiring a separate processing cluster. This library integrates with Kafka’s underlying storage and consumer protocols to provide exactly-once processing semantics where configured.
Kafka Connect is a framework (data integration, ETL) for streaming data between Kafka and external systems such as relational databases, key-value stores, search indexes, and file systems. It defines a pluggable connector architecture and standardizes configuration, management, and scaling of data pipelines. Connect workers can be run in distributed or standalone mode, supporting centralized operation in enterprise environments.
In enterprise use, Kafka often acts as a central backbone for event data, application logs, and inter-service communication (integration middleware). It interoperates with a range of stream processing frameworks, monitoring tools, and storage systems through official and community connectors and clients. Its technical role in a directory spans categories such as event streaming platform, distributed log, message broker, and stream processing runtime.