Skip to main content

Apache Pulsar

Apache Pulsar is a distributed messaging and streaming platform (event streaming) that provides multi-tenant, high-throughput pub-sub messaging with persistent storage.

  • Distributed pub-sub messaging system with persistent log storage (event streaming, messaging middleware).
  • Multi-tenant architecture with isolation for topics, namespaces, and tenants (multi-tenant infrastructure).
  • Segmented, tiered storage for messages with Apache BookKeeper-based persistence (data storage, log management).
  • Built-in support for both messaging and streaming workloads, including queuing and event processing (data streaming, messaging middleware).
  • Geo-replication and clustering across data centers for high availability and Disaster Recovery (DR) (distributed systems, replication).

More About Apache Pulsar

Apache Pulsar is an open-source distributed messaging and streaming platform (event streaming, messaging middleware) developed under The Apache Software Foundation. It is designed to handle publish-subscribe messaging with persistent storage across clusters of servers, supporting high-throughput data ingestion and event processing for applications that require durable, ordered message delivery and horizontal scalability.

Pulsar organizes data as topics within namespaces and tenants (multi-tenant infrastructure), allowing organizations to isolate workloads, manage quotas, and apply access controls across different teams or applications. Producers publish messages to topics, and consumers subscribe using multiple subscription modes such as exclusive, shared, and failover (messaging middleware), which gives architects options for load balancing, work queue patterns, and at-least-once delivery semantics.

At the storage layer, Pulsar separates compute from storage using a broker and bookie architecture. Brokers handle client connections and routing (message brokering), while Apache BookKeeper bookies manage the underlying ledgers that store message data (log storage). This design supports segment-based log storage, compaction, and configurable retention policies (data storage), enabling long-term storage of message streams as well as short-lived queues. Tiered storage integrations allow older data segments to move to lower-cost storage backends (data lifecycle management) while keeping recent data on primary BookKeeper nodes.

Pulsar supports geo-replication across multiple clusters and data centers (distributed replication), allowing topics to be replicated asynchronously between regions. This capability is used in enterprise scenarios for DR, regional access, and data locality. The system uses a cluster-based architecture with ZooKeeper coordination (cluster coordination) for metadata, leadership, and configuration management in typical deployments.

Enterprises use Pulsar for log collection, event-driven microservices, streaming pipelines, and messaging between backend services (enterprise integration). Its protocol and client libraries (application integration) support multiple programming languages and allow integration into existing application stacks and data platforms. Features such as message batching, compression, and backpressure handling (performance optimization) are exposed to help operators tune throughput and latency characteristics.

Pulsar’s functions and connectors, described in project materials, provide stream-native processing and integration with external systems (stream processing, data integration). Pulsar Functions run lightweight compute tasks directly on message streams, while Inference Orchestrator (IO) connectors move data between Pulsar and external storage or processing systems. This positions Pulsar in enterprise reference architectures as both a central event bus and a backbone for streaming data pipelines.

Within a technical taxonomy, Apache Pulsar fits primarily in the categories of event streaming platform, publish-subscribe messaging system, and distributed log storage. Its multi-tenant design, persistent segmented storage, and support for mixing queuing and streaming patterns make it a candidate for consolidated messaging and streaming infrastructure in complex environments where multiple teams and workloads share a common data backbone.