Apache Kafka is an open-source, distributed event streaming platform for building real-time, event-driven applications, scalable real-time data pipelines and systems for data integration across services.
It is one of the most popular open-source data processing systems available, supporting the real-time data streaming needs of financial institutions, retail giants, music and video streamers, video game innovators and more.
A large part of Kafka’s appeal is its architecture. Apache Kafka is a distributed event streaming platform that enables producers and consumers to exchange data through a publish–subscribe messaging model built on durable, partitioned logs.
As a distributed system, Kafka operates as a cluster of broker nodes that coordinate to store and transmit data across multiple machines. This design makes Kafka highly fault-tolerant because it can cope with the loss of a single node or machine in the system and still function.
Kafka also supports asynchronous communication, which decouples data producers from consumers. This feature makes it easier to build scalable, loosely coupled systems and enables more flexible system design in complex architectures.
Kafka is widely used for building event-driven and microservices-based architectures and is often deployed on Kubernetes in cloud-native environments to support scalable and automated data processing systems.
Developing with Kafka offers several key advantages:
Stay up to date on the most important—and intriguing—industry trends on AI, automation, data and beyond with the Think newsletter. See the IBM Privacy Statement.
Apache Kafka works on several underlying concepts. Here’s a brief look at how they work together to give Apache Kafka its core capabilities.
An event is a record of something that happened in a system, such as a user action, a sensor reading, a payment transaction or an application log entry. Events contain information describing the occurrence and can be produced by applications, devices or services.
Kafka ‘topics’ are named categories or streams of messages. Applications read data from topics or subdivisions of a topic called ‘partitions.’ Kafka brokers are the servers that store topic partitions, handle requests and replicate data across clusters.
Historically, Kafka relied on Apache ZooKeeper for cluster coordination, configuration management and leader election. Newer Kafka deployments increasingly use KRaft (Kafka Raft) mode, which removes the dependency on ZooKeeper and simplifies cluster management.
A ‘producer,’ in Apache Kafka architecture, is anything that can create data (also referred to as records, events or messages), for example a web server, application or application component, an Internet of Things (IoT), device and many others. A ‘consumer’ is any component that needs the data that’s been created by the producer to function.
For example, in an IoT app, the data could be information from sensors connected to the Internet, such as a temperature gauge or a sensor in a driverless vehicle that detects a traffic light has changed.
Producers publish messages to Kafka topics; consumers subscribe to topics and read messages from partitions.
One of Apache Kafka’s defining capabilities is event streaming: the continuous capture, storage and distribution of data as events occur. Unlike traditional batch-processing systems, Kafka enables applications to react to new information in near real time.
Events generated by applications, devices and services are published to topics and made available to multiple consumers simultaneously. Kafka’s distributed architecture is designed to handle high volumes of event data while providing scalability, durability and fault tolerance, making it well suited for real-time applications, data pipelines and analytics.
Apache Kafka’s event-driven architecture is designed to store, process and distribute event streams in real time. Its scalability, durability and fault tolerance make it well suited for the following core use cases:
Apache Kafka enables applications, services, and systems to exchange data through a distributed publish-subscribe model. Producers publish messages to Kafka topics, while consumers subscribe to those topics and process events independently.
Like traditional message brokers such as RabbitMQ, Kafka decouples data producers from consumers. However, Kafka differs in that messages are stored in durable, partitioned logs and can be consumed multiple times. Message keys are commonly used to determine partition placement and preserve ordering for related events.
Kafka was originally developed to handle large-scale activity tracking and user event collection. It can capture high volumes of user interactions, such as page views, clicks, registrations, purchases and application events, in real time.
These events are organized into topics and distributed across Kafka’s cluster, enabling organizations to collect, store and analyze user activity at scale while maintaining low latency and high throughput.
Kafka is widely used to collect and centralize operational data from distributed applications, infrastructure and services. By aggregating metrics, logs and system events into a single event stream, organizations gain real-time visibility into system health and performance.
Kafka itself exposes operational metrics through Java Management Extensions (JMX), allowing teams to monitor broker health, throughput, latency, storage utilization and consumer lag. These insights support effective troubleshooting, capacity planning and performance optimization.
Many organizations use Kafka as a central platform for log aggregation. Traditionally, log aggregation involves collecting log files from multiple servers and storing them in a central repository for analysis.
Kafka abstracts log data into streams of events, making it easier to ingest, process, and distribute logs from multiple sources. This approach supports real-time analysis, multiple downstream consumers and lower processing latency. Compared with traditional log collection systems, Kafka also provides strong durability through data replication and distributed storage.
Many organizations also use Kafka to ingest and transport data into a centralized data lake for long-term storage, analytics and reporting.
One of Kafka’s most powerful capabilities is real-time stream processing. Rather than relying solely on scheduled batch processing, Kafka enables applications to process continuous streams of data as events occur.
Using tools such as Kafka Streams, ksqlDB (formerly KSQL), Apache Flink and Apache Spark Structured Streaming, organizations can transform, enrich, aggregate and analyze data in real time. This capability supports use cases such as Internet of Things (IoT) applications, fraud detection, recommendation engines, machine learning pipelines and real-time analytics.
Kafka is commonly used to implement event sourcing architectures, where changes to an application’s state are stored as a sequence of immutable events rather than as direct updates to a database.
Because Kafka durably stores events in order, applications can reconstruct current state by replaying historical events. This approach improves auditability, supports system recovery, and enables multiple services to consume and react to the same business events.
Kafka can also serve as a distributed commit log for data replication and recovery. In this model, all changes are recorded sequentially in Kafka topics, creating a durable record of system activity.
Kafka’s log compaction feature retains the latest value for each key while removing outdated records, making it particularly useful for maintaining application state and synchronizing data across distributed systems. This approach is similar to the role performed by distributed log systems such as Apache BookKeeper.
Apache Kafka’s core capability of real-time data processing has thrown open the floodgates in terms of what apps can do across many industries. Using Kafka, enterprises are exploring new ways to leverage streaming data to increase revenue, drive digital transformation and create delightful experiences for their customers. Here are a few of the most striking examples.
The Internet of Things (IoT), a network of devices embedded with sensors allowing them to collect and share data over the Internet, relies heavily on Apache Kafka architecture.
For example, sensors connected to a windmill use IoT capabilities to transmit data on things like wind speed, temperature and humidity over the Internet. In this architecture, each sensor is a producer, generating data every second that it sends to a backend server or database, the consumer, for processing.
Kafka architecture facilitates this back-and-forth transmission and receipt of data, as well as its processing, in real-time, allowing scientists and engineers to track weather conditions from hundreds or thousands of miles away. Kafka’s record-keeping and message-queue capabilities ensure the quality and accuracy of the data that’s being gathered.
In the same way that Kafka enables the gathering of data via IoT devices that can be streamed to consumers in real-time, it also enables the gathering and analysis of information from the stock market.
Kafka has been used for many business-critical, high-volume workloads that are essential to trading stocks and monitoring financial markets.
Some of the world’s largest banks and financial institutions, such as PayPal, ING and JPMorgan Chase, use it for real-time data analysis, financial fraud detection, risk management in banking operations, regulatory compliance, market analysis and more.
Online retailers and e-commerce sites must process thousands of orders from their app or website every day, and Kafka plays a central role in making this happen for many businesses, including Walmart, Lowe’s and Domino’s. Response time and customer relationship management (CRM) are key to success in the retail industry, so it’s important that orders are processed quickly and accurately.
Kafka helps simplify the communication between customers and businesses, using its data pipeline to accurately record events and keep records of orders and cancellations, alerting all relevant parties in real-time. In addition to processing orders, Kafka generates accurate data that can be analyzed to assess business performance and uncover valuable insights.
The healthcare industry relies on Kafka to connect hospitals to critical electronic health records (EHR) and confidential patient information. Kafka facilitates two-way communication that powers healthcare apps that rely on data that’s being generated in real-time by several different sources. Kafka’s capabilities also allow the sharing of knowledge in real-time; for example, a patient’s allergy to a certain medication that can save lives.
In addition to helping doctors get real-time data that informs how they treat patients, Kafka is also critical to the medical research community. Its data storage and analytics capabilities help researchers scour medical data for insights into diseases and patient care, speeding medical breakthroughs.
Telecommunications providers generate massive volumes of data from mobile networks, broadband infrastructure and subscriber services. Apache Kafka is widely used to collect, transport and process this data in real time, helping providers monitor network performance and maintain service reliability.
By streaming network telemetry, usage records, customer activity and operational events to analytics and monitoring systems, Kafka enables faster issue detection, improved operational visibility and more responsive customer experiences. Its scalability and fault tolerance make it well suited for handling the high-throughput workloads common in modern telecommunications networks.
Today’s most advanced gaming platforms rely on real-time communication between players hundreds and even thousands of miles apart. If there’s any lag time in a game where players’ reaction time is key to their success, performance will suffer. What’s more, the gaming industry has been booming of late, growing by a compound annual growth rate (CAGR) of 13.4 % and increasing the scrutiny of its key operational metrics.
Kafka powers the lightning-fast communication and interaction between players that makes popular, hyper-real gaming ecosystems so popular. New games rely on Kafka’s real-time streaming abilities as well as its real-time analytics and data-storage functions. Furthermore, Kafka’s streaming pipeline helps players keep track of each other in real-time by ensuring that player movements are transmitted to other players instantly.
Cybersecurity teams use Kafka to process large volumes of security events generated by firewalls, endpoints, identity systems and cloud infrastructure. Kafka helps organizations build real-time threat detection pipelines, enrich security telemetry and deliver data to SIEM platforms for monitoring and incident response.
Kafka is widely used for event streaming, data integration and large-scale data pipelines. But when should you not use Kafka?
The answer boils down to the “Kafka tax,” which is a phrase used to describe the added cost of running and maintaining Kafka in production as use cases grow. In some cases, the operational complexity and required expertise can outweigh the benefits.
Scenarios where Kafka may not be the best choice include:
In some of instances, enterprise-grade managed offerings such as those from Confluent can reduce operational burden. These platforms offer benefits such as reduced total cost of ownership, deployment flexibility, a rich partner ecosystem and consumption optimization.
IBM Event Streams is an event streaming software built on open source Apache Kafka. It is available as a fully managed service on IBM Cloud or for self-hosting.
Unlock business potential with IBM integration solutions, connecting applications and systems to access critical data quickly and securely.
Unlock new capabilities and drive business agility with IBM cloud consulting services.