Apache Kafka is a platform for real-time distributed streaming. Apache Kafka Tutorial puts forth the understanding of a distributed streaming, the building components of Kafka framework, the core APIs of Kafka, scalability dimensions, the use cases it addresses and many more interesting topics.
Apache Kafka Architecture is a good choice for scalable real-time distributed streaming.
Kafka has been originally developed at LinkedIn and has become a top level Apache project during 2011.
The main goal of Apache Kafka is to be a unified platform that is scalable for handling real-time data streams.
Design Goals of Apache Kafka
Following are some of the design goals for its framework :
- Scalability – The framework should handle scalability in all the four dimensions (event producers, event processors, event consumers and event connectors [we shall learn about these in due course of this tutorial]).
- High-Volume – It should be capable of working with huge volume of data streams.
- Data Transformations – Kafka should provide provision for deriving new data streams using the data streams from producers.
- Low latency – To address traditional use cases of messaging, which require low latency.
- Fault Tolerance – The Kafka cluster should handle failures with the masters and data bases.
Installing Apache Kafka
Apache Kafka could be setup on your machine easily. Please go through one of the following installation steps to setup Apache Kafka on your machine based on the Operating System.
- Install Apache Kafka on Ubuntu
- Install Apache Kafka on MacOS
Kafka Framework – Core APIs
Kafka Framework (Kafka Cluster) contains following five actors :
- Topic [not mentioned in the below picture, but is soul of Kafka]
- Stream Processors
Topic is a category in which streams of events/records are stored in Kafka cluster. A Kafka cluster can have multiple topics.
Producers are applications that send data streams to topics in Kafka Cluster. A producer can send the stream of records to multiple topics. Apache Kafka Producer API enables an application to become a producer.
Consumers are applications that feed on data streams from topics in Kafka Cluster. A consumer can receive stream of records from multiple topics through subscription. Apache Kafka Consumer API enables an application to become a consumer.
Stream Processors are applications that transform data streams of topics to other data streams of topics in Kafka Cluster. Apache Kafka Streams API enables an application to become a stream processor.
Learn Kafka Stream Processor with Java Example.
Connectors are responsible for pulling stream data from Producers or transformed data from Stream Processors and delivering stream data to Consumers or Stream Processors.
Apache Kafka Connect API helps to realize connectors.
Learn Kafka Connector with an example that imports data from text file to Kafka Cluster.
Kafka Command-Line Tools
Kafka provides Command-Line tools to start a Console Producer, Console Consumer, Console Connector etc.
Kafka Console Producer and Consumer Example is a Kafka Tutorial for demonstrating to start a Producer and Consumer through console.
Now, Monitoring your Kafka Cluster has become easy. Kafka Monitor is relatively new package released by LinkedIn that can be used to do long running tests and regression tests. No changes are required to existing Kafka Cluster to use Kafka Monitor.
The biggest advantage Kafka Monitor brings in is that with the help of long running tests, you may detect the problems that develop over time. With Kafka Monitor you can have tests which talk to Kafka Cluster and bring in reports about the impact of your change in the Kafka Cluster during continuous development and regression.
Learn to build a Kafka Multi-Broker Cluster with three nodes in the cluster that run in your local machine.