In this article, I’d like share some basic information about Apache Kafka, how to install and use basic client tools ship with Kafka to create topic, to produce/to consume the messages. The version used is Apache Kafka 1.0.0.
1. Concept
Apache Kafka is a Publish-Subscribe messaging system, originated at Linkedin in 2009, open sourced in 2011. In term of architecture, Apache Kafka is similar to other messaging systems like ActiveMQ, RabbitMQ,…However, Apache Kafka has some characteristics that make it more popular nowadays such as:
- Kafka was designed to be distributed inherently. This makes Kakfa be very easy to scale out.
- High throughput, high performance
- Guarantee the fault-tolerant in term of machine failure.
Apache Kafka now becomes very popular for use cases related to real-time data processing with related technologies likes Apache Spark, Apache Storm, Hadoop….
2. Prerequisite
You will need to install Java 7 or newer versions first.
3. Download and Install Apache Kafka
3.1. Obtain a desire version of Apache Kafka at its site
You can download a binary version of Apache Kafka by using a web browser or get the mirror link and download with command line tools such as wget, curl, etc. For example, the below wget command will download the Apache Kafka 1.0.0 and put in the /tmp folder:
1 |
wget http://www.trieuvan.com/apache/kafka/1.0.0/kafka_2.11-1.0.0.tgz -P /tmp |
3.2. Extract the downloaded file
Let’s extract the downloaded file to the /opt directory:
1 |
sudo tar -xvzf /tmp/kafka_2.11-1.0.0.tgz -C /opt/ |
3.3. Go to the extracted Kafka directory
1 |
cd /opt/kafka_2.11-1.0.0/ |
Let’s take a look at Kafka directory:
- bin
Contains all necessary scripts for working with Apache Kafka such as: start, stop ZooKeeper, Kafka server, create Apache Kafka producer, consumer..
- libs
Contains all libraries used by Kafka
- config
Contains all configuration files for ZooKeeper, Logger, Kafka server…
- site-docs
Apache Kafka documents.
4. Start Apache Kafka server
Apache Kafka can be deployed into 3 types of cluster.
- Single node – Single broker cluster
For this type, we can deploy only a single Kafka broker on a single node.
- Single node – Multiple broker clusters
This type supports multiple brokers on a Single node. We just need to start Kafka brokers on different ports.
- Multiple nodes – Multiple broker clusters
This type is combined from above types. We can deploy multiple brokers on each node and connect those nodes together.
In tutorial, we will use the first type, deploy a Single Kafka broker on Single node which is our PC.
Start Apache Kafka requires 2 steps:
- Start ZooKeeper
Apache Kafka utilizes ZooKeeper to store metadata of cluster and consumer information. So, we have 2 choices: deploy a separated ZooKeeper server and configure Kafka to use that server. Or simply use the ZooKeeper bundled with Kakfa distribution. In this article, we will chose the option 2, using the ZooKeeper bundled with Kafka
- Start Kafka Broker
4.1. Start ZooKeeper
Let’s start the ZooKeeper server by issuing the following command:
1 |
bin/zookeeper-server-start.sh config/zookeeper.properties |
Successful start is ended with info likes: binding port 0.0.0.0/0.0.0.0:2181
By default, ZooKeeper listens on the port 2181. We can change it to another one by editing the file config/zookeeper.properties. Let’s take a look at its content.
4.2. Start Kafka Broker
To start Kafka broker, we can issue the below command:
1 |
bin/kafka-server-start.sh config/server.properties |
Successful start will ended with some info likes:
1 |
Registered broker 0 at path/brokers/ids/0 with address: PLAINTEXT --> EndPoint(rio-vmb,9092,PLAINTEXT) |
Where rio-vmb is the hostname of the current server, 9092 is the default port on which Apache Kafka is listened on and the default protocal is PLAINTEXT.
We can change above info by editing the config/server.properties. Let’s take a look at its content:
Some info we can see on the top of the file are: dataDir, clientPort,..
5. Working with Apache Kafka
5.1. Create a Kafka Topic
Let’s create an topic called HelloKafkaTopic by issuing the following command:
1 2 |
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 \ --partitions 1 --topic HelloKafkaTopic |
The output on the console should be:
1 |
Created topic "HelloKafkaTopic". |
5.2. To verify that the topic was created, we can list all the topics on the Kafka
1 |
bin/kafka-topics.sh --list --zookeeper localhost:2181 |
The output for this command should be:
1 |
HelloKafkaTopic |
5.3. Start a Kafka Producer
By default, Apache Kaka provides us some client tools (scripts) to start producer and send message to the broker. We will use the /bin/kafka-console-producer.sh to connect to the the topic HelloKafkaTopic of the broker localhost:9092:
1 |
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic HelloKafkaTopic |
Note that after the above command, the console will be waiting to input message from users. We can type any message and hit enter will push the message to the topic.
To exit the producer client, simply press: ctrl-c.
5.4. Start a Kafka Consumer to consume the messages produced by above producer
In similar to the producer, we will use the default consumer client (script) provided by Kafka.
1 |
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic HelloKafkaTopic --from-beginning |
5.5. Produce and Consume message
Back to the Producer console which is still waiting for input from keyboard, and put any message and hit enter. For example,
1 2 3 4 5 |
Hello It's me I was wondering |
You can see all those messages are consumed and displayed on the Consumer console.
The conversation will be as below images. The top console is the Producer console while the bottom is the Consumer console.
6. Conclusion
That was all basic steps to started with the Apache Kafka. On next article, I will share additional configuration for Apache Kafka broker, how to use other client libraries like Java, C#…to produce and consume messages with Kafka cluster.
Below are the articles related to Apache Kafka topic. If you’re interested in them, you can refer to the following links:
Apache Kafka 0.9 Java Client API Example
How To Write A Custom Serializer in Apache Kafka
Write An Apache Kafka Custom Partitioner
Apache Kafka Command Line Interface
Apache Kafka Connect MQTT Source Tutorial
Apache Flume Kafka Source And HDFS Sink Tutorial