Continue the series about Apache Kafka, in this post, I’d like to share some knowledge about Apache Kafka topic partition and how to write an Apache Kafka Custom Partitioner.

1. Basic about Apache Kafka Topic Partition.

There are many reasons why Apache Kafka is being adopted and used more widely today. Two of them which are related to the Topic Partition, I’d like to mention in this post are:

  • Scalable
  • High Performance

A Kafka topic likes other topics in other traditional messaging brokers. It’s simply a category, a feed name, a pipe to which messages can be published. There are a few points we should note that: by default, Apache Kafka stores the topic data on the disk directly rather than relational database. And only a  consumer of a consumer group can be consumed messages from a Topic. These seems to lead us to several thoughts as following:

  • The topic data can not be scaled out of a single machine
  • If the producer rate is higher than the consumer rate, then the consumer tends to fall further behind the producer day by day.

Apache Kafka solves this by introducing the concept of Topic Partition. A topic now can be divided into many partitions depended on our application business logic. Partitions data can be stored on different machines of the cluster. And each partition of the topic can be consumed by different consumers of a consumer group. This  improves the ingestion rate the consumer group.

2. Write An Apache Kafka Custom Partitioner.

By default, Apache Kafka producer will distribute the messages to different partitions by round-robin fashion. We can change this by using our custom partitioner. In below example, assume that we’re implementing a basic notification application which allow users to subscribe to receive notifications from other users. We will create a Kafka topic with many partitions. Each partition will hold the messages/notifications of a user.

Write An Apache Kafka Custom Partitioner - User Custom Patitioner Example

Apache Kafka Custom Patitioner Example – User Partitioner

Note that the example implementation will not create multiple many producers, consumers as above image. We just use 1 producer and 1 consumer with custom partitioner.

2.1. Prerequisites

  • Apache Kafka 0.9 broker installed on local machine or remote. You can refer to this link for setting up.
  • JDK 7/8 installed on your development PC.
  • Eclipse 4 (I am using Eclipse Mars 4.5)
  • Maven 3

2.2. Source code structure

The example source code is added in the Github. You can use git to pull the repository to your PC or simple download the zip version of the example and extract to your PC.

After having the source code, you can import the source code into Eclipse and run the test.

To import:

  •  Menu File –> Import –> Maven –> Existing Maven Projects
  • Browse to your source code location
  • Click Finish button to finish the importing

Here is the project structure in my Eclipse:

Apache Kafka Custom Patitioner Example - Source code

Apache Kafka Custom Patitioner Example – Source code

2.3. Maven Pom.xml

We use the kafka-clients-0.9.0.1 library for this example. The Java compiler 1.8

2.4. Classes Descriptions

The source code includes 6 classes:

IUserService.java, UserServiceImpl.java. These classes are used to handle on operations related to users. Currently they provide 2 methods: findUserId and findAllUsers on the list of 5 users: Tom, Mary, Alice, Daisy, Helen.

UserProducerThread.java. This is a simple producer, produces 5 messages to the UserMessageTopic topic.

UserConsumerThread.java. This is a simple consumer. It subscribe for the UserMessageTopic topic and print out the messages which contain information about partition, to console.

KafkaUserCustomPartitioner.java. This is our custom partitioner which will divide the messages based on user id.

KafkaCustomPartitionerMain.java. The entry point class. Initialize producer, consumer for our testing purpose.

2.4.1. IUserService.java, UserServiceImpl.java

The findUserId will retrieve id of user based on the given userName parameter. The findAllUsers will return all the users in the system.

2.4.2. UserProducerThread.java

This producer class, registered with our custom partitioner:
When run, it will produce a “Hello” messages to all users in the system. The custom partitioner will distribute them to the correct partition. When message is delivered successfully to the broker, we will print it out to the console for easy tracking.

2.4.3. KafkaUserCustomPartitioner.java

This is the custom partitioner. We need to implement the Partitioner interface. Our implementation will get the key which is the userName,  retrieve the according user Id and return it as the partition number.  The messages sent to Tom will be stored on the partition #1, Mary: #2, Alice: #3, Daisy: #4 and Helen: #5 or partition #0 otherwise.

2.4.4. UserConsumerThread.java

This consumer class, simply subscribe for the above topic. When the messages come, it simply print out them to the console.

2.4.5. KafkaCustomPartitionerMain.java

This class contains the main method. Initializes the producer, consumer for our testing purpose.

2.4.6 Run the application.

Step 1. Execute the script: create-topics.sh or create-topics.bat (on Windows) to create the UserMessageTopic topic on the Kafka broker.

Step 2. Open the KafkCustomPartitionerMain.java on the eclipse. Right click –> Run As –> Java Application or use the shortcut: Alt+Shift+x, j to start the main method.

The output on my eclipse is as below:

Produces 5 messages
Sent:Hello Tom, User: Tom, Partition: 1
Sent:Hello Alice, User: Alice, Partition: 3
Sent:Hello Daisy, User: Daisy, Partition: 4
Receive message: Hello Tom, Partition: 1, Offset: 0
Receive message: Hello Alice, Partition: 3, Offset: 0
Receive message: Hello Daisy, Partition: 4, Offset: 0
Sent:Hello Helen, User: Helen, Partition: 5
Receive message: Hello Helen, Partition: 5, Offset: 0
Sent:Hello Mary, User: Mary, Partition: 2
Receive message: Hello Mary, Partition: 2, Offset: 0

We can see that messages were delivered and received as our expectation.

Tom: partition #1.

Alice: partition #3.

Daisy: partition #4.

Helen: partition #5.

Mary: partition #2.

3. Summary

Topic Partition is the key unit of parallelism in Apache Kafka. We can use partition to support us in scaling out not only storage but also operations. In this example, we have tried to write An Apache Kafka Custom Partitioner which heps distribute the user messages to correct partitions of the Topic. This is just an very simple example for reference. Hopefully can help you guys quickly reuse to your own purpose.

Below are the articles related to Apache Kafka topic. If you’re interested in them, you can refer to the following links:

Apache Kafka Tutorial

Getting started with Apache Kafka 0.9

Using Apache Kafka Docker

Apache Kafka 0.9 Java Client API Example

Create Multi-threaded Apache Kafka Consumer

Apache Kafka Command Line Interface

Apache Kafka Connect Example

Apache Kafka Connect MQTT Source Tutorial

Apache Flume Kafka Source And HDFS Sink Tutorial

Spring Kafka Tutorial

0 0 vote
Article Rating