1. Introduction to Apache Kafka Connect
Apache Kafka, which is a kind of Publish/Subscribe Messaging system, gains a lot of attraction today. We can see many use cases where Apache Kafka stands with Apache Spark, Apache Storm in Big Data architecture which need real-time processing, analytic capabilities.
To integrate with other applications, systems, we need to write producers to feed data into Kafka and write the consumer to consume the data. However, Apache Kafka Connect which is one of new features has been introduced in Apache Kafka 0.9, simplifies the integration between Apache Kafka and other systems. Apache Kafka Connect supports us to quickly define connectors that move large collections of data from other systems into Kafka and from Kafka to other systems. Let’s take a look at the overview of the Apache Kafka Connect:

Apache Kafka Connect – Overview
The Sources in Kafka Connect are responsible for ingesting the data from other system into Kafka while the Sinks are responsible for writing the data to other systems. Note that another new feature has been also introduced in Apache Kafka 0.9 is Kafka Streams. It is a client library for processing and analyzing data stored in Kafka. We can filter, transform, aggregate, the data streams. By combining the Kafka Connect with Kafka Streams, we can build prefect data pipelines.
2. Some Apache Kafka Connectors
Currently, there are some opensource Sources and Sinks Connectors from community as below:
Connectors | References |
Apache Ignite | Source, Sink |
Elastic Search | Sink1, Sink2, Sink3 |
Cassandra | Source1, Source 2, Sink1, Sink2 |
MongoDB | Source |
HBase | Sink |
Syslog | Source |
MQTT (Source) | Source |
Twitter (Source) | Source, Sink |
S3 | Sink1, Sink2 |
Or you can find some certified Connectors from Confluent.io via this link
3. Example
3. 1. File Connectors
I’d like to take an example from Apache Kafka 0.10.0.0 distribution and elaborate it. The example is used to demo how to use Kafka Connect to stream data from source which is file test.txt to destination which is also a file, test.sink.txt. Note that the example will run on the standalone mode.

Apache Kafka Connect Example – FileStream
We will need to use 2 connectors:
- FileStreamSource reads the data from the test.txt file and publish to Kafka topic: connect-test
- FileStreamSink which will consume data from connect-test topic and write to the test.sink.txt file.
Let’s see configuration file for the Source at kafka_2.11-0.10.0.0\config\connect-file-source.properties
1 2 3 4 5 |
name=local-file-source connector.class=FileStreamSource tasks.max=1 file=test.txt topic=connect-test |
We need to define the connector.class, the maximum of tasks will we created, the file name that will be read by connector and the topic where data will be published.
Here is the configuration file for the Sink at kafka_2.11-0.10.0.0\config\connect-file-sink.properties
1 2 3 4 5 |
name=local-file-sink connector.class=FileStreamSink tasks.max=1 file=test.sink.txt topics=connect-test |
In similar to the Source, we need to define the connector.class, the number of tasks, the destination file where the data will be written and the topic which data will be consumed.
One important configuration file located at: kafka_2.11-0.10.0.0\config\connect-standalone.properties we need to define the address of the Kafka broker, the keys, values converters.
1 2 3 4 5 6 |
bootstrap.servers=localhost:9092 key.converter=org.apache.kafka.connect.json.JsonConverter value.converter=org.apache.kafka.connect.json.JsonConverter key.converter.schemas.enable=true value.converter.schemas.enable=true ... |
3. 2. Run the example
Make sure you have Apache Kafka 0.9.x or 0.10.x deployed and ready. Assume that we have an Kafka distribution at /opt/kafka_2.11-0.10.0.0. Note that we will use some basic Kafka command line below. If you’re not familiar with those, you can reference another post : Apache Kafka Command Line Interface
3.2.1. Start Kafka broker
We first should cd(change directory) to the Kafka distribution folder.
1 |
cd /opt/kafka_2.11-0.10.0.0 |
Start ZooKeeper
1 |
./bin/zookeeper-server-start.sh config/zookeeper.properties & |
Start Kafka Server
1 |
./bin/kafka-server-start.sh config/server.properties |
3.2.1. Start the Source and Sink connectors
1 |
./bin/connect-standalone.sh config/connect-standalone.properties config/connect-file-source.properties config/connect-file-sink.properties |
Note that after this command, the connector is ready for reading the content from test.txt file which should be located in the execution folder: /opt/kafka_2.11-0.10.0.0
3.2.2. Start the Source connector
Write some content to the test.txt file
1 2 3 |
echo 'hello' >> test.txt echo 'halo' >> test.txt echo 'salut' >> test.txt |
3.2.3. Check whether the Source connector feed the test.txt content into the topic connect-test or not
1 |
./bin/kafka-console-consumer.sh --zookeeper localhost:2181 --from-beginning --topic connect-test |
The output on the console is:
1 2 3 |
{"schema":{"type":"string","optional":false},"payload":"hello"} {"schema":{"type":"string","optional":false},"payload":"halo"} {"schema":{"type":"string","optional":false},"payload":"salut"} |
3.2.4. Check whether the Sink Connector write content to the test.sink.txt or not
1 |
cat test.sink.txt |
The output on my console:
1 2 3 |
hello halo salut |
4. Conclusions
We have seen the overview of Apache Kafka Connect and an simple example that using FileStream connectors. We can leverage Kafka Connectors to quickly ingest data from a lot of sources, do some processing and write to other destinations. Basically, everything can be done by Apache Kafka, we don’t need to use either other libraries, frameworks like Apache Flume or custom producers. In next posts, I will introduce more about using other types of Kafka Connectors like HDFS sink, JDBC sources, etc and how to implement a Kafka Connector.
Below are the articles related to Apache Kafka topic. If you’re interested in them, you can refer to the following links:
Getting started with Apache Kafka 0.9
Apache Kafka 0.9 Java Client API Example
Apache Kafka Command Line Interface
How To Write A Custom Serializer in Apache Kafka
Write An Apache Kafka Custom Partitioner
Apache Kafka Command Line Interface
Apache Flume Kafka Source And HDFS Sink Tutorial
Apache Kafka Connect MQTT Source Tutorial
Hi,
I am actually aware of how to capture data from any data source, such as a specific API (e.g Http get request). Many thanks.