Savic and Easha Abid
The author selected Apache Software Foundation to receive a donation as part of the Write for DOnations program.
Apache Kafka is an open-source distributed event and stream-processing platform written in Java, built to process demanding real-time data feeds. It is designed to be fault-tolerant with support for hundreds of nodes per cluster. Running a greater number of nodes efficiently requires containerization and orchestration processes for optimal resource usage, such as Kubernetes.
In this tutorial, you’ll learn how to deploy Kafka using Docker Compose. You’ll also learn how to deploy it on DigitalOcean Kubernetes using Strimzi, which integrates into Kubernetes and allows you to configure and maintain Kafka clusters using regular Kubernetes manifests without manual overhead.
To follow this tutorial, you will need:
kubectl
default. Instructions on how to configure kubectl
are shown under the Connect to your Cluster step when you create your cluster. To create a Kubernetes cluster on DigitalOcean, read the Kubernetes Quickstart.In this section, you’ll learn how to run Kafka using Docker Compose in KRaft mode. Utilizing KRaft streamlines the overall configuration and resource usage as no ZooKeeper instances are required.
First, you’ll define a Docker image that contains an unpacked Kafka release. You’ll use it to test the connection to the Kafka container by using the included scripts.
You’ll store the necessary commands in a Dockerfile
. Create and open it for editing:
Add the following lines:
The container is based on the latest Ubuntu version. After updating the package cache and installing curl
and Java, you download a Kafka release package. At the time of writing, the latest version of Kafka was 3.7.0
. You can look up the latest version on the official Downloads page and replace the highlighted value if required.
Then, you set the WORKDIR
(working directory) to /kafka-test
, to which you download and extract the Kafka release. The --strip-components=1
parameter is passed into tar
to skip the first directory of the archive, which is named after the archive itself.
Save and close the file.
Next, you’ll define the Docker Compose configuration in a file named kafka-compose.yaml
. Create and open it for editing by running:
Add the following lines:
Here you define two services, kafka
and kafka-test
. The kafka
service is based on the latest Bitnami Kafka image. Under the environment
section, you pass in the necessary environment variables and their values, which configure the Kafka node to be standalone with an ID of 0
.
For kafka-test
, you pass in the Dockerfile
you’ve just created as the base for building the image of the container. By setting tty
to true
, you leave a session open with the container. This is necessary to keep it alive, as it would otherwise exit immediately after startup with no further commands.
Save and close the file, then run the following command to bring up the services in the background:
The output will be long because kafka-test
will be built for the first time. The end of the output will be:
You can list the running containers with:
The output will look like the following:
Open a shell in the kafka-test
container by running:
The shell will already be positioned in the /kafka-test
directory:
Then, try creating a topic using kafka-topics.sh
:
Note that you refer to Kafka by its name in the Docker Compose configuration (kafka
).
The output will be:
You’ve successfully connected to the Kafka deployment from within the Docker Compose service. Type in exit
and press Enter
to close the shell.
To stop the Docker Compose deployment, run the following command:
In this step, you’ve deployed Kafka using Docker Compose. You’ve also tested that Kafka is available from within other containers by deploying a custom image that contains shell scripts for connecting to it. In the rest of the tutorial, you’ll learn how to deploy Kafka on DigitalOcean Kubernetes.
In this section, you’ll install Strimzi to your Kubernetes cluster. This entails adding its repository to Helm and creating a Helm release.
You’ll first need to add the Strimzi Helm repository to Helm, which contains the Strimzi chart:
The output will be:
Then, refresh Helm’s cache to download its contents:
You’ll see the following output:
Finally, install Strimzi to your cluster by running:
The output will look like this:
You now have Strimzi installed in your Kubernetes cluster. In the next section, you’ll use it to deploy Kafka to your cluster.
In this section, you’ll deploy a one-node Kafka cluster with ZooKeeper to your Kubernetes cluster. At the time of writing, support for deploying Kafka using KRaft was not generally available in Strimzi.
You’ll store the Kubernetes manifest for the deployment in a file named kafka.yaml
. Create and open it for editing:
Add the following lines to your file:
The first block of the spec
is related to Kafka itself. You set the version, as well as the number of replicas. Then, you define two listeners, which are ports that the Kafka deployment will use to communicate. The second listener is encrypted because you set tls
to true
. Since listeners can’t collide, you assign 9093
as the port number for the second one.
Since you’re deploying only one Kafka node in the config
section, you set various replication factors (for the topics, events, and replicas) to 1
. For storage
, you set the type
to jbod
(meaning “just a bunch of disks”) which allows you to specify multiple volumes. Here, you define one volume of type persistent-claim
with a size of 100GB. This will create a DigitalOcean Volume and assign it to Kafka. You also set deleteClaim
to false
to ensure that data isn’t deleted when the Kafka cluster is destroyed.
To configure the zookeeper
deployment, you set its number of replicas to 1
and provide it with a single persistent-claim
of 100GB, as only Kafka supports the jbod
storage type. The two definitions under entityOperator
instruct Strimzi to create cluster-wide operators for handling Kafka topics and users.
Save and close the file, then apply it by running:
kubectl
will display the following output:
You can watch the deployment become available by running:
After a few minutes, both Kafka and Zookeeper pods will become available and ready:
To list Kafka deployments, run the following command:
You’ll see output similar to this:
Now that Kafka is running, you’ll create a topic in it. Open a file called kafka-topic.yaml
for editing:
Add the following lines:
This KafkaTopic
defines a topic called my-topic
in the cluster you’ve just deployed (my-cluster
).
Save and close the file, then apply it by running:
The output will be:
Then, list all Kafka topics in the cluster:
kubectl
will show the following output:
In this step, you’ve deployed Kafka to your Kubernetes cluster using Strimzi, which takes care of the actual resources and ZooKeeper instances. You’ve also created a topic, which you’ll use in the next step when connecting to Kafka.
In this section, you’ll learn how to connect to a Kafka cluster deployed on Kubernetes from within the cluster.
Thanks to Strimzi, your Kafka deployment is already available to pods in the cluster. Any app from within can connect to the my-cluster-kafka-bootstrap
endpoint, which will automatically be resolved to the my-cluster
cluster.
You’ll now deploy a temporary pod to Kubernetes based on a Docker image that Strimzi provides. The image contains a Kafka installation with shell scripts for producing and consuming textual messages (kafka-console-producer.sh
and kafka-console-consumer.sh
).
Run the following command to run the producer script in-cluster:
The temporary pod will be called kafka-producer
and will use the image from the Strimzi project. It will be deleted after the commands end executing (--rm=true
) and will never be restarted, as it’s a one-time job. Then, you pass in the command to run kafka-console-producer.sh
script. As noted previously, you pass in the my-cluster-kafka-bootstrap
designator for the server and my-topic
as the topic name.
The output will look like this:
You can input any text message and press Enter
to send it to the topic:
To exit, press CTRL+C
and confirm with Enter
. Then, run the following command to run the consumer script in-cluster:
You may need to press Enter
for the command to proceed. The output will be similar to this:
You’ve learned how to connect to your Kafka deployment from within the cluster. You’ll now expose Kafka to the outside world.
In this step, you’ll expose your Kafka deployment externally using a load balancer.
Strimzi has a built-in way of creating and configuring a load balancer for Kafka. Open kafka.yaml
for editing by running:
Add the following lines to the listeners
section:
The highlighted part defines a new listener with the type loadbalancer
that will accept connections at port 9094
without TLS encryption.
Save and close the file, then apply the new manifest by running:
The output will be:
Run the following command to watch it become available:
When the load balancer that fronts traffic for Kafka becomes available, the output will be its IP address.
As part of the prerequisites, you downloaded and extracted the latest Kafka release to your machine. Navigate to that directory and run the console consumer, replacing your_lb_ip
with the IP address from the output of the previous command:
You’ll soon see the messages being read from the topic, meaning that you’ve been successfully connected:
To delete all Strimzi-related resources from your cluster (such as Kafka deployments and topics), run the following command:
In this article, you’ve deployed Kafka using Docker Compose and verified that you can connect to it. You’ve also learned how to install Strimzi to your DigitalOcean Kubernetes cluster and deployed a Kafka cluster using the provided manifests.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
Currently, I really struggle to fix the issue with the installation of the bitnami Kafka on the kubernetes. I don’t know what and why you did it, but now, for some strange reason, using the bitnami Kafka helm chart, I cannot make it work no matter what configuration I’m using. For some reason, port 9092 is not accessible… or just blocked… I’ve been running Kafka for 2 years on your platform, but now I’ll need to give up here and go to another vendor…