How To Deploy Kafka on Docker and DigitalOcean Kubernetes

Published on March 26, 2024

Apache

DigitalOcean Managed Kubernetes

By Savic and Easha Abid

How To Deploy Kafka on Docker and DigitalOcean Kubernetes

The author selected Apache Software Foundation to receive a donation as part of the Write for DOnations program.

Introduction

Apache Kafka is an open-source distributed event and stream-processing platform written in Java, built to process demanding real-time data feeds. It is designed to be fault-tolerant with support for hundreds of nodes per cluster. Running a greater number of nodes efficiently requires containerization and orchestration processes for optimal resource usage, such as Kubernetes.

In this tutorial, you’ll learn how to deploy Kafka using Docker Compose. You’ll also learn how to deploy it on DigitalOcean Kubernetes using Strimzi, which integrates into Kubernetes and allows you to configure and maintain Kafka clusters using regular Kubernetes manifests without manual overhead.

Prerequisites

To follow this tutorial, you will need:

Docker installed on your machine. For Ubuntu, visit How To Install and Use Docker on Ubuntu. You only need to complete Step 1 and Step 2.
Docker Compose installed on your machine. For Ubuntu, visit How To Install and Use Docker Compose on Ubuntu. You only need to complete Step 1 and Step 2.
A DigitalOcean Kubernetes v1.23+ cluster with your connection configured as the kubectl default. Instructions on how to configure kubectl are shown under the Connect to your Cluster step when you create your cluster. To create a Kubernetes cluster on DigitalOcean, read the Kubernetes Quickstart.
The Helm 3 package manager installed on your local machine. Complete Step 1 of the How To Install Software on Kubernetes Clusters with the Helm 3 Package Manager tutorial.
An understanding of Kafka, including topics, producers, and consumers. For more information, please visit Introduction to Kafka.

Step 1 - Running Kafka Using Docker Compose

In this section, you’ll learn how to run Kafka using Docker Compose in KRaft mode. Utilizing KRaft streamlines the overall configuration and resource usage as no ZooKeeper instances are required.

First, you’ll define a Docker image that contains an unpacked Kafka release. You’ll use it to test the connection to the Kafka container by using the included scripts.

You’ll store the necessary commands in a Dockerfile. Create and open it for editing:

nano Dockerfile

Add the following lines:

Dockerfile

FROM ubuntu:latest AS build
RUN apt-get update
RUN apt-get install curl default-jre -y
WORKDIR /kafka-test
RUN curl -o kafka.tgz https://dlcdn.apache.org/kafka/3.7.0/kafka_2.13-3.7.0.tgz
RUN tar -xzf kafka.tgz --strip-components=1

The container is based on the latest Ubuntu version. After updating the package cache and installing curl and Java, you download a Kafka release package. At the time of writing, the latest version of Kafka was 3.7.0. You can look up the latest version on the official Downloads page and replace the highlighted value if required.

Then, you set the WORKDIR (working directory) to /kafka-test, to which you download and extract the Kafka release. The --strip-components=1 parameter is passed into tar to skip the first directory of the archive, which is named after the archive itself.

Save and close the file.

Next, you’ll define the Docker Compose configuration in a file named kafka-compose.yaml. Create and open it for editing by running:

nano kafka-compose.yaml

Add the following lines:

kafka-compose.yaml

version: '3'

services:
  kafka:
    image: 'bitnami/kafka:latest'
    environment:
      - KAFKA_CFG_NODE_ID=0
      - KAFKA_CFG_PROCESS_ROLES=controller,broker
      - KAFKA_CFG_LISTENERS=PLAINTEXT://:9092,CONTROLLER://:9093
      - KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
      - KAFKA_CFG_CONTROLLER_QUORUM_VOTERS=0@kafka:9093
      - KAFKA_CFG_CONTROLLER_LISTENER_NAMES=CONTROLLER
  kafka-test:
    build:
      dockerfile: Dockerfile
      context: .
    tty: true

Here you define two services, kafka and kafka-test. The kafka service is based on the latest Bitnami Kafka image. Under the environment section, you pass in the necessary environment variables and their values, which configure the Kafka node to be standalone with an ID of 0.

For kafka-test, you pass in the Dockerfile you’ve just created as the base for building the image of the container. By setting tty to true, you leave a session open with the container. This is necessary to keep it alive, as it would otherwise exit immediately after startup with no further commands.

Save and close the file, then run the following command to bring up the services in the background:

docker-compose -f kafka-compose.yaml up -d

The output will be long because kafka-test will be built for the first time. The end of the output will be:

Output
...
Creating docker_kafka_1      ... done
Creating docker_kafka-test_1 ... done

You can list the running containers with:

docker ps

The output will look like the following:

OutputCONTAINER ID   IMAGE                  COMMAND                  CREATED         STATUS         PORTS      NAMES
3ce3e3190f6e   bitnami/kafka:latest   "/opt/bitnami/script…"   4 seconds ago   Up 3 seconds   9092/tcp   docker_kafka_1
2a0cd13859e3   docker_kafka-test      "/bin/bash"              4 seconds ago   Up 3 seconds              docker_kafka-test_1

Open a shell in the kafka-test container by running:

docker exec -it docker_kafka-test_1 bash

The shell will already be positioned in the /kafka-test directory:

root@2a0cd13859e3:/kafka-test#

Then, try creating a topic using kafka-topics.sh:

bin/kafka-topics.sh --create --topic first-topic --bootstrap-server kafka:9092

Note that you refer to Kafka by its name in the Docker Compose configuration (kafka).

The output will be:

OutputCreated topic first-topic.

You’ve successfully connected to the Kafka deployment from within the Docker Compose service. Type in exit and press Enter to close the shell.

To stop the Docker Compose deployment, run the following command:

docker-compose -f kafka-compose.yaml down

In this step, you’ve deployed Kafka using Docker Compose. You’ve also tested that Kafka is available from within other containers by deploying a custom image that contains shell scripts for connecting to it. In the rest of the tutorial, you’ll learn how to deploy Kafka on DigitalOcean Kubernetes.

Step 2 - Installing Strimzi to Kubernetes

In this section, you’ll install Strimzi to your Kubernetes cluster. This entails adding its repository to Helm and creating a Helm release.

You’ll first need to add the Strimzi Helm repository to Helm, which contains the Strimzi chart:

helm repo add strimzi https://strimzi.io/charts

The output will be:

Output
"strimzi" has been added to your repositories

Then, refresh Helm’s cache to download its contents:

helm repo update

You’ll see the following output:

Output
...Successfully got an update from the "strimzi" chart repository
Update Complete. ⎈Happy Helming!⎈

Finally, install Strimzi to your cluster by running:

helm install strimzi strimzi/strimzi-kafka-operator

The output will look like this:

OutputNAME: strimzi
LAST DEPLOYED: ...
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Thank you for installing strimzi-kafka-operator-0.40.0

To create a Kafka cluster refer to the following documentation.

https://strimzi.io/docs/operators/latest/deploying.html#deploying-cluster-operator-helm-chart-str

You now have Strimzi installed in your Kubernetes cluster. In the next section, you’ll use it to deploy Kafka to your cluster.

Step 3 - Deploying a Kafka Cluster to Kubernetes

In this section, you’ll deploy a one-node Kafka cluster with ZooKeeper to your Kubernetes cluster. At the time of writing, support for deploying Kafka using KRaft was not generally available in Strimzi.

You’ll store the Kubernetes manifest for the deployment in a file named kafka.yaml. Create and open it for editing:

nano kafka.yaml

Add the following lines to your file:

kafka.yaml

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: my-cluster
spec:
  kafka:
    version: 3.7.0
    replicas: 1
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: tls
        port: 9093
        type: internal
        tls: true
    config:
      offsets.topic.replication.factor: 1
      transaction.state.log.replication.factor: 1
      transaction.state.log.min.isr: 1
      default.replication.factor: 1
      min.insync.replicas: 1
      inter.broker.protocol.version: "3.7"
    storage:
      type: jbod
      volumes:
      - id: 0
        type: persistent-claim
        size: 100Gi
        deleteClaim: false
  zookeeper:
    replicas: 1
    storage:
      type: persistent-claim
      size: 100Gi
      deleteClaim: false
  entityOperator:
    topicOperator: {}
    userOperator: {}

The first block of the spec is related to Kafka itself. You set the version, as well as the number of replicas. Then, you define two listeners, which are ports that the Kafka deployment will use to communicate. The second listener is encrypted because you set tls to true. Since listeners can’t collide, you assign 9093 as the port number for the second one.

Since you’re deploying only one Kafka node in the config section, you set various replication factors (for the topics, events, and replicas) to 1. For storage, you set the type to jbod (meaning “just a bunch of disks”) which allows you to specify multiple volumes. Here, you define one volume of type persistent-claim with a size of 100GB. This will create a DigitalOcean Volume and assign it to Kafka. You also set deleteClaim to false to ensure that data isn’t deleted when the Kafka cluster is destroyed.

To configure the zookeeper deployment, you set its number of replicas to 1 and provide it with a single persistent-claim of 100GB, as only Kafka supports the jbod storage type. The two definitions under entityOperator instruct Strimzi to create cluster-wide operators for handling Kafka topics and users.

Save and close the file, then apply it by running:

kubectl apply -f kafka.yaml

kubectl will display the following output:

Outputkafka.kafka.strimzi.io/my-cluster created

You can watch the deployment become available by running:

kubectl get strimzipodset -w

After a few minutes, both Kafka and Zookeeper pods will become available and ready:

OutputNAME                   PODS   READY PODS   CURRENT PODS   AGE
...
my-cluster-kafka       1      1            1              28s
my-cluster-zookeeper   1      1            1              61s

To list Kafka deployments, run the following command:

kubectl get kafka

You’ll see output similar to this:

OutputNAME         DESIRED KAFKA REPLICAS   DESIRED ZK REPLICAS   READY   METADATA STATE   WARNINGS
my-cluster   1                        1

Now that Kafka is running, you’ll create a topic in it. Open a file called kafka-topic.yaml for editing:

nano kafka-topic.yaml

Add the following lines:

kafka-topic.yaml

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
  name: my-topic
  labels:
    strimzi.io/cluster: "my-cluster"
spec:
  partitions: 1
  replicas: 1

This KafkaTopic defines a topic called my-topic in the cluster you’ve just deployed (my-cluster).

Save and close the file, then apply it by running:

kubectl apply -f kafka-topic.yaml

The output will be:

Outputkafkatopic.kafka.strimzi.io/my-topic created

Then, list all Kafka topics in the cluster:

kubectl get kafkatopic

kubectl will show the following output:

OutputNAME       CLUSTER      PARTITIONS   REPLICATION FACTOR   READY
my-topic   my-cluster   1            1                    True

In this step, you’ve deployed Kafka to your Kubernetes cluster using Strimzi, which takes care of the actual resources and ZooKeeper instances. You’ve also created a topic, which you’ll use in the next step when connecting to Kafka.

Step 4 - Connecting to Kafka in Kubernetes

In this section, you’ll learn how to connect to a Kafka cluster deployed on Kubernetes from within the cluster.

Thanks to Strimzi, your Kafka deployment is already available to pods in the cluster. Any app from within can connect to the my-cluster-kafka-bootstrap endpoint, which will automatically be resolved to the my-cluster cluster.

You’ll now deploy a temporary pod to Kubernetes based on a Docker image that Strimzi provides. The image contains a Kafka installation with shell scripts for producing and consuming textual messages (kafka-console-producer.sh and kafka-console-consumer.sh).

Run the following command to run the producer script in-cluster:

kubectl run kafka-producer -ti \
--image=quay.io/strimzi/kafka:0.40.0-kafka-3.7.0 --rm=true --restart=Never \
-- bin/kafka-console-producer.sh --bootstrap-server my-cluster-kafka-bootstrap:9092 --topic my-topic

The temporary pod will be called kafka-producer and will use the image from the Strimzi project. It will be deleted after the commands end executing (--rm=true) and will never be restarted, as it’s a one-time job. Then, you pass in the command to run kafka-console-producer.sh script. As noted previously, you pass in the my-cluster-kafka-bootstrap designator for the server and my-topic as the topic name.

The output will look like this:

Output
If you don't see a command prompt, try pressing enter.
>

You can input any text message and press Enter to send it to the topic:

Output
If you don't see a command prompt, try pressing enter.
>Hello World!
>

To exit, press CTRL+C and confirm with Enter. Then, run the following command to run the consumer script in-cluster:

kubectl run kafka-consumer -ti \
--image=quay.io/strimzi/kafka:0.40.0-kafka-3.7.0 --rm=true --restart=Never \
-- bin/kafka-console-consumer.sh --bootstrap-server my-cluster-kafka-bootstrap:9092 --topic my-topic --from-beginning

You may need to press Enter for the command to proceed. The output will be similar to this:

Output
Hello World!
...

You’ve learned how to connect to your Kafka deployment from within the cluster. You’ll now expose Kafka to the outside world.

Step 5 - Exposing Kafka Outside of Kubernetes

In this step, you’ll expose your Kafka deployment externally using a load balancer.

Strimzi has a built-in way of creating and configuring a load balancer for Kafka. Open kafka.yaml for editing by running:

nano kafka.yaml

Add the following lines to the listeners section:

kafka.yaml

...
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: tls
        port: 9093
        type: internal
        tls: true
      - name: external
        port: 9094
        type: loadbalancer
        tls: false
...

The highlighted part defines a new listener with the type loadbalancer that will accept connections at port 9094 without TLS encryption.

Save and close the file, then apply the new manifest by running:

kubectl apply -f kafka.yaml

The output will be:

Outputkafka.kafka.strimzi.io/my-cluster configured

Run the following command to watch it become available:

kubectl get service my-cluster-kafka-external-bootstrap -w -o=jsonpath='{.status.loadBalancer.ingress[0].ip}{"\n"}'

When the load balancer that fronts traffic for Kafka becomes available, the output will be its IP address.

As part of the prerequisites, you downloaded and extracted the latest Kafka release to your machine. Navigate to that directory and run the console consumer, replacing your_lb_ip with the IP address from the output of the previous command:

bin/kafka-console-consumer.sh --bootstrap-server your_lb_ip:9094 --topic my-topic --from-beginning

You’ll soon see the messages being read from the topic, meaning that you’ve been successfully connected:

Output
Hello World!
...

To delete all Strimzi-related resources from your cluster (such as Kafka deployments and topics), run the following command:

kubectl delete $(kubectl get strimzi -o name)

Conclusion

In this article, you’ve deployed Kafka using Docker Compose and verified that you can connect to it. You’ve also learned how to install Strimzi to your DigitalOcean Kubernetes cluster and deployed a Kafka cluster using the provided manifests.

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about our products

About the author(s)

Savic

Author

See author profile

Expert in cloud topics including Kafka, Kubernetes, and Ubuntu.

See author profile

Easha Abid

Editor

Technical Writer

See author profile

Category:

Developer Center

Tags:

Apache

DigitalOcean Managed Kubernetes

Still looking for an answer?

Ask a question Search for more help

Was this helpful?

This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

Leonid M

April 1, 2024

Currently, I really struggle to fix the issue with the installation of the bitnami Kafka on the kubernetes. I don’t know what and why you did it, but now, for some strange reason, using the bitnami Kafka helm chart, I cannot make it work no matter what configuration I’m using. For some reason, port 9092 is not accessible… or just blocked… I’ve been running Kafka for 2 years on your platform, but now I’ll need to give up here and go to another vendor…