Savic and Easha Abid
The author selected Apache Software Foundation to receive a donation as part of the Write for DOnations program.
Apache Kafka is an open-source distributed event and stream-processing platform written in Java, built to process demanding real-time data feeds. It is designed to be fault-tolerant with support for hundreds of nodes per cluster. Running a greater number of nodes efficiently requires containerization and orchestration processes for optimal resource usage, such as Kubernetes.
In this tutorial, you’ll learn how to deploy Kafka using Docker Compose. You’ll also learn how to deploy it on DigitalOcean Kubernetes using Strimzi, which integrates into Kubernetes and allows you to configure and maintain Kafka clusters using regular Kubernetes manifests without manual overhead.
To follow this tutorial, you will need:
kubectl
default. Instructions on how to configure kubectl
are shown under the Connect to your Cluster step when you create your cluster. To create a Kubernetes cluster on DigitalOcean, read the Kubernetes Quickstart.In this section, you’ll learn how to run Kafka using Docker Compose in KRaft mode. Utilizing KRaft streamlines the overall configuration and resource usage as no ZooKeeper instances are required.
First, you’ll define a Docker image that contains an unpacked Kafka release. You’ll use it to test the connection to the Kafka container by using the included scripts.
You’ll store the necessary commands in a Dockerfile
. Create and open it for editing:
nano Dockerfile
Add the following lines:
FROM ubuntu:latest AS build
RUN apt-get update
RUN apt-get install curl default-jre -y
WORKDIR /kafka-test
RUN curl -o kafka.tgz https://dlcdn.apache.org/kafka/3.7.0/kafka_2.13-3.7.0.tgz
RUN tar -xzf kafka.tgz --strip-components=1
The container is based on the latest Ubuntu version. After updating the package cache and installing curl
and Java, you download a Kafka release package. At the time of writing, the latest version of Kafka was 3.7.0
. You can look up the latest version on the official Downloads page and replace the highlighted value if required.
Then, you set the WORKDIR
(working directory) to /kafka-test
, to which you download and extract the Kafka release. The --strip-components=1
parameter is passed into tar
to skip the first directory of the archive, which is named after the archive itself.
Save and close the file.
Next, you’ll define the Docker Compose configuration in a file named kafka-compose.yaml
. Create and open it for editing by running:
nano kafka-compose.yaml
Add the following lines:
version: '3'
services:
kafka:
image: 'bitnami/kafka:latest'
environment:
- KAFKA_CFG_NODE_ID=0
- KAFKA_CFG_PROCESS_ROLES=controller,broker
- KAFKA_CFG_LISTENERS=PLAINTEXT://:9092,CONTROLLER://:9093
- KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
- KAFKA_CFG_CONTROLLER_QUORUM_VOTERS=0@kafka:9093
- KAFKA_CFG_CONTROLLER_LISTENER_NAMES=CONTROLLER
kafka-test:
build:
dockerfile: Dockerfile
context: .
tty: true
Here you define two services, kafka
and kafka-test
. The kafka
service is based on the latest Bitnami Kafka image. Under the environment
section, you pass in the necessary environment variables and their values, which configure the Kafka node to be standalone with an ID of 0
.
For kafka-test
, you pass in the Dockerfile
you’ve just created as the base for building the image of the container. By setting tty
to true
, you leave a session open with the container. This is necessary to keep it alive, as it would otherwise exit immediately after startup with no further commands.
Save and close the file, then run the following command to bring up the services in the background:
docker-compose -f kafka-compose.yaml up -d
The output will be long because kafka-test
will be built for the first time. The end of the output will be:
Output...
Creating docker_kafka_1 ... done
Creating docker_kafka-test_1 ... done
You can list the running containers with:
docker ps
The output will look like the following:
OutputCONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
3ce3e3190f6e bitnami/kafka:latest "/opt/bitnami/script…" 4 seconds ago Up 3 seconds 9092/tcp docker_kafka_1
2a0cd13859e3 docker_kafka-test "/bin/bash" 4 seconds ago Up 3 seconds docker_kafka-test_1
Open a shell in the kafka-test
container by running:
docker exec -it docker_kafka-test_1 bash
The shell will already be positioned in the /kafka-test
directory:
root@2a0cd13859e3:/kafka-test#
Then, try creating a topic using kafka-topics.sh
:
bin/kafka-topics.sh --create --topic first-topic --bootstrap-server kafka:9092
Note that you refer to Kafka by its name in the Docker Compose configuration (kafka
).
The output will be:
OutputCreated topic first-topic.
You’ve successfully connected to the Kafka deployment from within the Docker Compose service. Type in exit
and press Enter
to close the shell.
To stop the Docker Compose deployment, run the following command:
docker-compose -f kafka-compose.yaml down
In this step, you’ve deployed Kafka using Docker Compose. You’ve also tested that Kafka is available from within other containers by deploying a custom image that contains shell scripts for connecting to it. In the rest of the tutorial, you’ll learn how to deploy Kafka on DigitalOcean Kubernetes.
In this section, you’ll install Strimzi to your Kubernetes cluster. This entails adding its repository to Helm and creating a Helm release.
You’ll first need to add the Strimzi Helm repository to Helm, which contains the Strimzi chart:
helm repo add strimzi https://strimzi.io/charts
The output will be:
Output"strimzi" has been added to your repositories
Then, refresh Helm’s cache to download its contents:
helm repo update
You’ll see the following output:
Output...Successfully got an update from the "strimzi" chart repository
Update Complete. ⎈Happy Helming!⎈
Finally, install Strimzi to your cluster by running:
helm install strimzi strimzi/strimzi-kafka-operator
The output will look like this:
OutputNAME: strimzi
LAST DEPLOYED: ...
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Thank you for installing strimzi-kafka-operator-0.40.0
To create a Kafka cluster refer to the following documentation.
https://strimzi.io/docs/operators/latest/deploying.html#deploying-cluster-operator-helm-chart-str
You now have Strimzi installed in your Kubernetes cluster. In the next section, you’ll use it to deploy Kafka to your cluster.
In this section, you’ll deploy a one-node Kafka cluster with ZooKeeper to your Kubernetes cluster. At the time of writing, support for deploying Kafka using KRaft was not generally available in Strimzi.
You’ll store the Kubernetes manifest for the deployment in a file named kafka.yaml
. Create and open it for editing:
nano kafka.yaml
Add the following lines to your file:
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
name: my-cluster
spec:
kafka:
version: 3.7.0
replicas: 1
listeners:
- name: plain
port: 9092
type: internal
tls: false
- name: tls
port: 9093
type: internal
tls: true
config:
offsets.topic.replication.factor: 1
transaction.state.log.replication.factor: 1
transaction.state.log.min.isr: 1
default.replication.factor: 1
min.insync.replicas: 1
inter.broker.protocol.version: "3.7"
storage:
type: jbod
volumes:
- id: 0
type: persistent-claim
size: 100Gi
deleteClaim: false
zookeeper:
replicas: 1
storage:
type: persistent-claim
size: 100Gi
deleteClaim: false
entityOperator:
topicOperator: {}
userOperator: {}
The first block of the spec
is related to Kafka itself. You set the version, as well as the number of replicas. Then, you define two listeners, which are ports that the Kafka deployment will use to communicate. The second listener is encrypted because you set tls
to true
. Since listeners can’t collide, you assign 9093
as the port number for the second one.
Since you’re deploying only one Kafka node in the config
section, you set various replication factors (for the topics, events, and replicas) to 1
. For storage
, you set the type
to jbod
(meaning “just a bunch of disks”) which allows you to specify multiple volumes. Here, you define one volume of type persistent-claim
with a size of 100GB. This will create a DigitalOcean Volume and assign it to Kafka. You also set deleteClaim
to false
to ensure that data isn’t deleted when the Kafka cluster is destroyed.
To configure the zookeeper
deployment, you set its number of replicas to 1
and provide it with a single persistent-claim
of 100GB, as only Kafka supports the jbod
storage type. The two definitions under entityOperator
instruct Strimzi to create cluster-wide operators for handling Kafka topics and users.
Save and close the file, then apply it by running:
kubectl apply -f kafka.yaml
kubectl
will display the following output:
Outputkafka.kafka.strimzi.io/my-cluster created
You can watch the deployment become available by running:
kubectl get strimzipodset -w
After a few minutes, both Kafka and Zookeeper pods will become available and ready:
OutputNAME PODS READY PODS CURRENT PODS AGE
...
my-cluster-kafka 1 1 1 28s
my-cluster-zookeeper 1 1 1 61s
To list Kafka deployments, run the following command:
kubectl get kafka
You’ll see output similar to this:
OutputNAME DESIRED KAFKA REPLICAS DESIRED ZK REPLICAS READY METADATA STATE WARNINGS
my-cluster 1 1
Now that Kafka is running, you’ll create a topic in it. Open a file called kafka-topic.yaml
for editing:
nano kafka-topic.yaml
Add the following lines:
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
name: my-topic
labels:
strimzi.io/cluster: "my-cluster"
spec:
partitions: 1
replicas: 1
This KafkaTopic
defines a topic called my-topic
in the cluster you’ve just deployed (my-cluster
).
Save and close the file, then apply it by running:
kubectl apply -f kafka-topic.yaml
The output will be:
Outputkafkatopic.kafka.strimzi.io/my-topic created
Then, list all Kafka topics in the cluster:
kubectl get kafkatopic
kubectl
will show the following output:
OutputNAME CLUSTER PARTITIONS REPLICATION FACTOR READY
my-topic my-cluster 1 1 True
In this step, you’ve deployed Kafka to your Kubernetes cluster using Strimzi, which takes care of the actual resources and ZooKeeper instances. You’ve also created a topic, which you’ll use in the next step when connecting to Kafka.
In this section, you’ll learn how to connect to a Kafka cluster deployed on Kubernetes from within the cluster.
Thanks to Strimzi, your Kafka deployment is already available to pods in the cluster. Any app from within can connect to the my-cluster-kafka-bootstrap
endpoint, which will automatically be resolved to the my-cluster
cluster.
You’ll now deploy a temporary pod to Kubernetes based on a Docker image that Strimzi provides. The image contains a Kafka installation with shell scripts for producing and consuming textual messages (kafka-console-producer.sh
and kafka-console-consumer.sh
).
Run the following command to run the producer script in-cluster:
kubectl run kafka-producer -ti \
--image=quay.io/strimzi/kafka:0.40.0-kafka-3.7.0 --rm=true --restart=Never \
-- bin/kafka-console-producer.sh --bootstrap-server my-cluster-kafka-bootstrap:9092 --topic my-topic
The temporary pod will be called kafka-producer
and will use the image from the Strimzi project. It will be deleted after the commands end executing (--rm=true
) and will never be restarted, as it’s a one-time job. Then, you pass in the command to run kafka-console-producer.sh
script. As noted previously, you pass in the my-cluster-kafka-bootstrap
designator for the server and my-topic
as the topic name.
The output will look like this:
OutputIf you don't see a command prompt, try pressing enter.
>
You can input any text message and press Enter
to send it to the topic:
OutputIf you don't see a command prompt, try pressing enter.
>Hello World!
>
To exit, press CTRL+C
and confirm with Enter
. Then, run the following command to run the consumer script in-cluster:
kubectl run kafka-consumer -ti \
--image=quay.io/strimzi/kafka:0.40.0-kafka-3.7.0 --rm=true --restart=Never \
-- bin/kafka-console-consumer.sh --bootstrap-server my-cluster-kafka-bootstrap:9092 --topic my-topic --from-beginning
You may need to press Enter
for the command to proceed. The output will be similar to this:
OutputHello World!
...
You’ve learned how to connect to your Kafka deployment from within the cluster. You’ll now expose Kafka to the outside world.
In this step, you’ll expose your Kafka deployment externally using a load balancer.
Strimzi has a built-in way of creating and configuring a load balancer for Kafka. Open kafka.yaml
for editing by running:
nano kafka.yaml
Add the following lines to the listeners
section:
...
listeners:
- name: plain
port: 9092
type: internal
tls: false
- name: tls
port: 9093
type: internal
tls: true
- name: external
port: 9094
type: loadbalancer
tls: false
...
The highlighted part defines a new listener with the type loadbalancer
that will accept connections at port 9094
without TLS encryption.
Save and close the file, then apply the new manifest by running:
kubectl apply -f kafka.yaml
The output will be:
Outputkafka.kafka.strimzi.io/my-cluster configured
Run the following command to watch it become available:
kubectl get service my-cluster-kafka-external-bootstrap -w -o=jsonpath='{.status.loadBalancer.ingress[0].ip}{"\n"}'
When the load balancer that fronts traffic for Kafka becomes available, the output will be its IP address.
As part of the prerequisites, you downloaded and extracted the latest Kafka release to your machine. Navigate to that directory and run the console consumer, replacing your_lb_ip
with the IP address from the output of the previous command:
bin/kafka-console-consumer.sh --bootstrap-server your_lb_ip:9094 --topic my-topic --from-beginning
You’ll soon see the messages being read from the topic, meaning that you’ve been successfully connected:
OutputHello World!
...
To delete all Strimzi-related resources from your cluster (such as Kafka deployments and topics), run the following command:
kubectl delete $(kubectl get strimzi -o name)
In this article, you’ve deployed Kafka using Docker Compose and verified that you can connect to it. You’ve also learned how to install Strimzi to your DigitalOcean Kubernetes cluster and deployed a Kafka cluster using the provided manifests.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
Currently, I really struggle to fix the issue with the installation of the bitnami Kafka on the kubernetes. I don’t know what and why you did it, but now, for some strange reason, using the bitnami Kafka helm chart, I cannot make it work no matter what configuration I’m using. For some reason, port 9092 is not accessible… or just blocked… I’ve been running Kafka for 2 years on your platform, but now I’ll need to give up here and go to another vendor…