Tutorial

How To Install Apache Kafka on Ubuntu 20.04

Updated on February 2, 2023

bsder and Matt Abrams

Not using Ubuntu 20.04?Choose a different version or distribution.

Ubuntu 20.04

The author selected the Free and Open Source Fund to receive a donation as part of the Write for DOnations program.

Introduction

Apache Kafka is a popular distributed message broker designed to handle large volumes of real-time data. A Kafka cluster is highly scalable and fault-tolerant. It also has a much higher throughput compared to other message brokers like ActiveMQ and RabbitMQ. Though it is generally used as a publish/subscribe messaging system, many organizations also use it for log aggregation because it offers persistent storage for published messages.

A publish/subscribe messaging system allows one or more producers to publish messages without considering the number of consumers or how they will process the messages. Subscribed clients are notified automatically about updates and the creation of new messages. This system is more efficient and scalable than systems where clients poll periodically to determine if new messages are available.

In this tutorial, you will install and configure Apache Kafka 2.8.2 on Ubuntu 20.04.

Prerequisites

To follow along, you will need:

An Ubuntu 20.04 server with at least 4 GB of RAM and a non-root user with sudo privileges. You can set this up by following our Initial Server Setup guide if you do not have a non-root user set up. Installations with less than 4GB of RAM may cause the Kafka service to fail.
OpenJDK 11 installed on your server. To install this version, follow our tutorial on How To Install Java with APT on Ubuntu 20.04. Kafka is written in Java, so it requires a JVM.

Step 1 — Creating a User for Kafka

Because Kafka can handle requests over a network, your first step is to create a dedicated user for the service. This minimizes damage to your Ubuntu machine in the event that someone compromises the Kafka server. You will create a dedicated kafka user in this step.

sudo adduser kafka

Follow the prompts to set a password and create the kafka user.

Next, add the kafka user to the sudo group with the adduser command. You need these privileges to install Kafka’s dependencies:

sudo adduser kafka sudo

Your kafka user is now ready. Log in to the kafka account using su:

su -l kafka

Now that you’ve created a Kafka-specific user, you are ready to download and extract the Kafka binaries.

Step 2 — Downloading and Extracting the Kafka Binaries

In this step, you’ll download and extract the Kafka binaries into dedicated folders in your kafka user’s home directory.

To start, create a directory in /home/kafka called Downloads to store your downloads:

mkdir ~/Downloads

Use curl to download the Kafka binaries:

curl "https://downloads.apache.org/kafka/2.8.2/kafka_2.13-2.8.2.tgz" -o ~/Downloads/kafka.tgz

Create a directory called kafka and move to this directory. You’ll use this directory as the base directory of the Kafka installation:

mkdir ~/kafka && cd ~/kafka

Extract the archive you downloaded using the tar command:

tar -xvzf ~/Downloads/kafka.tgz --strip 1

You specify the --strip 1 flag to ensure that the archive’s contents are extracted in ~/kafka/ itself and not in another directory (such as ~/kafka/kafka_2.13-2.8.2/) inside of it.

Now that you’ve downloaded and extracted the binaries successfully, you can start configuring your Kafka server.

Step 3 — Configuring the Kafka Server

A Kafka topic is the category, group, or feed name to which messages can be published. However, Kafka’s default behavior will not allow you to delete a topic. To modify this, you must edit the configuration file, which you will do in this step.

Kafka’s configuration options are specified in server.properties. Open this file with nano or your favorite editor:

nano ~/kafka/config/server.properties

First, add a setting that will allow you to delete Kafka topics. Add the following line to the bottom of the file:

~/kafka/config/server.properties

delete.topic.enable = true

Second, you’ll change the directory where the Kafka logs are stored by modifying the log.dirs property. Find the log.dirs property and replace the existing route with the highlighted route:

~/kafka/config/server.properties

log.dirs=/home/kafka/logs

Save and close the file.

Now that you’ve configured Kafka, you can create systemd unit files for running and enabling the Kafka server on startup.

Step 4 — Creating `systemd` Unit Files and Starting the Kafka Server

In this section, you will create systemd unit files for the Kafka service. These files will help you perform common service actions such as starting, stopping, and restarting Kafka in a manner consistent with other Linux services.

Kafka uses Zookeeper to manage its cluster state and configurations. It is used in many distributed systems, and you can read more about the tool in the official Zookeeper docs. You’ll use Zookeper as a service with these unit files.

Create the unit file for zookeeper:

sudo nano /etc/systemd/system/zookeeper.service

Enter the following unit definition into the file:

/etc/systemd/system/zookeeper.service

[Unit]
Requires=network.target remote-fs.target
After=network.target remote-fs.target

[Service]
Type=simple
User=kafka
ExecStart=/home/kafka/kafka/bin/zookeeper-server-start.sh /home/kafka/kafka/config/zookeeper.properties
ExecStop=/home/kafka/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

The [Unit] section specifies that Zookeeper requires networking and the filesystem to be ready before it can start.

The [Service] section specifies that systemd should use the zookeeper-server-start.sh and zookeeper-server-stop.sh shell files for starting and stopping the service. It also specifies that Zookeeper should be restarted if it exits abnormally.

After adding this content, save and close the file.

Next, create the systemd service file for kafka:

sudo nano /etc/systemd/system/kafka.service

Enter the following unit definition into the file:

/etc/systemd/system/kafka.service

[Unit]
Requires=zookeeper.service
After=zookeeper.service

[Service]
Type=simple
User=kafka
ExecStart=/bin/sh -c '/home/kafka/kafka/bin/kafka-server-start.sh /home/kafka/kafka/config/server.properties > /home/kafka/kafka/kafka.log 2>&1'
ExecStop=/home/kafka/kafka/bin/kafka-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

The [Unit] section specifies that this unit file depends on zookeeper.service, which will ensure that zookeeper gets started automatically when the kafka service starts.

The [Service] section specifies that systemd should use the kafka-server-start.sh and kafka-server-stop.sh shell files for starting and stopping the service. It also specifies that Kafka should be restarted if it exits abnormally.

Save and close the file.

Now that you have defined the units, start Kafka with the following command:

sudo systemctl start kafka

To ensure that the server has started successfully, check the journal logs for the kafka unit:

sudo systemctl status kafka

You will receive output like this:

Output● kafka.service
     Loaded: loaded (/etc/systemd/system/kafka.service; disabled; vendor preset>
     Active: active (running) since Wed 2023-02-01 23:44:12 UTC; 4s ago
   Main PID: 17770 (sh)
      Tasks: 69 (limit: 4677)
     Memory: 321.9M
     CGroup: /system.slice/kafka.service
             ├─17770 /bin/sh -c /home/kafka/kafka/bin/kafka-server-start.sh /ho>
             └─17793 java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMill>

You now have a Kafka server listening on port 9092, which is the default port the Kafka server uses.

You have started the kafka service. But if you reboot your server, Kafka will not restart automatically. To enable the kafka service on server boot, run the following command:

sudo systemctl enable zookeeper

You’ll receive a response that a symlink was created:

OutputCreated symlink /etc/systemd/system/multi-user.target.wants/zookeeper.service → /etc/systemd/system/zookeeper.service.

Then run this command:

sudo systemctl enable kafka

You’ll receive a response that a symlink was created:

OutputCreated symlink /etc/systemd/system/multi-user.target.wants/kafka.service → /etc/systemd/system/kafka.service.

In this step, you started and enabled the kafka and zookeeper services. In the next step, you will check the Kafka installation.

Step 5 — Testing the Kafka Installation

In this step, you will test your Kafka installation. You will publish and consume a Hello World message to make sure the Kafka server is behaving as expected.

Publishing messages in Kafka requires:

A producer, who enables the publication of records and data to topics.
A consumer, who reads messages and data from topics.

To begin, create a topic named TutorialTopic:

~/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic TutorialTopic

You can create a producer from the command line using the kafka-console-producer.sh script. It expects the Kafka server’s hostname, a port, and a topic as arguments.

You’ll receive a response that the topic was created:

OutputCreated topic TutorialTopic.

Now publish the string "Hello, World" to the TutorialTopic topic:

echo "Hello, World" | ~/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic TutorialTopic > /dev/null

Next, create a Kafka consumer using the kafka-console-consumer.sh script. It expects the ZooKeeper server’s hostname and port, along with a topic name, as arguments. The following command consumes messages from TutorialTopic. Note the use of the --from-beginning flag, which allows the consumption of messages that were published before the consumer was started:

~/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic TutorialTopic --from-beginning

If there are no configuration issues, you will receive a Hello, World response in your terminal:

OutputHello, World

The script will continue to run, waiting for more messages to publish. To test this, open a new terminal window and log in to your server. Remember to log in as your kafka user:

su -l kafka

In this new terminal, start a producer to publish a second message:

echo "Hello World from Sammy at DigitalOcean!" | ~/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic TutorialTopic > /dev/null

This message will load in the consumer’s output in your original terminal:

OutputHello, World
Hello World from Sammy at DigitalOcean!

When you are done testing, press CTRL+C to stop the consumer script in your original terminal.

You have now installed and configured a Kafka server on Ubuntu 20.04. In the next step, you will perform a few quick tasks to harden the security of your Kafka server.

Step 6 — Hardening the Kafka Server

With your installation complete, you can remove the kafka user’s admin privileges and harden the Kafka server.

Before you do so, log out and log back in as any other non-root sudo user. If you are still running the same shell session that you started this tutorial with, type exit.

Remove the kafka user from the sudo group:

sudo deluser kafka sudo

To further improve your Kafka server’s security, lock the kafka user’s password using the passwd command. This action ensures that nobody can directly log into the server using this account:

sudo passwd kafka -l

The -l flag locks the command to change a user’s password (passwd).

At this point, only root or a sudo user can log in as kafka with the following command:

sudo su - kafka

In the future, if you want to unlock the ability to change the password, use passwd with the -u option:

sudo passwd kafka -u

You have now successfully restricted the kafka user’s admin privileges. You are ready to begin using Kafka. You can optionally follow the next step, which will add KafkaT to your system.

Step 7 — Installing KafkaT (Optional)

KafkaT was developed to improve your ability to view details about your Kafka cluster and to perform certain administrative tasks from the command line. Because it is a Ruby gem, you will need Ruby to use it. You will also need the build-essential package to build the other gems that KafkaT depends on.

Install Ruby and the build-essential package using apt:

sudo apt install ruby ruby-dev build-essential

You can now install KafkaT with the gem command:

sudo CFLAGS=-Wno-error=format-overflow gem install kafkat

The Wno-error=format-overflow compilation flag is required to suppress Zookeeper’s warnings and errors during kafkat’s installation process.

When the installation has finished, you’ll receive a response that it is done:

Output...
Done installing documentation for json, colored, retryable, highline, trollop, zookeeper, zk, kafkat after 3 seconds
8 gems installed

KafkaT uses .kafkatcfg as the configuration file to determine the installation and log directories of your Kafka server. It should also have an entry pointing KafkaT to your ZooKeeper instance.

Create a new file called .kafkatcfg:

nano ~/.kafkatcfg

Add the following lines to specify the required information about your Kafka server and Zookeeper instance:

~/.kafkatcfg

{
  "kafka_path": "~/kafka",
  "log_path": "/home/kafka/logs",
  "zk_path": "localhost:2181"
}

Save and close the file. You are now ready to use KafkaT.

To view details about all Kafka partitions, try running this command:

kafkat partitions

You will receive the following output:

Output[DEPRECATION] The trollop gem has been renamed to optimist and will no longer be supported. Please switch to optimist as soon as possible.
/var/lib/gems/2.7.0/gems/json-1.8.6/lib/json/common.rb:155: warning: Using the last argument as keyword parameters is deprecated
...
Topic                 Partition   Leader      Replicas        ISRs
TutorialTopic         0             0         [0]             [0]
__consumer_offsets	  0		          0		      [0]							[0]
...
...

The output will include TutorialTopic and __consumer_offsets, an internal topic used by Kafka for storing client-related information. You can safely ignore lines starting with __consumer_offsets.

To learn more about KafkaT, refer to its GitHub repository.

Conclusion

You now have Apache Kafka running securely on your Ubuntu server. You can integrate Kafka into your favorite programming language using Kafka clients.

To learn more about Kafka, you can also consult its documentation.

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about our products

About the author(s)

bsder

See author profile

Category:

Tutorial

Tags:

Still looking for an answer?

Ask a question Search for more help

Was this helpful?

10 Comments

This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

ANelson82 • June 11, 2021

Thanks for tutorial. I forgot to install Java like a dummy.

MASuwandi • March 1, 2022

command: tar -xvzf ~/Downloads/kafka.tgz --strip 1

return: gzip: stdin: not in gzip format tar: Child returned status 1 tar: Error is not recoverable: exiting now

Why this is happening?

Samuel Kyama • March 30, 2022

use the following command to download the correct file curl “https://dlcdn.apache.org/kafka/3.1.0/kafka_2.13-3.1.0.tgz” -o ~/Downloads/kafka.tgz

murraymacdonald • May 4, 2022

I am getting the same error as MASuwandi

$tar -xvzf ~/Downloads/kafka.tgz --strip 1

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now

Anyone know a solution?

kaiohken1982 • June 3, 2022

For everyone complaining for gzip error, I think something changed in the kafka website about source download, currently the url is https://dlcdn.apache.org/kafka/3.2.0/kafka-3.2.0-src.tgz ( taken from official download page )

This will solve the gzip error.

Also, in 3.2.0, you cannot start the service if you do not build it with “./gradlew jar -PscalaVersion=2.13.6”

njmsaikat • July 23, 2022

Kafka version is updated so that url is trying to download a previous version from URL “https://downloads.apache.org/kafka/2.6.3/kafka_2.13-2.6.3.tgz”. that’s why the curl command doesn’t work anymore.

You can see a list of kafka verions available now from here - https://downloads.apache.org/kafka/

and change the expected version url of yours.

In short you can use this curl command as an alternative of abode istruction to download the latest version for now -

curl "https://downloads.apache.org/kafka/3.2.0/kafka-3.2.0-src.tgz" -o ~/Downloads/kafka.tgz

curl "https://dlcdn.apache.org/kafka/3.2.0/kafka-3.2.0-src.tgz" -o ~/Downloads/kafka.tgz

Richard Vogt • July 25, 2022

Forgot to read the fine-print: “at least 4 gigs of memory is needed or the server won’t start…”

You don’t get a helpful error message, like “not enough memory to start kafka server.” It just fails with status=1.

Azaretdodo • August 10, 2022

hello i don’t understand why but i can’t create log for the program :

‘’'sudo systemctl status kafka × kafka.service Loaded: loaded (/etc/systemd/system/kafka.service; enabled; vendor preset: enabled) Active: failed (Result: exit-code) since Wed 2022-08-10 10:29:05 CEST; 2min 21s ago Process: 7519 ExecStart=/bin/sh -c /home/dorianrosse/programs/kafka_2.13-3.2.1/bin/kafka-server-start.sh /home/dorianrosse/programs/kafka_2.13-3.2.1/config/server.properties > /home/dorianrosse/programs/kafka_2.13-3.2.1/kafka.log 2>&1 (code=exited, status=2) Main PID: 7519 (code=exited, status=2) CPU: 3ms

août 10 10:29:05 Ubuntu-ThinkPad-X250 systemd[1]: Started kafka.service. août 10 10:29:05 Ubuntu-ThinkPad-X250 sh[7519]: /bin/sh: 1: cannot create /home/dorianrosse/programs/kafka_2.13-3.2.1/kafka.log: Permission denied août 10 10:29:05 Ubuntu-ThinkPad-X250 systemd[1]: kafka.service: Main process exited, code=exited, status=2/INVALIDARGUMENT août 10 10:29:05 Ubuntu-ThinkPad-X250 systemd[1]: kafka.service: Failed with result ‘exit-code’.

thanks you in advance to help myself fully repair apache kafka,

regards.

Azaretdodo.

habuitrago • August 25, 2022

Most of the steps are still valid. Some updates need to be considered.

For example:

Current version of kafka is 3.2.1
WSL considerations as systemctl is not supported to start and stop kakfa.
Topic creation did not work as I think --zookeeper is not a valid command in latest versions.

For the rest, this quite generic and useful. Thanks.

atarikaltunn • November 22, 2022

gzip error is occuring because the file is not really a zipped file. I think the problem is the source that expired or deprecated. I have passed this issue by taking the link from https://kafka.apache.org/downloads.