The author selected the Wikimedia Foundation to receive a donation as part of the Write for DOnations program.
etcd is a distributed key-value store relied on by many platforms and tools, including Kubernetes, Vulcand, and Doorman. Within Kubernetes, etcd is used as a global configuration store that stores the state of the cluster. Knowing how to administer etcd is essential to administering a Kubernetes cluster. Whilst there are many managed Kubernetes offerings, also known as Kubernetes-as-a-Service, that remove this administrative burden away from you, many companies still choose to run self-managed Kubernetes clusters on-premises because of the flexibility it brings.
The first half of this article will guide you through setting up a 3-node etcd cluster on Ubuntu 18.04 servers. The second half will focus on securing the cluster using Transport Layer Security, or TLS. To run each setup in an automated manner, we will use Ansible throughout. Ansible is a configuration management tool similar to Puppet, Chef, and SaltStack; it allows us to define each setup step in a declarative manner, inside files called playbooks.
At the end of this tutorial, you will have a secure 3-node etcd cluster running on your servers. You will also have an Ansible playbook that allows you to repeatedly and consistently recreate the same setup on a fresh set of servers.
Before you begin this guide you’ll need the following:
Python, pip
, and the pyOpenSSL
package installed on your local machine. To learn how to install Python3, pip, and Python packages, refer to How To Install Python 3 and Set Up a Local Programming Environment on Ubuntu 18.04.
Three Ubuntu 18.04 servers on the same local network, with at least 2GB of RAM and root SSH access. You should also configure the servers to have the hostnames etcd1, etcd2, and etcd3. The steps outlined in this article would work on any generic server, not necessarily DigitalOcean Droplets. However, if you’d like to host your servers on DigitalOcean, you can follow the How to Create a Droplet from the DigitalOcean Control Panel guide to fulfil this requirement. Note that you must enable the Private Networking option when creating your Droplet. To enable private networking on existing Droplets, refer to How to Enable Private Networking on Droplets.
Warning: Since the purpose of this article is to provide an introduction to setting up an etcd cluster on a private network, the three Ubuntu 18.04 servers in this setup were not tested with a firewall and are accessed as the root user. In a production setup, any node exposed to the public internet would require a firewall and a sudo user to adhere to security best practices. For more information, check out the Initial Server Setup with Ubuntu 18.04 tutorial.
An SSH key pair allowing your local machine access to the etcd1, etcd2, and etcd3 servers. If you do not know what SSH is, or do not have an SSH key pair, you can learn about it by reading SSH Essentials: Working with SSH Servers, Clients, and Keys.
Ansible installed on your local machine. For example, if you’re running Ubuntu 18.04, you can install Ansible by following Step 1 of the How to Install and Configure Ansible on Ubuntu 18.04 article. This will make the ansible
and ansible-playbook
commands available on your machine. You may also want to keep this How to Use Ansible: A Reference Guide handy. The commands in this tutorial should work with Ansible v2.x; we have tested it on Ansible v2.9.7 running Python v3.8.2.
Ansible is a tool used to manage servers. The servers Ansible is managing are called the managed nodes, and the machine that is running Ansible is called the control node. Ansible works by using the SSH keys on the control node to gain access to the managed nodes. Once an SSH session is established, Ansible will run a set of scripts to provision and configure the managed nodes. In this step, we will test that we are able to use Ansible to connect to the managed nodes and run the hostname
command.
A typical day for a system administrator may involve managing different sets of nodes. For instance, you may use Ansible to provision some new servers, but later on use it to reconfigure another set of servers. To allow administrators to better organize the set of managed nodes, Ansible provides the concept of host inventory (or inventory for short). You can define every node that you wish to manage with Ansible inside an inventory file, and organize them into groups. Then, when running the ansible
and ansible-playbook
commands, you can specify which hosts or groups the command applies to.
By default, Ansible reads the inventory file from /etc/ansible/hosts
; however, we can specify a different inventory file by using the --inventory
flag (or -i
for short).
To get started, create a new directory on your local machine (the control node) to house all the files for this tutorial:
- mkdir -p $HOME/playground/etcd-ansible
Then, enter into the directory you just created:
- cd $HOME/playground/etcd-ansible
Inside the directory, create and open a blank inventory file named hosts
using your editor:
- nano $HOME/playground/etcd-ansible/hosts
Inside the hosts
file, list out each of your managed nodes in the following format, replacing the public IP addresses highlighted with the actual public IP addresses of your servers:
[etcd]
etcd1 ansible_host=etcd1_public_ip ansible_user=root
etcd2 ansible_host=etcd2_public_ip ansible_user=root
etcd3 ansible_host=etcd3_public_ip ansible_user=root
The [etcd]
line defines a group called etcd
. Under the group definition, we list all our managed nodes. Each line begins with an alias (e.g., etcd1
), which allows us to refer to each host using an easy-to-remember name instead of a long IP address. The ansible_host
and ansible_user
are Ansible variables. In this case, they are used to provide Ansible with the public IP addresses and SSH usernames to use when connecting via SSH.
To ensure Ansible is able to connect with our managed nodes, we can test for connectivity by using Ansible to run the hostname
command on each of the hosts within the etcd
group:
- ansible etcd -i hosts -m command -a hostname
Let us break down this command to learn what each part means:
etcd
: specifies the host pattern to use to determine which hosts from the inventory are being managed with this command. Here, we are using the group name as the host pattern.-i hosts
: specifies the inventory file to use.-m command
: the functionality behind Ansible is provided by modules. The command
module takes the argument passed in and executes it as a command on each of the managed nodes. This tutorial will introduce a few more Ansible modules as we progress.-a hostname
: the argument to pass into the module. The number and types of arguments depend on the module.After running the command, you will find the following output, which means Ansible is configured correctly:
Outputetcd2 | CHANGED | rc=0 >>
etcd2
etcd3 | CHANGED | rc=0 >>
etcd3
etcd1 | CHANGED | rc=0 >>
etcd1
Each command that Ansible runs is called a task. Using ansible
on the command line to run tasks is called running ad-hoc commands. The upside of ad-hoc commands is that they are quick and require little setup; the downside is that they run manually, and thus cannot be committed to a version control system like Git.
A slight improvement would be to write a shell script and run our commands using Ansible’s script
module. This would allow us to record the configuration steps we took into version control. However, shell scripts are imperative, which means we are responsible for figuring out the commands to run (the "how"s) to configure the system to the desired state. Ansible, on the other hand, advocates for a declarative approach, where we define “what” the desired state of our server should be inside configuration files, and Ansible is responsible for getting the server to that desired state.
The declarative approach is preferred because the intent of the configuration file is immediately conveyed, meaning it’s easier to understand and maintain. It also places the onus of handling edge cases on Ansible instead of the administrator, saving us a lot of work.
Now that you have configured the Ansible control node to communicate with the managed nodes, in the next step, we will introduce you to Ansible playbooks, which allow you to specify tasks in a declarative way.
In this step, we will replicate what was done in Step 1—printing out the hostnames of the managed nodes—but instead of running ad-hoc tasks, we will define each task declaratively as an Ansible playbook and run it. The purpose of this step is to demonstrate how Ansible playbooks work; we will carry out much more substantial tasks with playbooks in later steps.
Inside your project directory, create a new file named playbook.yaml
using your editor:
- nano $HOME/playground/etcd-ansible/playbook.yaml
Inside playbook.yaml
, add the following lines:
- hosts: etcd
tasks:
- name: "Retrieve hostname"
command: hostname
register: output
- name: "Print hostname"
debug: var=output.stdout_lines
Close and save the playbook.yaml
file by pressing CTRL+X
followed by Y
.
The playbook contains a list of plays; each play contains a list of tasks that should be run on all hosts matching the host pattern specified by the hosts
key. In this playbook, we have one play that contains two tasks. The first task runs the hostname
command using the command
module and registers the output to a variable named output
. In the second task, we use the debug
module to print out the stdout_lines
property of the output
variable.
We can now run this playbook using the ansible-playbook
command:
- ansible-playbook -i hosts playbook.yaml
You will find the following output, which means your playbook is working correctly:
OutputPLAY [etcd] ***********************************************************************************************************************
TASK [Gathering Facts] ************************************************************************************************************
ok: [etcd2]
ok: [etcd3]
ok: [etcd1]
TASK [Retrieve hostname] **********************************************************************************************************
changed: [etcd2]
changed: [etcd3]
changed: [etcd1]
TASK [Print hostname] *************************************************************************************************************
ok: [etcd1] => {
"output.stdout_lines": [
"etcd1"
]
}
ok: [etcd2] => {
"output.stdout_lines": [
"etcd2"
]
}
ok: [etcd3] => {
"output.stdout_lines": [
"etcd3"
]
}
PLAY RECAP ************************************************************************************************************************
etcd1 : ok=3 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
etcd2 : ok=3 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
etcd3 : ok=3 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
Note: ansible-playbook
sometimes uses cowsay
as a playful way to print the headings. If you find a lot of ASCII-art cows printed on your terminal, now you know why. To disable this feature, set the ANSIBLE_NOCOWS
environment variable to 1
prior to running ansible-playbook
by running export ANSIBLE_NOCOWS=1
in your shell.
In this step, we’ve moved from running imperative ad-hoc tasks to running declarative playbooks. In the next step, we will replace these two demo tasks with tasks that will set up our etcd cluster.
In this step, we will show you the commands to install etcd
manually and demonstrate how to translate these same commands into tasks inside our Ansible playbook.
etcd
and its client etcdctl
are available as binaries, which we’ll download, extract, and move to a directory that’s part of the PATH
environment variable. When configured manually, these are the steps we would take on each of the managed nodes:
- mkdir -p /opt/etcd/bin
- cd /opt/etcd/bin
- wget -qO- https://storage.googleapis.com/etcd/v3.3.13/etcd-v3.3.13-linux-amd64.tar.gz | tar --extract --gzip --strip-components=1
- echo 'export PATH="$PATH:/opt/etcd/bin"' >> ~/.profile
- echo 'export ETCDCTL_API=3" >> ~/.profile
The first four commands download and extract the binaries to the /opt/etcd/bin/
directory. By default, the etcdctl
client will use API v2 to communicate with the etcd
server. Since we are running etcd v3.x, the last command sets the ETCDCTL_API
environment variable to 3
.
Note: Here, we are using etcd v3.3.13 built for a machine with processors that use the AMD64 instruction set. You can find binaries for other systems and other versions on the the official GitHub Releases page.
To replicate the same steps in a standardized format, we can add tasks to our playbook. Open the playbook.yaml
playbook file in your editor:
- nano $HOME/playground/etcd-ansible/playbook.yaml
Replace the entirety of the playbook.yaml
file with the following contents:
- hosts: etcd
become: True
tasks:
- name: "Create directory for etcd binaries"
file:
path: /opt/etcd/bin
state: directory
owner: root
group: root
mode: 0700
- name: "Download the tarball into the /tmp directory"
get_url:
url: https://storage.googleapis.com/etcd/v3.3.13/etcd-v3.3.13-linux-amd64.tar.gz
dest: /tmp/etcd.tar.gz
owner: root
group: root
mode: 0600
force: True
- name: "Extract the contents of the tarball"
unarchive:
src: /tmp/etcd.tar.gz
dest: /opt/etcd/bin/
owner: root
group: root
mode: 0600
extra_opts:
- --strip-components=1
decrypt: True
remote_src: True
- name: "Set permissions for etcd"
file:
path: /opt/etcd/bin/etcd
state: file
owner: root
group: root
mode: 0700
- name: "Set permissions for etcdctl"
file:
path: /opt/etcd/bin/etcdctl
state: file
owner: root
group: root
mode: 0700
- name: "Add /opt/etcd/bin/ to the $PATH environment variable"
lineinfile:
path: /etc/profile
line: export PATH="$PATH:/opt/etcd/bin"
state: present
create: True
insertafter: EOF
- name: "Set the ETCDCTL_API environment variable to 3"
lineinfile:
path: /etc/profile
line: export ETCDCTL_API=3
state: present
create: True
insertafter: EOF
Each task uses a module; for this set of tasks, we are making use of the following modules:
file
: to create the /opt/etcd/bin
directory, and to later set the file permissions for the etcd
and etcdctl
binaries.get_url
: to download the gzipped tarball onto the managed nodes.unarchive
: to extract and unpack the etcd
and etcdctl
binaries from the gzipped tarball.lineinfile
: to add an entry into the .profile
file.To apply these changes, close and save the playbook.yaml
file by pressing CTRL+X
followed by Y
. Then, on the terminal, run the same ansible-playbook
command again:
- ansible-playbook -i hosts playbook.yaml
The PLAY RECAP
section of the output will show only ok
and changed
:
Output...
PLAY RECAP ************************************************************************************************************************
etcd1 : ok=8 changed=7 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
etcd2 : ok=8 changed=7 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
etcd3 : ok=8 changed=7 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
To confirm a correct installation of etcd, manually SSH into one of the managed nodes and run etcd
and etcdctl
:
- ssh root@etcd1_public_ip
etcd1_public_ip
is the public IP addresses of the server named etcd1. Once you have gained SSH access, run etcd --version
to print out the version of etcd installed:
- etcd --version
You will find output similar to what’s shown in the following, which means the etcd
binary is successfully installed:
Outputetcd Version: 3.3.13
Git SHA: 98d3084
Go Version: go1.10.8
Go OS/Arch: linux/amd64
To confirm etcdctl
is successfully installed, run etcdctl version
:
- etcdctl version
You will find output similar to the following:
Outputetcdctl version: 3.3.13
API version: 3.3
Note that the output says API version: 3.3
, which also confirms that our ETCDCTL_API
environment variable was set correctly.
Exit out of the etcd1 server to return to your local environment.
We have now successfully installed etcd
and etcdctl
on all of our managed nodes. In the next step, we will add more tasks to our play to run etcd as a background service.
The quickest way to run etcd with Ansible may appear to be to use the command
module to run /opt/etcd/bin/etcd
. However, this will not work because it will make etcd
run as a foreground process. Using the command
module will cause Ansible to hang as it waits for the etcd
command to return, which it never will. So in this step, we are going to update our playbook to run our etcd
binary as a background service instead.
Ubuntu 18.04 uses systemd as its init system, which means we can create new services by writing unit files and placing them inside the /etc/systemd/system/
directory.
First, inside our project directory, create a new directory named files/
:
- mkdir files
Then, using your editor, create a new file named etcd.service
within that directory:
- nano files/etcd.service
Next, copy the following code block into the files/etcd.service
file:
[Unit]
Description=etcd distributed reliable key-value store
[Service]
Type=notify
ExecStart=/opt/etcd/bin/etcd
Restart=always
This unit file defines a service that runs the executable at /opt/etcd/bin/etcd
, notifies systemd when it has finished initializing, and always restarts if it ever exits.
Note: If you’d like to understand more about systemd and unit files, or want to tailor the unit file to your needs, read the Understanding Systemd Units and Unit Files guide.
Close and save the files/etcd.service
file by pressing CTRL+X
followed by Y
.
Next, we need to add a task inside our playbook that will copy the files/etcd.service
local file into the /etc/systemd/system/etcd.service
directory for every managed node. We can do this using the copy
module.
Open up your playbook:
- nano $HOME/playground/etcd-ansible/playbook.yaml
Append the following highlighted task to the end of our existing tasks:
- hosts: etcd
become: True
tasks:
...
- name: "Set the ETCDCTL_API environment variable to 3"
lineinfile:
path: /etc/profile
line: export ETCDCTL_API=3
state: present
create: True
insertafter: EOF
- name: "Create a etcd service"
copy:
src: files/etcd.service
remote_src: False
dest: /etc/systemd/system/etcd.service
owner: root
group: root
mode: 0644
By copying the unit file into the /etc/systemd/system/etcd.service
, a service is now defined.
Save and exit the playbook.
Run the same ansible-playbook
command again to apply the new changes:
- ansible-playbook -i hosts playbook.yaml
To confirm the changes have been applied, first SSH into one of the managed nodes:
- ssh root@etcd1_public_ip
Then, run systemctl status etcd
to query systemd about the status of the etcd
service:
- systemctl status etcd
You will find the following output, which states that the service is loaded:
Output● etcd.service - etcd distributed reliable key-value store
Loaded: loaded (/etc/systemd/system/etcd.service; static; vendor preset: enabled)
Active: inactive (dead)
...
Note: The last line (Active: inactive (dead)
) of the output states that the service is inactive, which means it would not be automatically run when the system starts. This is expected and not an error.
Press q
to return to the shell, and then run exit
to exit out of the managed node and back to your local shell:
- exit
In this step, we updated our playbook to run the etcd
binary as a systemd service. In the next step, we will continue to set up etcd by providing it space to store its data.
etcd is a key-value data store, which means we must provide it with space to store its data. In this step, we are going to update our playbook to define a dedicated data directory for etcd to use.
Open up your playbook:
- nano $HOME/playground/etcd-ansible/playbook.yaml
Append the following task to the end of the list of tasks:
- hosts: etcd
become: True
tasks:
...
- name: "Create a etcd service"
copy:
src: files/etcd.service
remote_src: False
dest: /etc/systemd/system/etcd.service
owner: root
group: root
mode: 0644
- name: "Create a data directory"
file:
path: /var/lib/etcd/{{ inventory_hostname }}.etcd
state: directory
owner: root
group: root
mode: 0755
Here, we are using /var/lib/etcd/hostname.etcd
as the data directory, where hostname
is the hostname of the current managed node. inventory_hostname
is a variable that represents the hostname of the current managed node; its value is populated by Ansible automatically. The curly-braces syntax (i.e., {{ inventory_hostname }}
) is used for variable substitution, supported by the Jinja2 template engine, which is the default templating engine for Ansible.
Close the text editor and save the file.
Next, we need to instruct etcd to use this data directory. We do this by passing in the data-dir
parameter to etcd. To set etcd parameters, we can use a combination of environment variables, command-line flags, and configuration files. For this tutorial, we will use a configuration file, as it is much neater to isolate all configurations into a file, rather than have configuration littered across our playbook.
In your project directory, create a new directory named templates/
:
- mkdir templates
Then, using your editor, create a new file named etcd.conf.yaml.j2
within the directory:
- nano templates/etcd.conf.yaml.j2
Next, copy the following line and paste it into the file:
data-dir: /var/lib/etcd/{{ inventory_hostname }}.etcd
This file uses the same Jinja2 variable substitution syntax as our playbook. To substitute the variables and upload the result to each managed host, we can use the template
module. It works in a similar way to copy
, except it will perform variable substitution prior to upload.
Exit from etcd.conf.yaml.j2
, then open up your playbook:
- nano $HOME/playground/etcd-ansible/playbook.yaml
Append the following tasks to the list of tasks to create a directory and upload the templated configuration file into it:
- hosts: etcd
become: True
tasks:
...
- name: "Create a data directory"
file:
...
mode: 0755
- name: "Create directory for etcd configuration"
file:
path: /etc/etcd
state: directory
owner: root
group: root
mode: 0755
- name: "Create configuration file for etcd"
template:
src: templates/etcd.conf.yaml.j2
dest: /etc/etcd/etcd.conf.yaml
owner: root
group: root
mode: 0600
Save and close this file.
Because we’ve made this change, we need to update our service’s unit file to pass it the location of our configuration file (i.e., /etc/etcd/etcd.conf.yaml
).
Open the etcd service file on your local machine:
- nano files/etcd.service
Update the files/etcd.service
file by adding the --config-file
flag highlighted in the following:
[Unit]
Description=etcd distributed reliable key-value store
[Service]
Type=notify
ExecStart=/opt/etcd/bin/etcd --config-file /etc/etcd/etcd.conf.yaml
Restart=always
Save and close this file.
In this step, we used our playbook to provide a data directory for etcd to store its data. In the next step, we will add a couple more tasks to restart the etcd
service and have it run on startup.
Whenever we make changes to the unit file of a service, we need to restart the service to have it take effect. We can do this by running the systemctl restart etcd
command. Furthermore, to make the etcd
service start automatically on system startup, we need to run systemctl enable etcd
. In this step, we will run those two commands using the playbook.
To run commands, we can use the command
module:
- nano $HOME/playground/etcd-ansible/playbook.yaml
Append the following tasks to the end of the task list:
- hosts: etcd
become: True
tasks:
...
- name: "Create configuration file for etcd"
template:
...
mode: 0600
- name: "Enable the etcd service"
command: systemctl enable etcd
- name: "Start the etcd service"
command: systemctl restart etcd
Save and close the file.
Run ansible-playbook -i hosts playbook.yaml
once more:
- ansible-playbook -i hosts playbook.yaml
To check that the etcd
service is now restarted and enabled, SSH into one of the managed nodes:
- ssh root@etcd1_public_ip
Then, run systemctl status etcd
to check the status of the etcd
service:
- systemctl status etcd
You will find enabled
and active (running)
as highlighted in the following; this means the changes we made in our playbook have taken effect:
Output● etcd.service - etcd distributed reliable key-value store
Loaded: loaded (/etc/systemd/system/etcd.service; static; vendor preset: enabled)
Active: active (running)
Main PID: 19085 (etcd)
Tasks: 11 (limit: 2362)
In this step, we used the command
module to run systemctl
commands that restart and enable the etcd
service on our managed nodes. Now that we have set up an etcd installation, we will, in the next step, test out its functionality by carry out some basic create, read, update, and delete (CRUD) operations.
Although we have a working etcd installation, it is insecure and not yet ready for production use. But before we secure our etcd setup in later steps, let’s first understand what etcd can do in terms of functionality. In this step, we are going to manually send requests to etcd to add, retrieve, update, and delete data from it.
By default, etcd exposes an API that listens on port 2379
for client communication. This means we can send raw API requests to etcd using an HTTP client. However, it’s quicker to use the official etcd client etcdctl
, which allows you to create/update, retrieve, and delete key-value pairs using the put
, get
, and del
subcommands, respectively.
Make sure you’re still inside the etcd1 managed node, and run the following etcdctl
commands to confirm your etcd installation is working.
First, create a new entry using the put
subcommand.
The put
subcommand has the following syntax:
etcdctl put key value
On etcd1, run the following command:
- etcdctl put foo "bar"
The command we just ran instructs etcd to write the value "bar"
to the key foo
in the store.
You will then find OK
printed in the output, which indicates the data persisted:
OutputOK
We can then retrieve this entry using the get
subcommand, which has the syntax etcdctl get key
:
- etcdctl get foo
You will find this output, which shows the key on the first line and the value you inserted earlier on the second line:
Outputfoo
bar
We can delete the entry using the del
subcommand, which has the syntax etcdctl del key
:
- etcdctl del foo
You will find the following output, which indicates the number of entries deleted:
Output1
Now, let’s run the get
subcommand once more in an attempt to retrieve a deleted key-value pair:
- etcdctl get foo
You will not receive an output, which means etcdctl
is unable to retrieve the key-value pair. This confirms that after the entry is deleted, it can no longer be retrieved.
Now that you’ve tested the basic operations of etcd and etcdctl
, let’s exit out of our managed node and back to your local environment:
- exit
In this step, we used the etcdctl
client to send requests to etcd. At this point, we are running three separate instances of etcd, each acting independently from each other. However, etcd is designed as a distributed key-value store, which means multiple etcd instances can group up to form a single cluster; each instance then becomes a member of the cluster. After forming a cluster, you would be able to retrieve a key-value pair that was inserted from a different member of the cluster. In the next step, we will use our playbook to transform our 3 single-node clusters into a single 3-node cluster.
To create one 3-node cluster instead of three 1-node clusters, we must configure these etcd installations to communicate with each other. This means each one must know the IP addresses of the others. This process is called discovery. Discovery can be done using either static configuration or dynamic service discovery. In this step, we will discuss the difference between the two, as well as update our playbook to set up an etcd cluster using static discovery.
Discovery by static configuration is the method that requires the least setup; this is where the endpoints of each member are passed into the etcd
command before it is executed. To use static configuration, the following conditions must be met prior to the initialization of the cluster:
If these conditions cannot be met, then you can use a dynamic discovery service. With dynamic service discovery, all instances would register with the discovery service, which allows each member to retrieve information about the location of other members.
Since we know we want a 3-node etcd cluster, and all our servers have static IP addresses, we will use static discovery. To initiate our cluster using static discovery, we must add several parameters to our configuration file. Use an editor to open up the templates/etcd.conf.yaml.j2
template file:
- nano templates/etcd.conf.yaml.j2
Then, add the following highlighted lines:
data-dir: /var/lib/etcd/{{ inventory_hostname }}.etcd
name: {{ inventory_hostname }}
initial-advertise-peer-urls: http://{{ hostvars[inventory_hostname]['ansible_facts']['eth1']['ipv4']['address'] }}:2380
listen-peer-urls: http://{{ hostvars[inventory_hostname]['ansible_facts']['eth1']['ipv4']['address'] }}:2380,http://127.0.0.1:2380
advertise-client-urls: http://{{ hostvars[inventory_hostname]['ansible_facts']['eth1']['ipv4']['address'] }}:2379
listen-client-urls: http://{{ hostvars[inventory_hostname]['ansible_facts']['eth1']['ipv4']['address'] }}:2379,http://127.0.0.1:2379
initial-cluster-state: new
initial-cluster: {% for host in groups['etcd'] %}{{ hostvars[host]['ansible_facts']['hostname'] }}=http://{{ hostvars[host]['ansible_facts']['eth1']['ipv4']['address'] }}:2380{% if not loop.last %},{% endif %}{% endfor %}
Close and save the templates/etcd.conf.yaml.j2
file by pressing CTRL+X
followed by Y
.
Here’s a brief explanation of each parameter:
name
- a human-readable name for the member. By default, etcd uses a unique, randomly-generated ID to identify each member; however, a human-readable name allows us to reference it more easily inside configuration files and on the command line. Here, we will use the hostnames as the member names (i.e., etcd1
, etcd2
, and etcd3
).initial-advertise-peer-urls
- a list of IP address/port combinations that other members can use to communicate with this member. In addition to the API port (2379
), etcd also exposes port 2380
for peer communication between etcd members, which allows them to send messages to each other and exchange data. Note that these URLs must be reachable by its peers (and not be a local IP address).listen-peer-urls
- a list of IP address/port combinations where the current member will listen for communication from other members. This must include all the URLs from the --initial-advertise-peer-urls
flag, but also local URLs like 127.0.0.1:2380
. The destination IP address/port of incoming peer messages must match one of the URLs listed here.advertise-client-urls
- a list of IP address/port combinations that clients should use to communicate with this member. These URLs must be reachable by the client (and not be a local address). If the client is accessing the cluster over public internet, this must be a public IP address.listen-client-urls
- a list of IP address/port combinations where the current member will listen for communication from clients. This must include all the URLs from the --advertise-client-urls
flag, but also local URLs like 127.0.0.1:2379
. The destination IP address/port of incoming client messages must match one of the URLs listed here.initial-cluster
- a list of endpoints for each member of the cluster. Each endpoint must match one of the corresponding member’s initial-advertise-peer-urls
URLs.initial-cluster-state
- either new
or existing
.To ensure consistency, etcd can only make decisions when a majority of the nodes are healthy. This is known as establishing quorum. In other words, in a three-member cluster, quorum is reached if two or more of the members are healthy.
If the initial-cluster-state
parameter is set to new
, etcd
will know that this is a new cluster being bootstrapped, and will allow members to start in parallel, without waiting for quorum to be reached. More concretely, after the first member is started, it will not have quorum because one third (33.33%) is less than or equal to 50%. Normally, etcd will halt and refuse to commit any more actions and the cluster will never be formed. However, with initial-cluster-state
set to new
, it will ignore the initial lack of quorum.
If set to existing
, the member will try to join an existing cluster, and expects quorum to already be established.
Note: You can find more details about all supported configuration flags in the Configuration section of etcd’s documentation.
In the updated templates/etcd.conf.yaml.j2
template file, there are a few instances of hostvars
. When Ansible runs, it will collect variables from a variety of sources. We have already made use of the inventory_hostname
variable before, but there are a lot more available. These variables are available under hostvars[inventory_hostname]['ansible_facts']
. Here, we are extracting the private IP addresses of each node and using it to construct our parameter value.
Note: Because we enabled the Private Networking option when we created our servers, each server would have three IP addresses associated with them:
127.0.0.1
178.128.169.51
10.131.82.225
Each of these IP addresses are associated with a different network interface—the loopback address is associated with the lo
interface, the public IP address is associated with the eth0
interface, and the private IP address with the eth1
interface. We are using the eth1
interface so that all traffic stays within the private network, without ever reaching the internet.
Understanding of network interfaces is not required for this article, but if you’d like to learn more, An Introduction to Networking Terminology, Interfaces, and Protocols is a great place to start.
The {% %}
Jinja2 syntax defines the for
loop structure that iterates through every node in the etcd
group to build up the initial-cluster
string into a format required by etcd.
To form the new three-member cluster, you must first stop the etcd
service and clear the data directory before launching the cluster. To do this, use an editor to open up the playbook.yaml
file on your local machine:
- nano $HOME/playground/etcd-ansible/playbook.yaml
Then, before the "Create a data directory"
task, add a task to stop the etcd
service:
- hosts: etcd
become: True
tasks:
...
group: root
mode: 0644
- name: "Stop the etcd service"
command: systemctl stop etcd
- name: "Create a data directory"
file:
...
Next, update the "Create a data directory"
task to first delete the data directory and recreate it:
- hosts: etcd
become: True
tasks:
...
- name: "Stop the etcd service"
command: systemctl stop etcd
- name: "Create a data directory"
file:
path: /var/lib/etcd/{{ inventory_hostname }}.etcd
state: "{{ item }}"
owner: root
group: root
mode: 0755
with_items:
- absent
- directory
- name: "Create directory for etcd configuration"
file:
...
The with_items
property defines a list of strings that this task will iterate over. It is equivalent to repeating the same task twice but with different values for the state
property. Here, we are iterating over the list with items absent
and directory
, which ensures that the data directory is deleted first and then re-created after.
Close and save the playbook.yaml
file by pressing CTRL+X
followed by Y
. Then, run ansible-playbook
again. Ansible will now create a single, 3-member etcd
cluster:
- ansible-playbook -i hosts playbook.yaml
You can check this by SSH-ing into any etcd member node:
- ssh root@etcd1_public_ip
Then run etcdctl endpoint health --cluster
:
- etcdctl endpoint health --cluster
This will list out the health of each member of the cluster:
Outputhttp://etcd2_private_ip:2379 is healthy: successfully committed proposal: took = 2.517267ms
http://etcd1_private_ip:2379 is healthy: successfully committed proposal: took = 2.153612ms
http://etcd3_private_ip:2379 is healthy: successfully committed proposal: took = 2.639277ms
We have now successfully created a 3-node etcd cluster. We can confirm this by adding an entry to etcd on one member node, and retrieving it on another member node. On one of the member nodes, run etcdctl put
:
- etcdctl put foo "bar"
Then, use a new terminal to SSH into a different member node:
- ssh root@etcd2_public_ip
Next, attempt to retrieve the same entry using the key:
- etcdctl get foo
You will be able to retrieve the entry, which proves that the cluster is working:
Outputfoo
bar
Lastly, exit out of each of the managed nodes and back to your local machine:
- exit
- exit
In this step, we provisioned a new 3-node cluster. At the moment, communication between etcd
members and their peers and clients are conducted through HTTP. This means the communication is unencrypted and any party who can intercept the traffic can read the messages. This is not a big issue if the etcd
cluster and clients are all deployed within a private network or virtual private network (VPN) which you fully control. However, if any of the traffic needs to travel through a shared network (private or public), then you should ensure this traffic is encrypted. Furthermore, a mechanism needs to be put in place for a client or peer to verify the authenticity of the server.
In the next step, we will look at how to secure client-to-server as well as peer communication using TLS.
To encrypt messages between member nodes, etcd uses Hypertext Transfer Protocol Secure, or HTTPS, which is a layer on top of the Transport Layer Security, or TLS, protocol. TLS uses a system of private keys, certificates, and trusted entities called Certificate Authorities (CAs) to authenticate with, and send encrypted messages to, each other.
In this tutorial, each member node needs to generate a certificate to identify itself, and have this certificate signed by a CA. We will configure all member nodes to trust this CA, and thus also trust any certificates it signs. This allows member nodes to mutually authenticate with each other.
The certificate that a member node generates must allow other member nodes to identify itself. All certificates include the Common Name (CN) of the entity it is associated with. This is often used as the identity of the entity. However, when verifying a certificate, client implementations may compare whether the information it collected about the entity match what was given in the certificate. For example, when a client downloads the TLS certificate with the subject of CN=foo.bar.com
, but the client is actually connecting to the server using an IP address (e.g., 167.71.129.110
), then there’s a mismatch and the client may not trust the certificate. By specifying a subject alternative name (SAN) in the certificate, it informs the verifier that both names belong to the same entity.
Because our etcd members are peering with each other using their private IP addresses, when we define our certificates, we’ll need to provide these private IP addresses as the subject alternative names.
To find out the private IP address of a managed node, SSH into it:
- ssh root@etcd1_public_ip
Then run the following command:
- ip -f inet addr show eth1
You’ll find output similar to the following lines:
Output3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
inet 10.131.255.176/16 brd 10.131.255.255 scope global eth1
valid_lft forever preferred_lft forever
In our example output, 10.131.255.176
is the private IP address of the managed node, and the only information we are interested in. To filter out everything else apart from the private IP, we can pipe the output of the ip
command to the sed
utility, which is used to filter and transform text.
- ip -f inet addr show eth1 | sed -En -e 's/.*inet ([0-9.]+).*/\1/p'
Now, the only output is the private IP address itself:
Output10.131.255.176
Once you’re satisfied that the preceding command works, exit out of the managed node:
- exit
To incorporate the preceding commands into our playbook, first open up the playbook.yaml
file:
- nano $HOME/playground/etcd-ansible/playbook.yaml
Then, add a new play with a single task before our existing play:
...
- hosts: etcd
tasks:
- shell: ip -f inet addr show eth1 | sed -En -e 's/.*inet ([0-9.]+).*/\1/p'
register: privateIP
- hosts: etcd
tasks:
...
The task uses the shell
module to run the ip
and sed
commands, which fetches the private IP address of the managed node. It then registers the return value of the shell command inside a variable named privateIP
, which we will use later.
In this step, we added a task to the playbook to obtain the private IP address of the managed nodes. In the next step, we are going to use this information to generate certificates for each member node, and have these certificates signed by a Certificate Authority (CA).
In order for a member node to receive encrypted traffic, the sender must use the member node’s public key to encrypt the data, and the member node must use its private key to decrypt the ciphertext and retrieve the original data. The public key is packaged into a certificate and signed by a CA to ensure that it is genuine.
Therefore, we will need to generate a private key and certificate signing request (CSR) for each etcd member node. To make it easier for us, we will generate all key pairs and sign all certificates locally, on the control node, and then copy the relevant files to the managed hosts.
First, create a directory called artifacts/
, where we’ll place the files (keys and certificates) generated during the process. Open the playbook.yaml
file with an editor:
- nano $HOME/playground/etcd-ansible/playbook.yaml
In it, use the file
module to create the artifacts/
directory:
...
- shell: ip -f inet addr show eth1 | sed -En -e 's/.*inet ([0-9.]+).*/\1/p'
register: privateIP
- hosts: localhost
gather_facts: False
become: False
tasks:
- name: "Create ./artifacts directory to house keys and certificates"
file:
path: ./artifacts
state: directory
- hosts: etcd
tasks:
...
Next, add another task to the end of the play to generate the private key:
...
- hosts: localhost
gather_facts: False
become: False
tasks:
...
- name: "Generate private key for each member"
openssl_privatekey:
path: ./artifacts/{{item}}.key
type: RSA
size: 4096
state: present
force: True
with_items: "{{ groups['etcd'] }}"
- hosts: etcd
tasks:
...
Creating private keys and CSRs can be done using the openssl_privatekey
and openssl_csr
modules, respectively.
The force: True
attribute ensures that the private key is regenerated each time, even if it exists already.
Similarly, append the following new task to the same play to generate the CSRs for each member, using the openssl_csr
module:
...
- hosts: localhost
gather_facts: False
become: False
tasks:
...
- name: "Generate private key for each member"
openssl_privatekey:
...
with_items: "{{ groups['etcd'] }}"
- name: "Generate CSR for each member"
openssl_csr:
path: ./artifacts/{{item}}.csr
privatekey_path: ./artifacts/{{item}}.key
common_name: "{{item}}"
key_usage:
- digitalSignature
extended_key_usage:
- serverAuth
subject_alt_name:
- IP:{{ hostvars[item]['privateIP']['stdout']}}
- IP:127.0.0.1
force: True
with_items: "{{ groups['etcd'] }}"
We are specifying that this certificate can be involved in a digital signature mechanism for the purpose of server authentication. This certificate is associated with the hostname (e.g., etcd1
), but the verifier should also treat each node’s private and local loopback IP addresses as alternative names. Note that we are using the privateIP
variable that we registered in the previous play.
Close and save the playbook.yaml
file by pressing CTRL+X
followed by Y
. Then, run our playbook again:
- ansible-playbook -i hosts playbook.yaml
We will now find a new directory called artifacts
within our project directory; use ls
to list out its contents:
- ls artifacts
You will find the private keys and CSRs for each of the etcd members:
Outputetcd1.csr etcd1.key etcd2.csr etcd2.key etcd3.csr etcd3.key
In this step, we used several Ansible modules to generate private keys and public key certificates for each of the member nodes. In the next step, we will look at how to sign a certificate signing request (CSR).
Within an etcd cluster, member nodes encrypt messages using the receiver’s public key. To ensure the public key is genuine, the receiver packages the public key into a certificate signing request (CSR) and has a trusted entity (i.e., the CA) sign the CSR. Since we control all the member nodes and the CAs they trust, we don’t need to use an external CA and can act as our own CA. In this step, we are going to act as our own CA, which means we’ll need to generate a private key and a self-signed certificate to function as the CA.
First, open the playbook.yaml
file with your editor:
- nano $HOME/playground/etcd-ansible/playbook.yaml
Then, similar to the previous step, append a task to the localhost
play to generate a private key for the CA:
- hosts: localhost
...
tasks:
...
- name: "Generate CSR for each member"
...
with_items: "{{ groups['etcd'] }}"
- name: "Generate private key for CA"
openssl_privatekey:
path: ./artifacts/ca.key
type: RSA
size: 4096
state: present
force: True
- hosts: etcd
become: True
tasks:
- name: "Create directory for etcd binaries"
...
Next, use the openssl_csr
module to generate a new CSR. This is similar to the previous step, but in this CSR, we are adding the basic constraint and key usage extension to indicate that this certificate can be used as a CA certificate:
- hosts: localhost
...
tasks:
...
- name: "Generate private key for CA"
openssl_privatekey:
path: ./artifacts/ca.key
type: RSA
size: 4096
state: present
force: True
- name: "Generate CSR for CA"
openssl_csr:
path: ./artifacts/ca.csr
privatekey_path: ./artifacts/ca.key
common_name: ca
organization_name: "Etcd CA"
basic_constraints:
- CA:TRUE
- pathlen:1
basic_constraints_critical: True
key_usage:
- keyCertSign
- digitalSignature
force: True
- hosts: etcd
become: True
tasks:
- name: "Create directory for etcd binaries"
...
Lastly, use the openssl_certificate
module to self-sign the CSR:
- hosts: localhost
...
tasks:
...
- name: "Generate CSR for CA"
openssl_csr:
path: ./artifacts/ca.csr
privatekey_path: ./artifacts/ca.key
common_name: ca
organization_name: "Etcd CA"
basic_constraints:
- CA:TRUE
- pathlen:1
basic_constraints_critical: True
key_usage:
- keyCertSign
- digitalSignature
force: True
- name: "Generate self-signed CA certificate"
openssl_certificate:
path: ./artifacts/ca.crt
privatekey_path: ./artifacts/ca.key
csr_path: ./artifacts/ca.csr
provider: selfsigned
force: True
- hosts: etcd
become: True
tasks:
- name: "Create directory for etcd binaries"
...
Close and save the playbook.yaml
file by pressing CTRL+X
followed by Y
. Then, run our playbook again to apply the changes:
- ansible-playbook -i hosts playbook.yaml
You can also run ls
to check the contents of the artifacts/
directory:
- ls artifacts/
You will now find the freshly generated CA certificate (ca.crt
):
Outputca.crt ca.csr ca.key etcd1.csr etcd1.key etcd2.csr etcd2.key etcd3.csr etcd3.key
In this step, we generated a private key and a self-signed certificate for the CA. In the next step, we will use the CA certificate to sign each member’s CSR.
In this step, we are going to sign each member node’s CSR. This will be similar to how we used the openssl_certificate
module to self-sign the CA certificate, but instead of using the selfsigned
provider, we will use the ownca
provider, which allows us to sign using our own CA certificate.
Open up your playbook:
- nano $HOME/playground/etcd-ansible/playbook.yaml
Append the following highlighted task to the "Generate self-signed CA certificate"
task:
- hosts: localhost
...
tasks:
...
- name: "Generate self-signed CA certificate"
openssl_certificate:
path: ./artifacts/ca.crt
privatekey_path: ./artifacts/ca.key
csr_path: ./artifacts/ca.csr
provider: selfsigned
force: True
- name: "Generate an `etcd` member certificate signed with our own CA certificate"
openssl_certificate:
path: ./artifacts/{{item}}.crt
csr_path: ./artifacts/{{item}}.csr
ownca_path: ./artifacts/ca.crt
ownca_privatekey_path: ./artifacts/ca.key
provider: ownca
force: True
with_items: "{{ groups['etcd'] }}"
- hosts: etcd
become: True
tasks:
- name: "Create directory for etcd binaries"
...
Close and save the playbook.yaml
file by pressing CTRL+X
followed by Y
. Then, run the playbook again to apply the changes:
- ansible-playbook -i hosts playbook.yaml
Now, list out the contents of the artifacts/
directory:
- ls artifacts/
You will find the private key, CSR, and certificate for every etcd member and the CA:
Outputca.crt ca.csr ca.key etcd1.crt etcd1.csr etcd1.key etcd2.crt etcd2.csr etcd2.key etcd3.crt etcd3.csr etcd3.key
In this step, we have signed each member node’s CSRs using the CA’s key. In the next step, we are going to copy the relevant files into each managed node, so that etcd has access to the relevant keys and certificates to set up TLS connections.
Every node needs to have a copy of the CA’s self-signed certificate (ca.crt
). Each etcd
member node also needs to have its own private key and certificate. In this step, we are going to upload these files and place them in a new /etc/etcd/ssl/
directory.
To start, open the playbook.yaml
file with your editor:
- nano $HOME/playground/etcd-ansible/playbook.yaml
To make these changes on our Ansible playbook, first update the path
property of the Create directory for etcd configuration
task to create the /etc/etcd/ssl/
directory:
- hosts: etcd
...
tasks:
...
with_items:
- absent
- directory
- name: "Create directory for etcd configuration"
file:
path: "{{ item }}"
state: directory
owner: root
group: root
mode: 0755
with_items:
- /etc/etcd
- /etc/etcd/ssl
- name: "Create configuration file for etcd"
template:
...
Then, following the modified task, add three more tasks to copy the files over:
- hosts: etcd
...
tasks:
...
- name: "Copy over the CA certificate"
copy:
src: ./artifacts/ca.crt
remote_src: False
dest: /etc/etcd/ssl/ca.crt
owner: root
group: root
mode: 0644
- name: "Copy over the `etcd` member certificate"
copy:
src: ./artifacts/{{inventory_hostname}}.crt
remote_src: False
dest: /etc/etcd/ssl/server.crt
owner: root
group: root
mode: 0644
- name: "Copy over the `etcd` member key"
copy:
src: ./artifacts/{{inventory_hostname}}.key
remote_src: False
dest: /etc/etcd/ssl/server.key
owner: root
group: root
mode: 0600
- name: "Create configuration file for etcd"
template:
...
Close and save the playbook.yaml
file by pressing CTRL+X
followed by Y
.
Run ansible-playbook
again to make these changes:
- ansible-playbook -i hosts playbook.yaml
In this step, we have successfully uploaded the private keys and certificates to the managed nodes. Having copied the files over, we now need to update our etcd configuration file to make use of them.
In the last step of this tutorial, we are going to update some Ansible configurations to enable TLS in an etcd cluster.
First, open up the templates/etcd.conf.yaml.j2
template file using your editor:
- nano $HOME/playground/etcd-ansible/templates/etcd.conf.yaml.j2
Once inside, change all URLs to use https
as the protocol instead of http
. Additionally, add a section at the end of the template to specify the location of the CA certificate, server certificate, and server key:
data-dir: /var/lib/etcd/{{ inventory_hostname }}.etcd
name: {{ inventory_hostname }}
initial-advertise-peer-urls: https://{{ hostvars[inventory_hostname]['ansible_facts']['eth1']['ipv4']['address'] }}:2380
listen-peer-urls: https://{{ hostvars[inventory_hostname]['ansible_facts']['eth1']['ipv4']['address'] }}:2380,https://127.0.0.1:2380
advertise-client-urls: https://{{ hostvars[inventory_hostname]['ansible_facts']['eth1']['ipv4']['address'] }}:2379
listen-client-urls: https://{{ hostvars[inventory_hostname]['ansible_facts']['eth1']['ipv4']['address'] }}:2379,https://127.0.0.1:2379
initial-cluster-state: new
initial-cluster: {% for host in groups['etcd'] %}{{ hostvars[host]['ansible_facts']['hostname'] }}=https://{{ hostvars[host]['ansible_facts']['eth1']['ipv4']['address'] }}:2380{% if not loop.last %},{% endif %}{% endfor %}
client-transport-security:
cert-file: /etc/etcd/ssl/server.crt
key-file: /etc/etcd/ssl/server.key
trusted-ca-file: /etc/etcd/ssl/ca.crt
peer-transport-security:
cert-file: /etc/etcd/ssl/server.crt
key-file: /etc/etcd/ssl/server.key
trusted-ca-file: /etc/etcd/ssl/ca.crt
Close and save the templates/etcd.conf.yaml.j2
file.
Next, run your Ansible playbook:
- ansible-playbook -i hosts playbook.yaml
Then, SSH into one of the managed nodes:
- ssh root@etcd1_public_ip
Once inside, run the etcdctl endpoint health
command to check whether the endpoints are using HTTPS, and if all members are healthy:
- etcdctl --cacert /etc/etcd/ssl/ca.crt endpoint health --cluster
Because our CA certificate is not, by default, a trusted root CA certificate installed in the /etc/ssl/certs/
directory, we need to pass it to etcdctl
using the --cacert
flag.
This will give the following output:
Outputhttps://etcd3_private_ip:2379 is healthy: successfully committed proposal: took = 19.237262ms
https://etcd1_private_ip:2379 is healthy: successfully committed proposal: took = 4.769088ms
https://etcd2_private_ip:2379 is healthy: successfully committed proposal: took = 5.953599ms
To confirm that the etcd
cluster is actually working, we can, once again, create an entry on one member node, and retrieve it from another member node:
- etcdctl --cacert /etc/etcd/ssl/ca.crt put foo "bar"
Use a new terminal to SSH into a different node:
- ssh root@etcd2_public_ip
Now retrieve the same entry using the key foo
:
- etcdctl --cacert /etc/etcd/ssl/ca.crt get foo
This will return the entry, showing the output below:
Outputfoo
bar
You can do the same on the third node to ensure all three members are operational.
You have now successfully provisioned a 3-node etcd cluster, secured it with TLS, and confirmed that it is working.
etcd is a tool originally created by CoreOS. To understand etcd’s usage in relation to CoreOS, you can read How To Use Etcdctl and Etcd, CoreOS’s Distributed Key-Value Store. The article also guides you through setting up a dynamic discovery model, something which was discussed but not demonstrated in this tutorial.
As mentioned at the beginning of this tutorial, etcd is an important part of the Kubernetes ecosystem. To learn more about Kubernetes and etcd’s role within it, you can read An Introduction to Kubernetes. If you are deploying etcd as part of a Kubernetes cluster, know that there are other tools available, such as kubespray and kubeadm
. For more details on the latter, you can read How To Create a Kubernetes Cluster Using Kubeadm on Ubuntu 18.04.
Finally, this tutorial made use of many tools, but could not dive into each in too much detail. In the following you’ll find links that will provide a more detailed examination of each tool:
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!