This tutorial is out of date and no longer maintained.
This article is no longer current. If you are interested in writing an update for this article, please see DigitalOcean wants to publish your tech tutorial!
Reason: On December 22, 2016, CoreOS announced that it no longer maintains fleet. CoreOS recommends using Kubernetes for all clustering needs.
See Instead: For guidance using Kubernetes on CoreOS without fleet, see the Kubernetes on CoreOS Documentation.
If you are planning on using CoreOS in your infrastructure, the first thing you will want to set up is a CoreOS cluster. In order for CoreOS machines to form a cluster, their etcd2
instances must be connected. In this tutorial, we will give step-by-step instructions to quickly create a 3-node CoreOS cluster on DigitalOcean.
If you are unfamiliar with the components that CoreOS is built on (docker, etcd2, and fleet) it is highly recommended that you read An Introduction to CoreOS System Components. You will want to pay particular attention to the section that covers etcd2
, since that component is essential to the cluster discovery process.
Every CoreOS server that you create will need to have at least one SSH public key installed during its creation process. The key(s) will be installed to the core
user’s authorized keys file, and you will need the corresponding private key(s) to log in to your CoreOS server.
If you do not already have any SSH keys associated with your DigitalOcean account, do so now by following steps 1-3 of this tutorial: How To Use SSH Keys with DigitalOcean Droplets. Then you will want to add your private key to your SSH agent on your client machine by running the following command:
ssh-add
For more about this step, see this article.
If you are planning on using the DigitalOcean API to create your CoreOS machines, refer to this tutorial for information on how to generate and use a Personal Access Token with write permissions.
Now that you have the prerequisites out of the way, let’s start building our CoreOS cluster!
The first step to setting up a new CoreOS cluster is generating a new discovery URL, a unique address that stores peer CoreOS addresses and metadata. The easiest way to do this is to use https://discovery.etcd.io
, a free discovery service. A new discovery URL can be generated by visiting https://discovery.etcd.io/new in a web browser or by running the following curl
command:
curl -w "\n" "https://discovery.etcd.io/new?size=3"
Either method will return a fresh, unique discovery URL that looks something like the following (the highlighted part will be a unique token):
https://discovery.etcd.io/5c1574906b3502aa9d8dc43c1b185775
You will use your resulting discovery URL to create your new CoreOS cluster. The same discovery URL must be specified in the etcd2
section of the cloud-config of each server that you want to add to a particular CoreOS cluster.
Now that we have a discovery URL, let’s look at how to create cloud-config
file that uses it.
CoreOS uses a file called cloud-config
which allows you to declaratively customize network configuration, systemd units, and other OS-level items. This file is written in YAML format, which uses indentation to denote data hierarchy. The cloud-config
file is processed when a machine is booted, and provides a way to configure your machines with etcd2
settings that will allow them to discover the cluster that they should join.
We will cover how to write a minimal cloud-config
to get a working CoreOS cluster up and running. For a full list of items that can be configured with cloud-config, check out the official documentation. They also provide a helpful tool that can check your cloud-config file’s syntax, Cloud-Config Validator.
As mentioned earlier, the peer addresses of each CoreOS machine in a cluster is stored with the discovery URL. Therefore, each machine in a cluster must use the same discovery URL and pass in its own IP address where its etcd2
service can be reached. These are specified in cloud-config
under the etcd2
section, and are shown in the code block below.
You will also need to specify a units
section, which will start the etcd2
and fleet
services that are necessary for a working CoreOS cluster.
Here is a basic cloud-config
file that can be used with your CoreOS machines to make a new cluster (substitute the value of discovery
with the discovery URL that you generated earlier):
#cloud-config
coreos:
etcd2:
# generate a new token for each unique cluster from https://discovery.etcd.io/new:
discovery: https://discovery.etcd.io/<$><discovery_token><$>
# multi-region deployments, multi-cloud deployments, and Droplets without
# private networking need to use $public_ipv4:
advertise-client-urls: http://$private_ipv4:2379,http://$private_ipv4:4001
initial-advertise-peer-urls: http://$private_ipv4:2380
# listen on the official ports 2379, 2380 and one legacy port 4001:
listen-client-urls: http://0.0.0.0:2379,http://0.0.0.0:4001
listen-peer-urls: http://$private_ipv4:2380
fleet:
public-ip: $private_ipv4 # used for fleetctl ssh command
units:
- name: etcd2.service
command: start
- name: fleet.service
command: start
Note: The #cloud-config
line is required. The $private_ipv4
and $public_ipv4
substitution variables are fully supported in cloud-config on DigitalOcean–these variables will be replaced with the actual respective IP addresses of your new VPS. Also, the fleet
section is not required if you do not intend to use the fleetctl ssh
command.
This cloud-config
script can be used to set up a basic CoreOS that can be used for testing purposes; unfortunately, it is not very secure. For a more serious setup, you should set up a secure CoreOS cluster by following this tutorial: How To Secure Your CoreOS Cluster with TLS/SSL and Firewall Rules.
Now that you know what your cloud-config
file for each machine in your new CoreOS cluster will consist of, let’s create your CoreOS cluster. Because Droplets can be created through the DigitalOcean Control Panel or API, we will show you how to create your CoreOS cluster using both methods.
First, visit the DigitalOcean Control Panel then click the Create Droplet button.
Next, select CoreOS as your Linux distribution, then select which channel you want to use (Stable, Beta, or Alpha).
Then select your desired Droplet size. A smaller size is fine if you’re doing basic testing.
Next, select your preferred datacenter region.
Under the Select additional options header, select Private Networking and User Data. Copy and paste your cloud-config
script into the User Data text field. It should look something like this:
Next, select at least one SSH key that you want to use to log in to your Droplets.
Under the Finalize and create section, create at least three Droplets and specify their hostnames. In our example, we’ll call them coreos-01, coreos-02, and coreos-03:
Lastly, click the Create button to create the Droplets that will form your CoreOS cluster.
To learn more about the Droplet creation process, using the DigitalOcean Control Panel, refer to this guide.
If you use the DigitalOcean API to create your CoreOS Droplets, you can specify your cloud-config
via the user_data
parameter in your Droplet creation POST request–just paste the whole script in there.
Let us assume that we want to create three 1 GB Droplets named coreos-01, coreos-02, and coreos-03 with private networking, in the NYC3 data center, using the CoreOS Stable channel image, and the cloud-config
file shown earlier. Here is an example of the curl
command you would run to create it using the DigitalOcean API:
curl -X POST "https://api.digitalocean.com/v2/droplets" \
-d'{"names":["coreos-01","coreos-02","coreos-03"],"region":"nyc3","size":"1GB","private_networking":true,"image":"coreos-stable","user_data":
"#cloud-config
coreos:
etcd2:
# generate a new token for each unique cluster from https://discovery.etcd.io/new:
discovery: https://discovery.etcd.io/<$><discovery_token><$>
# multi-region deployments, multi-cloud deployments, and Droplets without
# private networking need to use $public_ipv4:
advertise-client-urls: http://$private_ipv4:2379,http://$private_ipv4:4001
initial-advertise-peer-urls: http://$private_ipv4:2380
# listen on the official ports 2379, 2380 and one legacy port 4001:
listen-client-urls: http://0.0.0.0:2379,http://0.0.0.0:4001
listen-peer-urls: http://$private_ipv4:2380
fleet:
public-ip: $private_ipv4 # used for fleetctl ssh command
units:
- name: etcd2.service
command: start
- name: fleet.service
command: start",
"ssh_keys":[ <SSH Key ID(s)> ]}' \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json"
Note: This is just like a normal multi-Droplet create request, with the addition of the example cloud-config
passed through the user_data
parameter.
You must substitute your SSH Key ID(s) or fingerprint(s) for <SSH Key ID(s)>
, and make sure $TOKEN
is set to one of your read/write DigitalOcean Personal Access Tokens.
After running this command with the appropriate substitutions, your 3-node CoreOS cluster will be created.
For more information about using the API, please refer to this tutorial.
To verify that our 3-machine cluster has formed properly, we must SSH to one of the cluster members.
Log into the coreos-01 machine as the core
user via SSH, and use the -A
option to forward your SSH agent. Remember to substitute the public IP address:
ssh -A core@coreos-01_public_IP
At the command prompt, enter this fleetctl
command to show all the members of the cluster:
fleetctl list-machines
You should see a list of all of the online machines in the cluster, identifiable by their respective peer-addr
IP addresses. Here is an example of the output:
MACHINE IP METADATA
59b2fffd... 10.131.29.141 -
853b0df3... 10.131.63.121 -
cd64a2e3... 10.131.63.120 -
If you see all of the machines that you created, all of them are aware of each other via etcd2
, and your cluster has formed properly!
Warning: Be sure to set up IPTables to restrict access to port 4001 to machines within your CoreOS cluster, after the cluster is set up. This will prevent external, unauthorized users from controlling your CoreOS machines. For production use, you should strongly consider following the steps in this guide to securing a CoreOS cluster with TLS/SSL certificates and firewall rules.
If you would like to add new machines to an existing CoreOS cluster, simply create a new Droplet using the same cloud-config
(and discovery URL). Your new CoreOS machine will automatically join the existing cluster.
If you forgot which discovery URL you used, you may look it up on one of the members of the cluster. Use the following grep
command on one of your existing machines:
grep DISCOVERY /run/systemd/system/etcd2.service.d/20-cloudinit.conf
You will see a line the contains the original discovery URL, like the following:
Environment="ETCD_DISCOVERY=https://discovery.etcd.io/575302f03f4fb2db82e81ea2abca55e9"
Your basic CoreOS cluster is set up, and now you can move on to testing with it! If you are looking to set up a secure CoreOS cluster, follow this tutorial: How To Secure Your CoreOS Cluster with TLS/SSL and Firewall Rules.
The rest of the tutorials in this series will show you more about CoreOS, and how to use docker containers and service discovery with your CoreOS cluster.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
CoreOS is a powerful Linux distribution built to make large, scalable deployments on varied infrastructure simple to manage. Based on a build of Chrome OS, CoreOS maintains a lightweight host system and uses Docker containers for all applications. In this series, we will introduce you to the basics of CoreOS, teach you how to set up a CoreOS cluster, and get you started with using docker containers with CoreOS.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
Since DigitalOcean droplets with private networking enabled are on the same private network as other customers’ droplets, then if “$private_ipv4” is specified for “addr” and “peer-addr”, isn’t it critical that etcd be secured with TLS and client cert authentication?
See: CoreOS – Etcd: Reading and Writing over HTTPS
I realize that delving into that aspect of coreos/etcd configuration is beyond the scope of this introductory “how to” article, but I believe that some strong mention should be given to this security-related concern.
Worth noting that if users move on to the next part of the series and haven’t ssh’d to their coreOS box with a -A, their ssh agent will not be forwarded, and fleet won’t work as expected. Changing the ssh command in this post to a -A would fix the problems users may see.
Do
$public_ipv6
and$private_ipv6
exist as well?Where can I find a list of all variables available to cloud-install on Digital Ocean?
Great Post!
I had this same problem as icoz. I was able to solve it by setting up a new cluster of machines. I had setup several clusters using the same discovery URL in the cloud config user date and I tried generating a fresh URL using the link:
https://discovery.etcd.io/new
Also, I had turned on IPV6 support in my first machines. It is possible its related to this as well. In any case after those two changes I was successful and fleetclt list-machines showed a working cluster.
I am trying to run coreos using your how-to. But on step fleetctl list-machines I got: E0907 13:56:51.771686 00847 client.go:200] Unable to get result for {Get /_coreos.com/fleet/machines}, retrying in 100ms E0907 13:56:51.872851 00847 client.go:200] Unable to get result for {Get /_coreos.com/fleet/machines}, retrying in 200ms E0907 13:56:52.073681 00847 client.go:200] Unable to get result for {Get /_coreos.com/fleet/machines}, retrying in 400ms E0907 13:56:52.474515 00847 client.go:200] Unable to get result for {Get /_coreos.com/fleet/machines}, retrying in 800ms E0907 13:56:53.275426 00847 client.go:200] Unable to get result for {Get /_coreos.com/fleet/machines}, retrying in 1s E0907 13:56:54.276284 00847 client.go:200] Unable to get result for {Get /_coreos.com/fleet/machines}, retrying in 1s E0907 13:56:54.771553 00847 fleetctl.go:152] error attempting to check latest fleet version in Registry: timeout reached E0907 13:56:54.772189 00847 client.go:200] Unable to get result for {Get /_coreos.com/fleet/machines}, retrying in 100ms E0907 13:56:54.873013 00847 client.go:200] Unable to get result for {Get /_coreos.com/fleet/machines}, retrying in 200ms E0907 13:56:55.073946 00847 client.go:200] Unable to get result for {Get /_coreos.com/fleet/machines}, retrying in 400ms E0907 13:56:55.474776 00847 client.go:200] Unable to get result for {Get /_coreos.com/fleet/machines}, retrying in 800ms E0907 13:56:56.275632 00847 client.go:200] Unable to get result for {Get /_coreos.com/fleet/machines}, retrying in 1s E0907 13:56:57.276739 00847 client.go:200] Unable to get result for {Get /_coreos.com/fleet/machines}, retrying in 1s Error retrieving list of active machines: timeout reached
Trying to see something in etcd: etcdctl ls Error: Cannot sync with the cluster using peers 127.0.0.1:4001
journalctl | tail
says: Sep 07 13:59:33 co1 etcd[914]: [etcd] Sep 7 13:59:33.531 INFO | 0d704bc2bca944f3ae08dca165a8393b attempted to join via 10.132.243.51:7001 failed: fail checking join version: Client Internal Error (Get http://10.132.243.51:7001/version: dial tcp 10.132.243.51:7001: i/o timeout) Sep 07 13:59:33 co1 etcd[914]: [etcd] Sep 7 13:59:33.532 INFO | 0d704bc2bca944f3ae08dca165a8393b attempted to join via 10.131.238.210:7001 failed: fail checking join version: Client Internal Error (Get http://10.131.238.210:7001/version: dial tcp 10.131.238.210:7001: connection refused) Sep 07 13:59:33 co1 etcd[914]: [etcd] Sep 7 13:59:33.532 INFO | 0d704bc2bca944f3ae08dca165a8393b is unable to join the cluster using any of the peers [10.131.238.213:7001 10.131.238.213:7001 10.131.238.210:7001 10.131.238.53:7001 10.132.243.51:7001 10.131.238.210:7001] at 0th time. Retrying in 3.8 seconds Sep 07 13:59:36 co1 etcd[914]: [etcd] Sep 7 13:59:36.533 INFO | 0d704bc2bca944f3ae08dca165a8393b attempted to join via 10.131.238.213:7001 failed: fail checking join version: Client Internal Error (Get http://10.131.238.213:7001/version: dial tcp 10.131.238.213:7001: connection refused) Sep 07 13:59:36 co1 etcd[914]: [etcd] Sep 7 13:59:36.534 INFO | 0d704bc2bca944f3ae08dca165a8393b attempted to join via 10.131.238.213:7001 failed: fail checking join version: Client Internal Error (Get http://10.131.238.213:7001/version: dial tcp 10.131.238.213:7001: connection refused) Sep 07 13:59:36 co1 etcd[914]: [etcd] Sep 7 13:59:36.536 INFO | 0d704bc2bca944f3ae08dca165a8393b attempted to join via 10.131.238.210:7001 failed: fail checking join version: Client Internal Error (Get http://10.131.238.210:7001/version: dial tcp 10.131.238.210:7001: connection refused) Sep 07 13:59:37 co1 etcd[914]: [etcd] Sep 7 13:59:37.887 INFO | 0d704bc2bca944f3ae08dca165a8393b attempted to join via 10.131.238.53:7001 failed: fail checking join version: Client Internal Error (Get http://10.131.238.53:7001/version: dial tcp 10.131.238.53:7001: i/o timeout) Sep 07 13:59:39 co1 etcd[914]: [etcd] Sep 7 13:59:39.238 INFO | 0d704bc2bca944f3ae08dca165a8393b attempted to join via 10.132.243.51:7001 failed: fail checking join version: Client Internal Error (Get http://10.132.243.51:7001/version: dial tcp 10.132.243.51:7001: i/o timeout) Sep 07 13:59:39 co1 etcd[914]: [etcd] Sep 7 13:59:39.242 INFO | 0d704bc2bca944f3ae08dca165a8393b attempted to join via 10.131.238.210:7001 failed: fail checking join version: Client Internal Error (Get http://10.131.238.210:7001/version: dial tcp 10.131.238.210:7001: connection refused) Sep 07 13:59:39 co1 etcd[914]: [etcd] Sep 7 13:59:39.242 INFO | 0d704bc2bca944f3ae08dca165a8393b is unable to join the cluster using any of the peers [10.131.238.213:7001 10.131.238.213:7001 10.131.238.210:7001 10.131.238.53:7001 10.132.243.51:7001 10.131.238.210:7001] at 1th time. Retrying in 3.8 secondsWhat can I do to start CoreOS correctly?
@manicas I don’t think that is very safe, everyone which know the public ip, can GET/PUT/DELETE key from etcd.
This was a very usefull article, thank you.
The only problem i encountered is that fleetctl when executing “fleetctl list-machines” is showing the public ips instead of private ones. I used the following cloud config:
Is there maybe a fix for this?
Since there is a stern warning about setting up a firewall, there should maybe also be a link to or an example of how it is done in CoreOS. I found this blog post helpful, and all it comes down to is adding some bits into your cloud-config. These additions configure a persistent iptables firewall that lets SSH and HTTP[S] traffic through, plus already established connections and some ICMP messages:
I had same problem that described here in comments. It was caused by limited width in user-data field in web form. Long comment line got stripped into two lines and cloudinit failed to parse my user-data.
Just remove lines with comments in it will work fine.