Developer Advocate
Corosync is an open source cluster engine used to implement high availability within applications. Commonly referred to as a messaging layer, Corosync provides a cluster membership and closed communication model for creating replicated state machines, on top of which cluster resource managers like Pacemaker can run. Corosync can be seen as the underlying system that connects the cluster nodes together, while Pacemaker monitors the cluster and takes action in the event of a failure.
This tutorial will demonstrate how to use Corosync and Pacemaker to create a high availability (HA) infrastructure on DigitalOcean with CentOS 7 servers and Reserved IPs. To facilitate the process of setting up and managing the cluster nodes, we are going to use PCS, a command line interface that interacts with both Corosync and Pacemaker. ##Prerequisites
In order to follow this guide, you will need:
When creating these Droplets, use descriptive hostnames to uniquely identify them. For this tutorial, we will refer to these Droplets as primary and secondary.
When you are ready to move on, make sure you are logged into both of your servers with your sudo
user.
##Step 1 — Set Up Nginx
To speed things up, we are going to use a simple shell script that installs Nginx and sets up a basic web page containing information about that specific server. This way we can easily identify which server is currently active in our Reserved IP setup. The script uses DigitalOcean’s Metadata service to fetch the Droplet’s IP address and hostname.
In order to execute the script, run the following commands on both servers:
After the script is finished running, accessing either Droplet via its public IP address from a browser should give you a basic web page showing the Droplet’s hostname and IP address.
In order to reduce this tutorial’s complexity, we will be using simple web servers as cluster nodes. In a production environment, the nodes would typically be configured to act as redundant load balancers. For more information about load balancers, check out our Introduction to HAProxy and Load Balancing Concepts guide.
##Step 2 — Create and Assign Reserved IP The first step is to create a Reserved IP and assign it to the primary server. In the DigitalOcean Control Panel, click Networking in the top menu, then Reserved IPs in the side menu.
You should see a page like this:
Select your primary server and click on the “Assign Reserved IP” button. After the Reserved IP has been assigned, check that you can reach the primary Droplet by accessing the Reserved IP address from your browser:
http://your_reserved_ip
You should see the index page of your primary Droplet. ##Step 3 — Create IP Reassignment Script In this step, we’ll demonstrate how the DigitalOcean API can be used to reassign a Reserved IP to another Droplet. Later on, we will configure Pacemaker to execute this script when the cluster detects a failure in one of the nodes.
For our example, we are going to use a basic Python script that takes a Reserved IP address and a Droplet ID as arguments in order to assign the Reserved IP to the given Droplet. The Droplet’s ID can be fetched from within the Droplet itself using the Metadata service.
Let’s start by downloading the assign-ip
script and making it executable. Feel free to review the contents of the script before downloading it.
The following two commands should be executed on both servers (primary and secondary):
The assign-ip
script requires the following information in order to be executed:
###Testing the IP Reassignment Script
To monitor the IP reassignment taking place, we can use a curl
command to access the Reserved IP address in a loop, with an interval of 1 second between each request.
Open a new local terminal and run the following command, making sure to replace reserved_IP_address with your actual Reserved IP address:
This command will keep running in the active terminal until interrupted with a CTRL+C
. It simply fetches the web page hosted by the server that your Reserved IP is currently assigned to. The output should look like this:
OutputDroplet: primary, IP Address: primary_IP_address
Droplet: primary, IP Address: primary_IP_address
Droplet: primary, IP Address: primary_IP_address
...
Now, let’s run the assign-ip
script to reassign the Reserved IP to the secondary droplet. We will use DigitalOcean’s Metadata service to fetch the current Droplet ID and use it as an argument to the script. Fetching the Droplet’s ID from the Metadata service can be done with:
Where 169.254.169.254
is a static IP address used by the Metadata service, and therefore should not be modified. This information is only available from within the Droplet itself.
Before we can execute the script, we need to set the DO_TOKEN environment variable containing the DigitalOcean API token. Run the following command from the secondary server, and don’t forget to replace your_api_token with your read/write Personal Access Token to the DigitalOcean API:
Still on the secondary server, run the assign-ip
script replacing reserved_IP_address with your Reserved IP address:
OutputMoving IP address: in-progress
By monitoring the output produced by the curl
command on your local terminal, you will notice that the Reserved IP will change its assigned IP address and start pointing to the secondary Droplet after a few seconds:
OutputDroplet: primary, IP Address: primary_IP_address
Droplet: primary, IP Address: primary_IP_address
Droplet: secondary, IP Address: secondary_IP_address
You can also access the Reserved IP address from your browser. You should get a page showing the secondary Droplet information. This means that the reassignment script worked as expected.
To reassign the Reserved IP back to the primary server, repeat the 2-step process but this time from the primary Droplet:
After a few seconds, the Reserved IP should be pointing to your primary Droplet again. ##Step 4 — Install Corosync, Pacemaker and PCS The next step is to get Corosync, Pacemaker and PCS installed on your Droplets. Because Corosync is a dependency to Pacemaker, it’s usually a better idea to simply install Pacemaker and let the system decide which Corosync version should be installed.
Install the software packages on both servers:
The PCS utility creates a new system user during installation, named hacluster, with a disabled password. We need to define a password for this user on both servers. This will enable PCS to perform tasks such as synchronizing the Corosync configuration on multiple nodes, as well as starting and stopping the cluster.
On both servers, run:
You should use the same password on both servers. We are going to use this password to configure the cluster in the next step.
The user hacluster has no interactive shell or home directory associated with its account, which means it’s not possible to log into the server using its credentials.
##Step 5 — Set Up the Cluster Now that we have Corosync, Pacemaker and PCS installed on both servers, we can set up the cluster. ###Enabling and Starting PCS To enable and start the PCS daemon, run the following on both servers:
###Obtaining the Private Network IP Address for Each Node For improved network performance and security, the nodes should be connected using the private network. The easiest way to obtain the Droplet’s private network IP address is via the Metadata service. On each server, run the following command:
This command will simply output the private network IP address of the Droplet you’re logged in. You can also find this information on your Droplet’s page at the DigitalOcean Control Panel (under the Settings tab).
Collect the private network IP address from both Droplets for the next steps. ###Authenticating the Cluster Nodes Authenticate the cluster nodes using the username hacluster and the same password you defined on step 3. You’ll need to provide the private network IP address for each node. From the primary server, run:
You should get output like this:
OutputUsername: hacluster
Password:
primary_private_IP_address: Authorized
secondary_private_IP_address: Authorized
###Generating the Corosync Configuration Still on the primary server, generate the Corosync configuration file with the following command:
The output should look similar to this:
OutputShutting down pacemaker/corosync services...
Redirecting to /bin/systemctl stop pacemaker.service
Redirecting to /bin/systemctl stop corosync.service
Killing any remaining services...
Removing all cluster configuration files...
primary_private_IP_address: Succeeded
secondary_private_IP_address: Succeeded
Synchronizing pcsd certificates on nodes primary_private_IP_address, secondary_private_IP_address...
primary_private_IP_address: Success
secondary_private_IP_address: Success
Restaring pcsd on the nodes in order to reload the certificates...
primary_private_IP_address: Success
secondary_private_IP_address: Success
This will generate a new configuration file located at /etc/corosync/corosync.conf
based on the parameters provided to the pcs cluster setup
command. We used webcluster as the cluster name in this example, but you can use the name of your choice.
###Starting the Cluster
To start the cluster you just set up, run the following command from the primary server:
Outputprimary_private_IP_address: Starting Cluster...
secondary_private_IP_address: Starting Cluster...
You can now confirm that both nodes joined the cluster by running the following command on any of the servers:
OutputMembership information
----------------------
Nodeid Votes Name
2 1 secondary_private_IP_address
1 1 primary_private_IP_address (local)
To get more information about the current status of the cluster, you can run:
The output should be similar to this:
OutputCluster Status:
Last updated: Fri Dec 11 11:59:09 2015 Last change: Fri Dec 11 11:59:00 2015 by hacluster via crmd on secondary
Stack: corosync
Current DC: secondary (version 1.1.13-a14efad) - partition with quorum
2 nodes and 0 resources configured
Online: [ primary secondary ]
PCSD Status:
primary (primary_private_IP_address): Online
secondary (secondary_private_IP_address): Online
Now you can enable the corosync
and pacemaker
services to make sure they will start when the system boots. Run the following on both servers:
###Disabling STONITH STONITH (Shoot The Other Node In The Head) is a fencing technique intended to prevent data corruption caused by faulty nodes in a cluster that are unresponsive but still accessing application data. Because its configuration depends on a number of factors that are out of scope for this guide, we are going to disable STONITH in our cluster setup.
To disable STONITH, run the following command on one of the Droplets, either primary or secondary:
##Step 6 — Create Reserved IP Reassignment Resource Agent The only thing left to do is to configure the resource agent that will execute the IP reassignment script when a failure is detected in one of the cluster nodes. The resource agent is responsible for creating an interface between the cluster and the resource itself. In our case, the resource is the assign-ip script. The cluster relies on the resource agent to execute the right procedures when given a start, stop or monitor command. There are different types of resource agents, but the most common one is the OCF (Open Cluster Framework) standard.
We will create a new OCF resource agent to manage the assign-ip service on both servers.
First, create the directory that will contain the resource agent. The directory name will be used by Pacemaker as an identifier for this custom agent. Run the following on both servers:
Next, download the FloatIP resource agent script and place it in the newly created directory, on both servers:
Now make the script executable with the following command on both servers:
We still need to register the resource agent within the cluster, using the PCS utility. The following command should be executed from one of the nodes (don’t forget to replace your_api_token with your DigitalOcean API token and reserved_IP_address with your actual Reserved IP address):
The resource should now be registered and active in the cluster. You can check the registered resources from any of the nodes with the pcs status
command:
Output...
2 nodes and 1 resource configured
Online: [ primary secondary ]
Full list of resources:
FloatIP (ocf::digitalocean:floatip): Started primary
...
##Step 7 — Test Failover Your cluster should now be ready to handle a node failure. A simple way to test failover is to restart the server that is currently active in your Reserved IP setup. If you’ve followed all steps in this tutorial, this should be the primary server.
Again, let’s monitor the IP reassignment by using a curl
command in a loop. From a local terminal, run:
From the primary server, run a reboot command:
After a few moments, the primary server should become unavailable. This will cause the secondary server to take over as the active node. You should see output similar to this in your local terminal running curl
:
Output...
Droplet: primary, IP Address: primary_IP_address
Droplet: primary, IP Address: primary_IP_address
curl: (7) Failed connect to reserved_IP_address; Connection refused
Droplet: secondary, IP Address: secondary_IP_address
Droplet: secondary, IP Address: secondary_IP_address
…
The “Connection refused” error happens when the request is made right before or at the same time when the IP reassignment is taking place. It may or may not show up in the output.
If you want to point the Reserved IP back to the primary node while also testing failover on the secondary node, just repeat the process but this time from the secondary Droplet:
##Conclusion
In this guide, we saw how Reserved IPs can be used together with Corosync, Pacemaker and PCS to create a highly available web server environment on CentOS 7 servers. We used a rather simple infrastructure to demonstrate the usage of Reserved IPs, but this setup can be scaled to implement high availability at any level of your application stack.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
The current working command is:
Thanks for this tutorial. My question is regarding the failover criteria. Either stopping the cluster or rebooting the service will trigger a failover but when stopping the etcd service on one of the nodes, a failover is not triggered. I’m assuming this is intended because of how the corosync/pacemaker clustering works: in such a manner that it will route requests to other nodes in the cluster even if the etcd service is down on the given node.
Thanks for this tutorial. My question is regarding the failover criteria. Either stopping the cluster or rebooting the service will trigger a failover but when stopping the etcd service on one of the nodes, a failover is not triggered. I’m assuming this is intended because of how the corosync/pacemaker clustering works: in such a manner that it will route requests to other nodes in the cluster even if the etcd service is down on the given node.
This comment has been deleted
This comment has been deleted
This comment has been deleted
There is a typo in step 6 command example. You need to remove params from the command otherwise it will not work.
pcs cluster setup --name HaCluster 192.168.153.184 192.168.163.185 --force Error: Unable to set corosync config: Unable to connect to 192.168.163.185 ([Errno 111] Connection refused) I don’t know why I have error. With regards. Thx
Hi,
I had the error “Unable to communicate with IP_…” when I used the pcs cluster auth command.
I found a solution, it may help some of you. When you have the firewall enabled you have to allow following ports: TCP: 2224, 3121, 21064 UDP: 5405
Regards,