Tutorial

How to Create a High Availability Setup with Pacemaker, Corosync and Reserved IPs on CentOS 7

Published on December 22, 2015

Developer Advocate

How to Create a High Availability Setup with Pacemaker, Corosync and Reserved IPs on CentOS 7

Introduction

Corosync is an open source cluster engine used to implement high availability within applications. Commonly referred to as a messaging layer, Corosync provides a cluster membership and closed communication model for creating replicated state machines, on top of which cluster resource managers like Pacemaker can run. Corosync can be seen as the underlying system that connects the cluster nodes together, while Pacemaker monitors the cluster and takes action in the event of a failure.

This tutorial will demonstrate how to use Corosync and Pacemaker to create a high availability (HA) infrastructure on DigitalOcean with CentOS 7 servers and Reserved IPs. To facilitate the process of setting up and managing the cluster nodes, we are going to use PCS, a command line interface that interacts with both Corosync and Pacemaker. ##Prerequisites

In order to follow this guide, you will need:

When creating these Droplets, use descriptive hostnames to uniquely identify them. For this tutorial, we will refer to these Droplets as primary and secondary.

When you are ready to move on, make sure you are logged into both of your servers with your sudo user.

##Step 1 — Set Up Nginx

To speed things up, we are going to use a simple shell script that installs Nginx and sets up a basic web page containing information about that specific server. This way we can easily identify which server is currently active in our Reserved IP setup. The script uses DigitalOcean’s Metadata service to fetch the Droplet’s IP address and hostname.

In order to execute the script, run the following commands on both servers:

  1. sudo curl -L -o install.sh http://do.co/nginx-centos
  2. sudo chmod +x install.sh
  3. sudo ./install.sh

After the script is finished running, accessing either Droplet via its public IP address from a browser should give you a basic web page showing the Droplet’s hostname and IP address.

In order to reduce this tutorial’s complexity, we will be using simple web servers as cluster nodes. In a production environment, the nodes would typically be configured to act as redundant load balancers. For more information about load balancers, check out our Introduction to HAProxy and Load Balancing Concepts guide.

##Step 2 — Create and Assign Reserved IP The first step is to create a Reserved IP and assign it to the primary server. In the DigitalOcean Control Panel, click Networking in the top menu, then Reserved IPs in the side menu.

You should see a page like this:

Reserved IPs Control Panel

Select your primary server and click on the “Assign Reserved IP” button. After the Reserved IP has been assigned, check that you can reach the primary Droplet by accessing the Reserved IP address from your browser:

http://your_reserved_ip

You should see the index page of your primary Droplet. ##Step 3 — Create IP Reassignment Script In this step, we’ll demonstrate how the DigitalOcean API can be used to reassign a Reserved IP to another Droplet. Later on, we will configure Pacemaker to execute this script when the cluster detects a failure in one of the nodes.

For our example, we are going to use a basic Python script that takes a Reserved IP address and a Droplet ID as arguments in order to assign the Reserved IP to the given Droplet. The Droplet’s ID can be fetched from within the Droplet itself using the Metadata service.

Let’s start by downloading the assign-ip script and making it executable. Feel free to review the contents of the script before downloading it.

The following two commands should be executed on both servers (primary and secondary):

  1. sudo curl -L -o /usr/local/bin/assign-ip http://do.co/assign-ip
  2. sudo chmod +x /usr/local/bin/assign-ip

The assign-ip script requires the following information in order to be executed:

  • Reserved IP: The first argument to the script, the Reserved IP that is being assigned
  • Droplet ID: The second argument to the script, the Droplet ID that the Reserved IP should be assigned to
  • DigitalOcean API Token : Passed in as the environment variable DO_TOKEN, your read/write DigitalOcean Personal Access Token

###Testing the IP Reassignment Script To monitor the IP reassignment taking place, we can use a curl command to access the Reserved IP address in a loop, with an interval of 1 second between each request.

Open a new local terminal and run the following command, making sure to replace reserved_IP_address with your actual Reserved IP address:

  1. while true; do curl reserved_IP_address; sleep 1; done

This command will keep running in the active terminal until interrupted with a CTRL+C. It simply fetches the web page hosted by the server that your Reserved IP is currently assigned to. The output should look like this:

Output
Droplet: primary, IP Address: primary_IP_address Droplet: primary, IP Address: primary_IP_address Droplet: primary, IP Address: primary_IP_address ...

Now, let’s run the assign-ip script to reassign the Reserved IP to the secondary droplet. We will use DigitalOcean’s Metadata service to fetch the current Droplet ID and use it as an argument to the script. Fetching the Droplet’s ID from the Metadata service can be done with:

  1. curl -s http://169.254.169.254/metadata/v1/id

Where 169.254.169.254 is a static IP address used by the Metadata service, and therefore should not be modified. This information is only available from within the Droplet itself.

Before we can execute the script, we need to set the DO_TOKEN environment variable containing the DigitalOcean API token. Run the following command from the secondary server, and don’t forget to replace your_api_token with your read/write Personal Access Token to the DigitalOcean API:

  1. export DO_TOKEN=your_api_token

Still on the secondary server, run the assign-ip script replacing reserved_IP_address with your Reserved IP address:

  1. assign-ip reserved_IP_address `curl -s http://169.254.169.254/metadata/v1/id`
Output
Moving IP address: in-progress

By monitoring the output produced by the curl command on your local terminal, you will notice that the Reserved IP will change its assigned IP address and start pointing to the secondary Droplet after a few seconds:

Output
Droplet: primary, IP Address: primary_IP_address Droplet: primary, IP Address: primary_IP_address Droplet: secondary, IP Address: secondary_IP_address

You can also access the Reserved IP address from your browser. You should get a page showing the secondary Droplet information. This means that the reassignment script worked as expected.

To reassign the Reserved IP back to the primary server, repeat the 2-step process but this time from the primary Droplet:

  1. export DO_TOKEN=your_api_token
  2. assign-ip reserved_IP_address `curl -s http://169.254.169.254/metadata/v1/id`

After a few seconds, the Reserved IP should be pointing to your primary Droplet again. ##Step 4 — Install Corosync, Pacemaker and PCS The next step is to get Corosync, Pacemaker and PCS installed on your Droplets. Because Corosync is a dependency to Pacemaker, it’s usually a better idea to simply install Pacemaker and let the system decide which Corosync version should be installed.

Install the software packages on both servers:

  1. sudo yum install pacemaker pcs

The PCS utility creates a new system user during installation, named hacluster, with a disabled password. We need to define a password for this user on both servers. This will enable PCS to perform tasks such as synchronizing the Corosync configuration on multiple nodes, as well as starting and stopping the cluster.

On both servers, run:

  1. passwd hacluster

You should use the same password on both servers. We are going to use this password to configure the cluster in the next step.

The user hacluster has no interactive shell or home directory associated with its account, which means it’s not possible to log into the server using its credentials.

##Step 5 — Set Up the Cluster Now that we have Corosync, Pacemaker and PCS installed on both servers, we can set up the cluster. ###Enabling and Starting PCS To enable and start the PCS daemon, run the following on both servers:

  1. sudo systemctl enable pcsd.service
  2. sudo systemctl start pcsd.service

###Obtaining the Private Network IP Address for Each Node For improved network performance and security, the nodes should be connected using the private network. The easiest way to obtain the Droplet’s private network IP address is via the Metadata service. On each server, run the following command:

  1. curl http://169.254.169.254/metadata/v1/interfaces/private/0/ipv4/address && echo

This command will simply output the private network IP address of the Droplet you’re logged in. You can also find this information on your Droplet’s page at the DigitalOcean Control Panel (under the Settings tab).

Collect the private network IP address from both Droplets for the next steps. ###Authenticating the Cluster Nodes Authenticate the cluster nodes using the username hacluster and the same password you defined on step 3. You’ll need to provide the private network IP address for each node. From the primary server, run:

  1. sudo pcs cluster auth primary_private_IP_address secondary_private_IP_address

You should get output like this:

Output
Username: hacluster Password: primary_private_IP_address: Authorized secondary_private_IP_address: Authorized

###Generating the Corosync Configuration Still on the primary server, generate the Corosync configuration file with the following command:

  1. sudo pcs cluster setup --name webcluster \
  2. primary_private_IP_address secondary_private_IP_address

The output should look similar to this:

Output
Shutting down pacemaker/corosync services... Redirecting to /bin/systemctl stop pacemaker.service Redirecting to /bin/systemctl stop corosync.service Killing any remaining services... Removing all cluster configuration files... primary_private_IP_address: Succeeded secondary_private_IP_address: Succeeded Synchronizing pcsd certificates on nodes primary_private_IP_address, secondary_private_IP_address... primary_private_IP_address: Success secondary_private_IP_address: Success Restaring pcsd on the nodes in order to reload the certificates... primary_private_IP_address: Success secondary_private_IP_address: Success

This will generate a new configuration file located at /etc/corosync/corosync.conf based on the parameters provided to the pcs cluster setup command. We used webcluster as the cluster name in this example, but you can use the name of your choice. ###Starting the Cluster To start the cluster you just set up, run the following command from the primary server:

  1. sudo pcs cluster start --all
Output
primary_private_IP_address: Starting Cluster... secondary_private_IP_address: Starting Cluster...

You can now confirm that both nodes joined the cluster by running the following command on any of the servers:

  1. sudo pcs status corosync
Output
Membership information ---------------------- Nodeid Votes Name 2 1 secondary_private_IP_address 1 1 primary_private_IP_address (local)

To get more information about the current status of the cluster, you can run:

  1. sudo pcs cluster status

The output should be similar to this:

Output
Cluster Status: Last updated: Fri Dec 11 11:59:09 2015 Last change: Fri Dec 11 11:59:00 2015 by hacluster via crmd on secondary Stack: corosync Current DC: secondary (version 1.1.13-a14efad) - partition with quorum 2 nodes and 0 resources configured Online: [ primary secondary ] PCSD Status: primary (primary_private_IP_address): Online secondary (secondary_private_IP_address): Online

Now you can enable the corosync and pacemaker services to make sure they will start when the system boots. Run the following on both servers:

  1. sudo systemctl enable corosync.service
  2. sudo systemctl enable pacemaker.service

###Disabling STONITH STONITH (Shoot The Other Node In The Head) is a fencing technique intended to prevent data corruption caused by faulty nodes in a cluster that are unresponsive but still accessing application data. Because its configuration depends on a number of factors that are out of scope for this guide, we are going to disable STONITH in our cluster setup.

To disable STONITH, run the following command on one of the Droplets, either primary or secondary:

  1. sudo pcs property set stonith-enabled=false

##Step 6 — Create Reserved IP Reassignment Resource Agent The only thing left to do is to configure the resource agent that will execute the IP reassignment script when a failure is detected in one of the cluster nodes. The resource agent is responsible for creating an interface between the cluster and the resource itself. In our case, the resource is the assign-ip script. The cluster relies on the resource agent to execute the right procedures when given a start, stop or monitor command. There are different types of resource agents, but the most common one is the OCF (Open Cluster Framework) standard.

We will create a new OCF resource agent to manage the assign-ip service on both servers.

First, create the directory that will contain the resource agent. The directory name will be used by Pacemaker as an identifier for this custom agent. Run the following on both servers:

  1. sudo mkdir /usr/lib/ocf/resource.d/digitalocean

Next, download the FloatIP resource agent script and place it in the newly created directory, on both servers:

  1. sudo curl -L -o /usr/lib/ocf/resource.d/digitalocean/floatip http://do.co/ocf-floatip

Now make the script executable with the following command on both servers:

  1. sudo chmod +x /usr/lib/ocf/resource.d/digitalocean/floatip

We still need to register the resource agent within the cluster, using the PCS utility. The following command should be executed from one of the nodes (don’t forget to replace your_api_token with your DigitalOcean API token and reserved_IP_address with your actual Reserved IP address):

  1. sudo pcs resource create FloatIP ocf:digitalocean:floatip \
  2. params do_token=your_api_token \
  3. reserved_ip=reserved_IP_address

The resource should now be registered and active in the cluster. You can check the registered resources from any of the nodes with the pcs status command:

  1. sudo pcs status
Output
... 2 nodes and 1 resource configured Online: [ primary secondary ] Full list of resources: FloatIP (ocf::digitalocean:floatip): Started primary ...

##Step 7 — Test Failover Your cluster should now be ready to handle a node failure. A simple way to test failover is to restart the server that is currently active in your Reserved IP setup. If you’ve followed all steps in this tutorial, this should be the primary server.

Again, let’s monitor the IP reassignment by using a curl command in a loop. From a local terminal, run:

  1. while true; do curl reserved_IP_address; sleep 1; done

From the primary server, run a reboot command:

  1. sudo reboot

After a few moments, the primary server should become unavailable. This will cause the secondary server to take over as the active node. You should see output similar to this in your local terminal running curl:

Output
... Droplet: primary, IP Address: primary_IP_address Droplet: primary, IP Address: primary_IP_address curl: (7) Failed connect to reserved_IP_address; Connection refused Droplet: secondary, IP Address: secondary_IP_address Droplet: secondary, IP Address: secondary_IP_address

The “Connection refused” error happens when the request is made right before or at the same time when the IP reassignment is taking place. It may or may not show up in the output.

If you want to point the Reserved IP back to the primary node while also testing failover on the secondary node, just repeat the process but this time from the secondary Droplet:

  1. sudo reboot

##Conclusion

In this guide, we saw how Reserved IPs can be used together with Corosync, Pacemaker and PCS to create a highly available web server environment on CentOS 7 servers. We used a rather simple infrastructure to demonstrate the usage of Reserved IPs, but this setup can be scaled to implement high availability at any level of your application stack.

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about our products

About the authors
Default avatar

Developer Advocate

Dev/Ops passionate about open source, PHP, and Linux.

Still looking for an answer?

Ask a questionSearch for more help

Was this helpful?
 
9 Comments


This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

The current working command is:

pcs resource create FloatIP ocf:digitalocean:floatip do_token=token floating_ip=164.9.2.91 --force

Thanks for this tutorial. My question is regarding the failover criteria. Either stopping the cluster or rebooting the service will trigger a failover but when stopping the etcd service on one of the nodes, a failover is not triggered. I’m assuming this is intended because of how the corosync/pacemaker clustering works: in such a manner that it will route requests to other nodes in the cluster even if the etcd service is down on the given node.

Thanks for this tutorial. My question is regarding the failover criteria. Either stopping the cluster or rebooting the service will trigger a failover but when stopping the etcd service on one of the nodes, a failover is not triggered. I’m assuming this is intended because of how the corosync/pacemaker clustering works: in such a manner that it will route requests to other nodes in the cluster even if the etcd service is down on the given node.

This comment has been deleted

    This comment has been deleted

      This comment has been deleted

        There is a typo in step 6 command example. You need to remove params from the command otherwise it will not work.

        sudo pcs resource create FloatIP ocf:digitalocean:floatip do_token=your_api_token floating_ip=floating_IP_address 
        

        pcs cluster setup --name HaCluster 192.168.153.184 192.168.163.185 --force Error: Unable to set corosync config: Unable to connect to 192.168.163.185 ([Errno 111] Connection refused) I don’t know why I have error. With regards. Thx

        Hi,

        I had the error “Unable to communicate with IP_…” when I used the pcs cluster auth command.

        # pcs cluster auth IP1 IP2
        Error: Unable to communicate with IP2
        IP1: Authorized
        

        I found a solution, it may help some of you. When you have the firewall enabled you have to allow following ports: TCP: 2224, 3121, 21064 UDP: 5405

        Regards,

        Try DigitalOcean for free

        Click below to sign up and get $200 of credit to try our products over 60 days!

        Sign up

        Join the Tech Talk
        Success! Thank you! Please check your email for further details.

        Please complete your information!

        Become a contributor for community

        Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.

        DigitalOcean Documentation

        Full documentation for every DigitalOcean product.

        Resources for startups and SMBs

        The Wave has everything you need to know about building a business, from raising funding to marketing your product.

        Get our newsletter

        Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.

        New accounts only. By submitting your email you agree to our Privacy Policy

        The developer cloud

        Scale up as you grow — whether you're running one virtual machine or ten thousand.

        Get started for free

        Sign up and get $200 in credit for your first 60 days with DigitalOcean.*

        *This promotional offer applies to new accounts only.