Manager, Developer Education
An earlier version of this tutorial was written by Justin Ellingwood.
When storing any critical data, having a single point of failure is very risky. While many databases and other software allow you to spread data out in the context of a single application, other systems can operate on the filesystem level to ensure that data is copied to another location whenever it’s written to disk.
GlusterFS is a network-attached storage filesystem that allows you to pool storage resources of multiple machines. In turn, this lets you treat multiple storage devices that are distributed among many computers as a single, more powerful unit. GlusterFS also gives you the freedom to create different kinds of storage configurations, many of which are functionally similar to RAID levels. For instance, you can stripe data across different nodes in the cluster, or you can implement redundancy for better data availability.
In this guide, you will create a redundant clustered storage array, also known as a distributed file system or, as it’s referred to in the GlusterFS documentation, a Trusted Storage Pool. This will provide functionality similar to a mirrored RAID configuration over the network: each independent server will contain its own copy of the data, allowing your applications to access either copy, thereby helping distribute your read load.
This redundant GlusterFS cluster will consist of two Ubuntu 20.04 servers. This will act similar to an NAS server with mirrored RAID. You’ll then access the cluster from a third Ubuntu 20.04 server configured to function as a GlusterFS client.
When you add data to a GlusterFS volume, that data gets synced to every machine in the storage pool where the volume is hosted. This traffic between nodes isn’t encrypted by default, meaning there’s a risk it could be intercepted by malicious actors.
For this reason, if you’re going to use GlusterFS in production, it’s recommended that you run it on an isolated network. For example, you could set it up to run in a Virtual Private Cloud (VPC) or with a VPN running between each of the nodes.
If you plan to deploy GlusterFS on DigitalOcean, you can set it up in an isolated network by adding your server infrastructure to a DigitalOcean Virtual Private Cloud. For details on how to set this up, see our VPC product documentation.
To follow this tutorial, you will need three servers running Ubuntu 20.04. Each server should have a non-root user with administrative privileges, and a firewall configured with UFW. To set this up, follow our initial server setup guide for Ubuntu 20.04.
Note: As mentioned in the Goals section, this tutorial will walk you through configuring two of your Ubuntu servers to act as servers in your storage pool and the remaining one to act as a client which you’ll use to access these nodes.
For clarity, this tutorial will refer to these machines with the following hostnames:
Hostname | Role in Storage Pool |
---|---|
gluster0 | Server |
gluster1 | Server |
gluster2 | Client |
Commands that should be run on either gluster0 or gluster1 will have blue and red backgrounds, respectively:
-
-
Commands that should only be run on the client (gluster2) will have a green background:
-
Commands that can or should be run on more than one machine will have a gray background:
-
Setting up some kind of hostname resolution between each computer can help with managing your Gluster storage pool. This way, whenever you have to reference one of your machines in a gluster
command later in this tutorial, you can do so with an easy-to-remember domain name or even a nickname instead of their respective IP addresses.
If you do not have a spare domain name, or if you just want to set up something quickly, you can instead edit the /etc/hosts
file on each computer. This is a special file on Linux machines where you can statically configure the system to resolve any hostnames contained in the file to static IP addresses.
Note: If you’d like to configure your servers to authenticate with a domain that you own, you’ll first need to obtain a domain name from a domain registrar — like Namecheap or Enom — and configure the appropriate DNS records.
Once you’ve configured an A record for each server, you can jump ahead to Step 2. As you follow this guide, make sure that you replace glusterN.example.com and glusterN with the domain name that resolves to the respective server referenced in the example command.
If you obtained your infrastructure from DigitalOcean, you could add your domain name to DigitalOcean then set up a unique A record for each of your servers.
Using your preferred text editor, open this file with root privileges on each of your machines. Here, we’ll use nano
:
- sudo nano /etc/hosts
By default, the file will look something like this with comments removed:
127.0.1.1 hostname hostname
127.0.0.1 localhost
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
On one of your Ubuntu servers, add each server’s IP address followed by any names you wish to use to reference them in commands below the local host definition.
In the following example, each server is given a long hostname that aligns with glusterN.example.com
and a short one that aligns with glusterN
. You can change the glusterN.example.com
and glusterN
portions of each line to whatever name — or names separated by single spaces — you would like to use to access each server. Note, though, that this tutorial will use these examples throughout:
Note: If your servers are part of a Virtual Private Cloud infrastructure pool, you should use each server’s private IP address in the /etc/hosts
file rather than their public IPs.
. . .
127.0.0.1 localhost
first_ip_address gluster0.example.com gluster0
second_ip_address gluster1.example.com gluster1
third_ip_address gluster2.example.com gluster2
. . .
When you are finished adding these new lines to the /etc/hosts
file of one machine, copy and add them to the /etc/hosts
files on your other machines. Each /etc/hosts
file should contain the same lines, linking your servers’ IP addresses to the names you’ve selected.
Save and close each file when you are finished. If you used nano
, do so by pressing CTRL + X
, Y
, and then ENTER
.
Now that you’ve configured hostname resolution between each of your servers, it will be easier to run commands later on as you set up a storage pool and volume. Next, you’ll go through another step that must be completed on each of your servers. Namely, you’ll add the Gluster project’s official personal package archive (PPA) to each of your three Ubuntu servers to ensure that you can install the latest version of GlusterFS.
Although the default Ubuntu 20.04 APT repositories do contain GlusterFS packages, at the time of this writing they are not the most recent versions. One way to install the latest stable version of GlusterFS (version 7.6 as of this writing) is to add the Gluster project’s official PPA to each of your three Ubuntu servers.
Add the PPA for the GlusterFS packages by running the following command on each server:
- sudo add-apt-repository ppa:gluster/glusterfs-7
Press ENTER
when prompted to confirm that you actually want to add the PPA.
After adding the PPA, refresh each server’s local package index. This will make each server aware of the new packages available:
- sudo apt update
After adding the Gluster project’s official PPA to each server and updating the local package index, you’re ready to install the necessary GlusterFS packages. However, because two of your three machines will act as Gluster servers and the other will act as a client, there are two separate installation and configuration procedures. First, you’ll install and set up the server components.
A storage pool is any amount of storage capacity aggregated from more than one storage resource. In this step, you will configure two of your servers — gluster0 and gluster1 — as the cluster components.
On both gluster0 and gluster1, install the GlusterFS server package by typing:
- sudo apt install glusterfs-server
When prompted, press Y
and then ENTER
to confirm the installation.
The installation process automatically configures GlusterFS to run as a systemd
service. However, it doesn’t automatically start the service or enable it to run at boot.
To start glusterd
, the GlusterFS service, run the following systemctl start
command on both gluster0 and gluster1:
- sudo systemctl start glusterd.service
Then run the following command on both servers. This will enable the service to start whenever the server boots up:
- sudo systemctl enable glusterd.service
Following that, you can check the service’s status on either or both servers:
- sudo systemctl status glusterd.service
If the service is up and running, you’ll receive output like this:
Output● glusterd.service - GlusterFS, a clustered file-system server
Loaded: loaded (/lib/systemd/system/glusterd.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2020-06-02 21:32:21 UTC; 32s ago
Docs: man:glusterd(8)
Main PID: 14742 (glusterd)
Tasks: 9 (limit: 2362)
CGroup: /system.slice/glusterd.service
└─14742 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
Assuming you followed the prerequisite initial server setup guide, you will have set up a firewall with UFW on each of your machines. Because of this, you’ll need to open up the firewall on each node before you can establish communications between them and create a storage pool.
The Gluster daemon uses port 24007
, so you’ll need to allow each node access to that port through the firewall of each other node in your storage pool. To do so, run the following command on gluster0. Remember to change gluster1_ip_address
to gluster1’s IP address:
- sudo ufw allow from gluster1_ip_address to any port 24007
And run the following command on gluster1. Again, be sure to change gluster0_ip_address
to gluster0’s IP address:
- sudo ufw allow from gluster0_ip_address to any port 24007
You’ll also need to allow your client machine (gluster2) access to this port. Otherwise, you’ll run into issues later on when you try to mount the volume. Run the following command on both gluster0 and gluster1 to open up this port to your client machine:
- sudo ufw allow from gluster2_ip_address to any port 24007
Then, to ensure that no other machines are able to access Gluster’s port on either server, add the following blanket deny
rule to both gluster0 and gluster1:
- sudo ufw deny 24007
You’re now ready to establish communication between gluster0 and gluster1. To do so, you’ll need to run the gluster peer probe
command on one of your nodes. It doesn’t matter which node you use, but the following example shows the command being run on gluster0:
- sudo gluster peer probe gluster1
Essentially, this command tells gluster0 to trust gluster1 and register it as part of its storage pool. If the probe is successful, it will return the following output:
Outputpeer probe: success
You can check that the nodes are communicating at any time by running the gluster peer status
command on either one. In this example, it’s run on gluster1:
- sudo gluster peer status
If you run this command from gluster1, it will show output like this:
OutputNumber of Peers: 1
Hostname: gluster0.example.com
Uuid: a3fae496-c4eb-4b20-9ed2-7840230407be
State: Peer in Cluster (Connected)
At this point, your two servers are communicating and ready to create storage volumes with each other.
Recall that the primary goal of this tutorial is to create a redundant storage pool. To this end you’ll set up a volume with replica functionality, allowing you to keep multiple copies of your data and prevent your cluster from having a single point of failure.
To create a volume, you’ll use the gluster volume create
command with this general syntax:
sudo gluster volume create volume_name replica number_of_servers domain1.com:/path/to/data/directory domain2.com:/path/to/data/directory force
Here’s what this gluster volume create
command’s arguments and options mean:
volume_name
: This is the name you’ll use to refer to the volume after it’s created. The following example command creates a volume named volume1
.replica number_of_servers
: Following the volume name, you can define what type of volume you want to create. Recall that the goal of this tutorial is to create a redundant storage pool, so we’ll use the replica
volume type. This requires an argument indicating how many servers the volume’s data will be replicated to (2
in the case of this tutorial).domain1.com:/…
and domain2.com:/…
: These define the machines and directory location of the bricks — GlusterFS’s term for its basic unit of storage, which includes any directories on any machines that serve as a part or a copy of a larger volume — that will make up volume1
. The following example will create a directory named gluster-storage
in the root directory of both servers.force
: This option will override any warnings or options that would otherwise come up and halt the volume’s creation.Following the conventions established earlier in this tutorial, you can run this command to create a volume. Note that you can run it from either gluster0 or gluster1:
- sudo gluster volume create volume1 replica 2 gluster0.example.com:/gluster-storage gluster1.example.com:/gluster-storage force
If the volume was created successfully, you’ll receive the following output:
Outputvolume create: volume1: success: please start the volume to access data
At this point, your volume has been created, but it’s not yet active. You can start the volume and make it available for use by running the following command, again from either of your Gluster servers:
- sudo gluster volume start volume1
You’ll receive this output if the volume was started correctly:
Outputvolume start: volume1: success
Next, check that the volume is online. Run the following command from either one of your nodes:
- sudo gluster volume status
This will return output similar to this:
OutputStatus of volume: volume1
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick gluster0.example.com:/gluster-storage 49152 0 Y 18801
Brick gluster1.example.com:/gluster-storage 49152 0 Y 19028
Self-heal Daemon on localhost N/A N/A Y 19049
Self-heal Daemon on gluster0.example.com N/A N/A Y 18822
Task Status of Volume volume1
------------------------------------------------------------------------------
There are no active volume tasks
Based on this output, the bricks on both servers are online.
As a final step to configuring your volume, you’ll need to open up the firewall on both servers so your client machine will be able to connect to and mount the volume. According to the previous command’s sample output, volume1
is running on port 49152
on both machines. This is GlusterFS’s default port for its initial volume, and the next volume you create will use port 49153
, then 49154
, and so on.
Run the following command on both gluster0 and gluster1 to allow gluster2 access to this port through each one’s respective firewall:
- sudo ufw allow from gluster2_ip_address to any port 49152
Then, for an added layer of security, add another blanket deny
rule for the volume’s port on both gluster0 and gluster1. This will ensure that no machines other than your client can access the volume on either server:
- sudo ufw deny 49152
Now that your volume is up and running, you can set up your client machine and begin using it remotely.
Your volume is now configured and available for use by your client machine. Before you begin though, you need to install the glusterfs-client
package from the PPA you set up in Step 1 on your client machine. This package’s dependencies include some of GlusterFS’s common libraries and translator modules and the FUSE tools required for it to work.
Run the following command on gluster2:
- sudo apt install glusterfs-client
You will mount your remote storage volume on your client computer shortly. Before you can do that, though, you need to create a mount point. Traditionally, this is in the /mnt
directory, but anywhere convenient can be used.
For simplicity’s sake, create a directory named /storage-pool
on your client machine to serve as the mount point. This directory name starts with a forward slash (/
) which places it in the root directory, so you’ll need to create it with sudo
privileges:
- sudo mkdir /storage-pool
Now you can mount the remote volume. Before that, though, take a look at the syntax of the mount
command you’ll use to do so:
sudo mount -t glusterfs domain1.com:volume_name /path/to/mount/point
mount
is a utility found in many Unix-like operating systems. It’s used to mount filesystems — anything from external storage devices, like SD cards or USB sticks, to network-attached storage as in the case of this tutorial — to directories on the machine’s existing filesystem. The mount
command syntax you’ll use includes the -t
option, which requires three arguments: the type of filesystem to be mounted, the device where the filesystem to mount can be found, and the directory on the client where you’ll mount the volume.
Notice that in this example syntax, the device argument points to a hostname followed by a colon and then the volume’s name. GlusterFS abstracts the actual storage directories on each host, meaning that this command doesn’t look to mount the /gluster-storage
directory, but instead the volume1
volume.
Also notice that you only have to specify one member of the storage cluster. This can be either node, since the GlusterFS service treats them as one machine.
Run the following command on your client machine (gluster2) to mount the volume to the /storage-pool
directory you created:
- sudo mount -t glusterfs gluster0.example.com:/volume1 /storage-pool
Following that, run the df
command. This will display the amount of available disk space for file systems to which the user invoking it has access:
- df
This command will show that the GlusterFS volume is mounted at the correct location:
OutputFilesystem 1K-blocks Used Available Use% Mounted on
. . .
gluster0.example.com:/volume1 50633164 1938032 48695132 4% /storage-pool
Now, you can move on to testing that any data you write to the volume on your client gets replicated to your server nodes as expected.
Now that you’ve set up your client to use your storage pool and volume, you can test its functionality.
On your client machine (gluster2), navigate to the mount point that you defined in the previous step:
- cd /storage-pool
Then create a few test files. The following command creates ten separate empty files in your storage pool:
- sudo touch file_{0..9}.test
If you examine the storage directories you defined earlier on each storage host, you’ll discover that all of these files are present on each system.
On gluster0:
- ls /gluster-storage
Outputfile_0.test file_2.test file_4.test file_6.test file_8.test
file_1.test file_3.test file_5.test file_7.test file_9.test
Likewise, on gluster1:
- ls /gluster-storage
Outputfile_0.test file_2.test file_4.test file_6.test file_8.test
file_1.test file_3.test file_5.test file_7.test file_9.test
As these outputs show, the test files that you added to the client were also written to both of your nodes.
If there is ever a point when one of the nodes in your storage cluster is down, it could fall out of sync with the storage pool if any changes are made to the filesystem. Doing a read operation on the client mount point after the node comes back online will alert the node to get any missing files:
- ls /storage-pool
Now that you’ve verified that your storage volume is mounted correctly and can replicate data to both machines in the cluster, you can lock down access to the storage pool.
At this point, any computer can connect to your storage volume without any restrictions. You can change this by setting the auth.allow
option, which defines the IP addresses of whatever clients should have access to the volume.
If you’re using the /etc/hosts
configuration, the names you’ve set for each server will not route correctly. You must use a static IP address instead. On the other hand, if you’re using DNS records, the domain name you’ve configured will work here.
On either one of your storage nodes (gluster0 or gluster1), run the following command:
- sudo gluster volume set volume1 auth.allow gluster2_ip_address
If the command completes successfully, it will return this output:
Outputvolume set: success
If you need to remove the restriction at any point, you can type:
- sudo gluster volume set volume1 auth.allow *
This will allow connections from any machine again. This is insecure, but can be useful for debugging issues.
If you have multiple clients, you can specify their IP addresses or domain names at the same time (depending whether you are using /etc/hosts
or DNS hostname resolution), separated by commas:
- sudo gluster volume set volume1 auth.allow gluster_client1_ip,gluster_client2_ip
Your storage pool is now configured, secured, and ready for use. Next you’ll learn a few commands that will help you get information about the status of your storage pool.
When you begin changing some of the settings for your GlusterFS storage, you might get confused about what options you have available, which volumes are live, and which nodes are associated with each volume.
There are a number of different commands that are available on your nodes to retrieve this data and interact with your storage pool.
If you want information about each of your volumes, run the gluster volume info
command:
- sudo gluster volume info
OutputVolume Name: volume1
Type: Replicate
Volume ID: a1e03075-a223-43ab-a0f6-612585940b0c
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: gluster0.example.com:/gluster-storage
Brick2: gluster1.example.com:/gluster-storage
Options Reconfigured:
auth.allow: gluster2_ip_address
transport.address-family: inet
storage.fips-mode-rchecksum: on
nfs.disable: on
performance.client-io-threads: off
Similarly, to get information about any peers that this node is connected to, you can type:
- sudo gluster peer status
Number of Peers: 1
Hostname: gluster0.example.com
Uuid: cb00a2fc-2384-41ac-b2a8-e7a1793bb5a9
State: Peer in Cluster (Connected)
If you want detailed information about how each node is performing, you can profile a volume by typing:
- sudo gluster volume profile volume_name start
When this command is complete, you can obtain the information that was gathered by typing:
- sudo gluster volume profile volume_name info
OutputBrick: gluster0.example.com:/gluster-storage
--------------------------------------------
Cumulative Stats:
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
0.00 0.00 us 0.00 us 0.00 us 30 FORGET
0.00 0.00 us 0.00 us 0.00 us 36 RELEASE
0.00 0.00 us 0.00 us 0.00 us 38 RELEASEDIR
Duration: 5445 seconds
Data Read: 0 bytes
Data Written: 0 bytes
Interval 0 Stats:
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
0.00 0.00 us 0.00 us 0.00 us 30 FORGET
0.00 0.00 us 0.00 us 0.00 us 36 RELEASE
0.00 0.00 us 0.00 us 0.00 us 38 RELEASEDIR
Duration: 5445 seconds
Data Read: 0 bytes
Data Written: 0 bytes
. . .
As shown previously, for a list of all of the GlusterFS associated components running on each of your nodes, run the gluster volume status
command:
- sudo gluster volume status
OutputStatus of volume: volume1
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick gluster0.example.com:/gluster-storage 49152 0 Y 19003
Brick gluster1.example.com:/gluster-storage 49152 0 Y 19040
Self-heal Daemon on localhost N/A N/A Y 19061
Self-heal Daemon on gluster0.example.com N/A N/A Y 19836
Task Status of Volume volume1
------------------------------------------------------------------------------
There are no active volume tasks
If you are going to be administering your GlusterFS storage volumes, it may be a good idea to drop into the GlusterFS console. This will allow you to interact with your GlusterFS environment without needing to type sudo gluster
before everything:
- sudo gluster
This will give you a prompt where you can type your commands. help
is a good one to get yourself oriented:
- help
Output peer help - display help for peer commands
volume help - display help for volume commands
volume bitrot help - display help for volume bitrot commands
volume quota help - display help for volume quota commands
snapshot help - display help for snapshot commands
global help - list global commands
When you are finished, run exit
to exit the Gluster console:
- exit
With that, you’re ready to begin integrating GlusterFS with your next application.
By completing this tutorial, you have a redundant storage system that will allow you to write to two separate servers simultaneously. This can be useful for a number of applications and can ensure that your data is available even when one server goes down.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
Why not just provide instructions on how to implement TLS/X.509 Mutual Auth with GlusterFS in stead of pushing the VPN solution? Seems a bit heavy for a feature that GlusterFS already provides. The private network exclusive to GlusterFS communication is still a good idea in spite of that, however.