Tutorial

How To Run a Multi-Node Cluster Database with Cassandra on Ubuntu 14.04

Published on March 31, 2016
How To Run a Multi-Node Cluster Database with Cassandra on Ubuntu 14.04

Introduction

Apache Cassandra is a highly scalable open source database system, achieving great performance on multi-node setups.

Previously, we went over how to run a single-node Cassandra cluster. In this tutorial, you’ll learn how to install and use Cassandra to run a multi-node cluster on Ubuntu 14.04.

Prerequisites

Because you’re about to build a multi-node Cassandra cluster, you must determine how many servers you’d like to have in your cluster and configure each of them. It is recommended, but not required, that they have the same or similar specifications.

To complete this tutorial, you’ll need the following:

Step 1 — Deleting Default Data

Servers in a Cassandra cluster are known as nodes. What you have on each server right now is a single-node Cassandra cluster. In this step, we’ll set up the nodes to function as a multi-node Cassandra cluster.

All the commands in this and subsequent steps must be repeated on each node in the cluster, so be sure to have as many terminals open as you have nodes in the cluster.

The first command you’ll run on each node will stop the Cassandra daemon.

  1. sudo service cassandra stop

When that’s completed, delete the default dataset.

  1. sudo rm -rf /var/lib/cassandra/data/system/*

Step 2 — Configuring the Cluster

Cassandra’s configuration file is located in the /etc/cassandra directory. That configuration file, cassandra.yaml, contains many directives and is very well commented. In this step, we’ll modify that file to set up the cluster.

Only the following directives need to be modified to set up a multi-node Cassandra cluster:

  • cluster_name: This is the name of your cluster.

  • -seeds: This is a comma-delimited list of the IP address of each node in the cluster.

  • listen_address: This is IP address that other nodes in the cluster will use to connect to this one. It defaults to localhost and needs changed to the IP address of the node.

  • rpc_address: This is the IP address for remote procedure calls. It defaults to localhost. If the server’s hostname is properly configured, leave this as is. Otherwise, change to server’s IP address or the loopback address (127.0.0.1).

  • endpoint_snitch: Name of the snitch, which is what tells Cassandra about what its network looks like. This defaults to SimpleSnitch, which is used for networks in one datacenter. In our case, we’ll change it to GossipingPropertyFileSnitch, which is preferred for production setups.

  • auto_bootstrap: This directive is not in the configuration file, so it has to be added and set to false. This makes new nodes automatically use the right data. It is optional if you’re adding nodes to an existing cluster, but required when you’re initializing a fresh cluster, that is, one with no data.

Open the configuration file for editing using nano or your favorite text editor.

  1. sudo nano /etc/cassandra/cassandra.yaml

Search the file for the following directives and modify them as below to match your cluster. Replace your_server_ip with the IP address of the server you’re currently working on. The - seeds: list should be the same on every server, and will contain each server’s IP address separated by commas.

/etc/cassandra/cassandra.yaml
. . .

cluster_name: 'CassandraDOCluster'

. . .

seed_provider:
  - class_name: org.apache.cassandra.locator.SimpleSeedProvider
    parameters:
         - seeds: "your_server_ip,your_server_ip_2,...your_server_ip_n"

. . .

listen_address: your_server_ip

. . .

rpc_address: your_server_ip

. . .

endpoint_snitch: GossipingPropertyFileSnitch

. . .

At the bottom of the file, add in the auto_bootstrap directive by pasting in this line:

/etc/cassandra/cassandra.yaml
auto_bootstrap: false

When you’re finished modifying the file, save and close it. Repeat this step for all the servers you want to include in the cluster.

Step 3 — Configuring the Firewall

At this point, the cluster has been configured, but the nodes are not communicating. In this step, we’ll configure the firewall to allow Cassandra traffic.

First, restart the Cassandra daemon on each.

  1. sudo service cassandra start

If you check the status of the cluster, you’ll find that only the local node is listed, because it’s not yet able to communicate with the other nodes.

  1. sudo nodetool status
Output
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens       Owns    Host ID                               Rack
UN  192.168.1.4  147.48 KB  256          ?       f50799ee-8589-4eb8-a0c8-241cd254e424  rack1

Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless

To allow communication, we’ll need to open the following network ports for each node:

  • 7000, which is the TCP port for commands and data.

  • 9042, which is the TCP port for the native transport server. cqlsh, the Cassandra command line utility, will connect to the cluster through this port.

To modify the firewall rules, open the rules file for IPv4.

  1. sudo nano /etc/iptables/rules.v4

Copy and paste the following line within the INPUT chain, which will allow traffic on the aforementioned ports. If you’re using the rules.v4 file from the firewall tutorial, you can insert the following line just before the # Reject anything that's fallen through to this point comment.

The IP address specified by-s should be the IP address of another node in the cluster. If you have two nodes with IP addresses 111.111.111.111 and 222.222.222.222, the rule on the 111.111.111.111 machine should use the IP address 222.222.222.222.

New firewall rule
-A INPUT -p tcp -s your_other_server_ip -m multiport --dports 7000,9042 -m state --state NEW,ESTABLISHED -j ACCEPT

After adding the rule, save and close the file, then restart IPTables.

  1. sudo service iptables-persistent restart

Step 4 — Check the Cluster Status

We’ve now completed all the steps needed to make the nodes into a multi-node cluster. You can verify that they’re all communicating by checking their status.

  1. sudo nodetool status
Output
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens       Owns    Host ID                               Rack
UN  192.168.1.4  147.48 KB  256          ?       f50799ee-8589-4eb8-a0c8-241cd254e424  rack1
UN  192.168.1.6  139.04 KB  256          ?       54b16af1-ad0a-4288-b34e-cacab39caeec  rack1

Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless

If you can see all the nodes you configured, you’ve just successfully set up a multi-node Cassandra cluster.

You can also check if you can connect to the cluster using cqlsh, the Cassandra command line client. Note that you can specify the IP address of any node in the cluster for this command.

  1. cqlsh your_server_ip 9042

You will see it connect:

Output
Connected to My DO Cluster at 192.168.1.6:9042.
[cqlsh 5.0.1 | Cassandra 2.2.3 | CQL spec 3.3.1 | Native protocol v4]
Use HELP for help.
cqlsh>

Then you can exit the CQL terminal.

  1. exit

Conclusion

Congratulations! You now have a multi-node Cassandra cluster running on Ubuntu 14.04. More information about Cassandra is available at the project’s website. If you need to troubleshoot the cluster, the first place to look for clues are in the log files, which are located in the /var/log/cassandra directory.

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about our products

About the authors
Default avatar
finid

author


Default avatar

staff technical writer

hi! i write do.co/docs now, but i used to be the senior tech editor publishing tutorials here in the community.


Still looking for an answer?

Ask a questionSearch for more help

Was this helpful?
 
5 Comments


This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

Hi, When I try to run: sudo nodetool status I got this line: nodetool: Failed to connect to ‘127.0.0.1:7199’ - ConnectException: ‘Connection refused’.

What to do?

Hi m trying to run cassandra cluster with nodejs and I have this table with just 2k rows, with loadtest I found out that with concurrent 110-120 users it starts crashing nodejs so I tried adding connection pooling and stuff which increased users to 200 but I want it to be at least 10k concurrent users I must be missing something if you could please help me out, would really really appreciate it

Hello Team,

How to make cassandra accessible remotely (Installed in AWS Ec2 Ubuntu 16.04 LTS)? I’m using cassandra 3.10 CQL 5.10.

I tried installing cassandra in my local machine it works unerringly.

When it comes to remote machine ie aws - Ec2 Ubuntu 16.04 LTS works locally but i wanted to make the cassandra access to my applications by changing few parameters in .yaml file and .sh files.

Please look into the below parameters and procedure which i followed/changed are right or wrong.

  1. The list parameters changed in cassandra.yaml

listen_address : 54.32.XX.XX (Public IP address) seed : 54.32.XX.XX (Public IP address) rpc_start : true rpc_address : 54.32.XX.XX (Public IP address) endpoint_snitch : Ec2Snitch

  1. The parameter changed in the cassandra-env.sh

Djava.rmi… : <54.32.XX.XX>(Public IP address)

  1. Save and exit.

  2. restart the cassandra service using the terminal commands i.e sudo service cassandra stop and sudo service cassandra start

  3. Run the sudo nodetool status command

nodetool status results below message

displays the 127.0.0.1 instead of 54.32.XX.XX (Public IP address)

Please let me know how to resolve this and where m going wrong.

Thanks in advance

Regards Anith

Hi, I have some doubts about configuring cluster…

  1. If I have 8 nodes cluster. On what basis I should decide keeping 4 in rack1 other in rack2 in the same data center.

  2. If I have a cluster of some nodes with only one data center. After few year I need to add one more data center in same cluster. Will I need to reconfigure all the cluster? Is there any way to add it then?

About auto_bootstrap setting Do I need to set it true, only if I am adding new fresh node to cluster?

Hi guy’s I want to configure firewall in windows, do you have a resource for me???

Try DigitalOcean for free

Click below to sign up and get $200 of credit to try our products over 60 days!

Sign up

Join the Tech Talk
Success! Thank you! Please check your email for further details.

Please complete your information!

Become a contributor for community

Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.

DigitalOcean Documentation

Full documentation for every DigitalOcean product.

Resources for startups and SMBs

The Wave has everything you need to know about building a business, from raising funding to marketing your product.

Get our newsletter

Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.

New accounts only. By submitting your email you agree to our Privacy Policy

The developer cloud

Scale up as you grow — whether you're running one virtual machine or ten thousand.

Get started for free

Sign up and get $200 in credit for your first 60 days with DigitalOcean.*

*This promotional offer applies to new accounts only.