Tutorial

How To Configure a Multi-Node Cluster with Cassandra on a Ubuntu VPS

Published on September 11, 2013
author

Henrique Pinheiro

How To Configure a Multi-Node Cluster with Cassandra on a Ubuntu VPS

Introduction

This tutorial will teach you how to configure a Multi-Node cluster with Cassandra on a VPS. Cassandra, a highly scalable open source database system that achieves great performance when setup with multiple-nodes – even on different data centers.

Installing Cassandra on Each Node

Before we begin configuring each node, you need to have Cassandra installed in every one of them. We have an easy tutorial on how to do that with VPS. After you've installed Cassandra on every node, you need to make sure it isn't running. To close Cassandra, type in:

sudo ps auwx | grep cassandra

If a process different from the "grep" one appears, copy the proccess ID and kill it:

sudo kill -9 PID
The highlited number is the PID How to kill the proccess

You'll also need to clear data. Do so by running:

sudo rm -rf /var/lib/cassandra/*

Configuring Cassandra

To configure Cassandra for multiple nodes, you'll need to know beforehand how many nodes you're going to use, and calculate token numbers for each. We've developed a tool to do this, and you can get it here. Simply write the number of nodes you're dealing with and you'll have tokens for each node. For example, if you have three nodes, you'd have these numbers:

Node 0: 0
Node 1: 3074457345618258602
Node 2: 6148914691236517205

Now you'll need to edit your configuration file for each node. To do so, open the nano text editor by running:

nano ~/cassandra/conf/cassandra.yaml

The information you'll need to edit can be the same for all nodes (cluster_name, seed_provider, rpc_address and endpoint_snitch) or different for each one (initial_token and listen_address). Choose a node to be your seed one, and look in the configuration file for the lines that refer to each of these attributes, and modify them to your needs:

cluster_name: 'Name'
initial_token: Token
seed_provider:
    - seeds:  "Seed IP"
listen_address: Droplet's IP
rpc_address: 0.0.0.0
endpoint_snitch: RackInferringSnitch

Substitute “Name” by your cluster name, “Token” by the number you generated earlier (depending on the node), “Seed IP” by your seed node’s IP, and “Droplet’s IP” by your droplet’s IP address. Do this for each node. Example of this filled on a 3-node setup:

Node 0
cluster_name: 'MyDigitalOceanCluster'
initial_token: 0
seed_provider:
    - seeds:  "198.211.xxx.0"
listen_address: 198.211.xxx.0
rpc_address: 0.0.0.0
endpoint_snitch: RackInferringSnitch

Node 1
cluster_name: 'MyDigitalOceanCluster'
initial_token: 3074457345618258602
seed_provider:
    - seeds:  "198.211.xxx.0"
listen_address: 192.241.xxx.0
rpc_address: 0.0.0.0
endpoint_snitch: RackInferringSnitch

Node 2
cluster_name: 'MyDigitalOceanCluster'
initial_token: 6148914691236517205
seed_provider:
    - seeds:  "198.211.xxx.0"
listen_address: 37.139.xxx.0
rpc_address: 0.0.0.0
endpoint_snitch: RackInferringSnitch

To run, simply type in:

sudo sh ~/cassandra/bin/cassandra

on the seed node and when it's finished, replicate this process on the other nodes. If you don't see any errors, your multi-node Cassandra setup should be successfully deployed.

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about our products

About the authors
Default avatar
Henrique Pinheiro

author

Still looking for an answer?

Ask a questionSearch for more help

Was this helpful?
 
3 Comments


This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

http://db.tt/S5wHPN4f is a broken link… can you update it?

To configure Cassandra for multiple nodes, you'll need to know beforehand how many nodes you're going to use, and calculate token numbers for each. We've developed a tool to do this, and you can get it here. 

Hi,

I have one question when setup multi cluster node, we have only cluster name unique for all node but we have not configured ip of all node in any cassandra.yaml file. In this case how it decides to which node it has to connect?

Was very useful! Thank you

Try DigitalOcean for free

Click below to sign up and get $200 of credit to try our products over 60 days!

Sign up

Join the Tech Talk
Success! Thank you! Please check your email for further details.

Please complete your information!

Featured on Community

Get our biweekly newsletter

Sign up for Infrastructure as a Newsletter.

Hollie's Hub for Good

Working on improving health and education, reducing inequality, and spurring economic growth? We'd like to help.

Become a contributor

Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.

Welcome to the developer cloud

DigitalOcean makes it simple to launch in the cloud and scale up as you grow — whether you're running one virtual machine or ten thousand.

Learn more