Elasticsearch is a popular open source search server that is used for real-time distributed search and analysis of data. When used for anything other than development, Elasticsearch should be deployed across multiple servers as a cluster, for the best performance, stability, and scalability.
This tutorial will show you how to install and configure a production Elasticsearch cluster on Ubuntu 14.04, in a cloud server environment.
Although manually setting up an Elasticsearch cluster is useful for learning, use of a configuration management tool is highly recommended with any cluster setup. If you want to use Ansible to deploy an Elasticsearch cluster, follow this tutorial: How To Use Ansible to Set Up a Production Elasticsearch Cluster.
You must have at least three Ubuntu 14.04 servers to complete this tutorial because an Elasticsearch cluster should have a minimum of 3 master-eligible nodes. If you want to have dedicated master and data nodes, you will need at least 3 servers for your master nodes plus additional servers for your data nodes.
If you would prefer to use CentOS instead, check out this tutorial: How To Set Up a Production Elasticsearch Cluster on CentOS 7
This tutorial assumes that your servers are using a VPN like the one described here: How To Use Ansible and Tinc VPN to Secure Your Server Infrastructure. This will provide private network functionality regardless of the physical network that your servers are using.
If you are using a shared private network, you must use a VPN to protect Elasticsearch from unauthorized access. Each server must be on the same private network because Elasticsearch doesn’t have security built into its HTTP interface. The private network must not be shared with any computers you don’t trust.
We will refer to your servers’ VPN IP addresses as vpn_ip
. We will also assume that they all have a VPN interface that is named “tun0”, as described in the tutorial linked above.
Elasticsearch requires Java, so we will install that now. We will install a recent version of Oracle Java 8 because that is what Elasticsearch recommends. It should, however, work fine with OpenJDK, if you decide to go that route.
Complete this step on all of your Elasticsearch servers.
Add the Oracle Java PPA to apt:
- sudo add-apt-repository -y ppa:webupd8team/java
Update your apt package database:
- sudo apt-get update
Install the latest stable version of Oracle Java 8 with this command (and accept the license agreement that pops up):
- sudo apt-get -y install oracle-java8-installer
Be sure to repeat this step on all of your Elasticsearch servers.
Now that Java 8 is installed, let’s install ElasticSearch.
Elasticsearch can be installed with a package manager by adding Elastic’s package source list. Complete this step on all of your Elasticsearch servers.
Run the following command to import the Elasticsearch public GPG key into apt:
- wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
If your prompt is just hanging there, it is probably waiting for your user’s password (to authorize the sudo
command). If this is the case, enter your password.
Create the Elasticsearch source list:
- echo "deb http://packages.elastic.co/elasticsearch/2.x/debian stable main" | sudo tee -a /etc/apt/sources.list.d/elasticsearch-2.x.list
Update your apt package database:
- sudo apt-get update
Install Elasticsearch with this command:
- sudo apt-get -y install elasticsearch
Be sure to repeat these steps on all of your Elasticsearch servers.
Elasticsearch is now installed but it needs to be configured before you can use it.
Now it’s time to edit the Elasticsearch configuration. Complete these steps on all of your Elasticsearch servers.
Open the Elasticsearch configuration file for editing:
- sudo vi /etc/elasticsearch/elasticsearch.yml
The subsequent sections will explain how the configuration must be modified.
You will want to restrict outside access to your Elasticsearch instance, so outsiders can’t access your data or shut down your Elasticsearch cluster through the HTTP API. In other words, you must configure Elasticsearch such that it only allows access to servers on your private network (VPN). To do this, we need to configure each node to bind to the VPN IP address, vpn_ip
, or interface, “tun0”.
Find the line that specifies network.host
, uncomment it, and replace its value with the respective server’s VPN IP address (e.g. 10.0.0.1
for node01) or interface name. Because our VPN interface is named “tun0” on all of our servers, we can configure all of our servers with the same line:
network.host: [_tun0_, _local_]
Note the addition of “_local_”, which configures Elasticsearch to also listen on all loopback devices. This will allow you to use the Elasticsearch HTTP API locally, from each server, by sending requests to localhost
. If you do not include this, Elasticsearch will only respond to requests to the VPN IP address.
Warning: Because Elasticsearch doesn’t have any built-in security, it is very important that you do not set this to any IP address that is accessible to any servers that you do not control or trust. Do not bind Elasticsearch to a public or shared private network IP address!
Next, set the name of your cluster, which will allow your Elasticsearch nodes to join and form the cluster. You will want to use a descriptive name that is unique (within your network).
Find the line that specifies cluster.name
, uncomment it, and replace its value with the your desired cluster name. In this tutorial, we will name our cluster “production”:
cluster.name: production
Next, we will set the name of each node. This should be a descriptive name that is unique within the cluster.
Find the line that specifies node.name
, uncomment it, and replace its value with your desired node name. In this tutorial, we will set each node name to the hostname of server by using the ${HOSTNAME}
environment variable:
node.name: ${HOSTNAME}
If you prefer, you may name your nodes manually, but make sure that you specify unique names. You may also leave node.name
commented out, if you don’t mind having your nodes named randomly.
Next, you will need to configure an initial list of nodes that will be contacted to discover and form a cluster. This is necessary in a unicast network.
Find the line that specifies discovery.zen.ping.unicast.hosts
and uncomment it. Replace its value with an array of strings of the VPN IP addresses or hostnames (that resolve to the VPN IP addresses) of all of the other nodes.
For example, if you have three servers node01
, node02
, and node03
with respective VPN IP addresses of 10.0.0.1
, 10.0.0.2
, and 10.0.0.3
, you could use this line:
discovery.zen.ping.unicast.hosts: ["10.0.0.1", "10.0.0.2", "10.0.0.3"]
Alternatively, if all of your servers are configured with name-based resolution of their VPN IP addresses (via DNS or /etc/hosts
), you could use this line:
discovery.zen.ping.unicast.hosts: ["node01", "node02", "node03"]
Note: The Ansible Playbook in the prerequisite VPN tutorial automatically creates /etc/hosts
entries on each server that resolve each VPN server’s inventory hostname (specified in the Ansible hosts
file) to its VPN IP address.
Your servers are now configured to form a basic Elasticsearch cluster. There are more settings that you will want to update, but we’ll get to those after we verify that the cluster is working.
Save and exit elasticsearch.yml
.
Now start Elasticsearch:
- sudo service elasticsearch restart
Then run this command to start Elasticsearch on boot up:
- sudo update-rc.d elasticsearch defaults 95 10
Be sure to repeat these steps (Configure Elasticsearch Cluster) on all of your Elasticsearch servers.
If everything was configured correctly, your Elasticsearch cluster should be up and running. Before moving on, let’s verify that it’s working properly. You can do so by querying Elasticsearch from any of the Elasticsearch nodes.
From any of your Elasticsearch servers, run this command to print the state of the cluster:
- curl -XGET 'http://localhost:9200/_cluster/state?pretty'
You should see output that indicates that a cluster named “production” is running. It should also indicate that all of the nodes you configured are members:
Cluster State:{
"cluster_name" : "production",
"version" : 36,
"state_uuid" : "MIkS5sk7TQCl31beb45kfQ",
"master_node" : "k6k2UObVQ0S-IFoRLmDcvA",
"blocks" : { },
"nodes" : {
"Jx_YC2sTQY6ayACU43_i3Q" : {
"name" : "node02",
"transport_address" : "10.0.0.2:9300",
"attributes" : { }
},
"k6k2UObVQ0S-IFoRLmDcvA" : {
"name" : "node01",
"transport_address" : "10.0.0.1:9300",
"attributes" : { }
},
"kQgZZUXATkSpduZxNwHfYQ" : {
"name" : "node03",
"transport_address" : "10.0.0.3:9300",
"attributes" : { }
}
},
...
If you see output that is similar to this, your Elasticsearch cluster is running! If any of your nodes are missing, review the configuration for the node(s) in question before moving on.
Next, we’ll go over some configuration settings that you should consider for your Elasticsearch cluster.
Elastic recommends to avoid swapping the Elasticsearch process at all costs, due to its negative effects on performance and stability. One way avoid excessive swapping is to configure Elasticsearch to lock the memory that it needs.
Complete this step on all of your Elasticsearch servers.
Edit the Elasticsearch configuration:
- sudo vi /etc/elasticsearch/elasticsearch.yml
Find the line that specifies bootstrap.mlockall
and uncomment it:
bootstrap.mlockall: true
Save and exit.
Next, open the /etc/default/elasticsearch
file for editing:
- sudo vi /etc/default/elasticsearch
First, find ES_HEAP_SIZE
, uncomment it, and set it to about 50% of your available memory. For example, if you have about 4 GB free, you should set this to 2 GB (2g
):
ES_HEAP_SIZE=2g
Next, find and uncomment MAX_LOCKED_MEMORY=unlimited
. It should look like this when you’re done:
MAX_LOCKED_MEMORY=unlimited
Save and exit.
Now restart Elasticsearch to put the changes into place:
- sudo service elasticsearch restart
Be sure to repeat this step on all of your Elasticsearch servers.
To verify that mlockall
is working on all of your Elasticsearch nodes, run this command from any node:
- curl http://localhost:9200/_nodes/process?pretty
Each node should have a line that says "mlockall" : true
, which indicates that memory locking is enabled and working:
Nodes process output:...
"nodes" : {
"kQgZZUXATkSpduZxNwHfYQ" : {
"name" : "es03",
"transport_address" : "10.0.0.3:9300",
"host" : "10.0.0.3",
"ip" : "10.0.0.3",
"version" : "2.2.0",
"build" : "8ff36d1",
"http_address" : "10.0.0.3:9200",
"process" : {
"refresh_interval_in_millis" : 1000,
"id" : 1650,
"mlockall" : true
}
...
If mlockall
is false for any of your nodes, review the node’s settings and restart Elasticsearch. A common reason for Elasticsearch failing to start is that ES_HEAP_SIZE
is set too high.
By default, your Elasticsearch node should have an “Open File Descriptor Limit” of 64k. This section will show you how to verify this and, if you want to, increase it.
First, find the process ID (PID) of your Elasticsearch process. An easy way to do this is to use the ps
command to list all of the processes that belong to the elasticsearch
user:
- ps -u elasticsearch
You should see output that looks like this. The number in the first column is the PID of your Elasticsearch (java) process:
Output: PID TTY TIME CMD
11708 ? 00:00:10 java
Then run this command to show the open file limits for the Elasticsearch process (replace the highlighted number with your own PID from the previous step):
- cat /proc/11708/limits | grep 'Max open files'
OutputMax open files 65535 65535 files
The numbers in the second and third columns indicate the soft and hard limits, respectively, as 64k (65535). This is OK for many setups, but you may want to increase this setting.
To increase the maximum number of open file descriptors in Elasticsearch, you just need to change a single setting.
Open the /etc/default/elasticsearch
file for editing:
- sudo vi /etc/default/elasticsearch
Find MAX_OPEN_FILES
, uncomment it, and set it to the limit you desire. For example, if you want a limit of 128k descriptors, change it to 131070
:
MAX_OPEN_FILES=131070
Save and exit.
Now restart Elasticsearch to put the changes into place:
- sudo service elasticsearch restart
Then follow the previous subsection to verify that the limits have been increased.
Be sure to repeat this step on any of your Elasticsearch servers that require higher file descriptor limits.
There are two common types of Elasticsearch nodes: master and data. Master nodes perform cluster-wide actions, such as managing indices and determining which data nodes should store particular data shards. Data nodes hold shards of your indexed documents, and handle CRUD, search, and aggregation operations. As a general rule, data nodes consume a significant amount of CPU, memory, and I/O.
By default, every Elasticsearch node is configured to be a “master-eligible” data node, which means they store data (and perform resource-intensive operations) and have the potential to be elected as a master node. For a small cluster, this is usually fine; a large Elasticsearch cluster, however, should be configured with dedicated master nodes so that the master node’s stability can’t be compromised by intensive data node work.
Before configuring dedicated master nodes, ensure that your cluster will have at least 3 master-eligible nodes. This is important to avoid a split-brain situation, which can cause inconsistencies in your data in the event of a network failure.
To configure a dedicated master node, edit the node’s Elasticsearch configuration:
- sudo vi /etc/elasticsearch/elasticsearch.yml
Add the two following lines:
node.master: true
node.data: false
The first line, node.master: true
, specifies that the node is master-eligible and is actually the default setting. The second line, node.data: false
, restricts the node from becoming a data node.
Save and exit.
Now restart the Elasticsearch node to put the change into effect:
- sudo service elasticsearch restart
Be sure to repeat this step on your other dedicated master nodes.
You can query the cluster to see which nodes are configured as dedicated master nodes with this command: curl -XGET 'http://localhost:9200/_cluster/state?pretty'
. Any node with data: false
and master: true
are dedicated master nodes.
To configure a dedicated data node—a data node that is not master-eligible—edit the node’s Elasticsearch configuration:
- sudo vi /etc/elasticsearch/elasticsearch.yml
Add the two following lines:
node.master: false
node.data: true
The first line, node.master: false
, specifies that the node is not master-eligible. The second line, node.data: true
, is the default setting which allows the node to be a data node.
Save and exit.
Now restart the Elasticsearch node to put the change into effect:
- sudo service elasticsearch restart
Be sure to repeat this step on your other dedicated data nodes.
You can query the cluster to see which nodes are configured as dedicated data nodes with this command: curl -XGET 'http://localhost:9200/_cluster/state?pretty'
. Any node that lists master: false
and does not list data: false
are dedicated data nodes.
When running an Elasticsearch cluster, it is important to set the minimum number of master-eligible nodes that need to be running for the cluster to function normally, which is sometimes referred to as quorum. This is to ensure data consistency in the event that one or more nodes lose connectivity to the rest of the cluster, preventing what is known as a “split-brain” situation.
To calculate the number of minimum master nodes your cluster should have, calculate n / 2 + 1
, where n is the total number of “master-eligible” nodes in your healthy cluster, then round the result down to the nearest integer. For example, for a 3-node cluster, the quorum is 2.
Note: Be sure to include all master-eligible nodes in your quorum calculation, including any data nodes that are master-eligible (default setting).
The minimum master nodes setting can be set dynamically, through the Elasticsearch HTTP API. To do so, run this command on any node (replace the highlighted number with your quorum):
- curl -XPUT localhost:9200/_cluster/settings?pretty -d '{
- "persistent" : {
- "discovery.zen.minimum_master_nodes" : 2
- }
- }'
Output:{
"acknowledged" : true,
"persistent" : {
"discovery" : {
"zen" : {
"minimum_master_nodes" : "2"
}
}
},
"transient" : { }
}
Note: This command is a “persistent” setting, meaning the minimum master nodes setting will survive full cluster restarts and override the Elasticsearch configuration file. Also, this setting can be specified as discovery.zen.minimum_master_nodes: 2
in /etc/elasticsearch.yml
if you have not already set it dynamically.
If you want to check this setting later, you can run this command:
- curl -XGET localhost:9200/_cluster/settings?pretty
You may access the Elasticsearch HTTP API by sending requests to the VPN IP address any of the nodes or, as demonstrated in the tutorial, by sending requests to localhost
from one of the nodes.
Your Elasticsearch cluster is accessible to client servers via the VPN IP address of any of the nodes, which means that the client servers must also be part of the VPN.
If you have other software that needs to connect to your cluster, such as Kibana or Logstash, you can typically configure the connection by providing your application with the VPN IP addresses of one or more of the Elasticsearch nodes.
Your Elasticsearch cluster should be running in a healthy state, and configured with some basic optimizations!
Elasticsearch has many other configuration options that weren’t covered here, such as index, shard, and replication settings. It is recommended that you revisit your configuration later, along with the official documentation, to ensure that your cluster is configured to meet your needs.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
Best tutorial I’ve seen on this topic. Thanks!
Hi Mitchell, Thanks for the blog i think this is one of the very few pieces of good documentation about elasticsearch that deserves mention.
I tried following the steps here but there was an issue it would be great if you can help me around it. I set up the elasticsearch on 3 nodes in an instance group on google compute engine.
My configuration file with non-commented parameters was -
When I check about the cluster health on each of the three nodes, the result is as if each is independently acting as a cluster with one node.
Can you help in figuring out what can be causing this issue ? Or what should be the values on network.bind_host or network.publish_host or if there is something else that I’m unaware of.
Thanks in advance.
Missing steps:
To enable private network, follow How To Enable DigitalOcean Private Networking on Existing Droplets. Than add the other nodes to you /etc/hosts like in Add an Entry to /etc/hosts.
After this you should be able to ping a node with the dns name (e.g.
ping pnv1
like in the example above). You should also see you 10.x.x.x (Your private network address) in theifconfig
under the nameeth1
. The config for two nodes is like the following:Then you need to edit the init script
vim /etc/init.d/elasticsearch
:Restart and make the checks like above.
That’s a great tutorial - as usual. I’m new to ES, however I have a question about the three droplets required, as mentioned on Elasticsearch website, it is recommended to have your server running 16GB or ram, in that case, i need the three droplets to be 16GB each? Thanks.
Hello, Thanks for nice tutorial!
I have one question: As you said at the end of tutorial, my application connect to cluster by connect to one of nodes of cluster. But could elasticsearch auto/dynamic switch other node in the same cluster when node which application is connecting down?
Service commands don’t seem to work, I keep getting:
Failed to restart elasticsearch.service: Unit elasticsearch.service not found
I did on this on 16.04 boxes, a few modifications to the settings and it worked fine. How is ES controlled?
EDIT: duh :) https://www.digitalocean.com/community/tutorials/how-to-install-and-configure-elasticsearch-on-ubuntu-16-04
EDIT 2: still not working w/ systemctl
I have a 3 node basic cluster. Which node should I install kibana on to view the cluster statistics on gui? I happen to observe that the master node changes sometimes when I do a curl -XGET ‘http://elasticsearch_node_IP:9200/_cluster/state?pretty’. It shows node1 and sometimes node2 and node3.
Hi, I have want to configure 3 node elasticsearch Cuter . I want to know what would be the best configuration of elasticsearch.
Elastic01 172.24.32.209 ( Elasticsearch, Kibana, Logstash installed) Elastic02 172.24.32.217 ( Elasticsearch, Kibana, Logstash installed) Elastic 03 172.24.32.218 ( Elasticsearch, Kibana, Logstash installed)
Change settings on all Nodes like follows
cluster.name: Production node.name: ${HOSTNAME} path.data: /data/elkdata path.logs: /data/elklogs network.host: 0.0.0.0 discovery.zen.ping.unicast.hosts: [“172.24.32.209”, “172.24.32.217”, “172.24.32.218”] discovery.zen.minimum_master_nodes: 3
systemctl restart elasticsearch on all 3 nodes after that elasticsearch always fails to start .
appreciate your support to resolve this issue