Like Us On Facebook

Follow Us On Twitter

Drop Down Menu

Friday 15 February 2019

A Clustering Guide to CrateDB and ElasticSearch

Hi friends,

Recently I ran CrateDB in multi-node environment as a trial and in this post I'll share my findings as to what is the basic setup and configuration that is required to have high availability setup in CrateDB. I am pretty sure that similar settings will be useful for ElasticSearch as well.

I used crateDB version 3.2.2 and Java 8 for this POC.

So there is some bootstrap checks before the server runs successfully. According to the documentation those are some mandatory checks that are mandatory and needed to get the cluster running smoothly in multi-node environment. 

The first problem I faced was related to increasing ulimit in Ubuntu, which I have explained in this post - Increase Open Files/File Descriptors/Ulimit in Ubuntu | CrateDB ElasticSearch.

Once my CrateDB ran successfully, I began to mess around with basic settings and configurations. Remember, many of the things can be customised, and I'll share a few read-worthy links if you truly want to customise all settings around your cluster, towards the end of this post.

So here's the final configuration that worked for me:

Here's the explanation for the non-obvious ones:

node.master
Set this to true to just INDICATE that this node is master eligible. It does not mean that it will be master for sure.

node.data
Set this to true if this node will be storing data. There can be a master node that doesn't store data.

network.host
This will vary if you want to set the cluster in your organisation locally, or set it up on Azure or GCP or AWS. In my case for POC purpose, I entered my machine's IP address here. More details here.

discovery.zen.minimum_master_nodes
This tells how many minimum master nodes are needed before the cluster events starts to work. In my case I had it as 2, if you take 1 master node offline that cluster won't work. One of the two is set as master.

discovery.zen.ping.unicast.hosts
You can specify your master node's IP address alongwith the ports here. If any new node tries to ping any of the existing nodes in the cluster, it may be automatically added to the cluster.

gateway.expected_nodes
This is the number of nodes that your cluster is expecting. It will start to show warning if the total number of data/master nodes don't match this count.

gateway.recover_after_nodes
This is the nodes that should be active in order to begin the recovery process. I set it at 3 because I could then check if I take 1 data node offline, does it lead to high availability of data or not.


I faced some heap space issues as well, and I have documented all of the issues here - Increase Open Files/File Descriptors/Ulimit in Ubuntu | CrateDB ElasticSearch

As you know, CrateDB by default maintains shards as (number of data nodes * 2). I created 1 table with 3282 records. The table was divided in 8 shards total as each of my 4 nodes were a data node. Also, the default replication in CrateDB is [0-1]. Much of this information after cluster formation can be found in these tables:

sys.health
sys.nodes
sys.shards


So each document out of 3282 total docs was being replicated once in a replica shared, across all 4 nodes. So even if I removed one of the non-master data nodes, data could still be extracted from the replica shards achieving 100% data availability and the system would automatically adjust the number of shards across the remaining nodes. 

Here's a brief idea about my observation:

If you try to take one of the data nodes offline, then what happens is that the process recognises this change and starts showing some docs in unreplicated docs section in Admin console, along-with some % loss in data availability. Then the server will try to balance now active shards and replicas and possibly recover from data loss by replicating data on new re-positioned shards and replicas. After a few seconds you can see that the data availability section goes 100% again.

It is possible to configure the number of replica nodes required, during create table command. More information can be found here

Similarly if you want to configure the number of shards, it can be configured via this way.

These were a few useful commands to check the cluster status:

select count(primary), num_docs, primary, node from sys.shards group by  primary, node,num_docs limit 100;


select count(primary), num_docs, primary, node['name'] from sys.shards group by  primary, node['name'],num_docs order by node['name'] limit 100;

If you're facing this issue when you took one of data nodes offline:

Not enough active copies to meet write consistency of [ALL] (have 1, needed 2)


"Have 1 needed 2" means that the document should go to two shards, but only one is available. That tells us that the index you are indexing into has 1 replica, not 0, so for some reason setting replicas to 0 didn't work. Also, the default consistency is QUORUM, not ALL. Using consistency ALL in a single node cluster with an index with default settings would definitely cause this, as default settings are 5 shards and 1 replica, and replicas can not be unassigned on a single node cluster.

 Some useful links:
CrateDB multi-node setup
Network Settings
System Information
Cluster Browser
Clustering
Local Gateway
Set Replicas
Sharding
Clustered Clause


Let me know via comments below if you face any problem in any of the above steps. Adios till my next post!


Do you like this post? Please link back to this article by copying one of the codes below.

URL: HTML link code: Forum link code: