In the realm of distributed systems, efficient data distribution is crucial for optimizing performance and ensuring scalability. One fundamental approach to distributing data across multiple servers is through hashing. This blog post will explore a straightforward method for distributing string keys across a distributed cache using simple hashing, discuss the problems with simple hashing, and how to resolve these problems with Consistent Hashing.
Related Articles
Simple Hashing to distribute the data
Assume, we are going to build a distributed cache application, where we have keys of type String. When dealing with string keys, the hash function needs to convert the string into a numerical value. We then use the modulo operation to determine the server where the data should be stored. The basic formula is:
getServer(key)=hashFunction(key)%n
Here,
a. key is the string to be hashed, and
b. n is the number of servers.
getServer(key) method return the server to store this key.
public int getServer(String key) {
int hashValue = hash(key);
return Math.abs(hashValue) % noOfServers;
}
Following diagram demonstrate the same.
When a key needs to be added to a server, its server location is determined using the getServer(key) method.
The distribution of keys is as follows:
· KEY_0 is stored on Server_3
· KEY_1 is stored on Server_3
· KEY_2 is stored on Server_1
· KEY_3 is stored on Server_2
· KEY_4 is stored on Server_4
· KEY_5 is stored on Server_3
· KEY_6 is stored on Server_3
· KEY_7 is stored on Server_0
· KEY_8 is stored on Server_4
· KEY_9 is stored on Server_3
· KEY_10 is stored on Server_1
· KEY_11 is stored on Server_0
· KEY_12 is stored on Server_3
· KEY_13 is stored on Server_4
· KEY_14 is stored on Server_1
· KEY_15 is stored on Server_0
· KEY_16 is stored on Server_2
· KEY_17 is stored on Server_2
· KEY_18 is stored on Server_3
· KEY_19 is stored on Server_0
Distribution Statistics
· Server_0 contains 4 keys
· Server_1 contains 3 keys
· Server_2 contains 3 keys
· Server_3 contains 7 keys
· Server_4 contains 3 keys
Demo application for the same.
SimpleHashing.java
package com.sample.app.hashing;
import java.math.BigInteger;
import java.nio.charset.StandardCharsets;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class SimpleHashing {
private static MessageDigest MESSAGE_DIGEST = null;
static {
try {
MESSAGE_DIGEST = MessageDigest.getInstance("SHA-256");
} catch (NoSuchAlgorithmException e) {
e.printStackTrace();
}
}
private int noOfServers;
public SimpleHashing(int noOfServers) {
this.noOfServers = noOfServers;
}
private int hash(String key) {
byte[] hashBytes = MESSAGE_DIGEST.digest(key.getBytes(StandardCharsets.UTF_8));
BigInteger hashValue = new BigInteger(1, hashBytes);
return hashValue.intValue();
}
public int getServer(String key) {
int hashValue = hash(key);
return Math.abs(hashValue) % noOfServers;
}
public void printKeysDistrribution(List keys) {
Map stats = new HashMap();
for (String key : keys) {
int serverIndex = getServer(key);
stats.putIfAbsent(serverIndex, 0);
int count = stats.get(serverIndex);
stats.put(serverIndex, count + 1);
System.out.println(key + " is stored in Server_" + serverIndex);
}
System.out.println("\nDistribution statistics");
for (int i = 0; i "Server_" + i + " contain " + stats.get(i) + " keys");
}
}
}
SimpleHashingDemo.java
package com.sample.app.hashing;
import java.util.ArrayList;
import java.util.List;
public class SimpleHashingDemo {
public static void main(String args[]) {
List keys = new ArrayList();
String keyPrefix = "KEY_";
int noOfKeys = 20;
int noOfServers = 5;
for(int i=0; i new SimpleHashing(noOfServers);
simpleHashing.printKeysDistrribution(keys);
}
}
Output
KEY_0 is stored in Server_3 KEY_1 is stored in Server_3 KEY_2 is stored in Server_1 KEY_3 is stored in Server_2 KEY_4 is stored in Server_4 KEY_5 is stored in Server_3 KEY_6 is stored in Server_3 KEY_7 is stored in Server_0 KEY_8 is stored in Server_4 KEY_9 is stored in Server_3 KEY_10 is stored in Server_1 KEY_11 is stored in Server_0 KEY_12 is stored in Server_3 KEY_13 is stored in Server_4 KEY_14 is stored in Server_1 KEY_15 is stored in Server_0 KEY_16 is stored in Server_2 KEY_17 is stored in Server_2 KEY_18 is stored in Server_3 KEY_19 is stored in Server_0 Distribution statistics Server_0 contain 4 keys Server_1 contain 3 keys Server_2 contain 3 keys Server_3 contain 7 keys Server_4 contain 3 keys
Problems with Simple Hashing
While simple hashing is an effective and straightforward method for distributing data across multiple servers, it struggles with a notable drawback, particularly when handling node failures and when new nodes are added to the cluster.
Redistribution of Data on Node Failure
One significant drawback of simple hashing is the need to redistribute data when a node goes down. Here's how it works and why it becomes a problem:
Initial Distribution: Initially, data is distributed across the servers based on the hash function. For example, using hash(key) = hashFunction(key) % n, where n is the number of servers.
Node Failure: Suppose one of the servers fails. This reduces the total number of servers (n). For instance, if Server_4 goes down in a system with 5 servers, n becomes 4.
Recalculation: When a node fails, all keys need to be rehashed and redistributed according to the new number of servers. This means recalculating hash(key) % 4 for all keys.
Data Movement: As a result of recalculating the hash values, most of the data will need to be moved to different servers. This is because the hash values modulo 4 will likely differ from the original values modulo 5, causing keys to map to different servers.
The distribution of keys is as follows:
· KEY_0 is stored in Server_3
· KEY_1 is stored in Server_1
· KEY_2 is stored in Server_0
· KEY_3 is stored in Server_1
· KEY_4 is stored in Server_2
· KEY_5 is stored in Server_0
· KEY_6 is stored in Server_1
· KEY_7 is stored in Server_3
· KEY_8 is stored in Server_3
· KEY_9 is stored in Server_1
· KEY_10 is stored in Server_2
· KEY_11 is stored in Server_0
· KEY_12 is stored in Server_3
· KEY_13 is stored in Server_1
· KEY_14 is stored in Server_1
· KEY_15 is stored in Server_3
· KEY_16 is stored in Server_2
· KEY_17 is stored in Server_1
· KEY_18 is stored in Server_0
· KEY_19 is stored in Server_0
Distribution statistics
· Server_0 contain 5 keys
· Server_1 contain 7 keys
· Server_2 contain 3 keys
· Server_3 contain 5 keys
Following table summarizes the keys distribution, when the number of servers are 4 and 5.
Key |
Number of Servers 4 |
Number of Servers 5 |
KEY_0 |
Server_3 |
Server_3 |
KEY_1 |
Server_1 |
Server_3 |
KEY_2 |
Server_0 |
Server_1 |
KEY_3 |
Server_1 |
Server_2 |
KEY_4 |
Server_2 |
Server_4 |
KEY_5 |
Server_0 |
Server_3 |
KEY_6 |
This post first appeared on Java Tutorial : Blog To Learn Java Programming, please read the originial post: here