May 21st 2024

In the realm of distributed systems, efficient data distribution is crucial for optimizing performance and ensuring scalability. One fundamental approach to distributing data across multiple servers is through hashing. This blog post will explore a straightforward method for distributing string keys across a distributed cache using simple hashing, discuss the problems with simple hashing, and how to resolve these problems with Consistent Hashing.

Simple Hashing to distribute the data

Assume, we are going to build a distributed cache application, where we have keys of type String. When dealing with string keys, the hash function needs to convert the string into a numerical value. We then use the modulo operation to determine the server where the data should be stored. The basic formula is:

getServer(key)=hashFunction(key)%n

Here,

a. key is the string to be hashed, and

b. n is the number of servers.

getServer(key) method return the server to store this key.

public int getServer(String key) {
	int hashValue = hash(key);
	return Math.abs(hashValue) % noOfServers;
}

Following diagram demonstrate the same.

When a key needs to be added to a server, its server location is determined using the getServer(key) method.

The distribution of keys is as follows:

· KEY_0 is stored on Server_3

· KEY_1 is stored on Server_3

· KEY_2 is stored on Server_1

· KEY_3 is stored on Server_2

· KEY_4 is stored on Server_4

· KEY_5 is stored on Server_3

· KEY_6 is stored on Server_3

· KEY_7 is stored on Server_0

· KEY_8 is stored on Server_4

· KEY_9 is stored on Server_3

· KEY_10 is stored on Server_1

· KEY_11 is stored on Server_0

· KEY_12 is stored on Server_3

· KEY_13 is stored on Server_4

· KEY_14 is stored on Server_1

· KEY_15 is stored on Server_0

· KEY_16 is stored on Server_2

· KEY_17 is stored on Server_2

· KEY_18 is stored on Server_3

· KEY_19 is stored on Server_0

Distribution Statistics

· Server_0 contains 4 keys

· Server_1 contains 3 keys

· Server_2 contains 3 keys

· Server_3 contains 7 keys

· Server_4 contains 3 keys

Demo application for the same.

SimpleHashing.java

package com.sample.app.hashing;

import java.math.BigInteger;
import java.nio.charset.StandardCharsets;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class SimpleHashing {
	private static MessageDigest MESSAGE_DIGEST = null;

	static {
		try {
			MESSAGE_DIGEST = MessageDigest.getInstance("SHA-256");
		} catch (NoSuchAlgorithmException e) {
			e.printStackTrace();
		}
	}

	private int noOfServers;

	public SimpleHashing(int noOfServers) {
		this.noOfServers = noOfServers;
	}

	private int hash(String key) {
		byte[] hashBytes = MESSAGE_DIGEST.digest(key.getBytes(StandardCharsets.UTF_8));
		BigInteger hashValue = new BigInteger(1, hashBytes);
		return hashValue.intValue();
	}

	public int getServer(String key) {
		int hashValue = hash(key);
		return Math.abs(hashValue) % noOfServers;
	}

	public void printKeysDistrribution(List keys) {
		Map stats = new HashMap();

		for (String key : keys) {
			int serverIndex = getServer(key);

			stats.putIfAbsent(serverIndex, 0);
			int count = stats.get(serverIndex);
			stats.put(serverIndex, count + 1);

			System.out.println(key + " is stored in Server_" + serverIndex);
		}

		System.out.println("\nDistribution statistics");
		for (int i = 0; i "Server_" + i + " contain " + stats.get(i) + " keys");
		}
	}

}

SimpleHashingDemo.java

package com.sample.app.hashing;

import java.util.ArrayList;
import java.util.List;

public class SimpleHashingDemo {
	
	public static void main(String args[]) {
		List keys = new ArrayList();
		String keyPrefix = "KEY_";
		int noOfKeys = 20;
		int noOfServers = 5;
		
		for(int i=0; i new SimpleHashing(noOfServers);
		simpleHashing.printKeysDistrribution(keys);
		
	}

}

Output

KEY_0 is stored in Server_3
KEY_1 is stored in Server_3
KEY_2 is stored in Server_1
KEY_3 is stored in Server_2
KEY_4 is stored in Server_4
KEY_5 is stored in Server_3
KEY_6 is stored in Server_3
KEY_7 is stored in Server_0
KEY_8 is stored in Server_4
KEY_9 is stored in Server_3
KEY_10 is stored in Server_1
KEY_11 is stored in Server_0
KEY_12 is stored in Server_3
KEY_13 is stored in Server_4
KEY_14 is stored in Server_1
KEY_15 is stored in Server_0
KEY_16 is stored in Server_2
KEY_17 is stored in Server_2
KEY_18 is stored in Server_3
KEY_19 is stored in Server_0

Distribution statistics
Server_0 contain 4 keys
Server_1 contain 3 keys
Server_2 contain 3 keys
Server_3 contain 7 keys
Server_4 contain 3 keys

Problems with Simple Hashing

While simple hashing is an effective and straightforward method for distributing data across multiple servers, it struggles with a notable drawback, particularly when handling node failures and when new nodes are added to the cluster.

Redistribution of Data on Node Failure

One significant drawback of simple hashing is the need to redistribute data when a node goes down. Here's how it works and why it becomes a problem:

Initial Distribution: Initially, data is distributed across the servers based on the hash function. For example, using hash(key) = hashFunction(key) % n, where n is the number of servers.

Node Failure: Suppose one of the servers fails. This reduces the total number of servers (n). For instance, if Server_4 goes down in a system with 5 servers, n becomes 4.

Recalculation: When a node fails, all keys need to be rehashed and redistributed according to the new number of servers. This means recalculating hash(key) % 4 for all keys.

Data Movement: As a result of recalculating the hash values, most of the data will need to be moved to different servers. This is because the hash values modulo 4 will likely differ from the original values modulo 5, causing keys to map to different servers.

The distribution of keys is as follows:

· KEY_0 is stored in Server_3

· KEY_1 is stored in Server_1

· KEY_2 is stored in Server_0

· KEY_3 is stored in Server_1

· KEY_4 is stored in Server_2

· KEY_5 is stored in Server_0

· KEY_6 is stored in Server_1

· KEY_7 is stored in Server_3

· KEY_8 is stored in Server_3

· KEY_9 is stored in Server_1

· KEY_10 is stored in Server_2

· KEY_11 is stored in Server_0

· KEY_12 is stored in Server_3

· KEY_13 is stored in Server_1

· KEY_14 is stored in Server_1

· KEY_15 is stored in Server_3

· KEY_16 is stored in Server_2

· KEY_17 is stored in Server_1

· KEY_18 is stored in Server_0

· KEY_19 is stored in Server_0

Distribution statistics

· Server_0 contain 5 keys

· Server_1 contain 7 keys

· Server_2 contain 3 keys

· Server_3 contain 5 keys

Following table summarizes the keys distribution, when the number of servers are 4 and 5.

Key	Number of Servers 4	Number of Servers 5
KEY_0	Server_3	Server_3
KEY_1	Server_1	Server_3
KEY_2	Server_0	Server_1
KEY_3	Server_1	Server_2
KEY_4	Server_2	Server_4
KEY_5	Server_0	Server_3
KEY_6

This post first appeared on Java Tutorial : Blog To Learn Java Programming, please read the originial post: here

People also like

Consistent Hashing: A Scalable Approach to Distributed Data Storage

Related Articles

Consistent Hashing: A Scalable Approach to Distributed Data Storage

Related Articles

Share the post

Subscribe to Java Tutorial : Blog To Learn Java Programming

Thank you for your subscription