Redis Partitioning | Range, Hash, Consistent hash, and Presharding

Ali Mohammad
13 min readApr 28, 2023

This article will explain what “partitioning” means in the context of Redis, as well as “horizontal” and “vertical” partitioning, as well as range, hash, consistent hash, and presharding.

Also, this Article has some ruby code associated with it so enjoy ❤️

⚠️ Grap a cup of your favourite drink ☕️, it will be a long ride 🌪️

😎 Let’s see what is partitioning from the start up.

Patrick data meme

Partitioning is a general term used to describe the act of breaking up data and distributing it across different hosts. There are two types of partitioning: horizontal partitioning and vertical partitioning.
When better performance, maintainability, or availability are desired, partitioning is performed in a cluster of hosts.

📜 Ready for a history lesson?

When Redis was first made, it wasn’t meant to be a distributed data store, so its data can’t be automatically shared between different instances. It was designed to work well on a single server. Redis Cluster is designed to solve distributed problems in Redis.
Over time, Redis storage may grow to such an extent that a single server may not be enough to store all its data. The performance of reading and writing to a single server may also decline.

🫡 So In the context of Redis what is Partitioning?

Redis partitioning is a way to split a Redis database into several smaller databases, which are called “partitions.” This is done to improve the speed and scalability of the database and to reduce the memory and processing load that comes with managing a lot of keys. Redis partitioning makes it possible for a single Redis server to hold a much larger amount of data and for multiple clients to access that data at the same time, which can improve the overall performance of the system.

  • Horizontal Partitioning:
    Horizontal partitioning, often known as sharding, is a Redis strategy for dividing a huge dataset into smaller, more manageable parts that are stored on distinct Redis servers. This permits data to be dispersed over numerous servers, which improves system performance and scalability. Horizontal partitioning allows a client to send a request to a single Redis server in order to access the data stored on that server, reducing the amount of time and processing resources required to obtain the data. Furthermore, horizontal partitioning allows for parallel access to data, which can increase overall system performance.
  • Vertical Partitioning:
    Vertical partitioning is a Redis approach for dividing a dataset into smaller partitions, each containing a subset of the total data. This enables more efficient data storage and retrieval, as well as enhanced performance and scalability. Vertical partitioning stores each division on a single Redis server and organizes the partitions according on the different data kinds or categories present within the dataset. If a Redis database has both number and string data, for example, the numeric data might be kept in one partition and the string data in another. Because Redis can optimize its store and retrieval activities based on the data type, this would allow for more efficient data storage and retrieval. Furthermore, vertical partitioning allows for parallel access to data, which can increase overall system performance.

Key values are spread across different Redis instances with vertical partitioning, while keys are spread across different Redis instances with horizontal partitioning.

TL;DR In horizontal partitioning, a database table is split into smaller tables based on the values of a certain attribute or set of attributes. This is in contrast to vertical partitioning, where a database table is divided into smaller tables based on the different columns or fields in the table.

Note: (in single-instance mode, Redis does not support partitioning or clustering)

🤔 So how can we partition our data ?

Partition meme

Range partitioning

Range partitioning is a Redis approach for dividing a dataset into smaller partitions depending on the range of values included within it. A Redis database containing a large number of numbers, for example, may be partitioned based on the range of values, with one partition including all values from 1 to 100, another containing all values from 101 to 200, and so on. This enables more effective data storage and retrieval since Redis can optimize its actions based on the range of values stored within each partition. Furthermore, range partitioning enables parallel access to data, which can increase overall system performance.

Range partitioning is a way to divide up data based on a range of keys.
The whole idea of range partitioning is to create ranges of keys and distribute them to different Redis instances. You can be creative in how you create the range selection.

Exciting right 🍔 ? Let’s see some examples:

💻 Partitioning Existing Data Using the CLI

To range partition existing data in Redis, you can use theredis-cli command-line interface to create a new Redis instance for each partition and then use theMIGRATE command to move the existing data from the old Redis instance to the new ones.

Here is an example of how you can do this:

  1. Start by creating a new Redis instance for each partition using the redis-server command, specifying a different port number for each instance:
redis-server --port 8000
redis-server --port 8001
redis-server --port 8002

2. Next, use theredis-cli command to connect to the old Redis instance and migrate the data to the new partitions using the MIGRATE command. TheMIGRATE command takes the following arguments:

MIGRATE host port key destination-db timeout [COPY] [REPLACE] [KEYS key1 key2 ...]

For example, to migrate the data with key burger from the old Redis instance to the new partition running on port 7000, you could use the following command:

redis-cli MIGRATE 127.0.0.1 8000 burger 0 1000 COPY

This will move the data with key burger from the old Redis instance to the new partition running on port 8000. Repeat this step for each partition and for each key that you want to move.

🎨 Partitioning Existing Data Using Ruby

To range partition existing data in Redis using Ruby, you can use the redis-rb gem, which provides a Ruby interface to the Redis key-value store.

  1. Start by installing the redis-rb gem using the following command:
gem install redis

2. Next, use theRedis.new method to create a new Redis instance for each partition, specifying a different port number for each instance:

require 'redis'
# Create a new Redis instance for a partition 
partition1 = Redis.new(:port => 8000)
partition2 = Redis.new(:port => 8001)
partition3 = Redis.new(:port => 8002)

3. Use themigrate method to migrate the data from the old Redis instance to the new partitions. Themigrate method takes the following arguments:

migrate(host, port, key, dest_db, timeout, [options])

For example, to migrate the data with a key burger from the old Redis instance to the new partition running on port 8000, you could use the following code:

# Connect to the old Redis instance
old_redis = Redis.new
# Migrate the data with key "burger" from the old Redis instance
# to the new partition running on port 8000
old_redis.migrate("127.0.0.1", 8000, "burger", 0, 1000, :copy => true)

This will move the data with key burger from the old Redis instance to the new partition running on port 8000. Repeat this step for each partition and for each key that you want to move.

📒 Hash partitioning

Hash partitioning is a straightforward concept to grasp. It entails applying a hash function to the Redis key, dividing the resulting hash value by the number of Redis instances accessible, and utilizing the remainder of that division as the instance index.

In Redis, hash partitioning divides the keyspace into several partitions based on the hash of the keys.
The HASH command is used to discover which partition a given key belongs to, and the data is subsequently saved in that partition.
This enables Redis to distribute keys across several instances, potentially improving performance and scalability.

A Ruby snippet may exemplify this idea better than English:

partition_index = key.hash % @partitions.length
host = @partitions[partition_index]

The efficacy of this method varies depending on the hash function used. Even divisions will be generated if your hash function is appropriate for your dataset. People frequently employ MD5 and SHA1 as hash algorithms.

To reduce collisions, it is advised that the total number of Redis instances be a prime number when using this partitioning strategy. Collisions are more frequent if the total number of Redis instances is not a prime number.

🛞 Automatic Partitioning Existing Data Using Ruby

You can use the “redis-rb gem” to develop a custom partitioning function that identifies which Redis instance a particular piece of data should be stored in based on its key or value to automatically partition fresh data for the correct instance in Redis.

Here is an example of how you can do this:

  1. Start by installing the redis-rb gem using the following command:
gem install redis

2. Next, create a new Ruby class that represents a Redis partition. This class should contain methods for connecting to the Redis instance for the partition and for storing and retrieving data from the partition. Here is an example of what this class might look like:

require 'redis'
class RedisPartition
attr_reader :redis
def initialize(port)
@redis = Redis.new(:port => port)
end
def set(key, value)
@redis.set(key, value)
end
def get(key)
@redis.get(key)
end
end

3. Create a new Ruby class that represents a set of Redis partitions. This class should contain methods for creating new partitions and for mapping keys to the correct partition. Here is an example of what this class might look like:

class RedisPartitionSet
attr_reader :partitions
  def initialize
@partitions = []
end
def create_partition(port)
partition = RedisPartition.new(port)
@partitions << partition
partition
end
def get_partition(key)
# Determine which partition the given key belongs to
# based on the key's value
partition_index = key.hash % @partitions.length
@partitions[partition_index]
end
end

4. Use theRedisPartitionSet class to create a set of Redis partitions and store and retrieve data from the correct partition based on the key. Here is an example of how you might do this:

# Create a new set of Redis partitions
partitions = RedisPartitionSet.new
# Create a new partition running on port 7000
partition1 = partitions.create_partition(7000)
# Create a new partition running on port 7001
partition2 = partitions.create_partition(7001)
# Create a new partition running on port 7002
partition3 = partitions.create_partition(7002)
# Store the value "burger" with key "food" in the correct partition
partitions.get_partition("food").set("food", "burger")
# Store the value "zenAts" with key "awesome" in the correct partition
partitions.get_partition("team").set("awesome", "zenAts")
# Retrieve the value stored with key "awesome" from the correct partition
value = partitions.get_partition("team").get("awesome")

This example creates a set of three Redis partitions and uses a custom partitioning function to determine which partition each piece of data should be stored in based on its key.

🏋🏻 Presharding

Presharding is the technique of pre-allocating and distributing data across numerous Redis servers or instances to share the workload and enhance performance. This is often accomplished by dividing the data into fixed-size chunks known as “shards” and allocating each shard to a unique Redis instance. Presharding enables for better resource use and scalability in a Redis cluster.

Presharding the data is one method of coping with the challenge of adding or replacing nodes over time with hash partitioning. This entails heavily pre-partitioning the data so that the host list size never changes. The plan is to run extra Redis instances on different ports while reusing the existing servers. This works well because Redis is single threaded and does not consume all of the machine’s resources, allowing you to launch multiple Redis instances per server and still be fine.

Then, with this new list of Redis instances, you would apply the same hash algorithm that we presented before, but now with far more elements in the Redis client array. This method works because if you need to add more capacity to the cluster, you can replace some Redis instances with more powerful ones, and the client array size never changes.

Preshard and partition all existing Redis instances in the CLI

  1. Use the cluster info command to get information about the current Redis cluster.
  2. Use the cluster nodes command to get the list of nodes in the cluster.
  3. Use the cluster reshard command to distribute the data evenly among the existing nodes.
  4. Use the cluster replicate command to create additional replicas of the nodes for increased reliability and availability.
  5. Use the cluster forget command to remove a node from the cluster.
  6. Repeat steps 3–5 as needed to preshard and partition the existing Redis instances for improved performance and scalability.
  7. Verify that the data has been evenly distributed and the cluster has been partitioned using the cluster info and cluster nodes commands.

Now, What if our server is not capable enough to handle the load?

Let’s first define our servers.

redisHosts = [
'loush_server' => [6379, 6380, 6381, 6382, 6383],
'loush_server1' => [6379, 6380, 6381, 6382, 6383],
'loush_server2' => [6379, 6380, 6381, 6382, 6383],
];

We chose to have only five instances per server as an example, but some people have over 200 instances per server.

To replaced by another server with more capacity, such as loush_server4.

The steps would be as follows:

  1. Launch loush_server4 with the same number of Redis instances as loush_server3. Make each new instance a replica of one of the instances of loush_server3 to avoid losing data.
  2. After the synchronization is done in all the new instances, replace all loush_server3 instances with loush_server4instances in the Redis host list.
  3. Stop all processes that connect to loush_server3 instances. Promote loush_server4 instances to master instances (the SLAVEOF NO ONE command).
  4. Restart all processes that were previously stopped.
  5. Shut down loush_server3.

The SLAVEOF NO ONE command in Redis is used to prevent one Redis instance from being a slave of another. This command is often used when a Redis instance is no longer required as a slave to replicate data from a master instance. When you run this command, the Redis instance will stop replicating data from the master instance and will no longer receive updates from it. This is useful for decommissioning a Redis instance or reconfiguring it to replicate from a different master instance.

If you cannot afford to stop all processes at once, set the slaves to be writable (CONFIG SET slave-read-only no), move all clients to the new instances, and then promote a slave to master.

Tradeoffs meme

In disaster settings, the presharding strategy does not operate well. If a set of servers is having problems and needs to be replaced, the only method to keep the cluster balanced is to replace the damaged servers with other servers. The size of the cluster cannot vary by definition. Clients will attempt to connect to poor servers in this case, and if fresh servers are not brought up immediately, it may have a significant impact on the affected projects. This is not an elastic method, and as everyone moves to the cloud, elasticity is always desirable.

Another disadvantage of this strategy is that you have a lot more instances to manage and monitor, and there aren’t any effective tools for doing so. Consistent hashing, which is often used as an alternative to this approach, is one of them.

🤓 Consistent hashing

Consistent hashing is a technique used to distribute keys across a cluster of Redis servers. It assigns keys to servers based on a hash function that maps keys to a range of values that correspond to the servers in the cluster (only K/n keys are remapped, where K is the number of keys and n is the number of servers). This allows keys to be evenly distributed among the servers, reducing the risk of hot spots and improving performance. When a server is added or removed from the cluster, only a small number of keys need to be redistributed, minimizing the impact on the overall system. Consistent hashing is used in Redis clusters to enable horizontal scaling and high availability.

We already discussed how hash partitioning works. The biggest disadvantage is that adding or removing servers from the list may have an adverse effect on key distribution and creation. When using Redis as a cache system with hash partitioning, scaling becomes extremely difficult because the size of the list of Redis servers cannot change (otherwise, a lot of cache misses will happen).

🚘 To put an example in glorious Ruby

require 'redis'
Create a Redis client object.
redis = Redis.new
# Define a ConsistentHash class that implements the consistent hashing algorithm.
class ConsistentHash
def initialize
@nodes = []
end
Add a node and its corresponding slots to the hash.
def add_node(node, slots)
@nodes << { node: node, slots: slots }
end
# Remove a node from the hash.
def remove_node(node)
@nodes.delete_if { |n| n[:node] == node }
end
Get the node and slot for a given key.
def get_node_and_slot(key)
Use a hash function to map the key to a value in the range of slots.
Return the node and slot that correspond to this value.
end
end
# Create a consistent hash object.
hash = ConsistentHash.new
# Use the cluster nodes command to get the list of nodes in the cluster and their corresponding slots.
nodes = redis.cluster('nodes')
nodes.each do |node|
# Add the node and its slots to the consistent hash.
hash.add_node(node, node.slots)
end
# Use the cluster forget command to remove a node from the cluster and the consistent hash.
redis.cluster('forget', node_id)
hash.remove_node(node_id)
# Use the cluster reshard command to redistribute the keys in the cluster and update the consistent hash.
redis.cluster('reshard', key_pattern, count, destination_node_id)
# Update the slots for the destination node in the consistent hash.
destination_node = nodes.find { |n| n.node_id == destination_node_id }
hash.add_node(destination_node, destination_node.slots)

✅ Advantages of Partitioning.

  1. It enables significantly larger databases by combining the memory of numerous computers. Without partitioning, the amount of memory that a single computer can support is restricted.
  2. It enables the scaling of processing capability to numerous cores and computers, as well as network bandwidth to multiple computers and network adapters.

❌ Disadvantages of Partitioning

  1. Multi-key operations are not supported. For example, you cannot intersect two sets if they are stored in keys that are assigned to different Redis instances.
  2. You cannot perform a Redis transaction using multiple keys.
  3. Introduces massive ambiguity when dealing with persistence.
  4. Shrinking and scaling data is super difficult.

Hope you found this article interesting and fun ❤️

--

--