I’m delighted to announce that today we’ve made our clustering technology even more useful with the public availability of RegEx Sharding. This feature allows you to define exactly how Redis Cloud distributes data between a database’s shards, thereby enabling your application to continue performing multi-key operations at top performance on huge datasets. Our standard and new RegEx sharding policies are immediately available to all our Redis Cloud Pay-as-You-Go subscribers.
Before diving into the details of this announcement, I’d first like to go over the “why” and “what” of a Redis Cluster. Redis, the fastest data store available today, is an open source, in-memory NoSQL database. Redis’ architecture is such that a single Redis server is bound by the hardware of the host that it is running on — specifically that server’s CPU, RAM and network. Being a (mostly) single-threaded process, Redis utilizes only one of the server’s CPU cores. And because it is an in-memory database, all data that a Redis process manages has to fit into the RAM of the server it’s running on. Lastly, the network interface of the server running Redis may also become a bottleneck once saturated with traffic generated by Redis and the application. While a single Redis server can process tens and hundreds of thousands of operations per second, there are cases in which applications need more.
Scaling up a Redis server (vertically) is feasible, to an extent. Sure, you can add more RAM to a server, replace the CPU with a faster model and even use a broader and faster network, but at the end of the day you’ll hit the upper limit of any single server’s hardware. That’s where clustering and sharding can help by allowing you to use multiple CPU cores on each server and beyond.
A Redis cluster is made up of one or more servers, with each server running one or more Redis processes. Each process manages a shared-nothing database instance that’s called a shard. The keys in the clustered database are mapped to hash slots, which in turn are mapped to shards, so that each shard manages a mutually-exclusive subset of the database’s namespace. In a sense, shards are the physical databases and hash slots are an additional layer that facilitates administrative operations such as resharding. By running the shards on multiple servers, a cluster essentially allows you to use more CPU cores, more RAM and more network resources for managing your database.
Clustering is an effective approach for horizontally scaling your Redis database, as it lets you use a distributed setup. It offers an efficient way to split the memory requirements, processing load and bandwidth that your database requires between multiple servers. There is one catch however – because Redis clusters are implemented with share nothing shards, you can’t execute multi-key operations that span more than one hash slot (e.g. ZUNIONSTORE on several sorted sets). Doing so will trigger an error. In order to execute atomic operations on multiple keys (i.e. single commands that operate on multiple keys, MULTI/EXEC blocks and Lua scripts), you have to ensure that all relevant keys are mapped to the same hash slot (more on that below).
Open source Redis v3 will be all about native clustering support. A few weeks ago, Salvatore Sanfilippo released the first v3 Release Candidate, and it is expected to be production-ready within a few months. Once the open source cluster has stabilized, it will provide all the tools needed for anyone to set up and operate a Redis cluster, and effectively address scalability challenges.
There are, however, other Redis clusters besides the open source implementation. Over the better part of the last two years, we at Redis have been operating our own independently-developed version of a Redis cluster to provide Redis Cloud’s scalability features. Redis’ production-proven clustering technology lets you dynamically scale your Redis databases well beyond the limits of any single server. Some of our customers use clustering to manage TB-scale datasets, whereas others rely on it to sustain massive throughput with sub-millisecond latencies. Like everything in our service, our clustering is dead simple to use and doesn’t require any special effort. Once employed, it is transparent to the application and does not require any code changes or specialized client libraries – you just continue working via a single database endpoint that masks the architecture’s underlying complexities.
Redis Cloud clusters offer a choice between two sharding policies: Standard and RegEx. The standard policy is designed to behave just like the open source Redis Cluster. By using hash tags in your key name (i.e. the ‘{‘ and ‘}’ characters), you can precisely specify the part of the key’s name that will be used for hashing. This standard sharding policy will use the substring of the key’s name that’s surrounded by curly brackets to map that key to a hash slot.
While this standard sharding policy is a powerful tool, we’ve taken it to the next level by introducing our RegEx sharding policy. This type of sharding allows you to configure a set of a regular expression rules that are used to extract the hash tag from the names of the keys. Because you can use multiple rules, and because each rule is a fully-fledged regular expression, this custom sharding policy offers a lot of flexibility — effectively allowing you to implement any pattern matching logic on your key names for extracting hash tags. You can embed a part of your business logic directly into the sharding policy to ensure a perfect fit for your application’s requirements.
Custom RegEx sharding policies are especially useful when using clustering with an existing application and dataset, because you can skip both the data migration and and code changes for your new implicit database schema. With custom RegEx sharding, you can “teach” Redis Cloud clusters how your dataset’s key names are constructed and ensure that keys which need to be in the same hash slot (for multi-key operations) are identified correctly. You can read more about our sharding policies and how to use them on our documentation page.
Clustering Redis is easy and fun with Redis Cloud. Our databases can be clustered and scaled to accommodate growth in data volume, throughput and traffic with a click of a button and without any changes to your existing application. Questions? Feedback? Email or tweet me – I’m highly available 🙂