As a provider of the Redis Cloud, we’re always testing out different configurations and options for Redis and AWS to help the Redis community achieve the best possible performance. This week, we decided to find out how fork times influence various AWS platforms and Redis dataset sizes. Redis uses Linux fork and COW (Copy On Write) to generate point-in-time snapshots or rewrite Append Only Files (AOFs) using a background save process. Fork is an expensive operation in most Unix-like systems, since it involves allocating and copying a large amount of memory objects (see more details here). Furthermore, the latency of fork operations on the Xen platform seems to be much more time consuming than on other virtualization platforms (as discussed here). Since a fork operation runs on the main Redis thread (and the Redis architecture is single-threaded), the longer the fork operation takes, the longer other Redis operations are delayed. If we take into account that Redis can process anywhere between 50K to 100K ops/sec even on a modest machine, a few seconds delay could mean slowing down hundreds of thousands operations, which might cause severe stability issues for your application. This problem is a real life limitation for Redis users over AWS because most AWS instances are based on Xen. Our test scenarios in a nutshell were as follows — We ran Redis (version 2.4.17) on top of the following platforms:
A more detailed description of the test setup can be found at the end of this post. Here’s what we found out about the Fork times:
Instance type | Memory limit | Memory used | Memory usage (%) | Fork time |
m1.small | 1.7 GB | 1.22 GB | 71.76% | 0.76 sec |
m1.large | 7.5 GB | 5.75 GB | 76.67% | 1.98 sec |
m1.xlarge | 15 GB | 11.46 GB | 76.40% | 3.46 sec |
m2.xlarge | 34 GB | 24.8 GB | 72.94% | 5.67 sec |
cc1.4xlarge | 23. GB | 18.4 GB | 80.00% | 0.22 sec |
Fork time per GB of memory:
As you can see, there is a strong correlation between instance processing power and the execution time of fork operation. Moreover, the Xen HVM instance achieved significantly lower latency than the regular Xen hypervisor instances. So, should Redis users over AWS migrate their datasets to Cluster Compute instances? Not necessarily. Here’s why:
Fork time and the Redis Cloud We tested fork times in similar scenarios on the Redis Cloud and compared the results to those of corresponding instances in a do-it-yourself (DIY) approach. This is what we found out:
Fork time per GB of memory:
As you can see here, fork times are significantly lower with the Redis Cloud than with regular Xen hypervisor platforms. Furthermore, fork time per GB drops when dataset size grows. Fork times on the Redis Cloud are somewhat higher than those of Xen HVM (cc1.4xlarge instance) – for small datasets of about 1GB it is 0.03 seconds versus 0.008 seconds. However, since there is probably a correlation between the size of your dataset and the rate your app is accessing Redis, the amount of delayed requests when using the Redis Cloud should be small. For larger datasets, the Redis Cloud fork time is 0.03 seconds per GB compared to 0.01 seconds for Xen HVM. Nevertheless, we still prefer to use m2.2xlarge and m2.4xlarge instances over cc1.4xlarge and and cc2.8xlarge instances in the Redis Cloud, for the following reasons:
How does the Redis Cloud minimize fork time? The Redis Cloud applies several mechanisms and technologies to maintain high performance:
Conclusions We validated in this test that the fork process on AWS standard Xen hypervisor instances causes significant latency on Redis operations (and therefore on your application performance). Fork time per GB somewhat improves with stronger instances, but it is still unacceptable in our opinion. While fork time is practically eliminated when using AWS Cluster Compute instances, these instances are very expensive and are not optimized for Redis operations. The Redis Cloud provides minimal fork times at affordable infrastructure prices, and our fork times drop even further when dataset sizes grow. Benchmark test setup For those who want to know more about our benchmark, here are some details on the resources we used:
This was our setup for generating load: