A little over 18 months ago we took our first baby steps in what has since then proved to be a wild and exciting ride. To the uninitiated outsider, a cloud service provider’s existence might seem dull and uneventful. However, anyone who has tirelessly searched for the one platform tweak to rule them all; who has tracked the elusive flow of the river of Through(put); who has courageously battled the Hydra Serpent of Many Heads (also known as the Ssssp-lit) will tell a different tale.
Ours is a wonderful and adventurous world in which nothing is certain and there is only one truth: using the private network interface, a.k.a. private DNS, is always preferable. Or so we believed.
We’ve since discovered otherwise, but first lets take a walk down memory lane… In the far reaches of time, before the Dawn of Virtualization and the Age of the Cloud, servers were physical objects, machines made of parts and pieces working in unison for computation’s sake. As networks sprang to life and grew, a distinct separation was made between private and public. Private networks were characterized by the fact they use their own address space, are accessible only to servers directly connected to them and are generally localized. The public network, as the name suggested, was global and accessible to anyone.
This basic difference between networks reflected well in the role each played. The public network was used to provide a system’s service to its consumers, whereas private networks were used internally to connect the system’s component servers. Furthermore, given their nature and location, private networks were more often than not considered to be safer, faster, cheaper, more reliable and to have more bandwidth than their public counterpart. And then servers ascended and thus became The Cloud.
In their new-found ethereal existence, servers shed their physical properties only to substitute them with virtualized replacements. No more servers, now there were instances in their place. Gone were actual CPUs and cores, in came compute units and vCPUs. Hard drives disappeared and gave way to ephemeral storage and “network attached persistent storage.” Even network interfaces were stripped of their corporeal essence and replaced by “IP addresses [that] are associated with the instance”. But we still believed that the private IP addresses were preferable for in-system communications and just plain awesome. But over time, experience has taught us differently.
During the development and testing of our service, we’ve spent countless hours trying to define and reproduce a particularly nasty phantom. We were finally able to capture it and we learned the following: AWS’s private addresses are, at times, worse than public addresses!
Once identified, we were able to easily and repeatedly demonstrate this behavior. The above graph is a live recorded representation of the time it takes a simple application to connect to one of our Redis Cloud instances. The application uses two threads, one connecting to the public address and the other to the private one. Once both connections are made, the application writes the time it took each connection to complete into that instance and is restarted.
Note how after each restart of the client, connecting to the private address in most cases takes more time than connecting to the public address, by orders of magnitude, and rarely is as fast as the public address. While these symptoms are easily demonstrated, the causes for this behavior are harder to explain. We suspect this unwelcome conduct is at least partially caused by the extra protection that the private interface is given. Since AWS uses software-based firewalls to protect private instances, the former can become quickly burdened by having too many rules to enforce. That load isn’t necessarily generated by a specific EC2 instance (in our tests we used less than 10 Security Groups), but rather by many EC2 instances that share the same firewall in the AWS network.
Once too loaded, the firewalls may introduce delays and even block connection attempts. This phenomenon, as shown above, is reliably reproduced when using PaaS for an application. Interestingly, it is much rarer with independent EC2 users.
Therefore, our conclusion and standing recommendation to PaaS users is to avoid using the private interface in favor of the public interface – the latter may induce up to a millisecond’s worth of latency, but at least it performs more consistently than its counterpart. So there you have it, another day, another myth busted.