In an effort to run, run, run… you don’t want to make the $460 million dollar mistake that Knights Capital made back in 2012. This single-day computer system failure of a leading financial market-maker offers several lessons for the broader IT community, including the critical importance of your system components’ design, implementation and DevOps details. In this two-part blog, I’ll share some ideas to help development teams keep their continuous integration and continuous deployment (CI/CD) processes fool-proof. In particular, I’ll show how you can manage continuous updates by using feature toggles and feature context to dictate code routing, store log data for easy access and create an error database with fast lookups — all with the help of Redis.
Imagine, you are a director of engineering managing a team of several developers responsible for the front-end of a web app with thousands of concurrent users. Your app is deployed in AWS and you push weekly updates. The business cannot afford to have any disruptions to the web app, so if an error occurs, your team has to roll-back its latest update instantly.
You have to identify the culprit code quickly, have the appropriate developer fix it and make the change part of a subsequent release. Also, the product team is always requesting new features to be made generally available asap. So, how can you react to errors swiftly, and deploy feature requests safely at the speed the business demands?
At the 2019 Game Developers Conference (GDC), I attended a session that described a well-thought-out process to perform weekly software releases reliably. The session was titled “Debugging in the Large: Cross-Platform Stability at 70M+ Monthly Active Users” and it was co-presented by Chris Swiedler from Roblox, a Redis customer. Chris shared an interesting insight into how his team modifies application behavior at Roblox without changing code in case they run into production issues. They use feature flags, which is very similar to Martin Fowler’s “feature toggle” approach.
Let’s breakdown Figure 2, which outlines an approach that could be part of your CI/CD and triage process.
This strategy can be helpful for:
But this approach can be taken one step further to help distributed development teams release new features safely and roll them back when required with minimal impact.
Redis Enterprise fits the bill when you need a fast, persistent database. Its capabilities include:
In my next installment for this series, I’ll offer more details and code snippets to show specifically how feature toggling, feature context, error databases and log databases built with Redis can make your CI/CD triage process more effective and efficient.