Service Reliability Math that Every Engineer Should Know

Aug 8, 2021
Uptime Downtime (Yearly)
99.00000% 3d 15h 39m
99.90000% 8h 45m 56s
99.99000% 52m 35s
99.99900% 5m 15s
99.99990% 31s
99.99999% 3s

For a service to be up 99.99999% of the time, it can only be down at most 3 seconds every year. Unfortunately, achieving that milestone is an arduous task, even for the most experienced site reliability engineering teams.

Visualizing service uptime is essential for all types of engineers. Know what your service can realistically deliver. Know what the customer requirements are. Adding an extra "9" might be linear in duration but is exponential in cost.

For the last 90 days, Stripe's API has had 99.999% uptime, or five 9's. That's a gold standard for many companies. Service-level agreements are more likely to count downtime on a quarterly or rolling basis rather than yearly. Calculating it like that gives you a bit more leeway on how you calculate it, but the magnitudes stay the same. Some will even remove "planned maintenance" from the downtime calculation.

I originally posted this on Twitter, and the response was overwhelming. Follow me on there for more valuable engineering snippets like this.