Availability, or Uptime, is one of the most important metrics for web performance monitoring, however it is often misunderstood and defined incorrectly.
Availability is simply the percentage of time your site, application, or service works successfully. The problems in defining Availability arise from defining what is considered “success” and what is considered “failure”. In the case an image request from a webserver, availability is clear – if the server responds with the image it is “Success”, and anything else is considered a failure. On the other side, for a webpage Success can have a more complicated meaning.
Imagine you build a simple webpage that utilizes the JQuery Library for a slideshow. You host the webpage at a web hosting company and you decide to host the JQuery library on a CDN provider. Your web hosting company promises 99.7% uptime (these promises are made in a ‘Service Level Agreement’ or SLA), and the CDN also promises 99.7% uptime, so it seems that there will be no impact on reliability by using the CDN
Not so fast! Let’s take a close look at what happens when either vendor fails. If your web hosting company is down, the webpage is unavailable -obvious failure. If the CDN is down or it fails to deliver the JQuery Library, your webpage is still reachable, but the slideshow will not work – and the webpage might be slow to render (browsers wait 2-3 seconds for a response from a server before canceling the request). In the eyes of an end user the webpage failed, therefore it is not available!
The webpage is truly available when both providers are available. Since the CDN and web host are not related, the True Availability of the webpage is 99.7% * 99.7% = 99.4% – which means that 0.6% of the time users will not be able to use your web page!
You can clearly see that there is a tradeoff between complexity and availability. In fact, each time you introduce a new link into the chain of events to serve a web page, the availability will go down.
So far it all seems nice and precise, and most likely you are thinking of determining the availability of your web-site by taking all the SLA’s and multiplying their individual availabilities. It is not that simple!
Let’s move for a moment from the world of mathematics into the world of human-usability, perceived quality and customer satisfaction. Most of the webpages do not require that all the requests they reference load properly. As matter of fact, for quite a few of these requests the user might not even notice if they loaded or not. For example: If Google Analytics tags are at the end of the page, and if they fail to load or post the data collected, they will not impact usability of the page. Therefore, defining which hosts impact your availability will change from page to page and company to company, depending on what they consider “unusable”.
You probably want the benefits of having a faster website by using more hosts or a CDN, and you may want to have lower operational costs by outsourcing certain tasks to other vendors who can leverage economies of scale. You might also be getting revenue from vendors such as an ad network or content partner. So how can you deal with problems caused by third party vendors?
We suggest the following:
Unfortunately you cannot move all content to bottom of the page, so you need to consider the value of the service provided by any third party and work out if it is worth the risk or not!
I hope this post sheds some light on the meaning of true availability and how to deal with third party content.