Everything is a Distributed System: Client-Side Consistency During Website Deployments

July 2023

Back home

Website deployments run concurrently with visitors' browsers.

For some time now I have been interested in the following problem:

VisitorServerDeveloperGET index.htmlDeploy index.html, style.cssGET style.css

In the scenario above, a visitor gets an inconsistent view of the website. They load an old version of index.html from before the deployment and a new version of style.css from after the deployment. Depending on how the website layout was changed, this could result in a totally broken experience.

I call this the client-side consistency problem. It is a textbook example of a distributed systems problem.

The problem is even worse if the website relies on scripts. New scripts might have code with wrong expectations about the old version of index.html. If there are more than one script, they might be downloaded in any order, with a mix of scripts coming from before and after the deployment.

Atomic deployments don't solve the problem. In the scenario above, the deployment was atomic! Both files got deployed at exactly the same instant.

Caching Makes the Problem Worse

It's easy to think that the problem is unimportant because it is low-probability. After all, a visitor has to get unlucky enough to load a page during a deployment for the problem to affect them.

However, that's the wrong way to think about it. A better way to think about it is to conservatively assume that every visitor who comes to your site during a deployment will get a broken copy of the site. Framed in those terms, deployments suddenly seem very dangerous!

To make matters worse, visitors don't have to visit your site during a deployment to be affected. Depending on how their browsers and your server are configured, this sequence of events could be possible:

VisitorServerDeveloperGET index.htmlGET style.cssDeploy index.html, style.cssGET index.htmlUse cached style.css

In this second scenario, a visitor gets a broken version of your site because their browser cached the style.css file from an old version of your site. After loading the new index.html from after the deployment, they again have a broken experience even though their page load didn't overlap with a deployment.

A Technique for Deployments that Ensure Client-Side Consistency

I have some reason to believe that a lot of big companies have independently identified and worked around this problem:

Specific technologies vary, but the general technique works like this:

  1. Assign versioned URLs to assets; style.css becomes style__v1.css, style__v2.css, etc.
  2. Retain old versions of assets for "a while".

That rules out the first scenario because the client's old version of index.html will still point to the correct, old stylesheet (style__v1.css):

VisitorServerDeveloperGET index.htmlDeploy index.html, style__v2.cssGET style__v1.css

And it rules out the second scenario because the new version of index.html will point to style__v2.css, which the visitor's browser has not yet cached:

VisitorServerDeveloperGET index.htmlGET style__v1.cssDeploy index.html, style__v2.cssGET index.htmlGET style__v2.css

Eventually, old assets like style__v1.css will have to be removed from the server. This is akin to garbage collection, and it is notoriously difficult in distributed systems. Ideally we would somehow "fence out" old clients so their browsers would never use a version of index.html that references style__v1.css... but there is no practical way to implement such a fence.

Instead, we have to settle for a carefully-chosen retention time. The retention time has to be long enough that we can be virtually certain no clients will be using an old version of index.html that references style__v1.css. The correct retention time depends on a lot of details about how the website is configured, but since disk space is cheap, it can probably be very long, even up to a year or several years.

Back home