Static site hosting hurdles

by Ciprian Dorin Craciun (https://volution.ro/ciprian) on 

When it comes to static sites, there are a myriad of solutions for authoring and compiling, but talk about hosting these static sites, and we are still in the early 2000s. I discuss the challenges one faces when hosting, and even make a proposal to solve some of these.

// permanent-link // Lobsters // HackerNews // index // RSS







Context

It seems that static sites are (for some time actually) back from the dead and here to stay for good (or until the next lap on the fashion treadmill).

For what a "static site" is I would usually point to Wikipedia, but this time on both topics of static sites and generators the content is so out-of-date that it's actually funny; see the static web page and static site generators articles. Thus, I'll have to resort to linking some marketing blogs for a more elaborate explanation: What is a static site generator? (from Cloudflare), and that's it! I would have loved to point to something from Netlify (but that blog is full of listicles) or from GitHub Pages (but that documentation already assumes one knows what is doing)... Thus, all I can do is point to a search for what is a static website on DuckDuckGo... (If anyone has a good neutral introduction on the topic please send me a link.)

Anyway, in technical terms a static site is just a bunch of files on the file-system that are served with a simple HTTP server which when loaded via a browser present the user with something one can read (or listen to if accessibility is taken into consideration).

Also still related to static sites, there is also the concept of Jamstack which although it has the same technical low-level underpinnings serves a completely different purpose. While "static sites" usually mean sites meant for consumption as reading material (thus they consist mostly of HTML for content, CSS for styling, and a pinch of JavaScript for eye-candy), "jamstack sites" usually mean web applications meant for consumption through interaction (thus they consist mostly of JavaScript for logic, CSS for styling, and a pinch of HTML for bootstrapping everything). There must also be said that some sites meant for reading are actually served as web applications, if for no other reason, because there is never enough blinking-flying-dancing eye-candy in a site; oh, and ads!

As for examples, the very site you are reading is such a static site! (There are countless out there, just search on the internet.)

A small overlooked detail

In this article I don't want to focus on what a static site (or generator) is, but on an ancillary topic that is most often overlooked, but which is essential to the static site.

So far, if we ignore the topic of static site generators, all seems so simple it feels like we are back in the early 2000s. We have our HTML, CSS, and JS files somewhere on the file-system (hopefully in a version control system like Git) and all we need to do is serve them to our users.

Which leads me to that detail I say it's often overlooked:

How to serve a static site?

The short but surprising answer is that although we imagine and operate as if we actually were in the early 2000s, that's far from the real requirements in 2022...

A few hosting solutions

Depending on what era one has started playing with computers the answer might range from:

Looking at the above list one surely starts to see the problem: sure, static sites seem simple enough, but when it comes to serving (and as said I'll ignore for the moment the generation) things quickly become complicate enough.

As an example, here are a few recent articles on how to serve a static site:

So, we (as an industry) have managed to streamline the authoring and compilation of static sites -- usually by employing a workflow that relies on Markdown and generators -- but we have yet to streamline a way to serve these sites, especially in the self-hosted scenario. Although in the cloud-hosted scenario things certainly look more streamlined -- because most of the magic happens inside their own infrastructure -- there sure are inconsistencies and issues; for example as discovered in this article about trailing slashes on URLs.

The deployment checklist

Thus, after authoring the content and generating the HTML, CSS, JS and other asset files, one needs to actually serve these to the users. And based on the previous hosting options, one has to think (at least some) of the following tasks:

That's simple enough, right? :)

When simple doesn't hold

The point I'm trying to make is that although static sites seem a simple enough concept, and letting aside that generating it (i.e. compiling from the actual sources) isn't, when it comes to hosting in 2022, it's as much as a hurdle as is writing and running a proper web application server.

Moreover, many of these options can't be configured in one place -- some are configured in the HTTP server, some in the CDN, etc. -- and certainly can't be diff-ed, tracked or even easily tested.

As discovered in the trailing slash article, even moving from one cloud-hosted provider to another can break one's site if one expects URLs to look a certain way.

Let's set all this aside for the moment and think about the workflow up-to the moment of deployment. One can easily track (and diff) and snapshot the content source (given that most likely it's a bunch of Markdown files); one can easily do the same for the assets and generator configuration; one can even do the same for the resulting generated files (although diff-ing doesn't work so well due to minification). But once we are ready to deploy the site, one doesn't actually know what he'll get without actually testing to check if the responses one gets are the responses one expects (let alone being sure they'll be the same two years from now).

A better alternative

However what has been described above doesn't have to be so unreliable. How about if we add yet another step in the static site generation:

Generate the complete HTTP resource set!

No, I'm not mad, what I'm proposing is to:

"Wasteful this is!" I hear you shout: we are wasting storage space for the same repetitive headers, wasting storage for multiple variants differing only in compression, wasting CPU to compress resources that might never be accessed.

Fortunately that's not the case. With a smart enough storage format all of these could be solved:

Thus, what we are left with is a compiled HTTP resource set that we should be able to track, diff (with the right tool), snapshot and test offline.

Once one is ready for deployment, the hosting check-list is reduced only to these few items:

In fact this isn't even an original idea as there already exists Kiwix, although it has a different use-case, that of bundling reading material for offline or disconnected use (including a snapshot of Wikipedia).

Putting it in practice

Remember when I've said that I have my own way of serving static sites?

Well, I've just implemented (some) of the above in my own static site "archiver" and "server": Kawipiko. (That link also has the documentation about installation and usage, plus some implementation details and benchmarks.)

Want to see it in action?

Not only is it secure (setting aside security issues that are found in the few dependencies it relies-upon) but it's also extremely fast, at least on-par with tuned Nginx and even better for some use-cases. How fast? ~70K to ~100K requests per second on my old laptop.

With regard to storage, the demo site is composed of ~350K resources, totaling ~4.5 GiB, which when compressed results in a single ~1.8 GiB archive. That archive contains besides the actual files also ~350K redirects (with/without slash). Regarding headers, there are ~30 unique headers for all files, and the one for each redirect.

Concluding remarks

I'm not saying that my own Kawipiko implementation is the correct or best approach.

However, I think we should try (or at least think about how) to standardize the hosting side of static sites, because in the end this is not the early 2000s. In that era index.html present in the URL was acceptable, and besides redirects a "webmaster" didn't need to know more about HTTP, or TLS, or security, or CDN's, or mobile experience, or Google's damn scoring system...

Also, what I'm proposing isn't even new, as mentioned earlier, Kiwix already employs a similar solution, and there exists even redbean which takes things to the extreme by embedding a zip file inside an executable that works on different operating systems (talk about portability).

(Followup)

A week after writing this article, I've wrote a small followup detailing how one can leverage Linux's seccomp to strengthen the security of such a HTTP server like Kawipiko.