Static Sites Aren't Simple Anymore

There is an iceberg of complexity under modern static sites. The complexity means that it’s harder than ever to build a statically generated site like this blog.

Yes, it’s possible (and even desirable in many cases) to publish raw HTML or markdown. Sometimes, a simple file server can suffice (or GitHub Pages). We used to drop files over FTP. Or run a small PHP script that served content. If you were at a university, you could log in and drop a file in your home directory that would be served (here’s my decade-old homepage on columbia.edu/~msr2174).

However, the expectations for a statically generated site have drastically gotten higher over the years. Readers want (rich) content served fast. Writers want dead simple (but expressive) writing and publishing. They want control over how their writing looks.

I’ve posted 904 blog posts on this blog(!). So, I’m no stranger to publishing static content. My blog is fairly simple, but there are still many optimizations to be made for a modern web experience. And I’ll be the first person to admit that I’ve over-engineered most of it.

But here are some of the things that modern web content publishers and consumers have come to expect.

Fast page loads. Things must load fast. Today, that means aggressive caching at the edge_._ Content needs to be served from a CDN. You don’t want to have to manage servers for static content anymore. Something like nginx seems nice until you realize that your viewers are hopping cross-country just to be served a few kB. For pages with high overlap, how do you make sure that as much of it is reused as possible? How do you serve static layouts first and hydrate them with actual content (so readers see something rather than a blank page)?

There are two hard things in computer science: cache invalidation, naming things, and off-by-one errors.

Easy to write. My content is simple. I’ll occasionally include a diagram or image, but it’s mostly text. I write every day, so I prefer to write in Apple Notes (so I can write on the go). I could write in Markdown or HTML, but that would just slow me down. I want to be able to publish and schedule content from anywhere, not just when I’m in front of the terminal.

Static sites are often dynamic sites in disguise. What happens when a post changes? As much as I love build systems, I don’t want to push a commit or start a CI pipeline every time I need to fix a typo or edit a sentence. Plus, a full rebuild might bust the cache for everything. This gets more complex when you have different routes with overlapping information. When I change the title of a post, I should invalidate the cache on the list of all posts, the post page, and maybe even the homepage or RSS feed if it’s recent. Doing the minimal amount of work is sometimes the hardest.

Interactive. Why does a static site need JavaScript? Well, it really doesn’t. But there are so many things that require just a little bit of JavaScript. What if you want to do some validation on a signup form? Add a few more posts as users scroll the page? Dropdowns? Basic analytics? Syntax highlighting for code snippets?

Once you add JavaScript, you bring on a lot of baggage. That means bundling, code splitting, tree-shaking, and everything else associated with making the JavaScript that’s served as small as possible.

Easy to design. While not entirely necessary, I’d like to design my blog in a simple way. As much as static site generator frameworks are complicated, custom theming frameworks are even worse. They become jumbled templates quickly (so turn the Heptagon of Configuration). There are many possible solutions here, but I enjoy the declarative style of React. It’s just code. The methods of encapsulation and reuse make sense to me.

No infrastructure to manage. Well, there’s always some sort of infrastructure to manage. Even if that’s a codebase. But I’d prefer to have everything serverless. There’s still a server somewhere, but I don’t have to worry about log rotation, storage, kernel updates, or deployments.

Oh, and you probably want to serve your content over HTTPS. Why? Because browsers might flag your content otherwise. It might not have the same benefits as it does for dynamic content, but it still adds privacy for the reader and some assurance that the content their reading is from the site they expect. Managing certificates is another piece of necessary infrastructure.

Simplicity is the goal (stop overengineering), but the requirements for a performant modern website have changed even statically generated ones.