Don't Use Kubernetes, Yet

Jun 14, 2022

Early-stage startups shouldn't run on Kubernetes yet.

But eventually, growth-stage and large companies should be running on Kubernetes in some form. Kubernetes Maximalism doesn't mean one-size-fits-all.

Infrastructure should progressively grow with your workloads and team. How can you choose the right technology now so that you can maximize growth and minimize pain later when you inevitably outgrow it?

This is a deeper dive into one area of the infrastructure stack: container abstractions. There are tons of ways to run containers on cloud, so it's especially tough to pick the right abstraction at the right time. I'd roughly classify them into four categories:

  • Code-to-Container-to-Deploy (AWS App Runner, Google App Engine)
  • Serverless Container Runtime (Fargate on ECS, Google Cloud Run)
  • Managed Kubernetes ({A,E,G}-KS)
  • Self-Hosted Kubernetes

A guide to choosing the right container abstractions broken down by engineering teams that are 1e0, 1e1, 1e2, and 1e3+ engineers.

1e0 ≤ team_size ≤ 1e1

Let's take the example of a small team. The developers might have some DevOps experience, but everyone's essentially an SRE. There might be a simple CI/CD pipeline but a limited focus on reproducibility or air-gapped environments. You can get far with serverless functions and event-driven architectures, but you'll probably need a long-running daemon at some point.

I'd be careful with the all-in-one options like AWS App Runner or any service that promises code-to-container-to-deployment. For any team building anything other than a simple web service, you'll run into a wall quickly with those services.

Be wary of simplicity that is hyper-opinionated optimization in disguise – Optimization is Fragile.

My advice for this team: start with serverless container runtimes. On AWS, that would be Fargate on ECS, or on Google Cloud, Google Cloud Run.

  • Deployments look like a simplified version of what you'd deploy on Kubernetes.
  • Turning on basic autoscaling is easy enough when you reach a little more scale.
  • You won't have to manage servers, network overlays, logging, or other necessary middleware.

The downsides are that you'll have to build and upload container images. While many higher-level services will pack up your code and turn it into a container, I don't suggest using them. Once you hit the configurability cliff (e.g., needing to change something that the builder abstracts), you take on all of the complexity that you thought you avoided, all at once.

In my experience, these services can be difficult to work with if you use the UI. I'd suggest provisioning them in code with something like Pulumi or AWS CDK.

You don't need a fully baked CI/CD pipeline. It's OK to build and deploy containers locally or with a simple script on GitHub actions. In the Spectrum of Reproducibility, you only need weak guarantees. While not reproducible and many foot guns, Docker images are good enough for small teams.

1e1 ≤ team_size ≤ 1e2

I'd suggest that teams adopting Kubernetes (even the managed versions) have an SRE team, or at minimum, a dedicated SRE engineer.

Reasons you might outgrow a serverless container runtime

  • Have non-standard resource requirements. Storage, networking, and machine configuration are limited on serverless runtimes. If you have particularly lopsided requirements (high RAM, low CPU) or high IOP storage, you might consider using a managed Kubernetes offering.
  • Stateful workloads that need operators. Stateful workloads are difficult to build on serverless runtimes, as the storage options are limited. You might need additional abstractions over the network (like service discovery or peering) that are tougher with serverless runtimes.
  • Managing an order of magnitude of services. Running a few services on Fargate on ECS or Cloud Run means that you can easily take care of automatable-but-infrequent events with a script or manual intervention. Having hundreds of ephemeral services that require TLS certificates and external DNS means that maybe GKE or EKS is a better option as a basis for automation.

The thing about Kubernetes tooling is that: (1) there are a lot of APIs to build upon, (2) that results in a Cambrian explosion of tools for which (3) not all of them will be useful.

1e2 ≤ team_size ≤ ??

Large engineering teams may want to run Kubernetes on bare metal or cloud.

You'll probably need a dedicated 1e2 DevOps team if you're going down this route. Or, you might be a company exposing Kubernetes in some way to your customers (e.g., a platform service or IaaS-like provider).

Some reasons why you might want to run Kubernetes yourself.

  • Cost: utilizing existing on-prem or legacy hardware, specialized hardware for specific applications (e.g., GPU-intensive)
  • Performance: applications where bare-metal performance is critical (e.g., FPGA, GPU, etc.)
  • Non-cloud environment: running Kubernetes at the edge, like in retail stores (e.g., Chick-fil-A)

My advice: be careful with the internal platforms and abstractions you build on Kubernetes. Even the best snowflake infrastructure eventually suffers from diseconomies of scale (see Diseconomies of Scale at Google). You shouldn't be wasting engineering cycles competing with or recreating products already offered by cloud hyperscalers.