As a follow-up to my post on SaaS isolation patterns, I'm looking at different application-level isolation patterns – containers. There's a whole spectrum of choices, and they each come with different strengths and weaknesses.
Virtualize the Hardware – Virtual Machines. The first and oldest class of containers is the virtual machine. An emulator called a hypervisor emulates physical hardware – everything from CPUs to Floppy drives.
There are two main classes of hypervisors – ones that work directly on the host machine's hardware, and those that work as a privileged process on the host's operating system. Microsoft's Hyper-V framework works directly on the hardware, in contrast to Virtualbox which doesn't.
Minimize the operating system– Unikernels. A specially built kernel in which all processes share the same address space. Imagine building a specialized Linux distribution for each different program that only contains the exact requirements for that program to run.
Optimize and minimize the Virtual Machine – Firecracker is the virtualization technology that powers AWS's Lambda Function-as-a-Service platform. Firecracker runs in userspace and spins up really tiny and quick virtual machines (think thousands per host).
Intercept Kernel Calls – gVisor virtualizes system calls instead of spinning up a virtual machine. Applications call system calls, which are intercepted by gVisor and then possibly routed to the host kernel. You can think of gVisor as a userspace operating system – that comes with all the difficulties of trying to build a networking stack in userspace.
Isolate the processes – Docker. Docker containers use a combination of cgroups and namespaces to do OS-level isolation. Containers get their own view of process IDs, networking, and file systems. Unlike virtual machines, containers are usually more lightweight and can share hardware resources.
Runtime containers – Java Virtual Machine. Java runs its programs in an application-defined virtual machine, compared to the hypervisor-type virtual machines earlier in this post.
Chromium Sandbox – Chrome ships with its own container mechanism that keeps users safe from malicious sites. At a high level, there is a privileged broker process that communicates over IPC with a less privileged target that is executing in a sandbox. Since it has to be cross-platform, the exact security boundaries differ a bit between Windows, macOS, and Linux. Unlike the Java Virtual Machine, code isn't executed in a virtual machine, so you get native speeds for C/C++ programs.
Link to the design.
WebAssembly Sandbox – WebAssembly (WASM) binaries execute in a sandboxed environment that's separated from the host runtime. This includes memory safety and conditional access to system calls.
Of course, there are other containers to be mentioned: OpenVZ, Rkt, LXC, and more. Maybe a follow-up post one day – a discussion of the different (and moving) security boundaries that each method provides.