Dec 1, 2022

In Star Wars, two characters have shown up in almost every film (ten out of eleven theatrical releases) – the humanoid droid, C-3PO, and the astromech droid, R2-D2.

Both of these droids always seem to find themselves at the center of the plot – R2-D2 stores the death star plans, recruits Luke and Obi-Wan to the hero's journey, overrides security systems at just the right time, and seems to guide the main characters to the next step.

A MacGuffin is an object of plot importance and desire - usually, it carries a message, a power, a secret, or something of great importance. MacGuffins have been a mainstay in both movies and storytelling for centuries — from the Holy Grail in the Legend of King Arthur to the briefcase in Pulp Fiction to R2-D2 in most of the Star Wars films.

The term was made famous by Alfred Hitchcock, who described it as,

It might be a Scottish name, taken from a story about two men on a train. One man says, 'What's that package up there in the baggage rack?' And the other answers, 'Oh, that's a MacGuffin'. The first one asks, 'What's a MacGuffin?' 'Well,' the other man says, 'it's an apparatus for trapping lions in the Scottish Highlands.' The first man says, 'But there are no lions in the Scottish Highlands,' and the other one answers, 'Well then, that's no MacGuffin!' So you see that a MacGuffin is actually nothing at all. — Alfred Hitchcock

George Lucas said this about his version of MacGuffins,

that the MacGuffin should be powerful and that the audience should care about it almost as much as the dueling heroes and villains on-screen. – George Lucas

MacGuffins are interesting to think about in the context of narratives — what else can logically drive the plot forward, motivate the protagonists (or antagonists), and provide a source of tension and conflict without making too much of a statement on its own?

Subscribe for daily posts on startups & engineering.

Do Cold Starts Matter?

Nov 30, 2022

Since serverless runtimes came to be, developers have agonized over "cold starts", the delay between when a request is received and when the runtime is ready to process the request.

It’s hard to benchmark exactly how long cold start durations are across runtimes as they are very sensitive to workloads. Using the NodeJS Lambda runtime on AWSS, you might see cold starts anywhere from 400ms to 1 second, depending on your function, memory, and network configuration.

But how much do cold starts matter? For the heaviest use cases, there are probably optimizations that you can make directly in the serverless runtime (see AWS’s newly announced Lambda SnapStart for Java Functions that reduces startup time for Spring apps from 6 seconds down to 200ms).

But for the majority of cases, there are really easy alternatives if you’re willing to step a little outside the serverless-scale-to-zero-everything paradigm.

  • Provisioned concurrency in a serverless runtime. The cost to keep a handler “warm” is fairly minimal (about $10/mo for a 1GB Lambda). Most serverless runtimes have this built-in already (AWS Lambda, Fargate, Cloud Run).
  • Keep functions warm by invoking them every few minutes or warming the cache on machines.
  • Use autoscaled containers or VMs for critical paths.
  • Edge runtimes for small functions that can be run in v8 isolates.

In the majority of workloads, your Lambda cold start time is probably the least of your worries, even though it is one of the most obvious (performance comes last, unfortunately). Small architectural changes can solve cold start latencies. Maybe there's a new class of workloads that's enabled by being able to run large tasks with a low cold start time and zero provisioned infrastructure. But for now, the cost differential isn't that large with just running a small set of services all the time.

Sharing a Notebook

Nov 29, 2022

The state-of-the-art in generative AI is advancing fast. But, unlike previous AI waves marked by big launches and research papers, generative AI is spreading in a much more grassroots (and unlikely) medium: through Google Colab notebooks.

Google Colab notebooks are free Jupyter notebooks that run in the cloud and are easy to share. Many people use them to tinker with models, experiment with code, and share ideas. Interestingly, it was launched by Google Research during the time I worked on Google Cloud AI (we shipped a similar but unbranded Jupyter workflow).

So why are Colab notebooks the medium of exchange?

First, the base infrastructure and models are already open-sourced and developed. During the last wave, TensorFlow and PyTorch were still being incubated as solutions to the problems of deep learning. The biggest models were either closed-source or too complex for the average developer to contribute to.

This time, there’s a lot of “plumbing” work that’s being done in forked GitHub repositories that don’t require deep knowledge of machine learning or diffusion models. Those changes could be modifying Stable Diffusion code to run them on consumer M1 GPUs or creating Web UIs or user interfaces to run text2img or img2img and tune parameters. Or maybe it’s modifying the model to run in a different framework or with even fewer resources.

Second, LLMs are more consumer-friendly. Normal users and developers can make sense of the model. Inputs (prompts) and outputs (images) are more accessible to the average user than bounding boxes, vector embeddings, or NumPy arrays. Models are smaller and can be run on commodity hardware. Datasets are relatively small, or trained weights are published.

Third, diffusion models are goldilocks models for Colab — too large to fine-tune or run inference on the average laptop but small enough to run on spot instances that are given away for free.

There are some interesting implications of Colab as a medium that ML applications go viral on:

  • Security — most of these models download and run code from GitHub. They might ask for permission to access your Google Drive. It isn’t easy to know exactly what’s going on in a notebook, and there are few guarantees that it’s doing what you think it is.
  • Presentation and code — A cardinal programming rule is separating presentation and code. But sometimes it’s helpful to combine the two. I wrote about this in Presentation Next to Code and In Defense of the Jupyter Notebook.
  • Monetization – Colab is unlikely to drive real infrastructure spend for cloud. While some consumers might pay for Colab Pro+ ($50/mo), it doesn't seem like a real business model (is it Enterprise SaaS? Does it belong in the same category as Google Workspace Docs/Sheets/Mail?).  Google can subsidize Colab through other products, but in the long run, it should be self-sustainable. Maybe it follows a Hugging Face-like playbook (although it's unclear exactly what the end-result looks like even in that case).

Fuzzy Databases

Nov 28, 2022

Different trade-offs already give rise to significantly different types of databases – from OLTP to OLAP, relational to non-relational, key-value, graph, document, and object database (to name a few).

What if you relaxed some key properties that we've come to expect?

What if databases returned extrapolated results?

If you squint, LLMs resemble something like a vector search database. Items are stored as embeddings, and queries return deterministic yet fuzzy results. What you lose in data loading time (i.e., model training), you make up for in compression (model size) and query time (inference). In the best case, models denoise and clean data automatically. The schema is learned rather than declared.

What if anyone could write to the database?

Blockchains are databases as well. They provide a verifiable ledger and peer-to-peer write access in exchange for significant trade-offs in privacy, throughput, latency, and storage costs. Keys are hashes (similar to a distributed hash table). Authorization is done through public-key infrastructure, and a generalized computing model can be built on top of the distributed ledger (e.g., the EVM).

What if the database could be embedded anywhere?

SQLite/DuckDB answer this question. While neither can support concurrent writes from different processes and are limited in other terms of horizontal scaling, they can be easier to use and can fit in more workflows (e.g., serverless, edge, browser). In many cases, they are operationally much easier to manage than a traditional database.

You could also look at these databases through the lens of hard-to-compute, easy-to-verify.

Subscribe for daily posts on startups & engineering.

Human-in-the-Loop and Other AI Mistakes

Nov 27, 2022

The 2016 influx of deep learning startups was marked by human-in-the-loop AI. Chatbots and magic assistants were powered by a human on the other side. Driverless cars with a remote driver handling most interactions.

The general playbook in 2016 went something like this:

The performance of deep neural networks scales with data and compute. Extrapolating 2016 results over the next few years shows that we'll have ubiquitous human-level conversational AI and other sophisticated agents.

To prepare for this, we'll be first-to-market by selling the same services today except with humans that are augmented by the models. While this will initially be a loss-leader, it will be extremely powerful once the models are good enough to solely power the interact (without humans). By then, we'll have the best distribution.

Of course, we still don't have the level of conversational AI that can power magic assistants or chatbots without a human-in-the-loop. Most of these startups ran out of money. The most successful ones were the ones that sold to strategic buyers (Cruise/GM, Otto/Uber, Zoox/Amazon) or ones that sold picks and shovels (Scale AI).

Extrapolating performance for ML models is challenging. More data, more compute, or different architectures don't always mean better performance (look at some of the initial results from Stable Diffusion 2.0).

We don't seem to be making the same mistakes as 2016 in the era of generative AI. Some companies are solving for distribution using someone else's proprietary model (e.g., Jasper AI/GPT-3), but these products deliver real value to customers today – with no human in the loop. If LLM performance plateaued, these companies would likely still have some intrinsic value.

Technical Posts Overview and Roundup

Nov 26, 2022

While I write about various things I find interesting, specific technical topics tend to be recurring themes. So, as both a way to organize the posts for a future long-form synthesis and for the influx of new subscribers, an overview and roundup of posts I've written over the last two years.

On Docker. One of my areas where I've gone deep technically. Docker is interesting because it solves three different problems – (1) a runtime to execute workloads in a distributed system, (2) a packaging format for production artifacts, and (3) a developer tool.

On cloud strategy. This is only the iceberg of cloud penetration. There are significant workloads to be lifted from on-premise data centers and net-new use cases. However, all types of dynamics are at play – cost advantages, distribution channels, and developer experience.

On version control, package management, and other software workflows. Shipping code has never been more of a coordination problem. How do you share and reuse code quickly and efficiently? So much happens between a developer writing code and that code making it to production.  

On WebAssembly. WebAssembly can be useful on the client – as a runtime that opens up the web to other LLVM-based languages. It can also be useful on the server – as a granular runtime that is more lightweight than a container.

On infrastructure-as-code. As cloud standardizes APIs, we can start to treat infrastructure as code. Of course, this opens up entirely new workflows – embedding infrastructure into CI, easy replication of entire stacks for preview environments, staging environments, reproducible infrastructure, and more. Still, a lot to figure out.

On software configuration. Configuring software is changing due to clouds, runtimes (Docker and WebAssembly), and infrastructure-as-code.

On Kubernetes. A complex yet essential part of the software stack. I worked on Kubernetes open-source at Google, so I'm biased. But I like to think that I provide a nuanced view of it.