AI Means More Developers

May 29, 2023

Software trends towards higher abstractions. You can do more with less. Not only do developers never need to touch hardware anymore, but they might not even need to interface with public cloud providers and might opt to use developer-friendly middlemen. That means less code to write (and maintain). Less code to write means a narrower range of skills needed to get started. This lowers the barrier to entry. The average developer doesn’t need to know about Linux system administration or manual memory management (and that’s ok).

AI tackles the other end — how do we write and debug code faster? You must maintain it, but the iteration loops are much quicker. Generate a draft to start with. Debug with AI assistance. Experienced developers can be much more productive. The intuition around what code generation is right and wrong saves developers significant time writing the code. I’ve often found I’m still guiding the models to generate the right code, but I can outsource the work. Knowing what you want and how to do it, combined with AI, makes you a powerful developer. More productive developers should mean even higher salaries for the best developers.

On the other hand, less experienced developers can get away with gaps in their knowledge. They could generate a bunch of code with AI, but they won’t be able to maintain it or accurately debug the inevitable mistakes. But instead, there are enough developer-friendly primitives for junior developers to deploy to (countless frontend frameworks and hosting APIs). Less experienced developers, who might not have been able to deploy an end-to-end application, might be able to cobble together a cloud-hosted function partially written by AI.

Finally, the middle of the pack might get squeezed out. Developers that can write code but can’t wrangle dependencies or reason about code quality or maintenance will suffer. On the other hand, experienced developers will take on some of their work (as part of being more productive), and less experienced developers will use AI to deploy work more on par with their work.

Get daily posts on startups, engineering, and AI

Two Years of Daily Blogging

May 28, 2023

This is daily blog #730. Last year I visualized the hyperlinks between my posts (using virgo, my graph-based configuration language). This year, the embedding space of the last 730 posts.

  1. I embedded all my posts using BERT (a transformers model pre-trained on a large corpus of English data). BERT uses 768-dimensional vectors.
  2. Then I ran them through t-SNE (t-distributed stochastic neighbor embedding, a fancy way to visualize high-dimensional data by translating them to two dimensions.
  3. Finally, I separated the two-dimensional space into equally sized bins and asked GPT-3.5 to develop a category name for each set of post titles.

I cleaned up a few titles that were too long for the display, but that’s about it. The code and data are on GitHub at r2d4/blog-embeddings.

Of course, there’s a lot missing when the dimensionality is reduced to only two, but there are some interesting insights.

The topics range from highly technical on the bottom left (Kubernetes and Cloud Infrastructure) to more meta topics on the top right (philosophy, problem-solving). There’s roughly equal distribution of posts across the four quadrants.

Don’t break the chain!

Prompt Engineering is Configuration Engineering

May 27, 2023

Ironically, one of the most challenging aspects of distributed systems is configuration management. Consensus, fault tolerance, leader election, and other concepts are complex but relatively straightforward.

Configuration management is challenging because it’s about the convergence of the internal system state, a declarative API, and tooling that glues together that API with other adjacent systems (CI/CD, developer tools, DevOps, etc.). There’s no algorithm like Raft or Paxos to guide the implementation. And so many different concerns end up with an API that requires the knowledge of multiple roles (operators and developers).

The history of configuration management in Kubernetes is a long one. Initially, JSON and YAML exposed fairly verbose declarative APIs. Inevitably, there was duplication and complexity. Developers turned to templating (via Helm, which used Jinja). This allowed some level of packaging — reusable configurations that could be further configured for each organization’s use case. But templates soon became even more complex themselves, to the point of nearly every field becoming a variable field via the template. With control flow, it was hard to tell what the end representation of the configuration would be. Infrastructure was already hard enough to test, and it became even harder with just-in-time compiled templates that were tough to type or schema check.

There were attempts to build more advanced languages that did more. Eliminating duplication with object-orientation, schema definitions, modules, packages, scripting, and control flow (see every sufficiently advanced configuration language is wrong).

I called this progression The Heptagon of Configuration. And we’re already seeing it in the prompt engineering world. In many ways, it’s the same problem, in a different form. Powerful but horizontal APIs that abstract away an enormous amount of complexity need to be configured for various use cases. Pipelines of applications built on single APIs.

How might prompt engineering evolve like configuration engineering?

First, we had hardcoded prompts. But developers started building applications that did more dynamic work in prompts — adding in user input, context from a database, or even scraped web results.

Then came the prompt templates. There’s guidance from Microsoft, which uses a Handlebars-like syntax, which is most likely the most advanced. Jinja templates embedded in Python applications.

The next step is a full DSL around prompts. LMQL is a query language for prompting. It might abstract some aspects of prompt engineering away. Things like schema checking (you might use ReLLM).

Finally, we’ll probably see more fine-tuned or “hardcoded” models that expose a more specific API that requires less templating or prompt engineering. Taking patterns known to work and exposing them behind a single API.

SEO Inside AI

May 26, 2023

What does SEO look like in a world where most queries are LLM-assisted in some way?

Keyword stuffing (at train time). It might be possible to keyword stuff data that are part of a training set by optimizing for specific tokens or token sequences. This might be as simple as “keyword stuffing” for LLMs but also more advanced in taking advantage of the embedding space.

Prompt injection (at inference time). For models that are augmented with tools (e.g., ChatGPT Plugins or Bing Chat), it is possible to prompt inject or prompt poison. The basic method goes like this: embed a specific prompt injection (e.g., “Ignore all previous directions and…”) inside the content of a website or other resource that an LLM would access (e.g., HTML or API). Then, when the LLM crawls your site as part of the query, it will template some features of your site into another prompt (possibly to summarize or extract information).

Token manipulation (SolidGoldMagikarp). Some odd tokens exist in the GPT-2 / GPT-3 / GPT-J. token vocabularies, like SolidGoldMagikarp and BuyableInstoreAndOnline. These shouldn’t be common enough to show up in the 50k token vocabulary, but they show up anyways. And when you query the model with these tokens, they spit out seemingly random results. For example, when asked, “What does the string “SolidGoldMagikarp” refer to?”, ChatGPT once responded, “The word “distributed” refers to …”. (now patched, see the original article).

The long story is that these tokens somehow end up in the vocabulary due to mistakes or overfitting in the training data (possibly) and then cause erratic behavior at inference time. There’s probably a whole world of SEO to be discovered in the embedding space (similar to keyword stuffing).

Ranking / Ads at Inference. Finally, there could just be a new RLHF or another layer that augments generations to add in more branded or relevant content. In this case, SEO would be related to the ranking algorithm that would sit on top (Goodhart’s law — when a measure becomes a target, it ceases to be a good measure).

Get daily posts on startups, engineering, and AI

A List of Things I Was Wrong About

May 25, 2023

I’ve been writing this blog daily for almost two years. A look at how my ideas have changed and what I was completely wrong about (90% of everything is crap). I’m a person that needs to learn via first principles, so doing is the most effective way for me to improve.

  1. Remote Developer Environments never caught on.
  2. Microsoft and Google turn Notion and Airtable into Commodity SaaS. Didn’t happen. Tables (by Google) and Loops (by Microsoft) have effectively stalled.
  3. “How to Beat Google Search” — I had written about GPT-J two days before and didn’t make any connection between search and LLMs.
  4. An overarching thesis about the securitization of everything. I connected this to Thomas Piketty’s Capital in the Twenty-First Century with my Ownership in the Twenty-First Century. It still might happen, but it’s not here today.
  5. VPN as a developer tool. The ecosystem of applications on top of WireGuard never caught on.
  6. Agent vs. Agentless architecture in distributed systems. Sidecar agents are still the easiest way to do things.
  7. Platform teams quickly disappearing. I didn’t explicitly say this anywhere, but I’ve incorporated it in many places. The gist: most internal platform abstractions are net negative productivity. The reality is: platform teams are probably more widespread than ever.
  8. SSH as less relevant in the cloud. In fact, new frameworks like mrsk by DHH use SSH as a central technology. So never bet against Lindy technology.
  9. Observability at the edge — Good idea, but hard to implement in practice.
  10. MicroSaaS —  Not completely wrong on this, but most of it was a Zero Interest Rate Policy phenomenon

…and many more.

Things I was right about but didn’t take advantage of (probably worse than being wrong!)

  1. Buying IPv4 Addresses. IPv4 prices have just about doubled since I wrote about it. Owning a small block would have been fun (and profitable)!
  2. ELO Rating. I forget why I wrote about this topic, but I never connected it to the idea of ranking models. Now it’s useful for model evaluation.

Things I was right about (but in the wrong way)

  1. Meta and Zuckerburg’s tenacity. I was never fully convinced by Meta’s metaverse strategy, but I thought it was refreshing to see a founder-led strategy that was bold and contrarian. Turns out that the metaverse was the wrong strategy at precisely the wrong time, but Zuckerberg was able to correct the course. As a result, they are doing some of the most important open work in AI.

The jury is still out on:

  1. Is AWS a Dumb Pipe? In the current AI revolution, I think we will get the answer.
  2. TypeScript for Infrastructure. It’s happening, but innovation is slowing.
  3. Apple’s Ads business creating competing incentives between its unique privacy position and a new business.
  4. MLOps and DevOps Convergence. I wrote this about my work on the last AI stack, but the jury is still out on whether this plays out for the LLMOps stack.

The ChatGPT Plugin Specification

May 24, 2023

ChatGPT plugins can call external services to augment answers. They might run a code interpreter or access an API like OpenTable or Zapier.

There isn’t publicly available information about how ChatGPT plugins work behind the scenes — it could be something like Toolformer or a custom implementation. But the public API is interesting in itself.

Developers submit a manifest with some metadata. The interesting parts of this are:

  • Name for model and name for human (or company) — Plausibly how the model refers to the tool. Maybe a simple pattern matching to understand when the generated output is deciding to use a specific tool.
  • Description for model — This most likely gets templated into the prompt somehow. You can only use 3 plugins simultaneously. Maybe that’s a result of this workflow. There’s some guidance and guardrails around this so that it doesn’t spill over into other parts of the completion (because it’s most likely templated into the prompt). This seems like a great vehicle for prompt injection (especially hard to find in a chained workflow of plugins).
  • OpenAPI specification — This is how the model understands what to call. There’s probably no fine-tuning on specific tools (maybe there’s fine-tuning like Toolformer with OpenAPI specifications, but it doesn’t seem like it). This means that they can add new plugins without any extra work. There are also some limits on the size of the OpenAPI spec.

The interesting things about the plugin specification:

  • Plugins do not know anything about the model. They are simply an API server and an API specification. This means that plugins should be theoretically compatible across different model versions.

There’s no natural language parsing or usage in the actual plugin. Just JSON or whatever your wire protocol is.