Sandbox Your Prompts

Apr 19, 2023

Prompt injection is a real security issue that (should) prevent LLMs from going into the enterprise (today). It’s not an issue if you’re just returning generated text to the user without any infrastructure in between. However, if you’re trying to perform any sort of action outside the LLM — call an external service, query a database, take an action, execute a plugin — you’re vulnerable to prompt injection. See Simon Willison’s “Prompt Injection: What’s the Worse That Can Happen?” (spoiler: it’s worse than you think).

It’s hard to just filter out malicious input at the very last stage — LLMs are great encoders and decoders by nature. “Rewrite the text in the style of Shakespeare” to bypass any word filters. Or exfiltrate data through markdown images (see proof-of-concept). Currently, there’s no framework that addresses these security holes.

A sandboxed prompt context is the answer. A virtual environment for the LLM to “execute” in which we can easily constrain every part of the environment —

  • What files does the LLM have access to?
  • What libraries or tools are installed?
  • What credentials are mounted?
  • Which parts of the network are firewalled? Which parts aren’t?

The LLM itself isn’t sandboxed, but all of the adjacent infrastructure (running chain-of-thought workflows, dispatching plugins or extensions, or taking action otherwise) should be sandboxed. There’s a tight coupling between the LLM calls and this infrastructure.

The simplest example is a code interpreter. LLMs shouldn’t leak state by reusing REPLs. A more practical example is a database connection string. It should only be exposed to certain parts of the workflow. It should never touch the prompt itself. In the markdown vulnerability, a tool to render a markdown image shouldn’t have internet access to exfiltrate data. Authorization and prompt context are two sides of the same sword.

Luckily, there are DevOps primitives like containers and WebAssembly runtimes that provide a level of isolation like this. See my list of different types of software containers for more. The tougher part: bridging the gap between these systems and the emerging LLM stack.