Fuzzy Databases

Nov 28, 2022

Different trade-offs already give rise to significantly different types of databases – from OLTP to OLAP, relational to non-relational, key-value, graph, document, and object database (to name a few).

What if you relaxed some key properties that we've come to expect?

What if databases returned extrapolated results?

If you squint, LLMs resemble something like a vector search database. Items are stored as embeddings, and queries return deterministic yet fuzzy results. What you lose in data loading time (i.e., model training), you make up for in compression (model size) and query time (inference). In the best case, models denoise and clean data automatically. The schema is learned rather than declared.

What if anyone could write to the database?

Blockchains are databases as well. They provide a verifiable ledger and peer-to-peer write access in exchange for significant trade-offs in privacy, throughput, latency, and storage costs. Keys are hashes (similar to a distributed hash table). Authorization is done through public-key infrastructure, and a generalized computing model can be built on top of the distributed ledger (e.g., the EVM).

What if the database could be embedded anywhere?

SQLite/DuckDB answer this question. While neither can support concurrent writes from different processes and are limited in other terms of horizontal scaling, they can be easier to use and can fit in more workflows (e.g., serverless, edge, browser). In many cases, they are operationally much easier to manage than a traditional database.

You could also look at these databases through the lens of hard-to-compute, easy-to-verify.