Working on LLMs.

Previously, I was a software engineer working on open-source Kubernetes at Google, building and maintaining Kubernetes developer tools such as minikube and skaffold. I also worked on machine learning pipelines as a maintainer of the Kubeflow project. Before Google, I worked at The Blackstone Group in NYC.

I received a BA in Mathematics from Columbia University. I have an MBA from Stanford Graduate School of Business, where I was an Arjay Miller Scholar.

Software

AI

  • @react-llm/headless - Easy-to-use headless React Hooks to run LLMs in the browser with WebGPU. All inference happens clientside in the browser. See chat.matt-rickard.com for a live example.
  • LLaMaTab - A Chrome Extension that uses @react-llm/headless to run an LLM entirely in the browser. Load the model once and run on any site.
  • ReLLM - Exact structure out of any LLM by constraining pre-generation logits with regex. ParserLLM is an extension to ReLLM that constrains LLMs to only generate a specific context-free grammar (e.g., JSON).
  • openlm - a drop-in OpenAI-compatible Python library that can call LLMs from any other hosted inference API. OpenLM takes in the same parameters as the openai.Completion class and outputs a similarly structured result.
  • llm.ts - A TypeScript equivalent of OpenLM that works in any JS environment (browser, node, deno)
  • Kubeflow - Machine Learning Toolkit for Kubernetes
  • ScapeNet and osrs-ocr - Fine-tuned object recognition and OCR models for the MMORPG Old School RuneScape

Distributed Systems

Interviews

Conference Talks