If you want to coerce a typed JSON response out of an LLM, you have a few options:
Control the token distributions via state machines with a regex or context-free grammar. The benefit of this method is correctness. You are guaranteed to get a valid response on the first generation. It comes at the cost of compute (the token distributions are computed and modified on every generation) and development time (it’s hard to define a valid regex or context-free grammar for every request/response that you want, especially in a prototyping phase).
Pydantic-type annotations that specify types and use a JSON parser to enforce structure (and to repair). This method uses prompt engineering and type hints (usually via a data validation layer like Pydantic) to send a templated prompt. The problem is that you have to deal with a third-party type system, and the method is strongly coupled to the programming language. In addition, when the model inevitably generates the wrong JSON, the compiler can’t provide helpful hints to repair the generation.
Proposing a third option: using the TypeScript type system to specify the output types for the desired response. With some intermediate layers of logic, you can achieve a high success rate at generating type-safe JSON responses for various types (even nested and complex ones).
I’m releasing another endpoint to my structured LLM API series on Thiggle:
/v1/completion/typed. The idea is a standard completion request extended to output valid JSON conforming to a specific response type. The type is specified in TypeScript types, which can include multiple types passed in alongside the prompt. Check out the documentation for more usage. Play around with it via a UI-based playground on thiggle.com/playground or via hosted API, api.thiggle.com.
How does it differ from OpenAI function calling? First, it’s not tied to GPT-4 or OpenAI models. It can run on any base model, such as Llama 2. Second, it can handle more complex types that would be hard to describe in something more verbose, like JSON Schema. However, the caveat of not using a full logit approach is that the output is not guaranteed — there are some cases where the model can fail to coerce the model into outputting the right response
How is it different from some of the open-source libraries that do similar things? Your mileage may vary with different approaches. Some are integrated deeply into the client language via annotations or a separate specification. TypeScript types might not be the best schema for all responses, but it’s easy enough to translate basic types from your chosen programming language — and the type system is expressive enough to encompass most of the use cases from different languages.