Source Code Generation

Jan 26, 2022

AI is coming for source code generation. But for the boring stuff.

I'm not talking about machine readable or intermediate code generated by compilers (although AI is coming for that as well), but human-readable source code generation. These models will provide the glue between layers to seal up leaky abstractions. And the leakiest abstractions are first.

Take for example generic REST or gRPC API client/servers. It would be a pain to plumb through each request/response pair for each language when it can be inferred from the definition. Since these transport layers are language agnostic, client and server stubs are autogenerated for each language. For REST, there's Swagger codegen built on Swagger/OpenAPI JSON definitions. For gRPC, there's Protobuf definitions and a variety of generators.

Why generate source code instead of stick it behind a library? Generated code does not cover all use cases – hence, "stub". Modification and extension are too generic to be fulfilled in a meaningful library API. Least common denominator design doesn't do much.

Another example of code generation is ORMs – Object-Relational Mapping libraries. These provide a layer that autogenerates SQL queries from language objects. The problem is that there is not a clear 1-1 mapping between objects and relations. Inheritance, polymorphism, encapsulation have context-specific mappings to relational concepts, or no mapping at all.

AI code-generative models like Copilot can perform just-in-time context-aware mappings that ORMs and client/server stubs can't. It's a wholly better model than the current state-of-the-art: no generator step in the build process (historically a source of many bugs) and no generating dead code. So it will be the boring code that gets generated first. But a write-optimized codebase brings its own problems. See my first thoughts a month into using Copilot.