Virgo: a Graph-based Configuration Language

Over the last few years, I've worked on open source distributed systems in Go at Google. I've thought a lot about dependency management, systems configuration, programming languages, and compilers.

Again and again, I saw the same fundamental data structure underpinning these technologies: the directed acyclic graph. The most frustrating part was modeling graph-based configuration in languages that optimized for hierarchical data structures. That's why I created Virgo.

Virgo is a graph-based configuration language. It has two main features: edge definitions and vertex definitions. The Virgo configuration file then parses into an adjacency list. You can clearly achieve similar results from adding additional conventions and restrictions on YAML or JSON. Much like YAML optimized for human readability, Virgo optimizes for natural graph readability, editability, and representation.

// config.vgo

a -> b, c, d -> e <- f, g
A graphical representation of the Virgo graph

Virgo is open to proposals and language changes. Please open up an issue to start a discussion at https://github.com/r2d4/virgo.

Graphs are everywhere in configuration management. One graph that engineers may be familiar with is the Makefile target graph. The make tool topologically sorts the targets that it resolves which lets it build the files in order. Virgo's CLI or Go library allow developers to replicate this feature easily.

clean -> parser, lexer -> "src files" -> test

parser = `goyacc parser.y`
lexer  = `golex lex.l`
clean  = `rm lex.yy.go parser.go || true`
test   = `go test-v`
"src files"  = `go build ./...`
A simple example to build the Virgo CLI tool with the language itself.

There are two entrypoints to parsing the Virgo file. You can use the Go library found in the same repository to parse the file into a native Go struct. There is also a published CLI binary that exposes the parsing function for other environments.

package main

import (
	"fmt"
	"io/ioutil"
	"log"
	"os"
	"strings"

	"github.com/pkg/errors"
	"matt-rickard.com/virgo/pkg/virgo"
)

func main() {
	if err := run("config.go"); err != nil {
		log.Fatal(err)
		os.Exit(1)
	}
}

func run(fname string) error {
	f, err := ioutil.ReadFile(fname)
	if err != nil {
		return errors.Wrap(err, "reading file")
	}
	g, err := virgo.Parse(f)
	if err != nil {
		return errors.Wrap(err, "parsing virgo file")
	}

	nodes, err := virgo.TopSort(g)
	if err != nil {
		return errors.Wrap(err, "topological sort")
	}

	out := []string{}
	for _, n := range nodes {
		out = append(out, g.Vertices[n]...)
	}
	fmt.Println(strings.Join(out, "\n"))
	return nil
}
Code snippet to read a Virgo file, topologically sort the graph and print out the vertex definitions for each node in order.
$ virgo run build.vgo
Or build with the CLI tool

One operation we frequently want to perform on graphs is a topological sort. A topological sorting is a linear ordering of vertices such that for every directed edge u -> v, vertex u comes before v in the ordering.

The CLI tool topologically sorts the graph, and can even start from a particular vertex (analogous to a Make target).

$ virgo run build.vgo:parser

Build systems are not the only type of configuration schema that can benefit from a graphical representation. Some other examples include:

  • deployment of microservices
  • docker build instructions
  • continuous integration pipelines
  • package dependencies
  • git commits

For full documentation on the language and features of Virgo, visit the GitHub page https://github.com/r2d4/virgo.

The Heptagon of Configuration

The Heptagon of Configuration is a term I'm coining to describe a pattern I've observed in software configuration, where configuration evolves through specific, increasing levels of flexibility and complexity, before returning the restrictive and simple implementation.

How does the Cycle Work?

Hardcoded values are the simplest configuration - but provide very little flexibility. The program surface increases, and with it the configuration, incorporating environment variables*, flags, and when that becomes cumbersome, a configuration file to encode the previous.

When multiple environments require similar configuration files, a templating language is used to eliminate repetition and promote reuse of templates.

The templates grow in complexity, until nearly every option in the configuration file is templated - rendering the reusability useless.

A Domain Specific Language (DSL) is invented to promote reuse of logical blocks, instead of using an inflexible, static template.

Since the DSL incorporates domain-specific knowledge by definition, every new function added increases the complexity to the end user. The code eventually becomes unreadable and unmaintainable, and the remaining programs are rewritten in Bash. Bash provides ultimate flexibility with a guise of reusability.

However, the Bash scripts are difficult to prove correctness, and rely on fragile text manipulation to generate a configuration file or template.

The most used Bash scripts evolve into new CLIs with hard-coded behavior. The cycle continues.

Coincidence?

*Fun fact: Environment variables celebrated their 40th birthday this year.