Florents Tselai Think, Code, Read, Sleep, Repeat

setup.py and Makefile: A just good enough build tool

15 Apr 2024

Much of the discussion around the xz backdoor vulnerability has focused on the complexity of tools like autoconf. See this discussion pgsql-hackers: Security lessons from liblzma, for context.

I don’t have a specific proposal to save the world; instead, I’ll try to sketch my idea of a just-good-enough-build tool and point to three of my favorite examples.

Semantics are indeed important, but I won’t drown in them. As a build tool, I define a software tool that automates turning a reasonably complex codebase into a single artifact suitable for distribution. This artifact can be a binary executable, a shareable tarball, or a zip file ready to be uploaded to package managers.

What about the “good enough” part? Here’s where things get more complex. From my perspective, a good enough build tool:

It should be readily available on most reasonable platforms. If I have to search for a different name to brew install it in my MacOS and a different package to apt install in my CI pipelines, that’s already too complex.

It should be readable and easy to navigate: nothing beats a Makefile on this. The next best thing is the ci.yml file. That's the first thing I reverse-engineer to figure out how to build a package locally.

It should play nicely with CLI and environment variables to act both as flags and parameterizable defaults.

It should make it easy to call subprocesses from. Subprocesses are useful for git operations (to extract the proper version for instances) or to call other build tools necessary by a submodule. For example, in many of the projects I’m involved in submodules written in a language different from the main project. Python packages, for instance, that have C-written dependencies, which should be built with autoconf. Bash scripts are the ideal option here, but they come with their usual shortcomings and accidental complexity. They can call scripts that seed the database with data for local development and testing.

It should be (almost) equally usable for 3 things: setting up a local development environment, running the CI pipeline, produce the build artifacts. In previous decades, one could prioritize and pick, but today, especially in open-source projects, the lines between the three are blurred.

Its language should be powerful but expressive. Vanilla shell-fu is not an option either. I’m sure Unix die-hards can do everything in a single sh script and they are indeed ideal for bootstrapping a project, but soon enough they turn too complex and hard to decipher. And they’re notoriously hard to debug, too. Take this build script yb_build.sh to build Yugabyte. I’ve used it and can confirm it is battle-tested, but I don’t think I can easily break it down into pieces in less than 5 minutes.

It can act as a documentation repository of platform-specific tweaks and build knowledge.

If you think I’ve over-fitted these properties to fit the initial title of the post, maybe you're right, but I can't help but think that whenever I find myself bootstrapping a complex project I start by building step-by-step either a Makefile or a bootstrap.sh/py that looks a lot like a setup.py

Below are some pieces of software that showcase what I'm looking for.

Redis Makefile

Antirez (the creator of Redis) recently noted on Twitter:

Repeat with me: Unix is no longer the complex jungle it was 30 years ago, and I no longer need a build system other than a Makefile with a few ifdefs for the majority of system software projects. I thereby promise to try very hard to keep things simple UNITL POSSIBLE.

The Redis Makefile itself is a testament to that: It's simple; it gets the job done and has done so for many years.

LLamafile Makefile

Another Makefile-based popular software here, but I especially like how jart has used a generic top-level Makefile that includes more specific Makefiles stored in the build directory, heavily relying on the fact that Makefiles rely on text substitution and macros A deps.mk to handle dependencies and a config.mk for configuration, which can be overridden by environment variables, and a powerful rules.mk for the juicy stuff.

Also, note the download-cosmocc.sh script instead of requiring apt install. Bandwidth nowadays is cheap, CPU time as well. Don't make me apt install something. Download it from the source in tmp/ and install it yourself. And finally, do you need a checksum library? Just ship a sha256sum.c with it and compile it.

EdgeDB setup.py

The setup.py script that EdgeDB uses is my favorite build tool. It supercharges a usually Python-only approach to build something much more complex.

Its logic is clear and written in Python. Subprocesses are called to extract git info and even build Postgres from source (which itself relies on autoconf)! The fact that Python has a lot of syntactic sugar to deal with Paths beautifully helps a lot. Also, notice that it's clear how both cython-ized and Rust extensions are built. And, of course, taps into an existing package ecosystem like setuptools. So there you have it: a simple setup.py script that builds a database (!) And builds components of it in at least 4 different languages. Isn't this beautiful and powerful?