reproducible builds - a month ahead of schedule

← hearing wagner data preparations | blog | still working on reproducible builds →

I think this might be my last blog entry on the subject of building SBCL for a while.

One of the premises behind SBCL as a separate entity from CMUCL, its parent, was to make the result of its build be independent of the compiler used to build it. To a world where separate compilation is the norm, the very idea that building some software should persistently modify the state of the compiler probably seems bizarre, but the Lisp world evolved in that way and Lisp environments (at least those written in themselves) developed build recipes where the steps to construct a new Lisp system from an old one and the source code would depend critically on internal details of both the old and the new one: substantial amounts of introspection on the build host were used to bootstrap the target, so if the details revealed by introspection were no longer valid for the new system, there would need to be some patching in the middle of the build process. (How would you know whether that was necessary? Typically, because the build would fail with a more-or-less – usually more – cryptic error.)

Enter SBCL, whose strategy is essentially to use the source files first to build an SBCL!Compiler running in a host Common Lisp implementation, and then to use that SBCL!Compiler to compile the source files again to produce the target system. This requires some contortions in the source files: we must write enough of the system in portable Common Lisp so that an arbitrary host can execute SBCL!Compiler to compile SBCL-flavoured sources (including the standard headache-inducing (defun car (list) (car list)) and similar, which works because SBCL!Compiler knows how to compile calls to car).

How much is “enough” of the system? Well, one answer might be when the build output actually works, at least to the point of running and executing some Lisp code. We got there about twelve years ago, when OpenMCL (as it was then called) compiled SBCL. And yet... how do we know there aren't odd differences that depend on the host compiler lurking, which will not obviously affect normal operation but will cause hard-to-debug trouble later? (In fact there were plenty of those, popping up at inopportune moments).

I’ve been working intermittently on dealing with this, by attempting to make the Common Lisp code that SBCL!Compiler is written in sufficiently portable that executing it on different implementations generates bitwise-identical output. Because then, and only then, can we be confident that we are not depending in some unforseen way on a particular implementation-specific detail; if output files are different, it might be a harmless divergence, for example a difference in ordering of steps where neither depends on the other, or it might in fact indicate a leak from the host environment into the target. Before this latest attack at the problem, I last worked on it seriously in 2009, getting most of the way there but with some problems remaining, as measured by the number of output files (out of some 330 or so) whose contents differed depending on which host Common Lisp implementation SBCL!Compiler was running on.

Over the last month, then, I have been slowly solving these problems, one by one. This has involved refining what is probably my second most useless skill, working out what SBCL fasl files are doing by looking at their contents in a text editor, and from that intuiting the differences in the implementations that give rise to the differences in the output files. The final pieces of the puzzle fell into place earlier this week, and the triumphant commit announces that as of Wednesday all 335 target source files get compiled identically by SBCL!Compiler, whether that is running under Clozure Common Lisp (32- or 64-bit versions), CLISP, or a different version of SBCL itself.

Oh but wait. There is another component to the build: as well as SBCL!Compiler, we have SBCL!Loader, which is responsible for taking those 335 output files and constructing from them a Lisp image file which the platform executable can use to start a Lisp session. (SBCL!Loader is maybe better known as “genesis”; but it is to load what SBCL!Compiler is to compile-file). And it was slightly disheartening to find that despite having 335 identical output files, the resulting cold-sbcl.core file differed between builds on different host compilers, even after I had remembered to discount the build fingerprint constructed to be different for every build.

Fortunately, the actual problem that needed fixing was relatively small: a call to maphash, which (understandably) makes no guarantees about ordering, was used to affect the Lisp image data directly. I then spent a certain amount of time being thoroughly confused, having managed to construct for myself a Lisp image where the following forms executed with ... odd results:

(loop for x being the external-symbols of "CL" count 1)
; => 1032
(length (delete-duplicates (loop for x being the external-symbols of "CL" collect x)))
; => 978

(shades of times gone by). Eventually I realised that

(unless (member (package-name package) '("COMMON-LISP" "KEYWORD" :test #'string=))
  ...)

was not the same as

(unless (member (package-name package) '("COMMON-LISP" "KEYWORD") :test #'string=)
  ...)

and all was well again, and as of this commit the cold-sbcl.core output file is identical no matter the build host.

It might be interesting to survey the various implementation-specific behaviours that we have run into during the process of making this build completely repeatable. The following is a probably non-exhaustive list – it has been twelve years, after all – but maybe is some food for thought, or (if you have a particularly demonic turn of mind) an ingredients list for a maximally-irritating CL implementation...

Perhaps most obviously, various constants are implementation-defined. The ones which caused the most trouble were undoubtably most-positive-fixnum and most-negative-fixnum – particularly since they could end up being used in ways where their presence wasn’t obvious. For example, (deftype fd () `(integer 0 ,most-positive-fixnum)) has, in the SBCL build process, a subtly different meaning from (deftype fd () (and fixnum unsigned-byte)) – in the second case, the fd type will have the intended meaning in the target system, using the target’s fixnum range, while in the first case we have no way of intercepting or translating the host’s value of most-positive-fixnum. Special mentions go to array-dimension-limit, which caused Bill Newman to be cross on the Internet, and to internal-time-units-per-second; I ended up tracking down one difference in output machine code from a leak of the host’s value of that constant into target code.
Similarly, random and sxhash quite justifiably differ between implementations. The practical upshot of that is that these functions can’t be used to implement a cache in SBCL!Compiler, because the access patterns and hence the patterns of cache hits and misses will be different depending on the host implementation.
As I’ve already mentioned, maphash does not iterate over hash-table contents in a specified order, and in fact that order need not be deterministic; similarly, with-package-iterator can generate symbols in arbitrary orders, and set operations (intersection, set-difference and friends) will return the set as a list whose elements are in an arbitrary order. Incautious use of these functions tended to give rise to harmless but sometimes hard-to-diagnose differences in output; the solution was typically to sort the iteration output before operating on any of it, to introduce determinism...
... but it was possible to get that wrong in a harder-to-detect way, because sort isnot specified to be stable. In some implementations, it actually is a stable sort in some conditions, but for cases where it’s important to preserve an already-existing partial order, stable-sort is the tool for the job.
The language specification explicitly says that the initial contents of uninitialized arrays are undefined. In most implementations, at most times, executing (make-array 8 :element-type (unsigned-byte 8)) will give a zero-filled array, but there are circumstances in some implementations where the returned array will have arbitrary data.
Not only are some constants implementation-defined, but so also are the effects of normal operation on some variables. *gensym-counter* is affected by macroexpansion if the macro function calls gensym, and implementations are permitted to macroexpand macros an arbitrary number of times. That means that our use of gensym needs to be immune to whatever the host implementation’s macroexpansion and evaluation strategy is.
The object returned by byte to represent a bitfield with size and position is implementation-defined. Implementations (variously) return bitmasks, conses, structures, vectors; host return values of byte must not be used during the execution of SBCL!Compiler. More subtly, the various boole-related constants (boole-and and friends) also need special treatment; at one point, their host values were used when SBCL!Compiler compiled the boole function itself, and it so happens that CLISP and SBCL both represent the constants as integers between 0 and 15... but with a different mapping between operation and integer.
my last blog entry talked about constant coalescing, and about printing of (quote foo). In fact printing in general has been a pain, and there are still significant differences in interpretation or at least in implementation of pretty-printing: to the extent that at one point we had to minimize printing at all in order for the build to complete under some implementations.
there are a number of things which are implementation-defined but have caused a certain amount of difficulty. Floating point in general is a problem, not completely solved (SBCL will not build correctly if its host doesn’t have distinct single- and double-float types that are at least approximately IEEE754-compliant). Some implementations lack denormalized numbers; some do not expose signed zeros to the user; and some implementations compute (log 2d0 10d0) more accurately than others, including SBCL itself, do. The behaviour of the host implementation on legal but dubious code is also potentially tricky: SBCL’s build treats full warnings as worthy of stopping, but some hosts emit full warnings for constructs that are tricky to write in other ways: for example to write portable code to handle multiple kinds of string, one might write (typecase string (simple-base-string ...) ((simple-array character (*)) ...)) (string ...)) but some implementations emit full warnings if a clause in a typecase is completely shadowed by other clauses, and if base-char and character are identical in that implementation the typecase above will signal.

There were probably other, more minor differences between implementations, but the above list gives a flavour of the things that needed doing in order to get to this point, where we have some assurance that our code is behaving as intended. And all of this is a month ahead of my self-imposed deadline of SBCL’s 15th birthday: SBCL was announced to the world on December 14th, 1999. (I’m hoping to be able to put on an sbcl15 workshop in conjunction with the European Lisp Symposium around April 20th/21st/22nd – if that sounds interesting, please pencil the dates in the diary and let me know...)