It’s been nearly fifteen years, and SBCL still can’t be reliably built by other Lisp compilers.
Of course, other peoples’ definition of “reliably” might differ. We
did achieve successful building under unrelated Lisp compilers
twelve years ago;
there were a couple of nasty bugs along the way, found both before and
after that triumphant announcement, but at least with a set of
compilers whose interpretation of the standard was sufficiently
similar to SBCL’s own, and with certain non-mandated but expected
features (such as the type (array (unsigned-byte 8) (*))
being
distinct from simple-vector
, and single-float
being distinct from
double-float
), SBCL achieved its aim of being buildable on a system
without an SBCL binary installed (indeed, using CLISP or XCL as a
build host, SBCL could in theory be bootstrapped starting with only
gcc).
For true “reliability”, though, we should not be depending on any
particular implementation-defined features other than ones we actually
require – or if we are, then the presence or absence of them should
not cause a visible difference in the resulting SBCL. The most common
kind of leak from the host lisp to the SBCL binary was the host’s
value of
most-positive-fixnum
influencing the target, causing problems from documentation errors all
the way up to type errors in the assembler. Those leaks were mostly
plugged a while ago, though they do recur every so often; there are
other problems, and over the last week I spent some time tracking down
three of them.
The first: if you’ve ever done (apropos "PRINT")
or something
similar at the SBCL prompt, you might wonder at the existence of
functions named something like
SB-VM::|CACHED-FUN--PINSRB[(EXT-2BYTE-XMM-REG/MEM ((PREFIX (QUOTE (102))) (OP1 (QUOTE (58))) (OP2 (QUOTE (32))) (IMM NIL TYPE (QUOTE IMM-BYTE))) (QUOTE (NAME TAB REG , REG/MEM ...)))]-EXT-2BYTE-XMM-REG/MEM-PRINTER|
.
What is going on there? Well, these functions are a part of the
disassembler machinery; they are responsible for taking a certain
amount of the machine code stream and generating a printed
representation of the corresponding assembly: in this case, for the
PINSRB
instruction. Ah, but (in most instruction sets) related instructions
share a fair amount of structure, and decoding and printing a PINSRD
instruction is basically the same as for PINSRB
, with just one
#x20
changed to a #x22
– in both cases we want the name of the
instruction, then a tab, then the destination register, a comma, the
source, another comma, and the offset in the destination register. So
SBCL arranges to reuse the PINSRB
instruction printer for PINSRD
;
it maintains a cache of printer functions, looked up by printer
specification, and reuses them when appropriate. So far, so normal;
the ugly name above is the generated name for such a function,
constructed by interning a printed, string representation of some
useful information.
Hm, but wait. See those (QUOTE (58))
fragments inside the name?
They result from printing the list (quote (58))
. Is there a
consensus on how to print that list? Note that
*print-pretty*
is bound to
nil
for this printing; prior experience has shown that there are strong
divergences between implementations, as well as long-standing
individual bugs, in pretty-printer support. So, what happens if I do
(write-to-string '(quote foo) :pretty nil)
?
- SBCL:
"(QUOTE FOO)"
, unconditionally - CCL:
"'FOO"
by default;"(QUOTE FOO)"
ifccl:*print-abbreviate-quote*
is set tonil
- CLISP:
"'FOO"
, unconditionally (I read the.d
code with comments in half-German to establish this)
So, if SBCL was compiled using CLISP, the name of the same function in
the final image would be
SB-VM::|CACHED-FUN--PINSRB[(EXT-2BYTE-XMM-REG/MEM ((PREFIX '(102)) (OP1 '(58)) (OP2 '(32)) (IMM NIL TYPE 'IMM-BYTE)) '(NAME TAB REG , REG/MEM ...))]-EXT-2BYTE-XMM-REG/MEM-PRINTER|
.
Which is shorter, and maybe marginally easier to read, but importantly
for my purposes is not bitwise-identical.
Thus, here we have a difference between host Common Lisp compilers
which leaks over into the final image, and it must be eliminated.
Fortunately, this was fairly straightforward to eliminate; those names
are never in fact used to find the function object, so generating a
unique name for functions based on a counter makes the generated
object file bitwise identical, no matter how the implementation prints
two-element lists beginning with quote
.
The second host leak is also related to quote
, and to our old friend
backquote
– though not related in any way to the
new implementation. Consider this
apparently innocuous fragment, which is a simplified version of some
code to implement the :type
option to
defstruct
:
(macrolet ((def (name type n)
`(progn
(declaim (inline ,name (setf ,name)))
(defun ,name (thing)
(declare (type simple-vector thing))
(the ,type (elt thing ,n)))
(defun (setf ,name) (value thing)
(declare (type simple-vector thing))
(declare (type ,type value))
(setf (elt thing ,n) value)))))
(def foo fixnum 0)
(def bar string 1))
What’s the problem here? Well, the functions are declaimed to be
inline
,
so SBCL records their source code. Their source code is generated by
a macroexpander, and so is made up of conses that are generated
programmatically (as opposed to freshly consed by the reader). That
source code is then stored as a literal object in an object file,
which means in practice that instructions for reconstructing a similar
object are dumped, to be executed when the object file is processed by
load
.
Backquote
is a reader macro that expands into code that, when evaluated,
generates list structure with appropriate evaluation and splicing of
unquoted fragments. What does this mean in practice? Well, one
reasonable implementation of reading `(type ,type value)
might
be:
(cons 'type (cons type '(value)))
and indeed you might (no guarantees) see something like that if you do
(macroexpand '`(type ,type value))
in the implementation of your choice. Similarly, reading `(setf
(elt thing ,n) value)
will eventually generate code like
(cons 'setf (cons (cons 'elt (list 'thing n)) '(value)))
Now, what is “similar”? In this context, it has a technical definition: it relates two objects in possibly-unrelated Lisp images, such that they can be considered to be equivalent despite the fact that they can’t be compared:
similar adj. (of two objects) defined to be equivalent under the similarity relationship.
similarity n. a two-place conceptual equivalence predicate, which is independent of the Lisp image so that two objects in different Lisp images can be understood to be equivalent under this predicate. See Section 3.2.4 (Literal Objects in Compiled Files).
Following that link, we discover that similarity for
cons
es
is defined in the obvious way:
Two conses, S and C, are similar if the car of S is similar to the car of C, and the cdr of S is similar to the cdr of C.
and also that implementations have some obligations:
Objects containing circular references can be externalizable objects. The file compiler is required to preserve eqlness of substructures within a file.
and some freedom:
With the exception of symbols and packages, any two literal objects in code being processed by the file compiler may be coalesced if and only if they are similar [...]
Put this all together, and what do we have? That def
macro above
generates code with similar literal objects: there are two instances
of '(value)
in it. A host compiler may, or may not, choose to
coalesce those two literal '(value)
s into a single literal object;
if it does, the inline expansion of foo
(and bar
) will have a
circular reference, which must be preserved, showing up as a
difference in the object files produced during the SBCL build. The
fix? It’s ugly, but portable: since we can’t stop an aggressive
compiler from coalescing constants which are similar but not
identical, we must make sure that any similar substructure is in fact
identical:
(macrolet ((def (name type n)
(let ((value '(value)))
`(progn
(declaim (inline ,name (setf ,name)))
(defun ,name (thing)
(declare (type simple-vector thing))
(the ,type (elt thing ,n)))
(defun (setf ,name) (value thing)
(declare (type simple-vector thing))
(declare (type ,type . ,value))
(setf (elt thing ,n) . ,value)))))
(def foo fixnum 0)
(def bar string 1))
Having dealt with a problem with
quote
,
and a problem with
backquote
,
what might the Universe serve up for my third problem? Naturally, it
would be a problem with a code walker. This code walker is
somewhat naïve, assuming as it does
that its body is made up of forms or tags; it is the assemble
macro,
which is used implicitly in the definition of VOPs (reusable assembly
units); for example, like
(assemble ()
(move ptr object)
(zeroize count)
(inst cmp ptr nil-value)
(inst jmp :e DONE)
LOOP
(loadw ptr ptr cons-cdr-slot list-pointer-lowtag)
(inst add count (fixnumize 1))
(inst cmp ptr nil-value)
(inst jmp :e DONE)
(%test-lowtag ptr LOOP nil list-pointer-lowtag)
(error-call vop 'object-not-list-error ptr)
DONE))
which generates code to compute the length of a list. The expander
for assemble
scans its body for any atoms, and generates binding
forms for those atoms to labels:
(let ((new-labels (append labels
(set-difference visible-labels inherited-labels))))
...
`(let (,@(mapcar (lambda (name) `(,name (gen-label))) new-labels))
...))
The problem with this, from a reproducibility point of view, is that
set-difference
(and the other set-related functions:
union
,
intersection
,
set-exclusive-or
and their n-destructive variants) do not return the sets with a
specified order – which is fine when the objects are truly treated as
sets, but in this case the LOOP
and DONE
label objects ended up in
different stack locations depending on the order of their binding.
Consequently the machine code for the function emitting code for
computing a list’s length – though not the machine code emitted by
that function – would vary depending on the host’s implementation of
set-difference
.
The fix here was to sort the result of the set operations, knowing
that all the labels would be symbols and that they could be treated as
string designators.
And after all this is? We’re still not quite there: there are three to four files (out of 330 or so) which are not bitwise-identical for differing host compilers. I hope to be able to rectify this situation in time for SBCL’s 15th birthday...