Christophe Weblog Wiki Code Publications Music

There was a bit of a kerfuffle following the 1.2.2 release of SBCL, regarding the incompatible change in the internals of the backquote reader macro.

Formally, implementations can choose how to implement the backquote reader macro (and its comma-based helpers): the semantics of backquote are defined only after evaluation:

An implementation is free to interpret a backquoted form F1 as any form F2 that, when evaluated, will produce a result that is the same under equal as the result implied by the above definition, provided that the side-effect behavior of the substitute form F2 is also consistent with the description given above.

(CLHS 2.4.6; emphasis mine)

There are also two advisory notes about the representation:

Often an implementation will choose a representation that facilitates pretty printing of the expression, so that (pprint '`(a ,b)) will display `(a ,b) and not, for example, (list 'a b). However, this is not a requirement.

(CLHS; added quote in example mine), and:

Implementors who have no particular reason to make one choice or another might wish to refer to IEEE Standard for the Scheme Programming Language, which identifies a popular choice of representation for such expressions that might provide useful to be useful compatibility for some user communities.

(CLHS; the Scheme representation reads `(foo ,bar) as (quasiquote (foo (unquote bar))))

The problem the new implementation of backquote is attempting to address is the first one: pretty printing. To understand what the problem is, an example might help: imagine that we as Common Lisp programmers (i.e. not implementors, and aiming for portability) have written a macro bind which is exactly equivalent to let:

(defmacro bind (bindings &body body)
  `(let ,bindings ,@body))

and we want to implement a pretty printer for it, so that (pprint '(progn (bind ((x 2) (z 3)) (if *print-pretty* (1+ x) (1- y))))) produces

  (bind ((x 2)
         (z 3))
    (if *print-pretty*
        (1+ x)
        (1- y))))

What does that look like? Writing pretty-printers is a little bit of a black art; a first stab is something like:

(defun pprint-bind (stream object)
  (pprint-logical-block (stream object :prefix "(" :suffix ")")
    (write (pprint-pop) :stream stream)
    (write-char #\Space stream)
    (pprint-logical-block (stream (pprint-pop) :prefix "(" :suffix ")")
        (write (pprint-pop) :stream stream)
        (pprint-newline :mandatory stream)))
    (pprint-indent :block 1 stream)
    (pprint-newline :mandatory stream)
      (write (pprint-pop) :stream stream)
      (pprint-newline :mandatory stream))))
(set-pprint-dispatch '(cons (eql bind)) 'pprint-bind)

The loop noise is necessary because we're using :mandatory newlines; a different newline style, such as :linear, might have let us use a standard utility function such as pprint-linear. But otherwise, this is straightforward pretty-printing code, doing roughly the equivalent of SBCL's internal pprint-let implementation, which is:

(formatter "~:<~^~W~^ ~@_~:<~@{~:<~^~W~@{ ~_~W~}~:>~^ ~_~}~:>~1I~:@_~@{~W~^ ~_~}~:>")

A few tests at the repl should show that this works with nasty, malformed inputs (“malformed” in the sense of not respecting the semantics of bind) as well as expected ones:

(pprint '(bind))
(pprint '(bind x))
(pprint '(bind x y))
(pprint '(bind (x y) z))
(pprint '(bind ((x 1) (y 2)) z))
(pprint '(bind ((x 1) (y 2)) z w))
(pprint '(bind . 3))
(pprint '(bind x . 4))
(pprint '(bind (x . y) z))
(pprint '(bind ((x . 0) (y . 1)) z))
(pprint '(bind ((x) (y)) . z))
(pprint '(bind ((x) y) z . w))

Meanwhile, imagine a world where the backquote reader macro simply wraps (quasiquote ...) around its argument, and comma likewise wraps (unquote ...):

(set-macro-character #\` (defun read-backquote (stream char)
                           (list 'quasiquote (read stream t nil t))))
(set-macro-character #\, (defun read-comma (stream char)
                           (list 'unquote (read stream t nil t))))

Writing pretty-printer support for that is easy, right?

(defun pprint-quasiquote (stream object)
  (write-char #\` stream)
  (write (cadr object) :stream stream))
(defun pprint-unquote (stream object)
  (write-char #\, stream)
  (write (cadr object) :stream stream))
(set-pprint-dispatch '(cons (eql quasiquote) (cons t null)) 'pprint-quasiquote)
(set-pprint-dispatch '(cons (eql unquote) (cons t null)) 'pprint-unquote)

(ignoring for the moment what happens if the printed representation of object happens to start with a @ or .)

(pprint '(quasiquote (x (unquote y))))

The problem arises when we try to combine these two things. In particular, what happens when we attempt to print backquoted forms:

(pprint '`(bind ,y z))

What we would hope to see is something like

`(bind ,y

but what we actually get is

`(bind (unquote

because each of the bindings in bind is printed individually, rather than the bindings being printed as a whole. And, lest there be hopes that this can be dealt with by a slightly different way of handling the pretty printing in pprint-bind, note that it's important that (pprint '(bind (function y) z)) print as

(bind (function

and not as

(bind #'y

so the only way to handle this is to know the magical symbols involved in backquote and comma reader macros – but that is not portably possible. So, we've come to the point where the conclusion is inevitable: it is not possible for an implementation to support list-structured quasiquote and unquote reader macros and general pretty printing for user-defined operators. (This isn’t the only failure mode for the combination of unquote-as-list-structure and pretty-printing; it’s surprisingly easy to write pretty-printing functions that fail to print accurately, not just cosmetically as above but catastrophically, producing output that cannot be read back in, or reads as a structurally unequal object to the original.)

The new implementation, by Douglas Katzman, preserves the implementation of the backquote reader macro as a simple list, but comma (and related reader macros) read as internal, literal structures. Since these internal structures are atoms, not lists, they are handled specially by pprint-logical-block and friends, and so their own particular pretty-printing routines always fire. The internal quasiquote macro ends up extracting and arranging for appropriate evaluation and splicing of unquoted material, and everything ends up working.

Everything? Well, not quite: one or two programmer libraries out there implemented some utility functionality – typically variable renaming, automatic lambda generation, or similar – without performing a full macroexpansion and proper codewalk. That code was in general already broken, but it is true that in the past generating an example to demonstrate the breakage would have to violate the general expectation of what “normal” Lisp code would look like, whereas as a result of the new implementation of backquote in SBCL the symptoms of breakage were much easier to generate. Several of these places were fixed before the new implementation was activated, such as iterate's #l macro; among the things to be dealt with after the new implementation was released was the utility code from let-over-lambda (a workaround has been installed in the version distributed from github), and there is still a little bit of fallout being dealt with (e.g. a regression in the accuracy of source-location tracking). But overall, I think the new implementation of backquote has substantial advantages in maintainability and correctness, and while it’s always possible to convince maintainers that they’ve made a mistake, I hope this post explains some of why the change was made.

Meanwhile, I've released SBCL version 1.2.3 – hopefully a much less “exciting” release...