naive vs proper code-walking

← pgcert essays | blog | backquote and pretty printing →

I said in my discussion about backquote representations that some utilities had defects made manifest by SBCL 1.2.2’s new internal representation for backquote and related operators, and that those defects could have been avoided by using a code-walker. I’m going to look at let-over-lambda code here, to try to demonstrate what I meant by that, and show how a proper code-walker can quite straightforwardly be used for the code transformations that have been implemented using a naïve walker (typically walking over a tree of conses), removing whole classes of defects in the process.

The let-over-lambda code I’m discussing is from https://github.com/thephoeron/let-over-lambda, specifically this version. This isn’t intended to be a hatchet job on the utility – clearly, it is of use to its users – but to show up potential problems and offer solutions for how to fix them. I should also state up front that I haven’t read the Let over Lambda book, but it’s entirely possible that discussing and using a full code-walker would have been out of scope (as it explicitly was for On Lisp).

Firstly, let’s deal with how the maintainer of the let-over-lambda code is dealing with the change in backquote representations, since it’s still topical:

;; package definition here just in case someone decides to paste
;; things into a Lisp session, and for private namespacing
(defpackage "LOL" (:use "CL"))
(in-package "LOL")
;; actual excerpts from let-over-lambda code from
;; <https://github.com/thephoeron/let-over-lambda/blob/a202167629cb421cbc2139cfce1db22a84278f9f/let-over-lambda.lisp>
;; begins here:
#+sbcl
(if (string-lessp (lisp-implementation-version) "1.2.2")
    (pushnew :safe-sbcl *features*)
    (setq *features* (remove :safe-sbcl *features*)))
(defun flatten (x)
  (labels ((rec (x acc)
             (cond ((null x) acc)
                   #+(and sbcl (not safe-sbcl))
                   ((typep x 'sb-impl::comma) (rec (sb-impl::comma-expr x) acc))
                   ((atom x) (cons x acc))
                   (t (rec (car x) (rec (cdr x) acc))))))
    (rec x nil)))

The issues around the (*features*) handling here have been reported at github; for the purpose of this blog entry, I will just say that I wrote about them in Maintaining Portable Lisp Programs, a long time ago, and that a better version might look a bit like this:

#+sbcl
(eval-when (:compile-toplevel :execute)
  (defun comma-implementation ()
    (typecase '`,x
      (symbol 'old)
      ((cons symbol (cons structure-object)) 'new)))
  (if (eql (comma-implementation) 'old)
      (pushnew 'cons-walkable-backquote *features*)
      (setq *features* (remove 'cons-walkable-backquote *features*))))
(defun flatten (x)
  (labels ((rec (x acc)
             (cond ((null x) acc)
                   #+lol::cons-walkable-backquote
                   ((typep x 'sb-impl::comma) (rec (sb-impl::comma-expr x) acc))
                   ((atom x) (cons x acc))
                   (t (rec (car x) (rec (cdr x) acc))))))
    (rec x nil)))

With these changes, the code is (relatively) robustly testing for the particular feature it needs to know about at the time that it needs to know, and recording it in a way that doesn’t risk confusion or contention with any other body of code. What is the let-over-lambda library using flatten for?

(defun g!-symbol-p (thing)
  (and (symbolp thing)
       (eql (mismatch (symbol-name thing) "G!") 2)))
(defmacro defmacro/g! (name args &rest body)
  (let ((syms (remove-duplicates
               (remove-if-not #'g!-symbol-p (flatten body)))))
    `(defmacro ,name ,args
       (let ,(mapcar
              (lambda (s)
                `(,s (gensym ,(subseq (symbol-name s) 2))))
              syms)
         ,@body))))

The intent behind this macro-defining macro, defmacro/g!, appears to be automatic gensym generation: being able to write

(defmacro/g! with-foo ((foo) &body body)
  `(let ((,g!foo (activate-foo ,foo)))
     (unwind-protect
         (progn ,@body)
       (deactivate-foo ,g!foo))))

without any explicit calls to gensym but retaining the protection that gensyms give against name capture:

(macroexpand-1 '(with-foo (3) 4))
; => (let ((#1=#:FOO1 (activate-foo 3)))
;      (unwind-protect
;          (progn 4)
;        (deactivate-foo #1#)))

That's fine; it’s reasonable to want something like this. Are there any issues with this, apart from the one exposed by SBCL’s new backquote implementation? In its conventional use, probably not – essentially, all uses of g! symbols are unquoted (i.e. behind commas) – but there are a couple of more theoretical points. One issue is that flatten as it currently stands will look for all symbols beginning with g! in the macroexpander function source, whether or not they are actually variable evaluations:

(defmacro/g! with-bar ((bar) &body body)
  `(block g!block
     (let ((,g!bar ,bar)) ,@body)))
; unused variable G!BLOCK
(macroexpand-1 '(with-bar (3) 4))
; => (block g!block (let ((#:BAR1 3)) 4))

In this example, that’s fair enough: it’s probably user error to have those g! symbols not be unquoted; this probably only becomes a real problem if there are macro-defining macros, with both the definer and the definition using g! symbols. It's not totally straightforward to demonstrate other problems with this simple approach to Lisp code transformation using just this macro; the transformation is sufficiently minimal, and the symptoms of problems relatively innocuous, that existing programming conventions are strong enough to prevent anything seriously untoward going wrong.

Before getting on to another example where the problems with this approach become more apparent, how could this transformation be done properly? By “properly” here I mean that the defmacro/g! should arrange to bind gensyms only for those g! symbols which are to be evaluated by the macroexpander, and not for those which are used for any other purpose. This is a task for a code-walker: a piece of code which exploits the fact that Lisp code is made up of Lisp data structures, all of which are introspectable, and the semantics of which in terms of effect on environment and execution are known. It is tedious, though possible, to write a mostly-portable code-walker (there needs to be some hook into the implementation’s representation of environments); I’m not going to do that here, but instead will use SBCL’s built-in code-walker.

The sb-walker:walk-form function takes three arguments: a form to walk, an initial environment to walk it in, and a walker function to perform whatever action is necessary on the walk. That walker function itself takes three arguments, a form, context and environment, and the walker arranges for it to be called on every macroexpanded or evaluated subform in the original form. The walker function should return a replacement form for the subform it is given (or the subform itself if it doesn’t want to take any action), and a secondary value of t if no further walking of that form should take place.

To do g! symbol detection and binding is fairly straightforward. If a symbol is in a context for evaluation, we collect it, and here we can take the first benefit from a proper code walk: we only collect g! symbols if the code-walker deems that they will be evaluated and there isn't an already-existing lexical binding for it:

(defmacro defmacro/g!-walked (name args &body body)
  (let* (g!symbols)
    (flet ((g!-walker (subform context env)
             (declare (ignore context))
             (typecase subform
               (symbol
                (when (and (g!-symbol-p subform)
                           (not (sb-walker:var-lexical-p subform env)))
                  (pushnew subform g!symbols))
                subform)
               (t subform))))
      (sb-walker:walk-form `(progn ,@body) nil #'g!-walker)
      `(defmacro ,name ,args
         (let ,(mapcar (lambda (s) (list s `(gensym ,(subseq (symbol-name s) 2))))
                       g!symbols)
           ,@body)))))

The fact that we only collect symbols which will be evaluated deals with the problem exhibited by with-bar, above:

(defmacro/g!-walked with-bar/walked ((bar) &body body)
  `(block g!block
     (let ((,g!bar ,bar)) ,@body)))
(macroexpand-1 '(with-bar/walked (3) 4))
; => (block g!block (let ((#:BAR1 3)) 4))

Only gathering symbols which don’t have lexical bindings (testing sb-walker:var-lexical-p) deals with another minor problem:

(defmacro/g!-walked with-baz ((baz) &body body)
  (let ((g!sym 'sym))
    `(let ((,g!sym ,baz)) ,@body)))
(macroexpand-1 '(with-baz (3) 4))
; => (let ((sym 3)) 4)

(the cons-walker – flatten – would not be able to detect that there is already a binding for g!sym, and would introduce another one, again leading to an unused variable warning.)

OK, time to recap. So far, we’ve corrected the code that tests for particular backquote implementations, which was used in flatten, which itself was used to perform a code-walk; we’ve also seen some low-impact or theoretical problems with that simple code-walking technique, and have used a proper code-walker instead of flatten to deal with those problems. If the odd extra unused variable binding were the worst thing that could happen, there wouldn’t be much benefit from using a code-walker (other than the assurance that the walker is dealing with forms for execution); however, let us now turn our attention to the other macro in let-over-lambda’s code which does significant codewalking:

(defun dollar-symbol-p (thing)
  (and (symbolp thing)
       (char= (char (symbol-name thing) 0) #\$)
       (ignore-errors (parse-integer (subseq (symbol-name thing) 1)))))
(defun prune-if-match-bodies-from-sub-lexical-scope (tree)
  (if (consp tree)
      (if (or (eq (car tree) 'if-match)
              (eq (car tree) 'when-match))
          (cddr tree)
          (cons (prune-if-match-bodies-from-sub-lexical-scope (car tree))
                (prune-if-match-bodies-from-sub-lexical-scope (cdr tree))))
      tree))
;; WARNING: Not %100 correct. Removes forms like (... if-match ...) from the
;; sub-lexical scope even though this isn't an invocation of the macro.
#+cl-ppcre
(defmacro! if-match ((test str) conseq &optional altern)
  (let ((dollars (remove-duplicates
                  (remove-if-not #'dollar-symbol-p
                                 (flatten (prune-if-match-bodies-from-sub-lexical-scope conseq))))))
    (let ((top (or (car (sort (mapcar #'dollar-symbol-p dollars) #'>)) 0)))
      `(let ((,g!str ,str))
         (multiple-value-bind (,g!s ,g!e ,g!ms ,g!me) (,test ,g!str)
           (declare (ignorable ,g!e ,g!me))
           (if ,g!s
               (if (< (length ,g!ms) ,top)
                   (error "ifmatch: too few matches")
                   ;; lightly edited here to remove irrelevant use of #`
                   (let ,(mapcar (lambda (a1) `(,(symb "$" a1)
                                                (subseq ,g!str (aref ,g!ms ,(1- a1))
                                                               (aref ,g!me ,(1- a1)))))
                                 (loop for i from 1 to top collect i))
                     ,conseq))
               ,altern))))))
(defmacro when-match ((test str) conseq &rest more-conseq)
  `(if-match (,test ,str)
     (progn ,conseq ,@more-conseq)))

What’s going on here? We have a prune-if-match-bodies-from-sub-lexical-scope function which, again, performs some kind of cons-based tree walk, removing some conses whose car is if-match or when-match. We have a trivial macro when-match which transforms into an if-match; the if-match macro is more involved. Any symbols named as a $ sign followed by an integer (in base 10) are treated specially; the intent is that they will be bound to capture groups of the cl-ppcre match. So it would be used in something like something like

(defun key-value (line)
  (if-match ((lambda (s) (scan "^\\(.*\\): \\(.*\\)$" s)) line)
      (list $1 $2)
      (error "not actually a key-value line: ~S" line)))

and that would macroexpand to, roughly,

(defun key-value (line)
  (multiple-value-bind (s e ms me)
      ((lambda (s) (scan "^\\(.*\\): \\(.*\\)$" s)) line)
    (if s
        (if (< (length ms) 2)
            (error "if-match: not enough matches)
            (let (($1 (subseq line (aref ms 0) (aref me 0)))
                  ($2 (subseq line (aref ms 1) (aref me 1))))
              (list $1 $2)))
        (error "not actually a key-value line: ~S" line))))

(there's additional reader macrology in let-over-lambda to make that lambda form unnecessary, but we can ignore that for our purposes).

Now, if-match has a similar problem that defmacro/g! had: since the tree walker doesn’t make a distinction between symbols present for evaluation and symbols for any other purpose, it is possible to confuse the walker. For example:

(if-match (scanner string)
    (if (> (length $1) 6)
        '|$1000000|
        'less-than-$1000000))

This form, if macroexpanded, will attempt to bind one million variables to matched groups; even if the compiler doesn’t choke on that, evaluation will go wrong, as the matcher is unlikely to match one million groups (so the “not enough matches” error branch will be taken) – whereas of course the quoted one million dollar symbol is not intended for evaluation.

But the nesting problems are more obvious in this case than for defmacro/g!. Firstly, take the simple case:

(if-match (scanner string)
    (list $1
          (if-match (scanner2 string)
              $2
              nil))
    nil)

Here, the $2 is in the scope of the inner if-match, and so mustn’t be included for the macroexpansion of the outer if-match. This case is handled in let-over-lambda’s implementation by the prune-if-match-bodies-from-sub-lexical-scope: the consequent of the inner if-match is pruned from the dollar-symbol accumulator. However, there are several issues with this; the first is that the test is pruned:

(if-match (scanner string)
    (if-match (scanner2 $2)
        $1
        nil)
    nil)

In this example, the $2 is ‘invisible’ to the outer if-match, and so won’t get a binding. That’s straightforwardly fixable, along with the mishandling of when-let’s syntax (the entire body of when-let should be pruned, not just the first form), and what I think is an error in the pruning of if-match (it should recurse on the cdddr, not the cddr; github issue).

Not fixable at all while still using naïve code-walking are two other problems, one of which is noted in the comment present in the let-over-lambda code: the pruner doesn’t distinguish between if-match forms for evaluation and other conses whose car is if-match. Triggering this problem does involve some contortions – in order for it to matter, we need an if-match not for evaluation followed by a dollar symbol which is to be evaluated; but, for example:

(defmacro list$/q (&rest args)
  `(list ,@(mapcar (lambda (x) (if (dollar-symbol-p x) x `',x)) args)))
(if-match (scanner string)
    (list$/q foo if-match $2)
    nil)

Here, although the $2 is in a position for evaluation (after macroexpansion), it will have no binding because it will have been pruned when naïvely walking the outer if-match macro. The if-match symbol argument to `list$/q ends up quoted, and should not be treated as a macro call.

Also, the pruner function must have special knowledge not just about the semantics of if-match, but also of any macro which can expand to if-match – see the attempt to handle when-match in the pruner. If a user were to have the temerity to define case-match

(defmacro case-match (string &rest clauses)
  (if (null clauses)
      nil
      `(if-match (,(caar clauses) ,string)
           (progn ,@(cdar clauses))
           (case-match string ,@(cdr clauses)))))

any attempt to nest a case-match inside an outer if-match is liable to fail, as the pruner has no knowledge of how to handle the case-match form.

All of these problems are solvable by using a proper code-walker. The code-walker should collect up all dollar symbols to be evaluated in the consequent of an if-match form, so that bindings for them can be generated, except for those with already existing lexical bindings within the if-match (not those from outside, otherwise nesting won’t work). For testing purposes, we’ll also signal a diagnostic condition within the macroexpander to indicate which dollar symbols we’ve found.

(define-condition if-match/walked-diagnostic (condition)
  ((symbols :initarg :symbols :reader if-match-symbols)))
(defmacro if-match/walked ((test string) consequent &optional alternative)
  (let* (dollar-symbols)
    (flet ((dollar-walker (subform context env)
             (declare (ignore context))
             (typecase subform
               (symbol
                (when (and (dollar-symbol-p subform)
                           (not (sb-walker:var-lexical-p subform env)))
                  (pushnew subform dollar-symbols))
                subform)
               (t subform))))
      (handler-bind ((if-match/walked-diagnostic #'continue))
        (sb-walker:walk-form consequent nil #'dollar-walker))
      (let* ((dollar-symbols (sort dollar-symbols #'> :key #'dollar-symbol-p))
             (top (dollar-symbol-p (car dollar-symbols))))
        (with-simple-restart (continue "Ignore diagnostic condition")
          (signal 'if-match/walked-diagnostic :symbols dollar-symbols))
        (sb-int:with-unique-names (start end match-start match-end)
          (sb-int:once-only ((string string))
            `(multiple-value-bind (,start ,end ,match-start ,match-end)
                 (,test ,string)
               (declare (ignore ,end) (ignorable ,match-end))
               (if ,start
                   (if (< (length ,match-start) ,top)
                       (error "~S: too few matches: needed ~D, got ~D." 'if-match
                              ,top (length ,match-start))
                       (let ,(mapcar (lambda (s)
                                       (let ((i (1- (dollar-symbol-p s))))
                                         `(,s (subseq ,string (aref ,match-start ,i) (aref ,match-end ,i)))))
                                     (reverse dollar-symbols))
                         ,consequent))
                   ,alternative))))))))

(I'm using sb-int:once-only and sb-int:with-unique-names to avoid having to include their definitions in this post, which is getting a bit lengthy). Testing this looks like

(defmacro test-if-match (form expected-symbols)
  `(handler-case (macroexpand-1 ',form)
     (if-match/walked-diagnostic (c)
       (assert (equal (if-match-symbols c) ',expected-symbols)))
     (:no-error (&rest values) (declare (ignore values)) (error "no diagnostic"))))
(test-if-match (if-match/walked (test string) (list $1 $2) 'foo) ($2 $1))
(test-if-match (if-match/walked (test string) (if (> (length $1) 6) '$10 '$8) nil) ($1))
(test-if-match (if-match/walked (scanner string)
                   (list $1
                         (if-match/walked (scanner2 string)
                             $2
                             nil))
                   nil)
               ($1))
(test-if-match (if-match/walked (scanner string) (list$/q foo if-match/walked $3) nil) ($3))
(defmacro case-match/walked (string &rest clauses)
  (if (null clauses)
      nil
      `(if-match/walked (,(caar clauses) ,string)
           (progn ,@(cdar clauses))
           (case-match/walked string ,@(cdr clauses)))))
(test-if-match (if-match/walked (scanner string)
                   (case-match/walked $1
                     (foo $2)
                     (bar $3)))
               ($1))

To summarize: I’ve shown here how to make use of a full code-walker to make a couple of code transforming macros more robust. Full code-walkers can do more than just what I've shown here: the sb-walker:walk-form interface can also inhibit macroexpansion, transform function calls into calls to other functions, while respecting the semantics of the Lisp operators in the code that is being walked and allowing some introspection of the lexical environment. Here, we have called sb-walker:walk-form for side effects from the walker function we’ve provided; it is also possible to use its value (that’s how sb-cltl2:macroexpand-all is implemented, for example). I hope that this can help users affected by the change in internal representation of backquote, as well as others who want to write advanced code-transforming macros. If the thought of using an SBCL-internal code-walker makes you a bit queasy (as well it might), you could instead start by looking at one or two other more explicitly-portable code-walkers out there, for example John Fremlin’s macroexpand-dammit, the walker in Alex Plotnick's CLWEB literate programming system (github link), or the code walker in iterate.