I said in my
discussion about backquote representations
that some utilities had defects made manifest by
SBCL 1.2.2’s new internal
representation for backquote and related operators, and that those
defects could have been avoided by using a code-walker. I’m going to
look at let-over-lambda
code here, to try to demonstrate what I
meant by that, and show how a proper code-walker can quite
straightforwardly be used for the code transformations that have been
implemented using a naïve walker (typically walking over a tree of
conses), removing whole classes of defects in the process.
The let-over-lambda
code I’m discussing is from
https://github.com/thephoeron/let-over-lambda,
specifically
this version.
This isn’t intended to be a hatchet job on the utility – clearly, it
is of use to its users – but to show up potential problems and offer
solutions for how to fix them. I should also state up front that I
haven’t read the Let over Lambda book,
but it’s entirely possible that discussing and using a full
code-walker would have been out of scope (as it explicitly was for
On Lisp).
Firstly, let’s deal with how the maintainer of the let-over-lambda
code is dealing with the change in backquote representations, since
it’s still topical:
;; package definition here just in case someone decides to paste
;; things into a Lisp session, and for private namespacing
(defpackage "LOL" (:use "CL"))
(in-package "LOL")
;; actual excerpts from let-over-lambda code from
;; <https://github.com/thephoeron/let-over-lambda/blob/a202167629cb421cbc2139cfce1db22a84278f9f/let-over-lambda.lisp>
;; begins here:
#+sbcl
(if (string-lessp (lisp-implementation-version) "1.2.2")
(pushnew :safe-sbcl *features*)
(setq *features* (remove :safe-sbcl *features*)))
(defun flatten (x)
(labels ((rec (x acc)
(cond ((null x) acc)
#+(and sbcl (not safe-sbcl))
((typep x 'sb-impl::comma) (rec (sb-impl::comma-expr x) acc))
((atom x) (cons x acc))
(t (rec (car x) (rec (cdr x) acc))))))
(rec x nil)))
The issues around the
(*features*
)
handling here have been reported at github; for
the purpose of this blog entry, I will just say that I wrote about
them in
Maintaining Portable Lisp Programs,
a long time ago, and that a better version might look a bit like this:
#+sbcl
(eval-when (:compile-toplevel :execute)
(defun comma-implementation ()
(typecase '`,x
(symbol 'old)
((cons symbol (cons structure-object)) 'new)))
(if (eql (comma-implementation) 'old)
(pushnew 'cons-walkable-backquote *features*)
(setq *features* (remove 'cons-walkable-backquote *features*))))
(defun flatten (x)
(labels ((rec (x acc)
(cond ((null x) acc)
#+lol::cons-walkable-backquote
((typep x 'sb-impl::comma) (rec (sb-impl::comma-expr x) acc))
((atom x) (cons x acc))
(t (rec (car x) (rec (cdr x) acc))))))
(rec x nil)))
With these changes, the code is (relatively) robustly testing for the
particular feature it needs to know about at the time that it needs to
know, and recording it in a way that doesn’t risk confusion or
contention with any other body of code. What is the let-over-lambda
library using flatten
for?
(defun g!-symbol-p (thing)
(and (symbolp thing)
(eql (mismatch (symbol-name thing) "G!") 2)))
(defmacro defmacro/g! (name args &rest body)
(let ((syms (remove-duplicates
(remove-if-not #'g!-symbol-p (flatten body)))))
`(defmacro ,name ,args
(let ,(mapcar
(lambda (s)
`(,s (gensym ,(subseq (symbol-name s) 2))))
syms)
,@body))))
The intent behind this macro-defining macro, defmacro/g!
, appears to
be automatic
gensym
generation: being able to write
(defmacro/g! with-foo ((foo) &body body)
`(let ((,g!foo (activate-foo ,foo)))
(unwind-protect
(progn ,@body)
(deactivate-foo ,g!foo))))
without any explicit calls to
gensym
but retaining the protection that gensyms give against name capture:
(macroexpand-1 '(with-foo (3) 4))
; => (let ((#1=#:FOO1 (activate-foo 3)))
; (unwind-protect
; (progn 4)
; (deactivate-foo #1#)))
That's fine; it’s reasonable to want something like this. Are there
any issues with this, apart from the one exposed by SBCL’s new
backquote implementation? In its conventional use, probably not –
essentially, all uses of g!
symbols are unquoted (i.e. behind
commas) – but there are a couple of more theoretical points. One
issue is that flatten
as it currently stands will look for all
symbols beginning with g!
in the macroexpander function source,
whether or not they are actually variable evaluations:
(defmacro/g! with-bar ((bar) &body body)
`(block g!block
(let ((,g!bar ,bar)) ,@body)))
; unused variable G!BLOCK
(macroexpand-1 '(with-bar (3) 4))
; => (block g!block (let ((#:BAR1 3)) 4))
In this example, that’s fair enough: it’s probably user error to have
those g!
symbols not be unquoted; this probably only becomes a real
problem if there are macro-defining macros, with both the definer and
the definition using g!
symbols. It's not totally straightforward
to demonstrate other problems with this simple approach to Lisp code
transformation using just this macro; the transformation is
sufficiently minimal, and the symptoms of problems relatively
innocuous, that existing programming conventions are strong enough to
prevent anything seriously untoward going wrong.
Before getting on to another example where the problems with this
approach become more apparent, how could this transformation be done
properly? By “properly” here I mean that the defmacro/g!
should
arrange to bind gensyms only for those g!
symbols which are to be
evaluated by the macroexpander, and not for those which are used for
any other purpose. This is a task for a code-walker: a piece of code
which exploits the fact that Lisp code is made up of Lisp data
structures, all of which are introspectable, and the semantics of
which in terms of effect on environment and execution are known. It
is tedious, though possible, to write a mostly-portable code-walker
(there needs to be some hook into the implementation’s representation
of environments); I’m not going to do that here, but instead will use
SBCL’s built-in code-walker.
The sb-walker:walk-form
function takes three arguments: a form to
walk, an initial environment to walk it in, and a walker function to
perform whatever action is necessary on the walk. That walker
function itself takes three arguments, a form, context and
environment, and the walker arranges for it to be called on every
macroexpanded or evaluated subform in the original form. The walker
function should return a replacement form for the subform it is given
(or the subform itself if it doesn’t want to take any action), and a
secondary value of
t
if no
further walking of that form should take place.
To do g!
symbol detection and binding is fairly straightforward. If
a symbol is in a context for evaluation, we collect it, and here we
can take the first benefit from a proper code walk: we only collect
g!
symbols if the code-walker deems that they will be evaluated
and there isn't an already-existing lexical binding for it:
(defmacro defmacro/g!-walked (name args &body body)
(let* (g!symbols)
(flet ((g!-walker (subform context env)
(declare (ignore context))
(typecase subform
(symbol
(when (and (g!-symbol-p subform)
(not (sb-walker:var-lexical-p subform env)))
(pushnew subform g!symbols))
subform)
(t subform))))
(sb-walker:walk-form `(progn ,@body) nil #'g!-walker)
`(defmacro ,name ,args
(let ,(mapcar (lambda (s) (list s `(gensym ,(subseq (symbol-name s) 2))))
g!symbols)
,@body)))))
The fact that we only collect symbols which will be evaluated deals
with the problem exhibited by with-bar
, above:
(defmacro/g!-walked with-bar/walked ((bar) &body body)
`(block g!block
(let ((,g!bar ,bar)) ,@body)))
(macroexpand-1 '(with-bar/walked (3) 4))
; => (block g!block (let ((#:BAR1 3)) 4))
Only gathering symbols which don’t have lexical bindings (testing
sb-walker:var-lexical-p
) deals with another minor problem:
(defmacro/g!-walked with-baz ((baz) &body body)
(let ((g!sym 'sym))
`(let ((,g!sym ,baz)) ,@body)))
(macroexpand-1 '(with-baz (3) 4))
; => (let ((sym 3)) 4)
(the cons-walker – flatten
– would not be able to detect that there
is already a binding for g!sym
, and would introduce another one,
again leading to an unused variable warning.)
OK, time to recap. So far, we’ve corrected the code that tests for
particular backquote implementations, which was used in flatten
,
which itself was used to perform a code-walk; we’ve also seen some
low-impact or theoretical problems with that simple code-walking
technique, and have used a proper code-walker instead of flatten
to
deal with those problems. If the odd extra unused variable binding
were the worst thing that could happen, there wouldn’t be much benefit
from using a code-walker (other than the assurance that the walker is
dealing with forms for execution); however, let us now turn our
attention to the other macro in let-over-lambda
’s code which does
significant codewalking:
(defun dollar-symbol-p (thing)
(and (symbolp thing)
(char= (char (symbol-name thing) 0) #\$)
(ignore-errors (parse-integer (subseq (symbol-name thing) 1)))))
(defun prune-if-match-bodies-from-sub-lexical-scope (tree)
(if (consp tree)
(if (or (eq (car tree) 'if-match)
(eq (car tree) 'when-match))
(cddr tree)
(cons (prune-if-match-bodies-from-sub-lexical-scope (car tree))
(prune-if-match-bodies-from-sub-lexical-scope (cdr tree))))
tree))
;; WARNING: Not %100 correct. Removes forms like (... if-match ...) from the
;; sub-lexical scope even though this isn't an invocation of the macro.
#+cl-ppcre
(defmacro! if-match ((test str) conseq &optional altern)
(let ((dollars (remove-duplicates
(remove-if-not #'dollar-symbol-p
(flatten (prune-if-match-bodies-from-sub-lexical-scope conseq))))))
(let ((top (or (car (sort (mapcar #'dollar-symbol-p dollars) #'>)) 0)))
`(let ((,g!str ,str))
(multiple-value-bind (,g!s ,g!e ,g!ms ,g!me) (,test ,g!str)
(declare (ignorable ,g!e ,g!me))
(if ,g!s
(if (< (length ,g!ms) ,top)
(error "ifmatch: too few matches")
;; lightly edited here to remove irrelevant use of #`
(let ,(mapcar (lambda (a1) `(,(symb "$" a1)
(subseq ,g!str (aref ,g!ms ,(1- a1))
(aref ,g!me ,(1- a1)))))
(loop for i from 1 to top collect i))
,conseq))
,altern))))))
(defmacro when-match ((test str) conseq &rest more-conseq)
`(if-match (,test ,str)
(progn ,conseq ,@more-conseq)))
What’s going on here? We have a
prune-if-match-bodies-from-sub-lexical-scope
function which, again,
performs some kind of cons-based tree walk, removing some conses whose
car is if-match
or when-match
. We have a trivial macro
when-match
which transforms into an if-match
; the if-match
macro
is more involved. Any symbols named as a $
sign followed by an
integer (in base 10) are treated specially; the intent is that they
will be bound to capture groups of the cl-ppcre match. So it would be
used in something like something like
(defun key-value (line)
(if-match ((lambda (s) (scan "^\\(.*\\): \\(.*\\)$" s)) line)
(list $1 $2)
(error "not actually a key-value line: ~S" line)))
and that would macroexpand to, roughly,
(defun key-value (line)
(multiple-value-bind (s e ms me)
((lambda (s) (scan "^\\(.*\\): \\(.*\\)$" s)) line)
(if s
(if (< (length ms) 2)
(error "if-match: not enough matches)
(let (($1 (subseq line (aref ms 0) (aref me 0)))
($2 (subseq line (aref ms 1) (aref me 1))))
(list $1 $2)))
(error "not actually a key-value line: ~S" line))))
(there's additional reader macrology in let-over-lambda
to make that
lambda
form unnecessary, but we can ignore that for our purposes).
Now, if-match
has a similar problem that defmacro/g!
had: since
the tree walker doesn’t make a distinction between symbols present for
evaluation and symbols for any other purpose, it is possible to
confuse the walker. For example:
(if-match (scanner string)
(if (> (length $1) 6)
'|$1000000|
'less-than-$1000000))
This form, if macroexpanded, will attempt to bind one million variables to matched groups; even if the compiler doesn’t choke on that, evaluation will go wrong, as the matcher is unlikely to match one million groups (so the “not enough matches” error branch will be taken) – whereas of course the quoted one million dollar symbol is not intended for evaluation.
But the nesting problems are more obvious in this case than for
defmacro/g!
. Firstly, take the simple case:
(if-match (scanner string)
(list $1
(if-match (scanner2 string)
$2
nil))
nil)
Here, the $2
is in the scope of the inner if-match
, and so mustn’t
be included for the macroexpansion of the outer if-match
. This case
is handled in let-over-lambda
’s implementation by the
prune-if-match-bodies-from-sub-lexical-scope
: the consequent of the
inner if-match
is pruned from the dollar-symbol accumulator.
However, there are several issues with this; the first is that the
test is pruned:
(if-match (scanner string)
(if-match (scanner2 $2)
$1
nil)
nil)
In this example, the $2
is ‘invisible’ to the outer if-match
, and
so won’t get a binding. That’s straightforwardly fixable, along with
the mishandling of when-let
’s syntax (the entire body of when-let
should be pruned, not just the first form), and what I think is an
error in the pruning of if-match
(it should recurse on the cdddr
,
not the cddr
;
github issue).
Not fixable at all while still using naïve code-walking are two other
problems, one of which is noted in the comment present in the
let-over-lambda
code: the pruner doesn’t distinguish between
if-match
forms for evaluation and other conses whose car is
if-match
. Triggering this problem does involve some contortions –
in order for it to matter, we need an if-match
not for evaluation
followed by a dollar symbol which is to be evaluated; but, for
example:
(defmacro list$/q (&rest args)
`(list ,@(mapcar (lambda (x) (if (dollar-symbol-p x) x `',x)) args)))
(if-match (scanner string)
(list$/q foo if-match $2)
nil)
Here, although the $2
is in a position for evaluation (after
macroexpansion), it will have no binding because it will have been
pruned when naïvely walking the outer if-match
macro. The
if-match
symbol argument to `list$/q ends up quoted, and should not
be treated as a macro call.
Also, the pruner function must have special knowledge not just about
the semantics of if-match
, but also of any macro which can expand
to if-match
– see the attempt to handle when-match
in the pruner.
If a user were to have the temerity to define case-match
(defmacro case-match (string &rest clauses)
(if (null clauses)
nil
`(if-match (,(caar clauses) ,string)
(progn ,@(cdar clauses))
(case-match string ,@(cdr clauses)))))
any attempt to nest a case-match
inside an outer if-match
is
liable to fail, as the pruner has no knowledge of how to handle the
case-match
form.
All of these problems are solvable by using a proper code-walker. The
code-walker should collect up all dollar symbols to be evaluated in
the consequent of an if-match
form, so that bindings for them can be
generated, except for those with already existing lexical bindings
within the if-match
(not those from outside, otherwise nesting
won’t work). For testing purposes, we’ll also signal a diagnostic
condition within the macroexpander to indicate which dollar symbols
we’ve found.
(define-condition if-match/walked-diagnostic (condition)
((symbols :initarg :symbols :reader if-match-symbols)))
(defmacro if-match/walked ((test string) consequent &optional alternative)
(let* (dollar-symbols)
(flet ((dollar-walker (subform context env)
(declare (ignore context))
(typecase subform
(symbol
(when (and (dollar-symbol-p subform)
(not (sb-walker:var-lexical-p subform env)))
(pushnew subform dollar-symbols))
subform)
(t subform))))
(handler-bind ((if-match/walked-diagnostic #'continue))
(sb-walker:walk-form consequent nil #'dollar-walker))
(let* ((dollar-symbols (sort dollar-symbols #'> :key #'dollar-symbol-p))
(top (dollar-symbol-p (car dollar-symbols))))
(with-simple-restart (continue "Ignore diagnostic condition")
(signal 'if-match/walked-diagnostic :symbols dollar-symbols))
(sb-int:with-unique-names (start end match-start match-end)
(sb-int:once-only ((string string))
`(multiple-value-bind (,start ,end ,match-start ,match-end)
(,test ,string)
(declare (ignore ,end) (ignorable ,match-end))
(if ,start
(if (< (length ,match-start) ,top)
(error "~S: too few matches: needed ~D, got ~D." 'if-match
,top (length ,match-start))
(let ,(mapcar (lambda (s)
(let ((i (1- (dollar-symbol-p s))))
`(,s (subseq ,string (aref ,match-start ,i) (aref ,match-end ,i)))))
(reverse dollar-symbols))
,consequent))
,alternative))))))))
(I'm using sb-int:once-only
and sb-int:with-unique-names
to avoid
having to include their definitions in this post, which is getting a
bit lengthy). Testing this looks like
(defmacro test-if-match (form expected-symbols)
`(handler-case (macroexpand-1 ',form)
(if-match/walked-diagnostic (c)
(assert (equal (if-match-symbols c) ',expected-symbols)))
(:no-error (&rest values) (declare (ignore values)) (error "no diagnostic"))))
(test-if-match (if-match/walked (test string) (list $1 $2) 'foo) ($2 $1))
(test-if-match (if-match/walked (test string) (if (> (length $1) 6) '$10 '$8) nil) ($1))
(test-if-match (if-match/walked (scanner string)
(list $1
(if-match/walked (scanner2 string)
$2
nil))
nil)
($1))
(test-if-match (if-match/walked (scanner string) (list$/q foo if-match/walked $3) nil) ($3))
(defmacro case-match/walked (string &rest clauses)
(if (null clauses)
nil
`(if-match/walked (,(caar clauses) ,string)
(progn ,@(cdar clauses))
(case-match/walked string ,@(cdr clauses)))))
(test-if-match (if-match/walked (scanner string)
(case-match/walked $1
(foo $2)
(bar $3)))
($1))
To summarize: I’ve shown here how to make use of a full code-walker to
make a couple of code transforming macros more robust. Full
code-walkers can do more than just what I've shown here: the
sb-walker:walk-form
interface can also inhibit macroexpansion,
transform function calls into calls to other functions, while
respecting the semantics of the Lisp operators in the code that is
being walked and allowing some introspection of the lexical
environment. Here, we have called sb-walker:walk-form
for side
effects from the walker function we’ve provided; it is also possible
to use its value (that’s how sb-cltl2:macroexpand-all
is
implemented, for example). I hope that this can help users affected
by the change in internal representation of backquote, as well as
others who want to write advanced code-transforming macros. If the
thought of using an SBCL-internal code-walker makes you a bit queasy
(as well it might), you could instead start by looking at one or two
other more explicitly-portable code-walkers out there, for example
John Fremlin’s
macroexpand-dammit, the
walker in Alex Plotnick's
CLWEB literate
programming system
(github link), or the code
walker in iterate.