Since it seems still topical to talk about Lisp and
code-transformation macros, here’s another worked example – this time
inspired by the enthusiasm for the R
magrittr
package.
The basic idea behind the magrittr
package is, as
Hadley said at EARL2014, to
convert from a form of code where arguments to the same function are
far apart to one where they’re essentially close together; the example
he presented was converting
arrange(
summarise
group_by(
filter(babynames, name == "Hadley"),
year),
total = sum(n)
desc(year))
to
b0 <- babynames
b1 <- filter(b0, name == "Hadley")
b2 <- group_by(b1, year)
b3 <- summarise(b2, total = sum(n))
b4 <- arrange(b3, desc(year))
only without the danger of mistyping one of the variable names along the way and failing to perform the computation that was intended.
R, as I have said before, is a Lisp-1 with weird syntax and wacky evaluation semantics. One of the things that ordinary user code can do is inspect the syntactic form of its arguments, before evaluating them. This means that when looking at a fragment of code such as
foo(bar(2,3), 4)
where a call-by-value language would first evaluate bar(2,3)
, then
call foo
with two arguments (the value resulting from the
evaluation, and 4), R instead uses a form of call-by-need evaluation,
and also provides operators for inspecting the promise directly.
This means R users can do such horrible things as
foo <- function(x) {
tmp <- substitute(x)
sgn <- 1
while(class(tmp) == "(") {
tmp <- tmp[[2]]
sgn <- sgn * -1
}
sgn * eval.parent(tmp)
}
foo(3) # 3
foo((3)) # -3
foo(((3))) # 3
foo((((3)))) # -3 (isn’t this awesome? I did say “wacky”)
In the case of magrittr
, the package authors have taken advantage of
this to invent some new syntax; the pipe operator %>%
is charged
with inserting its first argument (its left-hand side, in normal
operation) as the first argument to the call of its second argument
(right-hand side). Hadley’s example is
babynames %>%
filter(name == "Hadley") %>%
group_by(year) %>%
summarise(total = sum(n)) %>%
arrange(desc(year))
and this is effective because the data flow in this case really is a
pipeline: there's a dataset, which needs filtering, then grouping,
then summarization, then sorting, and each operation works on the
result of the previous. This already needs to inspect the syntactic
form of the argument; an additional feature is recognizing the
presence of .
s in the call, and placing the left-hand side value in
that argument position instead of as the first argument if it is
present.
In Common Lisp, there are some piping or chaining operators out there
(e.g. one
two
three (search for
ablock
)
four
and probably many others), and they do well enough. However! They
mostly suffer from similar problems that we’ve seen before: doing code
transformations with not quite enough understanding of the semantics
of the code that they’re transforming; again, that’s fine for normal
use, but for didactic purposes let’s pretend that we really care
about this.
The ->
macro from http://stackoverflow.com/a/11080068 is basically
the same as the magrittr
%>%
operator: it converts symbols in the
pipeline to function calls, and places the result of the previous
evaluation as the first argument of the current operator, except if a
$
is present in the arguments, in which case it replaces that.
(This version doesn’t support more than one $
in the argument list;
it would be a little bit of a pain to support that, needing a
temporary name, but it’s straightforward in principle).
Since the ->
macro does its job, a code-walker implementation isn’t
strictly necessary: pure syntactic manipulation is good enough, and if
it’s used with just the code it expects, it will do it well. It is of
course possible to express what it does using a code-walker; we’ll fix
the multiple-$ ‘bug’ along the way, by explicitly introducing bindings
rather than replacements of symbols:
(defmacro -> (form &body body)
(labels ((find-$ (form env)
(sb-walker:walk-form form env
(lambda (f c e)
(cond
((eql f '$) (return-from find-$ t))
((eql f form) f)
(t (values f t)))))
nil)
(walker (form context env)
(cond
((symbolp form) (list form))
((atom form) form)
(t (if (find-$ form env)
(values `(setq $ ,form) t)
(values `(setq $ ,(list* (car form) '$ (cdr form))) t))))))
`(let (($ ,form))
,@(mapcar (lambda (f) (sb-walker:walk-form f nil #'walker)) body))))
How to understand this implementation? Well, clearly, we need to
understand what sb-walker:walk
does. Broadly, it calls the walker
function (its third argument) on successive evaluated subforms of the
original form (and on variable names set by
setq
);
the primary return value is used as the interim result of the walk,
subject to further walking (macroexpansion and walking of its
subforms) except if the second return value from the walker function
is t.
Now, let’s start with the find-$
local function: its job is to walk
a form, and returns t
if it finds a $
variable to be evaluated at
toplevel and nil
otherwise. It does that by returning t
if the
form it’s given is $
; otherwise, if the form it’s given is the
original form, we need to walk its subforms, so return f
; otherwise,
return its form argument f
with a secondary value of t
to inhibit
further walking. This operation is slightly at odds with the use of a
code walker: we are explicitly not taking advantage of the fact that
it understands the semantics of the code it’s walking. This might
explain why the find-$
function itself looks a bit weird.
The walker
local function is responsible for most of the code
transformation. It binds $
to the value of the first form, then
repeatedly sets $
to the value of successive forms, rewritten to
interpolate a $
in the first argument position if there isn’t one in
the form already (as reported by find-$
). If any of the forms is a
symbol, it gets listified and subsequently re-walked. Thus
(macroexpand-1 '(-> "THREE" string-downcase (char 0)))
; => (LET (($ "THREE"))
; (SETQ $ (STRING-DOWNCASE $))
; (SETQ $ (CHAR $ 0))),
; T
So far, so good. Now, what could we do with a code-walker that we
can’t without? Well, the above implementation of ->
supports
chaining simple function calls, so one answer is “chaining things that
aren’t just function calls”. Another refinement is to support eliding
the insertion of $
when there are any uses of $
in the form, not
just as a bare argument. Looking at the second one first, since it’s
less controversial:
(defmacro -> (form &body body)
(labels ((find-$ (form env)
(sb-walker:walk-form form env
(lambda (f c e)
(cond
((and (eql f '$) (eql c :eval))
(return-from find-$ t))
(t f))))
nil)
(walker (form context env)
(cond
((symbolp form) (list form))
((atom form) form)
(t (if (find-$ form env)
(values `(setq $ ,form) t)
(values `(setq $ ,(list* (car form) '$ (cdr form))) t))))))
`(let (($ ,form))
,@(mapcar (lambda (f) (sb-walker:walk-form f nil #'walker)) body))))
The only thing that’s changed here is the definition of find-$
, and
in fact it’s a little simpler: the task is now to walk the entire form
and find uses of $
in an evaluated position, no matter how deep in
the evaluation. Because this is a code-walker, this will correctly
handle macros, backquotes, quoted symbols, and so on, and this allows
code of the form
(macroexpand-1 '(-> "THREE" string-downcase (char 0) char-code (complex (1+ $) (1- $))))
; => (LET (($ "THREE"))
; (SETQ $ (STRING-DOWNCASE $))
; (SETQ $ (CHAR-CODE $))
; (SETQ $ (COMPLEX (1+ $) (1- $)))),
; T
which, as far as I can tell, is not supported in magrittr
: doing 3
%>% complex(.+1,.-1)
is met with the error that “object '.' not
found”. Supporting this might, of course, not be a good idea, but at
least the code walker shows that it’s possible.
What if we wanted to augment ->
to handle binding forms, or special
forms in general? This is probably beyond the call of duty, but let’s
just briefly imagine that we wanted to be able to support binding
special variables around the individual calls in the chain; for
example, we want
(-> 3 (let ((*random-state* (make-random-state))) rnorm) mean)
to expand to
(let (($ 3))
(setq $ (let ((*random-state* (make-random-state))) (rnorm $)))
(setq $ (mean $)))
and let us also say, to make it interesting, that uses of $
in the
bindings clauses of the let
should not count against inhibiting
the insertion of $
in the first argument position of the first form
in the body of the let
, so
(-> 3 (let ((y (1+ $))) (atan y)))
should expand to
(let (($ 3)) (setq $ (let ((y (1+ $))) (atan $ y))))
So our code walker needs to walk the bindings of the let
, merely
collecting information into the walker’s lexical environment, then
walk the body performing the same rewrite as before. CHALLENGE
ACCEPTED:
(defmacro -> (&body forms)
(let ((rewrite t))
(declare (special rewrite))
(labels ((find-$ (form env)
(sb-walker:walk-form form env
(lambda (f c e)
(cond
((and (eql f '$) (eql c :eval))
(return-from find-$ t))
(t f))))
nil)
(walker (form context env)
(declare (ignore context))
(typecase form
(symbol (if rewrite (list form) form))
(atom form)
((cons (member with-rewriting without-rewriting))
(let ((rewrite (eql (car form) 'with-rewriting)))
(declare (special rewrite))
(values (sb-walker:walk-form (cadr form) env #'walker) t)))
((cons (member let let*))
(unless rewrite
(return-from walker form))
(let* ((body (member 'declare (cddr form)
:key (lambda (x) (when (consp x) (car x))) :test-not #'eql))
(declares (ldiff (cddr form) body))
(rewritten (sb-walker:walk-form
`(without-rewriting
(,(car form) ,(cadr form)
,@declares
(with-rewriting
,@body)))
env #'walker)))
(values rewritten t)))
(t
(unless rewrite
(return-from walker form))
(if (find-$ form env)
(values `(setq $ ,form) t)
(values `(setq $ ,(list* (car form) '$ (cdr form))) t))))))
`(let (($ ,(car forms)))
,@(mapcar (lambda (f) (sb-walker:walk-form f nil #'walker)) (cdr forms))))))
Here, find-$
is unchanged from the previous version; all the new
functionality is in walker
. How does it work? The default branch
of the walker
function is also unchanged; what has changed is
handling of
let
and
let*
forms. The main trick is to communicate information between
successive calls to the walker function, and turn the rewriting on and
off appropriately: we wrap parts of the form in new pseudo-special
operators with-rewriting
and without-rewriting
, which is basically
a tacky and restricted implementation of compiler-let
– if we needed
to, we could do a proper one with
macrolet
.
Within the scope of a without-rewriting
, walker
doesn’t do
anything special, but merely return the form it was given, except if
the form it’s given is a with-rewriting
form. This is a nice
illustration, incidentally, of the idea that lexical scope in the code
translates nicely to dynamic scope in the compiler; I can’t remember
where I read that first (but it’s certainly not a new idea).
And now
(macroexpand '(-> 3 (let ((*random-state* (make-random-state))) rnorm) mean))
; => (LET (($ 3))
; (LET ((*RANDOM-STATE* (MAKE-RANDOM-STATE)))
; (SETQ $ (RNORM $)))
; (SETQ $ (MEAN $))),
; T
(macroexpand '(-> 3 (let ((y (1+ $))) (atan y))))
; => (LET (($ 3))
; (LET ((Y (1+ $)))
; (SETQ $ (ATAN $ Y)))),
; T
Just to be clear: this post isn’t advocating a smarter pipe operator;
I don’t have a clear enough view, but I doubt that the benefits of the
smartness outweigh the complexity. It is demonstrating what can be
done, in a reasonably controlled way, using a code-walker: ascribing
semantics to fragments of Common Lisp code, and combining those
fragments in a particular way, and of course it’s another example of
sb-walker:walk
in use.
Finally, if something like this does in fact get used, people
sometimes get tripped up by the package system: the special bits of
syntax are symbols, and importing or package-qualifying ->
without
doing the corresponding thing to $
would lead to cryptic errors,
wrong results and/or confusion. One possibility to handle that is to
invent a bit more reader syntax:
(set-macro-character #\¦
(defun pipe-reader (stream char)
(let ((*readtable* (copy-readtable)))
(set-macro-character #\·
(lambda (stream char)
(declare (ignore stream char))
'$) t)
(cons '-> (read-delimited-list char stream t)))) nil)
¦"THREE" string-downcase (find-if #'alpha-char-p ·) char-code¦
If this is the exported syntax, it has the advantage that the
interface can only be misused intentionally: the actual macro and its
anaphoric symbol are both hidden from the programmer; and the syntax
is reasonably easy to type – on my keyboard ¦
is AltGr
+|
and ·
is AltGr
+.
– and moderately mnemonic from shell pipes and function
notation respectively. It also has all the usual disadvantages of
reader-based interfaces, such as composability, somewhat mitigated if
pipe-reader
is part of the macro’s exported interface.