Since it seems still topical to talk about Lisp and code-transformation macros, here’s another worked example – this time inspired by the enthusiasm for the R `magrittr` package.

The basic idea behind the `magrittr` package is, as Hadley said at EARL2014, to convert from a form of code where arguments to the same function are far apart to one where they’re essentially close together; the example he presented was converting

``````arrange(
summarise
group_by(
filter(babynames, name == "Hadley"),
year),
total = sum(n)
desc(year))
``````

to

``````b0 <- babynames
b1 <- filter(b0, name == "Hadley")
b2 <- group_by(b1, year)
b3 <- summarise(b2, total = sum(n))
b4 <- arrange(b3, desc(year))
``````

only without the danger of mistyping one of the variable names along the way and failing to perform the computation that was intended.

R, as I have said before, is a Lisp-1 with weird syntax and wacky evaluation semantics. One of the things that ordinary user code can do is inspect the syntactic form of its arguments, before evaluating them. This means that when looking at a fragment of code such as

``````foo(bar(2,3), 4)
``````

where a call-by-value language would first evaluate `bar(2,3)`, then call `foo` with two arguments (the value resulting from the evaluation, and 4), R instead uses a form of call-by-need evaluation, and also provides operators for inspecting the promise directly. This means R users can do such horrible things as

``````foo <- function(x) {
tmp <- substitute(x)
sgn <- 1
while(class(tmp) == "(") {
tmp <- tmp[[2]]
sgn <- sgn * -1
}
sgn * eval.parent(tmp)
}
foo(3) # 3
foo((3)) # -3
foo(((3))) # 3
foo((((3)))) # -3 (isn’t this awesome?  I did say “wacky”)
``````

In the case of `magrittr`, the package authors have taken advantage of this to invent some new syntax; the pipe operator `%>%` is charged with inserting its first argument (its left-hand side, in normal operation) as the first argument to the call of its second argument (right-hand side). Hadley’s example is

``````babynames %>%
filter(name == "Hadley") %>%
group_by(year) %>%
summarise(total = sum(n)) %>%
arrange(desc(year))
``````

and this is effective because the data flow in this case really is a pipeline: there's a dataset, which needs filtering, then grouping, then summarization, then sorting, and each operation works on the result of the previous. This already needs to inspect the syntactic form of the argument; an additional feature is recognizing the presence of `.`s in the call, and placing the left-hand side value in that argument position instead of as the first argument if it is present.

In Common Lisp, there are some piping or chaining operators out there (e.g. one two three (search for `ablock`) four and probably many others), and they do well enough. However! They mostly suffer from similar problems that we’ve seen before: doing code transformations with not quite enough understanding of the semantics of the code that they’re transforming; again, that’s fine for normal use, but for didactic purposes let’s pretend that we really care about this.

The `->` macro from http://stackoverflow.com/a/11080068 is basically the same as the `magrittr` `%>%` operator: it converts symbols in the pipeline to function calls, and places the result of the previous evaluation as the first argument of the current operator, except if a `\$` is present in the arguments, in which case it replaces that. (This version doesn’t support more than one `\$` in the argument list; it would be a little bit of a pain to support that, needing a temporary name, but it’s straightforward in principle).

Since the `->` macro does its job, a code-walker implementation isn’t strictly necessary: pure syntactic manipulation is good enough, and if it’s used with just the code it expects, it will do it well. It is of course possible to express what it does using a code-walker; we’ll fix the multiple-\$ ‘bug’ along the way, by explicitly introducing bindings rather than replacements of symbols:

``````(defmacro -> (form &body body)
(labels ((find-\$ (form env)
(sb-walker:walk-form form env
(lambda (f c e)
(cond
((eql f '\$) (return-from find-\$ t))
((eql f form) f)
(t (values f t)))))
nil)
(walker (form context env)
(cond
((symbolp form) (list form))
((atom form) form)
(t (if (find-\$ form env)
(values `(setq \$ ,form) t)
(values `(setq \$ ,(list* (car form) '\$ (cdr form))) t))))))
`(let ((\$ ,form))
,@(mapcar (lambda (f) (sb-walker:walk-form f nil #'walker)) body))))
``````

How to understand this implementation? Well, clearly, we need to understand what `sb-walker:walk` does. Broadly, it calls the walker function (its third argument) on successive evaluated subforms of the original form (and on variable names set by `setq`); the primary return value is used as the interim result of the walk, subject to further walking (macroexpansion and walking of its subforms) except if the second return value from the walker function is t.

Now, let’s start with the `find-\$` local function: its job is to walk a form, and returns `t` if it finds a `\$` variable to be evaluated at toplevel and `nil` otherwise. It does that by returning `t` if the form it’s given is `\$`; otherwise, if the form it’s given is the original form, we need to walk its subforms, so return `f`; otherwise, return its form argument `f` with a secondary value of `t` to inhibit further walking. This operation is slightly at odds with the use of a code walker: we are explicitly not taking advantage of the fact that it understands the semantics of the code it’s walking. This might explain why the `find-\$` function itself looks a bit weird.

The `walker` local function is responsible for most of the code transformation. It binds `\$` to the value of the first form, then repeatedly sets `\$` to the value of successive forms, rewritten to interpolate a `\$` in the first argument position if there isn’t one in the form already (as reported by `find-\$`). If any of the forms is a symbol, it gets listified and subsequently re-walked. Thus

``````(macroexpand-1 '(-> "THREE" string-downcase (char 0)))
; => (LET ((\$ "THREE"))
;      (SETQ \$ (STRING-DOWNCASE \$))
;      (SETQ \$ (CHAR \$ 0))),
;    T
``````

So far, so good. Now, what could we do with a code-walker that we can’t without? Well, the above implementation of `->` supports chaining simple function calls, so one answer is “chaining things that aren’t just function calls”. Another refinement is to support eliding the insertion of `\$` when there are any uses of `\$` in the form, not just as a bare argument. Looking at the second one first, since it’s less controversial:

``````(defmacro -> (form &body body)
(labels ((find-\$ (form env)
(sb-walker:walk-form form env
(lambda (f c e)
(cond
((and (eql f '\$) (eql c :eval))
(return-from find-\$ t))
(t f))))
nil)
(walker (form context env)
(cond
((symbolp form) (list form))
((atom form) form)
(t (if (find-\$ form env)
(values `(setq \$ ,form) t)
(values `(setq \$ ,(list* (car form) '\$ (cdr form))) t))))))
`(let ((\$ ,form))
,@(mapcar (lambda (f) (sb-walker:walk-form f nil #'walker)) body))))
``````

The only thing that’s changed here is the definition of `find-\$`, and in fact it’s a little simpler: the task is now to walk the entire form and find uses of `\$` in an evaluated position, no matter how deep in the evaluation. Because this is a code-walker, this will correctly handle macros, backquotes, quoted symbols, and so on, and this allows code of the form

``````(macroexpand-1 '(-> "THREE" string-downcase (char 0) char-code (complex (1+ \$) (1- \$))))
; => (LET ((\$ "THREE"))
;      (SETQ \$ (STRING-DOWNCASE \$))
;      (SETQ \$ (CHAR-CODE \$))
;      (SETQ \$ (COMPLEX (1+ \$) (1- \$)))),
;    T
``````

which, as far as I can tell, is not supported in `magrittr`: doing ```3 %>% complex(.+1,.-1)``` is met with the error that “object '.' not found”. Supporting this might, of course, not be a good idea, but at least the code walker shows that it’s possible.

What if we wanted to augment `->` to handle binding forms, or special forms in general? This is probably beyond the call of duty, but let’s just briefly imagine that we wanted to be able to support binding special variables around the individual calls in the chain; for example, we want

``````(-> 3 (let ((*random-state* (make-random-state))) rnorm) mean)
``````

to expand to

``````(let ((\$ 3))
(setq \$ (let ((*random-state* (make-random-state))) (rnorm \$)))
(setq \$ (mean \$)))
``````

and let us also say, to make it interesting, that uses of `\$` in the bindings clauses of the `let` should not count against inhibiting the insertion of `\$` in the first argument position of the first form in the body of the `let`, so

``````(-> 3 (let ((y (1+ \$))) (atan y)))
``````

should expand to

``````(let ((\$ 3)) (setq \$ (let ((y (1+ \$))) (atan \$ y))))
``````

So our code walker needs to walk the bindings of the `let`, merely collecting information into the walker’s lexical environment, then walk the body performing the same rewrite as before. CHALLENGE ACCEPTED:

``````(defmacro -> (&body forms)
(let ((rewrite t))
(declare (special rewrite))
(labels ((find-\$ (form env)
(sb-walker:walk-form form env
(lambda (f c e)
(cond
((and (eql f '\$) (eql c :eval))
(return-from find-\$ t))
(t f))))
nil)
(walker (form context env)
(declare (ignore context))
(typecase form
(symbol (if rewrite (list form) form))
(atom form)
((cons (member with-rewriting without-rewriting))
(let ((rewrite (eql (car form) 'with-rewriting)))
(declare (special rewrite))
(values (sb-walker:walk-form (cadr form) env #'walker) t)))
((cons (member let let*))
(unless rewrite
(return-from walker form))
(let* ((body (member 'declare (cddr form)
:key (lambda (x) (when (consp x) (car x))) :test-not #'eql))
(declares (ldiff (cddr form) body))
(rewritten (sb-walker:walk-form
`(without-rewriting
(,(car form) ,(cadr form)
,@declares
(with-rewriting
,@body)))
env #'walker)))
(values rewritten t)))
(t
(unless rewrite
(return-from walker form))
(if (find-\$ form env)
(values `(setq \$ ,form) t)
(values `(setq \$ ,(list* (car form) '\$ (cdr form))) t))))))
`(let ((\$ ,(car forms)))
,@(mapcar (lambda (f) (sb-walker:walk-form f nil #'walker)) (cdr forms))))))
``````

Here, `find-\$` is unchanged from the previous version; all the new functionality is in `walker`. How does it work? The default branch of the `walker` function is also unchanged; what has changed is handling of `let` and `let*` forms. The main trick is to communicate information between successive calls to the walker function, and turn the rewriting on and off appropriately: we wrap parts of the form in new pseudo-special operators `with-rewriting` and `without-rewriting`, which is basically a tacky and restricted implementation of `compiler-let` – if we needed to, we could do a proper one with `macrolet`. Within the scope of a `without-rewriting`, `walker` doesn’t do anything special, but merely return the form it was given, except if the form it’s given is a `with-rewriting` form. This is a nice illustration, incidentally, of the idea that lexical scope in the code translates nicely to dynamic scope in the compiler; I can’t remember where I read that first (but it’s certainly not a new idea).

And now

``````(macroexpand '(-> 3 (let ((*random-state* (make-random-state))) rnorm) mean))
; => (LET ((\$ 3))
;      (LET ((*RANDOM-STATE* (MAKE-RANDOM-STATE)))
;        (SETQ \$ (RNORM \$)))
;      (SETQ \$ (MEAN \$))),
;    T
(macroexpand '(-> 3 (let ((y (1+ \$))) (atan y))))
; => (LET ((\$ 3))
;      (LET ((Y (1+ \$)))
;        (SETQ \$ (ATAN \$ Y)))),
;    T
``````

Just to be clear: this post isn’t advocating a smarter pipe operator; I don’t have a clear enough view, but I doubt that the benefits of the smartness outweigh the complexity. It is demonstrating what can be done, in a reasonably controlled way, using a code-walker: ascribing semantics to fragments of Common Lisp code, and combining those fragments in a particular way, and of course it’s another example of `sb-walker:walk` in use.

Finally, if something like this does in fact get used, people sometimes get tripped up by the package system: the special bits of syntax are symbols, and importing or package-qualifying `->` without doing the corresponding thing to `\$` would lead to cryptic errors, wrong results and/or confusion. One possibility to handle that is to invent a bit more reader syntax:

``````(set-macro-character #\¦
(defun pipe-reader (stream char)
(let ((*readtable* (copy-readtable)))
(set-macro-character #\·
(lambda (stream char)
(declare (ignore stream char))
'\$) t)
(cons '-> (read-delimited-list char stream t)))) nil)
¦"THREE" string-downcase (find-if #'alpha-char-p ·) char-code¦
``````

If this is the exported syntax, it has the advantage that the interface can only be misused intentionally: the actual macro and its anaphoric symbol are both hidden from the programmer; and the syntax is reasonably easy to type – on my keyboard `¦` is `AltGr`+`|` and `·` is `AltGr`+`.` – and moderately mnemonic from shell pipes and function notation respectively. It also has all the usual disadvantages of reader-based interfaces, such as composability, somewhat mitigated if `pipe-reader` is part of the macro’s exported interface.