Christophe Weblog Wiki Code Publications Music
backquote and pretty printing

There was a bit of a kerfuffle following the 1.2.2 release of SBCL, regarding the incompatible change in the internals of the backquote reader macro.

Formally, implementations can choose how to implement the backquote reader macro (and its comma-based helpers): the semantics of backquote are defined only after evaluation:

An implementation is free to interpret a backquoted form F1 as any form F2 that, when evaluated, will produce a result that is the same under equal as the result implied by the above definition, provided that the side-effect behavior of the substitute form F2 is also consistent with the description given above.

(CLHS 2.4.6; emphasis mine)

There are also two advisory notes about the representation:

Often an implementation will choose a representation that facilitates pretty printing of the expression, so that (pprint '`(a ,b)) will display `(a ,b) and not, for example, (list 'a b). However, this is not a requirement.

(CLHS 2.4.6.1; added quote in example mine), and:

Implementors who have no particular reason to make one choice or another might wish to refer to IEEE Standard for the Scheme Programming Language, which identifies a popular choice of representation for such expressions that might provide useful to be useful compatibility for some user communities.

(CLHS 2.4.6.1; the Scheme representation reads `(foo ,bar) as (quasiquote (foo (unquote bar))))

The problem the new implementation of backquote is attempting to address is the first one: pretty printing. To understand what the problem is, an example might help: imagine that we as Common Lisp programmers (i.e. not implementors, and aiming for portability) have written a macro bind which is exactly equivalent to let:

(defmacro bind (bindings &body body)
  `(let ,bindings ,@body))

and we want to implement a pretty printer for it, so that (pprint '(progn (bind ((x 2) (z 3)) (if *print-pretty* (1+ x) (1- y))))) produces

(progn
  (bind ((x 2)
         (z 3))
    (if *print-pretty*
        (1+ x)
        (1- y))))

What does that look like? Writing pretty-printers is a little bit of a black art; a first stab is something like:

(defun pprint-bind (stream object)
  (pprint-logical-block (stream object :prefix "(" :suffix ")")
    (pprint-exit-if-list-exhausted)
    (write (pprint-pop) :stream stream)
    (pprint-exit-if-list-exhausted)
    (write-char #\Space stream)
    (pprint-logical-block (stream (pprint-pop) :prefix "(" :suffix ")")
      (pprint-exit-if-list-exhausted)
      (loop
        (write (pprint-pop) :stream stream)
        (pprint-exit-if-list-exhausted)
        (pprint-newline :mandatory stream)))
    (pprint-exit-if-list-exhausted)
    (pprint-indent :block 1 stream)
    (pprint-newline :mandatory stream)
    (loop
      (write (pprint-pop) :stream stream)
      (pprint-exit-if-list-exhausted)
      (pprint-newline :mandatory stream))))
(set-pprint-dispatch '(cons (eql bind)) 'pprint-bind)

The loop noise is necessary because we're using :mandatory newlines; a different newline style, such as :linear, might have let us use a standard utility function such as pprint-linear. But otherwise, this is straightforward pretty-printing code, doing roughly the equivalent of SBCL's internal pprint-let implementation, which is:

(formatter "~:<~^~W~^ ~@_~:<~@{~:<~^~W~@{ ~_~W~}~:>~^ ~_~}~:>~1I~:@_~@{~W~^ ~_~}~:>")

A few tests at the repl should show that this works with nasty, malformed inputs (“malformed” in the sense of not respecting the semantics of bind) as well as expected ones:

(pprint '(bind))
(pprint '(bind x))
(pprint '(bind x y))
(pprint '(bind (x y) z))
(pprint '(bind ((x 1) (y 2)) z))
(pprint '(bind ((x 1) (y 2)) z w))
(pprint '(bind . 3))
(pprint '(bind x . 4))
(pprint '(bind (x . y) z))
(pprint '(bind ((x . 0) (y . 1)) z))
(pprint '(bind ((x) (y)) . z))
(pprint '(bind ((x) y) z . w))

Meanwhile, imagine a world where the backquote reader macro simply wraps (quasiquote ...) around its argument, and comma likewise wraps (unquote ...):

(set-macro-character #\` (defun read-backquote (stream char)
                           (list 'quasiquote (read stream t nil t))))
(set-macro-character #\, (defun read-comma (stream char)
                           (list 'unquote (read stream t nil t))))

Writing pretty-printer support for that is easy, right?

(defun pprint-quasiquote (stream object)
  (write-char #\` stream)
  (write (cadr object) :stream stream))
(defun pprint-unquote (stream object)
  (write-char #\, stream)
  (write (cadr object) :stream stream))
(set-pprint-dispatch '(cons (eql quasiquote) (cons t null)) 'pprint-quasiquote)
(set-pprint-dispatch '(cons (eql unquote) (cons t null)) 'pprint-unquote)

(ignoring for the moment what happens if the printed representation of object happens to start with a @ or .)

(pprint '(quasiquote (x (unquote y))))

The problem arises when we try to combine these two things. In particular, what happens when we attempt to print backquoted forms:

(pprint '`(bind ,y z))

What we would hope to see is something like

`(bind ,y
   z)

but what we actually get is

`(bind (unquote
        y)
   z)

because each of the bindings in bind is printed individually, rather than the bindings being printed as a whole. And, lest there be hopes that this can be dealt with by a slightly different way of handling the pretty printing in pprint-bind, note that it's important that (pprint '(bind (function y) z)) print as

(bind (function
       y)
  z)

and not as

(bind #'y
  z)

so the only way to handle this is to know the magical symbols involved in backquote and comma reader macros – but that is not portably possible. So, we've come to the point where the conclusion is inevitable: it is not possible for an implementation to support list-structured quasiquote and unquote reader macros and general pretty printing for user-defined operators. (This isn’t the only failure mode for the combination of unquote-as-list-structure and pretty-printing; it’s surprisingly easy to write pretty-printing functions that fail to print accurately, not just cosmetically as above but catastrophically, producing output that cannot be read back in, or reads as a structurally unequal object to the original.)

The new implementation, by Douglas Katzman, preserves the implementation of the backquote reader macro as a simple list, but comma (and related reader macros) read as internal, literal structures. Since these internal structures are atoms, not lists, they are handled specially by pprint-logical-block and friends, and so their own particular pretty-printing routines always fire. The internal quasiquote macro ends up extracting and arranging for appropriate evaluation and splicing of unquoted material, and everything ends up working.

Everything? Well, not quite: one or two programmer libraries out there implemented some utility functionality – typically variable renaming, automatic lambda generation, or similar – without performing a full macroexpansion and proper codewalk. That code was in general already broken, but it is true that in the past generating an example to demonstrate the breakage would have to violate the general expectation of what “normal” Lisp code would look like, whereas as a result of the new implementation of backquote in SBCL the symptoms of breakage were much easier to generate. Several of these places were fixed before the new implementation was activated, such as iterate's #l macro; among the things to be dealt with after the new implementation was released was the utility code from let-over-lambda (a workaround has been installed in the version distributed from github), and there is still a little bit of fallout being dealt with (e.g. a regression in the accuracy of source-location tracking). But overall, I think the new implementation of backquote has substantial advantages in maintainability and correctness, and while it’s always possible to convince maintainers that they’ve made a mistake, I hope this post explains some of why the change was made.

Meanwhile, I've released SBCL version 1.2.3 – hopefully a much less “exciting” release...

languages for teaching programming

There’s seems to be something about rainy weekends in May that stimulates academics in Computing departments to have e-mail discussions about programming languages and teaching. The key ingredients probably include houseboundness, the lull between the end of formal teaching and the start of exams, the beginning of contemplation of next year’s workload, and enthusiasm; of course, the academic administrative timescale is such that any changes that we contemplate now, in May 2014, can only be put in place for the intake in September 2015... if you’ve ever wondered why University programmes in fashionable subjects seem to lag about two years behind the fashion (e.g. the high growth of Masters programmes in “Financial Engineering” or similar around 2007 – I suspect that demand for paying surprisingly high tuition fees for a degree in Synthetic Derivative Construction weakened shortly after those programmes came on stream, but I don’t actually have the data to be certain).

Meanwhile, I was at the European Lisp Symposium in Paris last week, where there was a presentation of a very apposite nature: the Computing Department at Middlesex university has implemented an integrated first-year of undergraduate teaching, covering a broad range of the computing curriculum (possibly not as broad as at Goldsmiths, though) through an Arduino and Raspberry Pi robot with a Racket-based programmer interface. Students’ progress is evaluated not through formal tests, courseworks or exams, but through around 100 binary judgments in the natural context of “student observable behaviours” at three levels (‘threshold’, which students must exhibit to progress to the second year; ‘typical’, and ‘excellent’).

This approach has a number of advantages, I think, over a more traditional division of the year into four thirty-credit modules (e.g. Maths, Java, Systems, and Profession): for one, it pretty much guarantees a coherent approach to the year, where in the divided modules case it is surprisingly easy for one module to be updated in syllabus or course content without adjustments to the others, leaving for example some programming material insufficiently supported by the maths (and some maths taught without motivation). The assessment method is in principle transparent to the students, who know what they have to do to progress (and to get better marks); I'm not convinced that this is always a good thing, but for an introductory and core course I think the benefits substantially outweigh the disadvantages. The use of Racket as the teaching language has an equalising effect – it’s unlikely that students will have prior experience with it, so everyone starts off at the same point at least with respect to the language – and the use of a robot provides visceral feedback and a sense of achievement when it is made to do something in a way that text and even pixels on a screen might not. (This feedback and tangible sense of achievement is perhaps why the third-year option of Physical Computing at Goldsmiths is so popular: often oversubscribed by a huge margin).

With these thoughts bubbling around in my head, then, when the annual discussion kicked off at the weekend I decided to try to articulate my thoughts in a less ephemeral way than in the middle of a hydra-like discussion: so I wrote a wiki page, and circulated that. One of the points of having a personal wiki is that the content on it can evolve, but before I eradicate the evidence of what was there before, and since it got at least one response (beyond “why don’t you allow comments on your wiki?”) it's worth trying to continue the dialogue.

Firstly, Chris Cannam pulls me up on not including an Understand goal, or one like it: teaching students to understand and act on their understanding of computing artifacts, hardware and software. I could make the argument that this lives at the intersection of my Think and Experiment goals, but I think that would be retrospective justification and that there is a distinct aim there. I’m not sure why I left it out; possibly, I am slightly hamstrung in this discussion about pedagogy by a total absence of formal computing education; one course in fundamentals of computing as a 17-year-old, and one short course on Fortran and numerical methods as an undergraduate, and that’s it. It's in some ways ironic that I left out Understand, given that in my use of computers as a hobby it’s largely what I do: Lisp software maintenance is often a cross between debugger-oriented programming and software archaeology. But maybe that irony is not as strong as it might seem; I learnt software and computing as a craft, apprenticed to a master; I learnt the shibboleths (“OAOO! OAOO! OAOO!”) and read the training manuals, but I learnt by doing, and I’m sure I’m far from alone, even in academic Computing let alone among programmers or computing professionals more generally.

Maybe the Middlesex approach gets closer, in the University setting, to the apprenticeship (or pre-apprenticeship) period; certainly, there are clear echoes in their approach of the switch by MIT from the SICP- and Scheme-based 6.001 to the Robotics- and Python-based introduction to programming – and Sussman’s argument from the time of the switch points to a qualitative difference in the role of programmers, which happens to dovetail with current research on active learning. In my conversation with the lecturers involved after their presentation, they said that the students take a more usual set of languages in their second-years (Java, C++); obviously, since this is the first year of their approach, they don’t yet know how the transition will go.

And then there was the slightly cynical Job vs Career distinction that I drew, being the difference between a graduate-level job six months after graduation as distinct from a fulfilling career. One can of course lead to the other, but it’s by no means guaranteed, and I would guess that if asked most people would idealistically say we as teachers in Universities should be attempting to optimize for the latter. Unfortunately, we are measured in our performance in the former; one of the “Key Information Sets” collected by higher-education agencies and presented to prospective undergraduates is the set of student ‘destinations’. Aiming to optimize this statistic is somewhat akin to schools optimizing their GCSE results with respect to the proportion of pupils gaining at least 5 passes: ideally, the measurement should be a reflection of the pedagogical practice, but the foreknowledge of the measurement has the potential to distort the allocation of effort and resources. In the case of the school statistic, there’s evidence that extra effort is concentrated on borderline pupils, at the expense of both the less able and potential high-fliers; the distortion isn’t so stark in the case of programming languages, because the students have significant agency between teaching and measurement, but there is certainly pressure to ensure that we teach the most common language used in assessment centres.

In Chris’ case, C++ might have had a positive effect on both; the programming language snob in me, though, wants to believe that there are hordes of dissatisfied programmers out there, having got a job using their competence in some “industry-standard” language, desperately wanting to know about a better way of doing things. I might be overprojecting the relationship of a professional programmer with their tools, of course: for many a career in programming and development is just that, rather than a love-hate relationship with their compiler and linker. (Also, there’s the danger of showing the students that there is a better way, but that the market doesn’t let them use it...)

Is it possible to conclude anything from all of this? Well, as I said in my initial thoughts, this is an underspecified problem; I think that a sensible decision can only be taken in practice once the priorities for teaching programming at all have been established, in combination with the resources available for delivering teaching. I’m also not enough of a polyglot in practice to offer a mature judgment on many fashionable languages; I know enough of plenty of languages to modify and maintain code, but am comfortable writing from scratch in far fewer.

So with all those caveats, an ideal programming curriculum (reflecting my personal preferences and priorities) might include in its first two years: something in the Lisp family, for Think and Experiment; ARM Assembler, for Think and Understand; C++, for Job and Career (and all three together for Society). Study should probably be handled in individual third-year electives, and I would probably also include a course hybrid between programming, database and system administration to cover a Web programming stack for more Job purposes. Flame on (but not here, because system administration is hard).

european lisp symposium 2014

My train ride to Paris was uneventful, and I arrived at my accommodation only several hours after bedtime. I did manage to write my talk, and it was good to discover the number of obsessive planet.lisp readers when I showed up to register – “good to see you again. How’s the talk?” was the median greeting. For the record, I had not only written my talk on the train but also had a chance to relax a little. Go trains.

The conference was fun; gatherings of like-minded people usually are, but of course it takes substantial effort from determined people to create the conditions for that to happen. Kent’s work as programme chair, both before and during the conference, came with a feeling of apparent serenity while never for a moment looking out of control, and the groundwork that the local organizing team (Didier Verna, Gérard Assayag and Sylvie Benoit) had done meant that even the problem of the registrations exceeding the maximum capacity of the room – nice problem to have! – could be dealt with.

I liked the variety in the keynotes. It was very interesting to hear Richard Gabriel’s talk on his research in mood in Natural Language processing and generation; in many ways, those research directions are similar to those in the Transforming Musicology world. The distinction he drew between programming for a purpose and programming for exploration was very clear, too: he made it clear that he considered them two completely different things, and with my hat as a researcher on I have to agree: usually when I'm programming for research I don’t know what I'm doing, and the domain of investigation is so obviously more unknown than the program structure that I had better have a malleable environment, so that I can minimize the cost of going down the wrong path. Pascal Costanza gave a clear and detailed view of the problems in parallel programming, drawing a distinction between parallelism and concurrency, and happened to use examples from several of my previous lives (Smooth-Particle Hydrodynamics, Sequence Alignment) to illustrate his points. Gábor Melis talked about his own learning and practice in the context of machine learning, with a particular focus on his enviable competition record; his call to aim for the right-hand side of the curves (representing clear understanding and maximum use-case coverage) was accompanied by announcements of two libraries, mgl-pax and mgl-mat.

My own presentation was, I suppose, competent enough (slides). Afterwards, I had a good talk with my previous co-author in the generalizes research line, Jim Newton, about the details, and Gábor told me he’d like to try it “on by default”. But the perils of trying to get across a highly-technical topic struck, and I got a number of comments of the form that the talk had been enjoyable but I’d “lost them at compute-applicable-methods-using-classes”. I suppose I could still win if the talk was enjoyable enough for them to read and work through the paper; maybe next time I might risk the demo effect rather more than I did and actually do some programming live on stage, to help ground the ideas in people’s minds. I did get a specific request: to write a blog post about eval-when in the context of metaobject programming, and hopefully I'll find time for the in the next few train journeys...

Meanwhile, highlights (for me) among the contributed papers: Nick Levine driving Lispworks’ CAPI graphical user interface library from SBCL using his Common Lisp AUdience Expansion toolkit (preaching to the choir, though: his real target is Python developers); Faré Rideau’s description of a decade-long exploration of defsystem design space; François-Xavier Bois’ demonstration of web-mode.el, an Emacs mode capable of handling CSS, Javascript and PHP simultaneously; and two talks motivated by pedagogy: Pedro Ramos’ discussion of the design tradeoffs involved in an implementation of Python in Racket, and the team presentation of the approach taken for a new robotics- and Scheme-oriented undergraduate first-year at Middlesex University, on which more in a subsequent post.

Lightning talks of particular note to me: Martin Simmons talking about Lispworks for mobile; Didier Verna and Marco Antoniotti talking about their respective documentation generation systems (my response); Mikhail Raskin’s argument about the opportunity to push Julia in a lispy direction; and probably others which will come back to mind later.

I was also pleased to be able to contribute to the last full session of the symposium, a workshop/panel about Lisp in the area of music applications: an area which is serendipitously close to the day job. I worked on getting a version of audioDB, our feature-vector search engine for similarity matching, built and working on my laptop, and finding a sufficiently interesting search among my Gombert/Josquin collection to demo – and I also had the chance to talk about Raymond Whorley’s work on using multiple viewpoint systems for hymn harmonizations, and what that teaches us about how people do it (slides, for what they're worth). Other systems of interest in the session included OpenMusic (of course, given where we were), PWGL, OMax, modalys, and overtone; there was an interesting conversation about whether the choice of implementation language was restricting the userbase, particularly for tools such as OpenMusic where the normal interface is a graphical one but a large fraction of users end up wanting to customize behaviour or implement their own custom patches.

And then it was all over bar the dinner! On a boat, sadly immobile, but with good conversation and good company. The walk home in company was fun, though in retrospect it was probably a mistake to stop off at a bar for a nightcap... the train journey back to the UK the following morning was definitely less productive than it could have been; closing eyes and letting the world go past was much more attractive.

But now I'm on another train, going off to record the complete works of Bernadino de Ribera. Productivity yay.

on my way to els2014

After a packed weekend of music-making with De Profundis (go buy tickets!), I’m now at St Pancras, waiting for a train to Paris to go to the European Lisp Symposium, where I’ll be presenting my work (with Jan Moringen and David Lichteblau) on generalizer metaobjects. I’m looking forward to the event: it’s being held at IRCAM, which as an institution neatly combines two of my interests (computing and music) – there’s going to be a special session on Tuesday on Lisp and composition, though I might try to participate and broaden it a bit to talk about some of our current problems and approaches in Transforming Musicology.

I’m not (yet) looking forward to my talk. Over the next few hours, I’m going to be putting to the test my current belief that I am most productive on trains, because I haven’t yet written my talk, and it turns out that I am in the first session tomorow morning. (I know, I know, this is the sound of the world’s tiniest violin). 2h20 should be enough fo anyone.

The flip side of the stress of being first is the ability to enjoy the rest of the symposium without the stress of an upcoming presentation hanging over me – as well as the ability to ask nasty difficult and probing questions without too much fear of instant reprisal. (I suppose that by putting this out there before the talk I open myself up to pre-reprisals; all I can say is that I welcome a constructive and robust exchange of views: there is definitely more work to be done in the world of extended dispatch, and suggestions will be welcome.)

And of course it will be good to meet up with a community of like-minded people; I may not be highly active in the Lisp world these days (though I have been making an effort to be more engaged with at least the SBCL team for more than the one day a month of release activity, and it’s been fun to watch the ARM port progress on #sbcl) but I retain a good helping of interest in what’s going on, and it will be great to meet up with old friends and maybe even to make new ones. (Registration got to the room capacity very quickly, and hopefully the overflow room will have a good atmosphere too). Hopefully the conversations I have and ideas generated will mean that I have a productive train ride back, too!

just like old times

Sometimes, it feels like old times, rolling back the years, and so on. One of the things I did while avoiding work on my PhD (on observational tests of exotic cosmologies inspired by then-fashionable high-energy physics theories, since you ask) was to assist in resurrecting old and new CMUCL backends for SBCL, and to do some additional backend hacking. I learnt a lot! (To be fair, I learnt a lot about General Relativity too). And I even wrote about some of the funny bits when porting, such as getting to a working REPL with a completely broken integer multiply on SPARC, or highly broken bignums when working on a proper 64-bit backend for alpha.

On the #sbcl IRC channel over the last few days, the ARM porting crew (not me! I've just been kibitzing) has made substantial progress, to the point that it's now possible to boot a mostly-working REPL. Some of the highlights of the non-workingness:

(truncate (expt 2 32) 10)
21074691228794313799883120286105
6

(fib 50)
6277101738309684039858724727853387073209631205623600462197

(expt 2.0 3)
2.0

(sb-ext:run-program "uname" '() :search t :output *standard-output*)
Linux
Memory fault at 3c

Meanwhile, given that most ARM cores don't have an integer division instruction, it's a nice bit of synchronicity that one of this year's Summer of Code projects for SBCL is to improve division by constant integers; I'm very much looking forward to getting involved with that, and hopefully also bringing some of the next generation of SBCL hackers into the community. Maybe my writing a while back about a self-sustaining system wasn't totally specious.

http-content-negotiation-and-generalized-specializers

I promised a non-trivial example of a use for generalized specializers a while ago. Here it is: automatic handling of HTTP (RFC 2616) Content-Type negotiation in computed responses.

In RESTful services and things of that ilk, a client indicates that it wants to apply a verb (GET, for example) to a particular resource (named by a URN, or possibly identified by a URI). This resource is a conceptual object; it will have zero or more concrete manifestations, and content negotiation provides a way for the client to indicate which of those manifestations it would prefer to receive.

That's all a bit abstract. To take a concrete example, consider the woefully incomplete list of books in my living room at openlibrary. A user operating a graphical web browser to access that resource is presumed to want to retrieve HTML and associated resources, in order to view a shiny representation of the information associated with that resource (a “web page”, if you will). But the human-oriented representation of the information is not the only possible one, and it is common practice in some circles to provide machine-readable representations as well as human-oriented ones, at the same URL; for example, try:

curl -H 'Accept: application/json' https://openlibrary.org/people/csrhodes/lists

and observe the difference between that and visiting the same URL in a graphical browser.

How does the web server know which representation to send? Well, the example has given away the punchline (if the links above to RFC sections haven't already). The graphical web browser will send an Accept header indicating that it prefers to receive objects with presentational content types – text/html, image/* and so on; the browser I have to hand sends text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 as its Accept header, meaning “please give me text/html or application/xhtml+xml, or failing that application/xml, or failing that anything else”. If the server has more than one representation for a given resource, it can use this client-supplied information to provide the best possible content; if the client has particular requirements – for example, attempting to find machine-readable content for further processing – it can declare this by specifying particular acceptable content-types in its Accept header.

For a resource for which more than one representation exists, then, the server must dispatch between them based on the client Accept header. And this is exactly a non-standard dispatch of the kind I've been discussing. Consider a resource http://foo.example/ which is implemented by sending the return value of the generic function foo back to the client:

(defgeneric foo (request)
  (:generic-function-class accept-generic-function))

The default behaviour is somewhat a matter of taste, but one reasonable choice is that if no content-type matches we should use the defined HTTP status code to indicate that the responses we could generate are not acceptable to the client:

(defmethod foo ((request t))
  (http:406 request))

Maybe we have a couple of presentational representations for the resource:

(defmethod foo ((request (accept "text/plain")))
  "Foo")

(defmethod foo ((request (accept "text/html")))
  "<!DOCTYPE html>
<html>
 <head><title>Foo</title></head>
 <body><p>Foo</p></body>
</html>")

And we might have some machine-readable representations:

(defmethod foo ((request (accept "text/turtle")))
  "@prefix foo: <http://example.org/ns#> .
@prefix : <http://other.example.org/ns#> .
foo:bar foo: : .")

(defmethod foo ((request (accept "application/rdf+xml")))
  "<?xml version=\"1.0\"?>
<rdf:RDF xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\"
         xmlns:foo=\"http://example.org/ns#\">
  <rdf:Description about=\"http://example.org/ns#bar\">
    <foo:>
      <rdf:Description about=\"http://other.example.org/ns#\"/>
    </foo:>
  </rdf:Description>
</rdf:RDF>")

(I apologize to any fans of XML/RDF if I have mangled that).

Now a graphical web browser sending an accept header of text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 as above will cause the server to send the HTML version, as that is the most specific applicable method to that accept string. Given this, it is perfectly possible to construct specialized clients with alternative preferences expressed in the accept header. A terminal-based client might prioritize text/plain over text/html (though in fact neither w3m nor lynx does that, at least in the versions I have installed). A client for the Semantic Web might instead accept data in serialized RDF, preferring more modern serializations, by sending an accept string of text/turtle,application/rdf+xml;q=0.9. All these clients could each be served the resource in their preferred format.

The case of serving one of a set of alternative files hosted on the filesystem in response to a request with an arbitrary accept string is different; in this case, it doesn't make sense to do the dispatch through methods. If we were to try to do so, it would look something like

(defmethod filesystem-handler (url (content-type (accept "text/html")))
  (let ((prospect (pathname-for-url url "html")))
    (if (probe-file prospect)
        (serve-file prospect "text/html")
        (call-next-method))))

but we would need to define one such method per possible mime-type we might want to serve: doable, but unnecessary compared with the alternative strategy of finding all content-types servable for a given url, then choosing the one with the highest score:

(defun filesystem-handler (url accept)
  (do* ((prospects (files-for url) (cdr prospects))
        (mime-types (mapcar #'mime-type prospects) (cdr mime-types))
        result mime-type (q 0))
       ((null prospects) (serve-file result mime-type))
     (when (> (q (car mime-types) accept) q)
       (setf result (car prospects) 
             mime-type (car mime-types) 
             q (q (car mime-types))))))

(the set of files on the filesystem effectively already define a set of methods for a given url; it doesn't make sense to try to mirror that as a set of reified methods on a generic function. Also, I've written this out using do* largely to keep the do*-is-not-that-bad society alive.)

Anyway. There's an interesting detail I've elided so far; not only do response-generating functions have to generate the content they wish to send in the response; they also have to indicate what content-type they are actually sending. Our accept-generic-function already handles dispatching on content-type; can it also take responsibility for setting the content-type of the response?

Why yes! The way to do this is using a method combination; it might look something like this:

(defvar *actual-content-type*)

(defgeneric handle-content-type (request))

(define-method-combination content-negotiation/or ()
  ((around (:around))
   (primary () :required t))
  (:arguments request)
  (labels ((transform/1 (method)
             `(make-method
               (progn
                 (let ((s (car (sb-mop:method-specializers ,method))))
                   (when (typep s 'accept-specializer)
                     (setf *actual-content-type* (media-type s))))
                 (call-method ,method))))
           (transform (primaries)
             (mapcar #'(lambda (x) `(call-method ,(transform/1 x)))
                     primaries))
           (wrap (form)
             `(let ((*actual-content-type*))
                (multiple-value-prog1
                    ,form
                  (handle-content-type ,request)))))
    (let ((form (if (rest primary)
                    `(or ,@(transform primary))
                    `(call-method ,(transform/1 (car primary))))))
      (if around
          (wrap `(call-method ,(first around)
                              (,@(rest around) (make-method ,form))))
          (wrap form)))))

This behaves just like the or built-in method-combination, except that when calling a primary method whose specializer for the first argument is one of our accept-specializers, the content-type of the specializer is stored in a special variable; the last thing the effective method does is to call the new handle-content-type generic function, passing it the original generic function's first argument.

Now let's redefine our foo generic function to have the new method combination, and a method on handle-content-type:

(defgeneric foo (request)
  (:generic-function-class accept-generic-function)
  (:method-combination content-negotiation/or))

(defmethod handle-content-type ((x string))
  (format t "Content-Type: ~A~%" *actual-content-type*))

and now, finally, we can try it all out:

SPECIALIZABLE> (foo "text/plain,text/html;q=0.9,*/*;q=0.8")
Content-Type: text/plain
"Foo"

SPECIALIZABLE> (foo "text/turtle,application/rdf+xml;q=0.9")
Content-Type: text/turtle
"@prefix foo: <http://example.org/ns#> .
@prefix : <http://other.example.org/ns#> .
foo:bar foo: : ."

SPECIALIZABLE> (foo "audio/mp3")
406

OK, but by what magic do these accept-specializer objects exist and function? I wrote a paper about that, with Jan Moringen and David Lichteblau: as part of my ongoing open access experimentation, the version we submitted to the European Lisp Symposium is viewable at Goldsmiths' e-prints repository or on arXiv. The ELS Chairs have just announced a deadline extension, so there's still time (until March 23) for anyone to submit technical papers or abstracts for tutorials and demonstration sessions: please do, and I hope to see many of my readers there.

sbcl release management in the air

Just because I'm attending mobile world congress doesn't mean that everything else stops. It's the end of the month, so it must be time to release sbcl. This month's is a little unsatisfying, because we've only achieved one of the two things I'd hoped for: we've been cautious and conservative after last month's landing of the new register allocator, but we haven't sorted out what's going on to cause the less active architectures to fail to build. There are some workarounds that have been mooted; for one reason and another no-one has had the time to check whether they actually work, and while there's partial progress on identifying the root cause of the build failure on sparc it is only partial.

Nevertheless, minor architectures have been broken before, and no-one particularly benefits from not releasing, so 1.1.16 it is. Actually making the release is a little more challenging than usual: I aim to release by the end of the month, and in practice that means it must be done today, 28th February. However, this is the day that I am returning from Barcelona, so I am less in control of laptop power and connectivity than usual for a release day. And to add to the challenge, I am trying this time to address the valid complaints that the binaries built on my laptop don't actually run on released versions of Linux, thanks to the change in the semantics of memcpy (glibc changed the implementation in its version 2.14 to exploit the freedom given to return undefined results if the source and destination overlap) and the behaviour of the linker and versioned symbols.

So over breakfast I dusted off my squeeze chroot (that doesn't sound dodgy at all), and tried to work out how to get a runnable version of SBCL in there at all (answer: build using clisp and link to the chroot's libc). At lunchtime, I used the café's wireless to check for any last-minute changes, and in the airport I found a power socket, so I started the release build. Of course it didn't finish before the gate opened, and in any case I wasn't going to chance uploading sbcl binaries over the free airport wifi (15 minutes, woo!)

I've performed some release stunts before. SBCL 0.9 was released by William Harold Newman and simultaneously announced by me at the first European Common Lisp Meeting in Amsterdam in 2005. Last year, I released SBCL 1.1.8 “live” as part of a lightning talk at the European Lisp Symposium in Madrid. But I think this is the first time that an SBCL release was even partially effected from the air. Next stop, space?

sbcl summer of code 2014 participation

I received notification yesterday that SBCL was accepted as a “mentoring organization” to Google's Summer of Code 2014 programme. What this means is that eligible students can think about applying to be mentored by members of the SBCL team to work on a development project – and to be paid a stipend for that work.

The next steps for students are:

  • get involved with the community: join #lisp and #sbcl on the freenode IRC network; read the sbcl-devel mailing list (e.g. at gmane.lisp.steel-bank.devel; have a go at some of the bugs marked as ‘easy’
  • look at the existing list of project ideas, and see if any of those ideas would form a good basis for a project that would sustain your interest for three months or so.
  • read Paul Khuong's essay on getting started with SBCL from last year: it is still absolutely applicable to this year's programme, or indeed getting into SBCL hacking independently.

The period for student applications to be mentored as part of SBCL's participation is from 10th March until 21st March 2014; for best results, get engaged with the community and potential mentors well before the cutoff point.

Last year two projects were funded, and both completed successfully, even if merging them into the mainline is/was slower than one might like. I'm looking forward to seeing what this year brings!

generic function precompilation

Once upon a time, I wrote two blog entries about precompiling the dispatch for generic functions in order to improve the responsiveness at application startup, and obviously planned a third but didn't get around to writing it. Some time later, I asked the lazyweb about it, and unfortunately this time didn't get anything much back. On further reflection, I think I was going to discuss a couple of additional technical points.

First of all, there's the question about having this kind of precompilation on all the time; wouldn't it be nice if generic functions were always at their most efficient, ready to be called? Well, yes, but what we actually want is for functions to be at their most efficient when we want to call them, and we don't particularly care at other times. Why draw that distinction? Well, when we write code that uses generic functions, it is usually of the form:

(defgeneric foo (x y))
(defmethod foo ((x t) (y t)) ...)
(defmethod foo ((x integer) (y integer)) ...)
(defmethod foo ((x rational) (y float)) ...)

a sequence of method definitions; sometimes contiguous, as above; often distributed among various files. loading such code will execute the defmethod forms in series. And each of those executions will call add-method, changing the generic function's methods, so any aggressive dispatch precompilation scheme will kick in at every defmethod. That would be bad enough, being O(N) in the number of methods, but it's actually worse than that: the amount of work involved in precompilation will tend to be O(N) in the number of methods, or in the number of concrete classes applicable to those methods, so there's O(N2) or worse behaviour from having precompilation active all the time. SBCL is routinely mocked for having slow-loading FASL files (originally, FASL probably stood for FASt Load; we don't advertise that much); expensive and poorly-scaling computations at load-time probably won't win us any extra friends. (Though having written that, I now wonder if recording changed generic functions using the dependent-update protocol, and hooking load or asdf:load-system to precompile after loading a whole file or system, might be a tasteful compromise).

Secondly, there's the detail of exactly how to pre-fill the cache for a given generic function (given that its specializers are mostly or entirely classes). The simplest strategy, of filling the cache with exactly those classes used as specializers, fails as soon as abstract or protocol classes are used to organize the implementation of the generic function. The next simplest, which is to fill the cache with all possible subclasses of specializers at all argument positions, well, just reading that should set alarm bells ringing – it might not be too problematic for generic functions with one argument, but the cross-product effect of multiple arguments will probably cause the implementation to collapse under its own weight into a virtual black hole. It might be that the best approach is to allow the user to specify an exact set of classes: and to allow the user to introspect a running system to find out which class combinations have actually been seen and hence cached in a given session.

In fact, SBCL itself depends on a particular, specific version of this. The generic function print-object is used for printing almost everything; even without user specialization, it is called whenever a structure-object or standard-object needs to be output to any stream. Users can write methods on it, though, and that then invalidates any previous precomputed dispatch information. But there are times when it's really important that print-object just work, without any kind of extra computation to go on: in particular, when the debugger needs to inform the user that the Lisp is running out of storage space (heap or stack), it's pretty important that that reporting can happen without using more heap or stack than necessary. So for print-object, we never reset the cache state to empty; it is always prefilled with entries for control-stack-exhausted, binding-stack-exhausted, alien-stack-exhausted, heap-exhausted-error and restart for its first argument. (This strategy only works if there are no specializations of the second argument to print-object anywhere in the system; there's no good reason for a user to try to specialize the second argument in any case, so we emit a warning if they do that and hope that they read and learn from it.)

extended specializer responses

I got a few responses to my call for use cases for generalized specializers; I'll try to summarize them in this post. Before I do, it's perhaps worth noting that the work on generalized specializers that Jim Newton and I did was written up at about the same time as Charlotte Herzeel, Jorge Vallejos, Theo D'Hondt and Pascal Costanza's work on filtered dispatch, and so our Custom Specializers in Object-Oriented Lisp paper doesn't refer to it. I'll talk more about filtered dispatch in a later post, when I tie all these threads together, but for those who aren't aware of it it's worth looking at, along with ContextL for dispatch that varies based on dynamic program state.

Meanwhile, some other ideas for extended specializers: James Anderson suggested duck typing, and I think working along similar lines Patrick Stein thought of specializers that could dispatch on the dimensions of array arguments (or length of sequence arguments). That feels to me as though there should be a fairly compelling use case, but the toy example of

(defmethod cross-product ((x (length 3)) (y (length 3)))
  ...)

doesn't generalize. I suppose there might be an example of selecting particular numerical algorithms based on overall matrix dimensions or properties, something like

(defmethod invert ((x (total-size> 1000000000)))
  ... invert big matrix ...)
(defmethod invert ((x (total-size<= 1000000000)))
  ... invert small matrix ...)

but it doesn't feel concrete yet.

On the other hand, just about everyone's first response to the question (Jan Moringen, and the crowded wisdom of #lisp IRC) was pattern specializers, usually through regular expressions or optima in particular; one example concrete use case given was to dispatch among handlers for urls. This is a very interesting case; I first considered this in the early days of extended specializers, and indeed mop-27.impure.lisp from the SBCL test suite captures one other use case, in a (toy) simplifier for arithmetic expressions, redolent of implementations of symbolic differentiation from a bygone age. At the time, I took that example no further, both because I didn't have a real use case, and also because I didn't fancy writing a full-strength pattern matching engine; now that optima exists, it's probably worth revisiting the example.

In particular, it's worth considering how to handle capturing of pattern variables. To continue with the toy simplifier, the existing dispatch and specializer implementation would allow one to write

(defmethod simplify1 ((p (pattern (+ x 0))))
  (cadr p))

but what would be really neat and convenient would be to be able to write

(defmethod simplify1 ((p (pattern (+ x 0))))
  x)

and in order to do that, we need to intercede in the generation of the actual method function in order to add bindings for pattern variables (which, helpfully, optima can tell us about).

The awkwardness in terms of the protocol – beyond the problems with metaprogramming at compile-time in the first place – is that make-method-lambda operates in ignorance of the specializers that will be applied to the method. In fact, this was the root cause of a recently-reported SBCL bug: SBCL itself needs to communicate the name and method lambda list from defmethod to make-method-lambda, and has no other way of doing that than special variables, which weren't being cleared to defend against nested calls to make-method-lambda through code-walking and macroexpansion.

But make-method-lambda is the function which generates the method body; it is exactly the function that we need to extend or override in order to do the desired automatic matching. So how can we expose this to the metaprogrammer? We can't change the signature of make-method-lambda; it might seem obvious to extend it by adding &optional arguments, but that changes the signature of the generic function and would break backwards compatibility, as methods must accept the same number of optional arguments as their generic function does; similarly, adding keyword arguments to the generic function doesn't work.

We could export and document the special variables that we ourselves use to propagate the information; in some ways that's cleanest, and has the virtue of at least having been tested in the wild. On the other hand, it feels unusual, at least in the context of the metaobject protocol; usually, everything of interest is an argument to a protocol function. So, we could define a new protocol function, and have make-method-lambda default to trampolining to it (say, make-method-lambda-with-specializer-specifiers; the basic idea, though not the horrible name, is due to Jan Moringen). But we'd need to be careful in doing that; there is some implementation-specific logic deep in PCL that performs extra optimizations to methods (the fast-method calling convention) if the system detects that only the standardized methods on make-method-lambda are applicable to a particular call; if we add any extra metaobject protocol functions in this area, we'll need to make sure that we correctly update any internal logic based on detecting the absence of metaprogrammer overrides or extensions...

more prototypes with multiple dispatch

I received a couple of points in feedback regarding protoypes with multiple dispatch. Firstly, from Pascal Costanza, the observation that there's already a prototype-based dispatch system present in the quasi-standard CL MOP; the key is to view all the prototype objects as singletons, and to use introspection to manage delegation and the class-prototype as the actual object. Thus (any mistakes are mine, not Pascal's):

(defun clone (o)
  (let* ((scs (sb-mop:class-direct-superclasses (class-of o)))
         (new (make-instance 'standard-class :direct-superclasses scs)))
    (sb-mop:finalize-inheritance new)
    (sb-mop:class-prototype new)))

(defun add-delegation (o d)
  (let ((scs (sb-mop:class-direct-superclasses (class-of o))))
    (reinitialize-instance (class-of o) 
                           :direct-superclasses (list* (class-of d) scs))))

I don't think that this works in quite the same way as a traditional prototype-based system, mind you: here, a cloned object inherits all the behaviours provided by delegation, but not the behaviours from the cloned object itself. It's pretty close, though, and might help those to whom all this is new (but who are intimately familiar with the CLOS MOP – hm, actually, the intersection of those two sets might be a little small) get a bit of a handle on this.

Secondly, Lee Salzman was kind enough to get in touch, and we had a conversation about his thesis and his reflections on that work. I'll paraphrase, so again any misrepresentation is mine, but regarding the question at the end of my previous post about the semantics of redefinition, Lee was quite clear that the expectation was that previously-cloned objects should retain their previous behaviour, even if the original object is later updated (in prototypes and multiple dispatch terms, even if the original method is redefined). In other words, the ??? is indeed expected by prototype-oriented programmers to be (FOO[1] /B/).

Lee also said that, in his subsequent experience, he has come to believe that class-based systems are a necessity – particularly in image-based systems (where there is a persistent world). He's happy with prototypes as a mechanism in a world with static programme descriptions, where you can just edit the source code, then compile and run the programme, but believes that directly modifying running programmes makes classes or some equivalent abstraction necessary. I'm not sure I completely understand this point (though I'm fairly convinced by now that I don't much like the semantics of method redefinition in a world with long-lived objects), but I'm still waiting for a good example to highlight the advantages of prototype-oriented programmes.

Meanwhile, what does this imply for the implementation of prototypes with multiple dispatch within the extended CLOS metaobject protocol? Defining a new method specialized on an must override any existing behaviour for that object, but must leave unchanged the previous behaviour for any clones; that therefore implies that the generic function cannot discard the old method, as it must remain applicable to any clones. The consequences of this include: two prototype specializers, specialized on the same object, must not be considered to be the same specializers (otherwise the standard semantics of method definition would replace the old method with the new); and that the specializers must be annotated with the method that they belong to (so that it is possible to compute for a given specializer and argument whether that argument is acceptable to the specializer, remembering that the acceptability of an object to a prototype specializer in this world is primarily a function of the object itself). One piece of good news: the CLOS MOP already provides a mechanism for associating specializers with methods, through the specializer-direct-methods generic function, so that's at least some of the battle.

All of this merits a fuller description, a reference implementation, and some example code using it. I'm working on the first two, aiming to top up my all-important publications list by presenting at some conferences (and also helping to justifying my attendance); I'd love to hear about ideas for the third...

prototypes with multiple dispatch

On New Year's Day (did I have nothing better to do?) I asked the wider world real life uses of non-standard method selection, and I hinted that I had an real-life example of my own. This post does not discuss that example.

Instead, I'm going to discuss my limited understanding of another non-standard (at least, non-CLOS-standard) method selection system, and demonstrate what its use looks like in my current development world. I'm talking about a prototype-based object system: specifically, a mostly-faithful reimplementation of the Prototypes with Multiple Dispatch [1,2] system found in Slate and described by Lee Salzman and Jonathan Aldrich. I'll try to introduce the semantics as I understand them to an audience used to class-based multiple dispatch, but I'm left with some questions that I don't know how to answer, not being used to programming in this way myself.

So, first, what's going on? Well, the proper answer might be to read the linked papers, which have a relatively formal specification of the semantics, but the high-level idea could perhaps be summarised as finding what happens when you try to support prototype-based object systems (where the normal way to instantiate an object is to copy another one; where objects implement some of their functionality by delegating to other objects; and where single-dispatch methods are stored in the object itself) and multiple dispatch systems (where methods do not have a privileged receiver but dispatch on all of their arguments) simultaneously. The authors found that they could implement some reasonable semantics, and perform dispatch reasonably efficiently, by storing some dispatch metainformation within the objects themselves. The example code, involving the interactions between fish, healthy sharks and dying sharks, can be translated into my extended-specializer CL as:

(defpvar /root/ (make-instance 'prototype-object :delegations nil))
(defpvar /animal/ (clone /root/))
(defpvar /fish/ (clone /root/))
(defpvar /shark/ (clone /root/))
(defpvar /healthy-shark/ (clone /root/))
(defpvar /dying-shark/ (clone /root/))
(add-delegation /fish/ /animal/)
(add-delegation /shark/ /animal/)
(add-delegation /shark/ /healthy-shark/)
(defgeneric encounter (x y)
  (:generic-function-class prototype-generic-function))
(defmethod encounter ((x /fish/) (y /healthy-shark/))
  (format t "~&~A swims away~%" x))
(defmethod encounter ((x /fish/) (y /animal/))
  x)
(defgeneric fight (x y)
  (:generic-function-class prototype-generic-function))
(defmethod fight ((x /healthy-shark/) (y /shark/))
  (remove-delegation x)
  (add-delegation x /dying-shark/)
  x)
(defmethod encounter ((x /healthy-shark/) (y /fish/))
  (format t "~&~A swallows ~A~%" x y))
(defmethod encounter ((x /healthy-shark/) (y /shark/))
  (format t "~&~A fights ~A~%" x y)
  (fight x y))

(compare figures 4 and 7 of [1]; defpvar is secretly just defvar with some extra debugging information so I don't go crazy trying to understand what a particular #<PROTOTYPE-OBJECT ...> actually is.)

Running some of the above code with

(encounter (clone /shark/) (clone /shark/))

prints

#<PROTOTYPE-OBJECT [/HEALTHY-SHARK/, /ANIMAL/] {10079A8713}> fights
#<PROTOTYPE-OBJECT [/HEALTHY-SHARK/, /ANIMAL/] {10079A8903}>

and returns

#<PROTOTYPE-OBJECT [/DYING-SHARK/, /ANIMAL/] {10079A8713}>

(and I'm confident that that's what is meant to happen, though I don't really understand why in this model sharks aren't fish).

The first question I have, then, is another lazyweb question: are there larger programs written in this style that demonstrate the advantages of prototypes with multiple dispatch (specifically over classes with multiple dispatch; i.e. over regular CLOS). I know of Sheeple, another lisp implementation of prototype dispatch, probably different in subtle or not-so-subtle ways from this one; what I'm after, though, probably doesn't exist: if there's no easy way of using prototype dispatch in Lisp, it won't be used to solve problems, and some other way will be used instead (let's call that the computer programmer's weak version of the Sapir-Whorf hypothesis). What's the canonical example of a problem where prototype-based object systems shine?

The second question I have is more technical, and more directly related to the expected semantics. In particular, I don't know what would be expected in the presence of method redefinition, or even if method redefinition is a concept that can make sense in this world. Consider

(defpvar /a/ (clone /root/))
(defgeneric foo (x)
  (:generic-function-class prototype-generic-function))
(defmethod foo ((x /a/)) `(foo[1] ,x))
(defpvar /b/ (clone /a/))
(foo /a/) ; => (FOO[1] /A/)
(foo /b/) ; => (FOO[1] /B/)
(defmethod foo ((x /a/)) `(foo[2] ,x))
(foo /a/) ; => (FOO[2] /A/)
(foo /b/) ; => ???

What should that last form return? Arguments from the prototype-oriented world would, I suspect, lead to the desired return value being (FOO[1] /B/), as the redefinition of the method on foo specialized to /a/ is completely irrelevant to /b/. This is my reading of the semantics described in [1], for what it's worth. Arguments from the world of generic functions and expected behaviour would probably argue for (FOO[2] /B/), because redefining a method is an action on a generic function, not an action on a set of application objects. And the argument of my implementation in practice (at present, subject to change) is to signal no-applicable-method, because method redefinition is the successive removal of the old method and addition of the new one, and removal of the old method of the generic function affects dispatch on all the objects, whereas adding the new one affects dispatch on just /a/.

slime has moved to github

This is probably not news to anyone reading this, but: SLIME has moved its primary source repository to github.

One of my projects, swankr (essentially, a SLIME for R, reusing most of the existing communication protocol and implementing a SWANK backend in R, hence the name), obviously depends on SLIME: if nothing else, the existing instructions for getting swankr up and running included a suggestion to get the SLIME sources from common-lisp.net CVS, which as Zach says is a problem that it's nice no longer to have. (Incidentally, Zach is running a surveydirect link – to assess possible effects of changes on the SLIME userbase.)

So, time to update the instructions, and in the process also

  • hoover up any obvious patches from my inbox;
  • improve the startup configuration so that the user need do no explicit configuration beyond a suitable entry in slime-lisp-implementations;
  • update the BUGS.org and TODO.org files for the current reality.

That's all safely checked in and pushed to the many various locations which people might think of as the primary swankr repository. But those org-mode files are exported to HTML to produce the minimal swankr web presence, and the script that does that was last run with org-mode version 7.x, while now, in the future, org-mode is up to version 8 (and this time the change in major version number is definitely indicative of non-backwards-compatible API changes).

This has already bitten me. The principles of reproducible research are good, and org-mode offers many of the features that help: easily-edited source format, integration with a polyglossia of programming languages, high-quality PDF output (and adequate-quality HTML output, from the same sources) – I've written some technical reports in this style, and been generally happy with it. But, with the changes from org-mode 7 to org-mode 8, in order to reproduce my existing documents I needed to make not only (minor) source document changes, but also (quite major) changes to the bits of elisp to generate exported documents in house style. Reproducible research is difficult; I suppose it's obvious that exact reproduction depends on exact software versions, platform configurations and so on – see this paper, for example – but how many authors of supposedly reproducible research are actually listing all the programs, up to exact versions of dynamically-linked libraries they link to, used in the preparation and treatment of the data and document?

In any case, the changes needed for swankr are pretty minor, though I was briefly confused when I tried org-html-export-as-html first (and no output was produced, because that's the new name for exporting to an HTML buffer rather than an HTML file. The swankr website should now reflect the current state of affairs.

the dangers of writing less

A year or so ago, I wrote a couple of blog entries on precompiling discriminating functions for generic functions, both given mostly class-based dispatch and also when most of the methods were specialized on individual objects (i.e. methods with specializers of class eql-specializer).

I signed off the second entry with

Next up, unless I've made further oversights which need correction: automating this process, and some incidental thoughts.

and of course never got back to this topic. Now it's over a year later, and I can no longer remember either my incidental thoughts, which I'm sure were fascinating, nor my strategy for automating this (because this was an attempt to address an actual user need, in – if I remember correctly – a CL implementation of protobuf). Suggestions for what I might have been thinking at the time gratefully received.

Write more: you know it makes sense.

more efficient hyperlinked blogging

How meta. To maintain my progress on my new year's resolution, I have written some code (*gasp!*) – yes, that counts as writing. And what does that code do? Why, it makes me more efficient at highly hyperlinked blogging: it is a certain amount of fairly trivial and mildly tedious elisp, which allows the easy insertion of markdown markup to create links to various authoritative sources of information about Lisp. Well, OK, to the Hyperspec and the MOP dictionary, but as an implementor that's all I really need, right? So, now I can talk about compute-effective-method or make-method-lambda and my eager readers can be taken to the relevant documentation at the speed of thought.

Questions that arose during the process:

  • why are all the fake packages in hyperspec.el created with (make-vector 67 0)?
  • has anyone in the 23 years since AMOP was published ever been glad that the MOP standardizes the extract-lambda-list and extract-specializer-names functions? (Fun fact: SBCL also has extract-parameters and extract-required-parameters functions, unexported and unused.)
bugs all the way down

There are times when being in control of the whole software stack is a mixed blessing.

While doing investigations related to my previous post, I found myself wondering what the arguments and return values of make-method-lambda were in practice, in SBCL. So I did what any self-respecting Lisp programmer would do, and instead of following that link and decoding the description, I simply ran (trace sb-mop:make-method-lambda), and then ran my defmethod as normal. I was half-expecting it to break instantly, because the implementation of trace encapsulates named functions in a way that changes the class of the function object (essentially, it wraps the existing function in a new anonymous function; fine for ordinary functions, not so good for generic-function objects), and I was half-right: an odd error occurred, but after trace printed the information I wanted.

What was the odd error? Well, after successfully calling and returning from make-method-lambda, I got a no-applicable-method error while trying to compute the applicable methods for... make-method-lambda. Wait, what?

SBCL's CLOS has various optimizations in it; some of them have been documented in the SBCL Internals Manual, such as the clever things done to make slot-value fast, and specialized discriminating functions. There are plenty more that are more opaque to the modern user, one of which is the “fast method call” optimization. In that optimization, the normal calling convention for methods within method combination, which involves calling the method's method-function with two arguments – a list of the arguments passed to the generic function, and a list of next methods – is bypassed, with the fast-method-function instead being supplied with a permutation vector (for fast slot access) and next method call (for fast call-next-method) as the first two arguments and the generic function's original arguments as the remainder, unrolled.

In order for this optimization to be valid, the call-method calling convention must be the standard one – if the user is extending or overriding the method invocation protocol, all the optimizations based on assuming that the method invocation protocol might be invalid. We have to be conservative, so we need to turn this optimization off if we can't prove that it's valid – and the only case where we can prove that it's valid is if only the system-provided method on make-method-lambda has been called. But we can't communicate that after the fact; although make-method-lambda returns initargs as well as the lambda, an extending method could arbitrarily mess with the lambda while returning the initargs the system-provided method returns. So in order to find out whether the optimization is safe, we have to check whether exactly our system-provided method on make-method-lambda was the applicable one, so there's an explicit call to compute-applicable-methods of make-method-lambda after the method object has been created. And make-method-lambda being traced and hence not a generic-function any more, it's normal that there's an error. Hooray! Now we understand what is going on.

As for how to fix it, well, how about adding an encapsulations slot to generic-function objects, and handling the encapsulations in sb-mop:compute-discriminating-function? The encapsulation implementation as it currently stands is fairly horrible, abusing as it does special variables and chains of closures; there's a fair chance that encapsulating generic functions in this way will turn out a bit less horrible. So, modify sb-debug::encapsulate, C-c C-c, and package locks strike. In theory we are meant to be able to unlock and continue; in practice, that seems to be true for some package locks but not others. Specifically, the package lock from setting the fdefinition from a non-approved package gives a continuable error, but the ones from compiling special declarations of locked symbols have already taken effect and converted themselves to run-time errors. Curses. So, (mapcar #'unlock-package (list-all-packages)) and try again; then, it all goes well until adding the slot to the generic-function class (and I note in passing that many of the attributes that CL specifies are generic-function SBCL only gives to standard-generic-function objects), at which point my SLIME repl tells me that something has gone wrong, but not what, because no generic function works any more, including print-object. (This happens depressingly often while working on CLOS).

That means it's time for an SBCL rebuild, which is fine because it gives me time to write up this blog entry up to this point. Great, that finishes, and now we go onwards: implementing the functionality we need in compute-discriminating-function is a bit horrible, but this is only a proof-of-concept so we wrap it all up in a labels and stop worrying about 80-column conventions. Then we hit C-c C-c and belatedly remember that redefining methods involves removing them from their generic function and adding them again, and doing that to compute-discriminating-function is likely to have bad consequences. Sure enough:

There is no applicable method for the generic function 
  #<STANDARD-GENERIC-FUNCTION COMPUTE-DISCRIMINATING-FUNCTION (1)>
when called with arguments
  (#<STANDARD-GENERIC-FUNCTION NO-APPLICABLE-METHOD (1)>).

Yes, well. One (shorter) rebuild of just CLOS later, and then a few more edit/build/test cycles, and we can trace generic functions without changing the identity of the fdefinition. Hooray! Wait, what was I intending to do with my evening?

seeking real life uses for generalized specializers

Some time ago (call it half a decade or so), Jim Newton of Cadence and I did some work on extensible specializers: essentially coming up with a proof-of-concept protocol to allow users to define their own specializers with their own applicability and ordering semantics. That's a little bit vague; the concrete example we used in the writeup was a code walker which could warn about the use of unbound variables (and the non-use of bindings), and which implemented its handling of special forms with code of the form:

(defmethod walk ((expr (cons (eql 'quote))) env call-stack)
  nil)
(defmethod walk ((var symbol) env call-stack)
  (let ((binding (find-binding env var)))
    (if binding
        (setf (used binding) t)
        (format t "~&unbound: ~A: ~A~%" var call-stack))))
(defmethod walk ((form (cons (eql 'lambda))) env call-stack)
  (destructuring-bind (lambda lambda-list &rest body) form
    (let* ((bindings (derive-bindings-from-ll lambda-list))
           (env* (make-env bindings env)))
      (dolist (form body)
        (walk form env* (cons form call-stack)))
      (dolist (binding bindings)
        (unless (used (cdr binding))
          (format t "~&unused: ~A: ~A~%" (car binding) call-stack))))))

The idea here is that it's possible to implement support in the walker for extra special forms in a modular way; while this doesn't matter very much in Common Lisp (which, famously, is not dead, just smells funny), in other languages which have made other tradeoffs in the volatility/extensibility space. And when I say “very much” I mean it: even SBCL allows extra special forms to be loaded at runtime; the sb-cltl2 module includes an implementation of compiler-let, which requires its own special handling in the codewalker which is used in the implementation of CLOS.

So modularity and extensibility is required in a code walker, even in Common Lisp implementations; in Cadence Skill++ it might even be generally useful (I don't know). In SBCL, the extensibility is provided using an explicit definer form; sb-cltl2 does

(defun walk-compiler-let (form context env)
  #1=#<implementation elided>)
(sb-walker::define-walker-template compiler-let walk-compiler-let)

and that's not substantially different from

(defmethod sb-walker:walk ((form (cons (eql 'compiler-let))) context env)
  #1#)

So far, so unexpected, for Lisp language extensions at least: of course the obvious test for a language extension is how many lines of code it can save when implementing another language extension. Where this might become interesting (and this, dear lazyweb, is where you come in) is if this kind of extension is relevant in application domains. Ideally, then, I'm looking for real-life examples of patterns of selecting ‘methods’ (they don't have to be expressed as Lisp methods, just distinct functions) based on attributes of objects, not just the objects' classes. The walker above satisfies these criteria: the objects under consideration are all of type symbol or cons, but the dispatch partly happens based on the car of the cons – but are there examples with less of the meta nature about them?

(I do have at least one example, which I will keep to myself for a little while: I will return to this in a future post, but for now I am interested in whether there's a variety of such things, and whether the generalization of specializer metaobjects is capable of handling cases I haven't thought of yet. Bonus points if the application requires multiple dispatch and/or non-standard method combination.)

some sbcl wiki entries

One of the potential advantages of this new wiki/blog setup is that I should be able to refer easily to my notes in my blog. It's true that the wiki-like content will change, whereas a mostly-static blog entry will naturally refer to the content of the wiki as it is at the point of blogging; in theory I should be able to find the content at the time that the link was made through a sufficiently advanced use of the version control system backing the ikiwiki installation, and in practice the readers of the blog (if any) are likely to follow (or not) any links within a short time-window of the initial publication of any given entry. So that shouldn't prove too confusing.

In order to test this, and to provide some slightly-relevant content for anyone still reading this on planet lisp: I've transferred my non-actionable notes from my sbcl org-mode file to an sbcl wiki section. This new setup gets heaps of bonus points if someone out there implements the development projects listed there as a result of reading this post, but even if not, it's a useful decluttering...