#+TITLE: Generalizers: New Metaobjects for Generalized Dispatch
-#+AUTHOR: Christophe Rhodes
+#+AUTHOR: Christophe Rhodes, Jan Moringen, David Lichteblau
+#+OPTIONS: toc:nil
+#+LaTeX_HEADER: \usepackage[margin=1in]{geometry}
-#+BEGIN_ABSTRACT
-Foo
-#+END_ABSTRACT
+#+begin_abstract
+1. This paper introduces a new metaobject, the generalizer, which
+ complements the existing specializer metaobject.
+2. With the help of examples, we show that this metaobject allows for
+ the efficient implementation of complex non-class-based dispatch
+ within the framework of existing metaobject protocols
+3. We present the generalizer protocol, implemented within the SBCL
+ implementation of Common Lisp
+4. In combination with previous work, this produces a fully-functional
+ extension of the existing mechanism for method selection and
+ effective method computation, including support for standard and
+ user-defined method combination independent from method selection.
+#+end_abstract
+
+* Introduction
+ The revisions to the original Common Lisp language \cite{CLtL1}
+ included the detailed specification of an object system, known as
+ the Common Lisp Object System (CLOS), which was eventually
+ standardized as part of the ANSI Common Lisp standard \cite{CLtS}.
+ The object system as presented to the standardization committee was
+ formed of three parts, the first two of which covered XXX [what?]
+ and were incorporated into the final standard, and the third,
+ covering a Metaobject Protocol (MOP) for CLOS, was not.
+
+ Nevertheless, the CLOS MOP has proven to be a robust design, and
+ while many implementations have derived their implementations of
+ CLOS from either the Closette illustrative implementation in
+ \cite{AMOP}, or the Portable Common Loops implementation of CLOS
+ from Xerox Parc, there have been from-scratch reimplementations of
+ CLOS (in at least CLISP; check for others -- ABCL? Lisp500?!)
+ incorporating the majority of the Metaobject Protocol as described.
+
+ Although it has stood the test of time, the MOP is neither without
+ issues (e.g. M-M-L considered harmful; slot-definition initargs
+ issue) nor a complete framework for the metaprogrammer to implement
+ all conceivable variations of object-oriented behaviour; indeed,
+ while metaprogramming offers some possibilities for customization of
+ the object system behaviour, those possibilities cannot extend
+ arbitrarily in all directions. There is still an expectation that
+ functionality is implemented with methods on generic functions,
+ acting on objects with slots. [XXX find Paepke picture here? Not
+ Paepke; AMOP?]. XXX include typical examples of MOP: object
+ persistence; maybe ref. Kizcales "MOPs: why we want them and what
+ else they can do"? (Fig. 2 in that is good) ORMs; sparse slots.
+ jmoringe:
+ + introspection, e.g. documentation generation
+ + programmatic construction of classes and generic functions
+ e.g. for IDL compilers, model transformations
+
+ One area of functionality where there is scope for customization by
+ the metaprogrammer is in the mechanics and semantics of method
+ applicability and dispatch. While in principle AMOP allows
+ customization of dispatch in various different ways (the
+ metaprogrammer can define methods on protocol functions such as
+ =compute-applicable-methods=,
+ =compute-applicable-methods-using-classes=), for example, in
+ practice implementation support for this was weak until relatively
+ recently (ref. closer, also check how ContextL and filtered dispatch
+ are implemented).
+ jmoringe: filtered dispatch uses a custom method combination, i
+ think
+
+ Another potential mechanism for customizing dispatch is implicit in
+ the class structure defined by AMOP: standard specializer objects
+ (instances of =class= and =eql-specializer=) are generalized
+ instances of the =specializer= protocol class, and in principle
+ there are no restrictions on the metaprogrammer constructing
+ additional subclasses. Previous work [Newton/Rhodes] has explored
+ the potential for customizing generic function dispatch using
+ extended specializers, but as of that work the metaprogrammer must
+ override the entirety of the generic function invocation protocol
+ (from =compute-discriminating-function= on down), leading to toy
+ implementations and duplicated effort.
+
+ This paper introduces a protocol for efficient and controlled
+ handling of arbitrary subclasses of =specializer=. In particular,
+ it introduces the =generalizer= protocol class, which generalizes
+ (ahem) the return value of =class-of=, and allows the metaprogrammer
+ to hook into cacheing schemes to avoid needless recomputation of
+ effective methods for sufficiently similar generic function
+ arguments (See Figure\nbsp\ref{fig:dispatch}).
+
+ #+CAPTION: Dispatch Comparison
+ #+LABEL: fig:dispatch
+ #+ATTR_LATEX: width=0.9\linewidth float
+ [[file:figures/dispatch-comparison.pdf]]
+
+ The remaining sections in this paper can be read in any order. We
+ give some motivating examples in section XX, including
+ reimplementations of examples from previous work, as well as
+ examples which are poorly supported by previous protocols. We
+ describe the protocol itself in section YY, describing each protocol
+ function in detail and, where applicable, relating it to existing
+ protocol functions within the CLOS MOP. We survey related work in
+ more detail in section ZZ, touching on work on customized dispatch
+ schemes in other environments. Finally, we draw our conclusions
+ from this work, and indicate directions for further development, in
+ section WW; reading that section before the others indicates
+ substantial trust in the authors' work.
+* Examples
+ In this section, we present a number of examples of dispatch
+ implemented using our protocol, which we describe in section YY.
+ For reasons of space, the metaprogram code examples in this section
+ do not include some of the necessary support code to run; complete
+ implementations of each of these cases are included in an appendix /
+ in the accompanying repository snapshot / at this location.
+
+ A note on terminology: we will attempt to distinguish between the
+ user of an individual case of generalized dispatch (the
+ “programmer”), the implementor of a particular case of generalized
+ dispatch (the “metaprogrammer”), and the authors as the designers
+ and implementors of our generalized dispatch protocol (the
+ “metametaprogammer”, or more likely ”we”).
+
+ - [ ] =cons-specializer= (can be done using filtered dispatch)
+ - [ ] factorial (like filtered dispatch)
+ - [ ] HTTP Accept header
+ - [ ] xpattern
+ - [ ] prototype/multimethod
+** car-of-cons
+ We start by presenting our original use case, performing
+ dispatching on the first element of lists. Semantically, we allow
+ the programmer to specialize any argument of methods with a new
+ kind of specializer, =cons-specializer=, which is applicable if and
+ only if the corresponding object is a =cons= whose =car= is =eql=
+ to the symbol associated with the =cons-specializer=; these
+ specializers are more specific than the =cons= class, but less
+ specific than an =eql-specializer= on any given =cons=.
+
+ One motivation for the use of this specializer is in an extensible
+ code walker: a new special form can be handled simply by writing an
+ additional method on the walking generic function, seamlessly
+ interoperating with all existing methods.
+
+ The programmer code using these specializers is unchanged from
+ \cite{Newton.Rhodes.2008}; the benefits of the protocol described
+ here are centered on performance: in an application such as walking
+ source code, we would expect to encounter special forms
+ (distinguished by particular atoms in the =car= position) multiple
+ times, and hence to dispatch to the same effective method
+ repeatedly.
+#+begin_src lisp
+(defclass cons-specializer (specializer)
+ ((%car :reader %car :initarg :car)))
+(defclass cons-generalizer (generalizer)
+ ((%car :reader %car :initarg :car)))
+(defmethod generalizer-of-using-class ((gf cons-generic-function) arg)
+ (typecase arg
+ ((cons symbol) (make-instance 'cons-generalizer :car (car arg)))
+ (t (call-next-method))))
+(defmethod generalizer-equal-hash-key ((gf cons-generic-function)
+ (g cons-generalizer))
+ (%car g))
+(defmethod specializer-accepts-generalizer-p ((gf cons-generic-function)
+ (s cons-specializer)
+ (g cons-generalizer))
+ (if (eql (%car s) (%car g))
+ (values t t)
+ (values nil t)))
+(defmethod specializer-accepts-p ((s cons-specializer) o)
+ (and (consp o) (eql (car o) (%car s))))
+
+#| less interesting methods elided: jmoringe: (un)parsing, specializer<?, more? |#
+#+end_src
+#+begin_src
+(defgeneric walk (form env vars)
+ (:generic-function-class cons-generic-function))
+(defmethod walk ((expr (cons lambda)) env call-stack)
+ (let ((lambda-list (cadr expr))
+ (body (cddr expr)))
+ (with-checked-bindings ((bindings-from-ll lambda-list) env call-stack)
+ (dolist (form body)
+ (walk form env (cons form call-stack))))))
+(defmethod walk ((expr (cons let)) env call-stack)
+ (with-checked-bindings ((mapcar (lambda (x) (walk (cadr x) env (cons (cadr x) call-stack)) (cons (car x) (make-instance 'binding))) (cadr expr)) env call-stack)
+ (dolist (form (cddr expr))
+ (walk form env (cons form call-stack)))))
+#+end_src
+
+ | implementation | time (ms / 100k calls) | overhead |
+ |-----------------------+------------------------+----------|
+ | cons-gf/no-cache | 9000 | +2700% |
+ | cons-gf | 1500 | +370% |
+ | cons-gf/one-arg-cache | 740 | +130% |
+ | gf/methods | 360 | +14% |
+ | function | 317 | |
+
+ Note that in this example there is no strict need for
+ =cons-specializer= and =cons-generalizer= to be distinct classes –
+ just as in the normal protocol involving
+ =compute-applicable-methods= and
+ =compute-applicable-methods-using-classes=, the specializer object
+ for mediating dispatch contains the same information as the object
+ representing the equivalence class of objects to which that
+ specializer is applicable: here it is the =car= of the =cons=
+ object; in the standard dispatch it is the =class= of the object.
+ This feature also characterizes those use cases where the
+ metaprogrammer could straightforwardly use filtered dispatch
+ \cite{Costanza.etal:2008} to implement their dispatch semantics.
+ We will see in section XX.x an example of a case where filtered
+ dispatch is incapable of efficiently implementing the dispatch, but
+ first we present our implementation of the motivating case from
+ \cite{Costanza.etal:2008}.
+** signum
+ Our second example of the implementation and use of generalized
+ specializers is a reimplementation of one of the examples in
+ \cite{Costanza.etal:2008}: specifically, the factorial function.
+ Here, we will perform dispatch based on the =signum= of the
+ argument, and again, at most one method with a =signum= specializer
+ will be appliable to any given argument, which makes the structure
+ of the specializer implementation very similar to the =cons=
+ specializers in the previous section.
+
+ We have chosen to compare signum values using \texttt{=}, which
+ means that a method with specializer =(signum 1)= will be
+ applicable to positive floating-point arguments (see the first
+ method on =specializer-accepts-generalizer-p= and the method on
+ =specializer=accepts-p= below). This leads to one subtle
+ difference in behaviour compared to that of the =cons=
+ specializers: in the case of =signum= specializers, the /next/
+ method after any =signum= specializer can be different, depending
+ on the class of the argument. This aspect of the dispatch is
+ handled by the second method on =specializer-accepts-generalizer-p=
+ below.
+#+begin_src lisp
+(defclass signum-specializer (specializer)
+ ((%signum :reader %signum :initarg :signum)))
+(defclass signum-generalizer (generalizer)
+ ((%signum :reader %signum :initarg :signum)))
+(defmethod generalizer-of-using-class ((gf signum-generic-function) arg)
+ (typecase arg
+ (real (make-instance 'signum-generalizer :signum (signum arg)))
+ (t (call-next-method))))
+(defmethod generalizer-equal-hash-key ((gf signum-generic-function)
+ (g signum-specializer))
+ (%signum g)) ; this will create multiple entries for the same emf, but that's OK
+(defmethod specializer-accepts-generalizer-p ((gf signum-generic-function)
+ (s signum-specializer)
+ (g signum-generalizer))
+ (if (= (%signum s) (%signum g)) ; or EQL?
+ (values t t)
+ (values nil t)))
+
+;; this method is perhaps interesting enough to talk about?
+(defmethod specializer-accepts-generalizer-p ((gf signum-generic-function) (specializer sb-mop:specializer) (thing signum-specializer))
+ (specializer-accepts-generalizer-p gf specializer (class-of (%signum thing))))
+
+
+(defmethod specializer-accepts-p ((s signum-specializer) o)
+ (and (realp o) (= (%signum s) (signum o))))
+
+#| again elide more boring methods |#
+#+end_src
+
+ Given these definitions, and some more straightforward ones elided
+ for reasons of space, we can implement the factorial function as
+ follows:
+
+#+begin_src lisp
+(defgeneric fact (n)
+ (:generic-function-class signum-generic-function))
+(defmethod fact ((n (signum 0))) 1)
+(defmethod fact ((n (signum 1))) (* n (fact (1- n))))
+#+end_src
+
+ We do not need to include a method on =(signum -1)=, as the
+ standard =no-applicable-method= protocol will automatically apply to
+ negative real or non-real arguments.
+
+ Benchmarketing: we chose to benchmark 20! because that is the
+ largest factorial whose answer fits in SBCL's 63-bit fixnums, so as
+ to attempt to measure the maximum effect of dispatch (unobscured by
+ allocation / gc issues)
+
+#+begin_src lisp
+(progn (gc :full t) (time (dotimes (i 10000) (%fact 20))))
+#+end_src
+
+ | implementation | time (ms/10k calls) | overhead |
+ |-------------------------+---------------------+----------|
+ | signum-gf/no-cache | 2400 | +41000% |
+ | signum-gf | 230 | +3800% |
+ | signum-gf/one-arg-cache | 75 | +1100% |
+ | gf/fixnum | 12 | +100% |
+ | function | 6.0 | |
+
+ We could allow the metaprogrammer to improve on the one-argument
+ performance by constructing a specialized cache: for =signum=
+ arguments of =rational= arguments, the logical cache structure is
+ to index a three-element vector with =(1+ signum)=. The current
+ protocol does not provide a way of eliding the two generic function
+ calls for the generic cache; we discuss possible approaches in
+ section WW.
+** HTTP Accept header
+ In this section, we implement a non-trivial form of dispatch. The
+ application in question is a web server, and specifically to allow
+ the programmer to support RFC 2616 \cite{rfc2616} content
+ negotiation, of particular interest to publishers and consumers of
+ REST-style Web APIs.
+
+ The basic mechanism in content negotiation is as follows: the web
+ client sends an HTTP request with an =Accept:= header, which is a
+ string describing the media types it is willing to receive as a
+ response to the request, along with numerical preferences. The web
+ server compares these stated client preferences with the resources
+ it has available to satisfy this request, and sends the best
+ matching resource in its response.
+
+ In the case where there are static files on the filesystem, and the
+ web server must merely select between them, there is not much more
+ to say. However, it is not unusual for a web service to be backed
+ by some other form of data, and responses computed and sent on the
+ fly, and in these circumstances the web server must compute which
+ of its known output formats it can use to satisfy the request
+ before actually generating the best matching response.
+
+ The =accept-specializer= below implements the dispatch. It depends
+ on a lazily-computed =tree= slot to represent the information in
+ the accept header (generated by =parse-accept-string=), and a
+ function =q= to compute the (defaulted) preference level for a
+ given content-type and =tree=; then, method selection and ordering
+ involves finding the =q= for each =accept-specializer='s content
+ type given the =tree=, and sorting them according to the preference
+ level.
+
+#+begin_src lisp
+(defclass accept-specializer (extended-specializer)
+ ((media-type :initarg :media-type :reader media-type)))
+(defclass accept-generalizer ()
+ ((header :initarg :header :reader header)
+ (tree)
+ (next :initarg :next :reader next)))
+(defmethod generalizer-equal-hash-key
+ ((gf accept-generic-function) (g accept-generalizer))
+ `(accept-generalizer ,(header g)))
+(defmethod specializer-accepts-generalizer-p ((gf accept-generic-function) (s acc
+ept-specializer) (generalizer accept-generalizer))
+ (values (q (media-type s) (tree generalizer)) t))
+(defmethod specializer-accepts-generalizer-p ((gf accept-generic-function) (s sb-
+mop:specializer) (generalizer accept-generalizer))
+ (specializer-accepts-generalizer-p gf s (next generalizer)))
+
+(defmethod specializer< ((gf accept-generic-function) (s1 accept-specializer) (s2
+ accept-specializer) generalizer)
+ (cond
+ ((string= (media-type s1) (media-type s2)) '=)
+ (t (let ((q1 (q (media-type s1) (tree generalizer)))
+ (q2 (q (media-type s2) (tree generalizer))))
+ (cond
+ ((= q1 q2) '=)
+ ((< q1 q2) '>)
+ (t '<))))))
+#+end_src
+
+#+begin_src
+(defmethod generalizer-of-using-class ((gf accept-generic-function) (arg tbnl:request))
+ (make-instance 'accept-generalizer
+ :header (tbnl:header-in :accept arg)
+ :next (class-of arg)))
+(defmethod specializer-accepts-p ((specializer accept-specializer) (obj tbnl:requ
+est))
+ (q (media-type specializer) (parse-accept-string (tbnl:header-in :accept obj)))
+)
+#+end_src
+
+ This dispatch can't be done with filtered dispatch, except by
+ generating anonymous classes with all the right mime-types as
+ direct superclasses in dispatch order; the filter would generate
+#+begin_src lisp
+(ensure-class nil :direct-superclasses '(text/html image/webp ...))
+#+end_src
+ and dispatch the operates using those anonymous classes.
+
+ While this is possible to do, it is awkward to express content-type
+ negotiation in this way, as it means that the dispatcher must know
+ about the universe of mime-types that clients might declare that
+ they accept, rather than merely the set of mime-types that a
+ particular generic function is capable of serving; handling
+ wildcards in accept strings is particularly awkward in the
+ filtering paradigm.
+
+ Note that in this example, the method on =specializer<= involves a
+ nontrivial ordering of methods based on the =q= values specified in
+ the accept header (whereas in sections XX.1 and XX.2 only a single
+ extended specializer could be applicable to any given argument).
+
+ Also note that the accept specializer protocol is straightforwardly
+ extensible to other suitable objects; for example, one simple
+ debugging aid is to define that an =accept-specializer= should be
+ applicable to =string= objects. This can be done in a modular
+ fashion (see source example NN), and generalizes to dealing with
+ multiple web server libraries, so that content-negotiation methods
+ are applicable to each web server's request objects.
+
+#+begin_src lisp
+(defmethod generalizer-of-using-class ((gf accept-generic-function) (s string))
+ (make-instance 'accept-generalizer
+ :header s
+ :next (class-of s)))
+(defmethod specializer-accepts-p ((s accept-specializer) (string string))
+ (q (media-type s) (parse-accept-string string)))
+#+end_src
+ jmoringe: the name =accept-specializer=, while sensible, may
+ confusing in this context because "accept" occurs as part of the
+ protocol with a different semantic.
+** Pattern / xpattern / regex / optima
+ Here's the /really/ interesting bit, but on the other hand we're
+ probably going to run out of space, and the full description of
+ these is going to take us into =make-method-lambda= territory.
+ A second paper? Future work?
+* Protocol
+** Generalizer
+ - [ ] =generalizer-of-using-class= (NB class of gf not class of object)
+ - [ ] =compute-applicable-methods-using-generalizers=
+ - [ ] =generalizer-equal-hash-key=
+ - [ ] =specializer-accepts-generalizer-p=
+ - [ ] =specializer-accepts-p=
+ - [ ] =specializer<=
+ jmoringe: If I remember correctly, closette has
+ =method-more-specific-p= should we aim for parity with that and
+ use =specializer-more-specific-p=? The downside would be that
+ =-p= indicates a Boolean return value which is not the case here.
+** Full protocol
+ Description and specification left for reasons of space (we'll see?)
+ - [ ] =same-specializer-p=
+ - [ ] =parse/unparse-specializer-using-class=
+ - [ ] =make-method-specializers-form=
+ - [ ] jmoringe: In an email, I suggested
+ =make-specializer-form-using-class=:
+
+ #+begin_quote
+ Could we change =make-method-specializers-form='s default
+ behaviour to call a new generic function
+ #+begin_src
+ make-specializer-form-using-class gf method name env
+ #+end_src
+ with builtin methods on =sb-mop:specializer=, =symbol=, =cons= (for
+ eql-specializers)? This would make it unnecessary to repeat
+ boilerplate along the lines of
+ #+begin_src lisp
+ (flet ((make-parse-form (name)
+ (if <name-is-interesting>
+ <handle-interesting-specializer>
+ <repeat-handling-of-standard-specializers>)))
+ `(list ,@(mapcar #'make-parse-form specializer-names)))
+ #+end_src
+ for each generic function class.
+ #+end_quote
+ - [ ] =make-method-lambda= revision (use environment arg?)
+
+ jmoringe: would only be relevant for pattern dispatch, right? I
+ think, we didn't finish the discussion regarding special
+ variables vs. environment vs. new protocol function
+* Related Work
+ - [ ] Newton/Rhodes, obv
+ - [ ] filtered dispatch -- the point is that our work continues to
+ be useful in cases where there are unbounded numbers of
+ equivalence classes but each given invokation involves a small
+ number of methods.
+ - [ ] ContextL / context-oriented programming -- dispatch occurs on
+ hidden layer argument being an instance of an anonymous class with
+ suitably arranged superclasses -- OK because set of layers is
+ bounded and under programmer control
+ - [ ] http://soft.vub.ac.be/Publications/2010/vub-tr-soft-10-04.pdf
+ - [ ] http://soft.vub.ac.be/lambic/files/lambic-ilc09.pdf
+ - [ ] http://soft.vub.ac.be/Publications/2011/vub-soft-phd-11-03.pdf
+ - [ ] Prototypes with Multiple Dispatch
+ http://sauerbraten.org/lee/ecoop.pdf -- extension of Self-style
+ object system to handle multiple equally-privileged "receivers".
+ A good test case for our protocol; handled adequately with
+ generalizer being the tuple of (roles,delegations), with some
+ thought needed for method redefinitions but otherwise working
+ fine.
+ - [ ] Sheeple
+* Conclusions
+ - protocol for straightforward definition of custom dispatch
+ + interoperates seamlessly with rest of CLOS: method combination,
+ etc.
+ + tolerably efficient: two extra standard gf invokations and one
+ hash table lookup per call on the fast path (but more to be
+ done)
+ + expressive: handles foms of dispatch not handled elsewhere; all
+ the usual niceties of redefinition, modularity, introspection
+** Future Work
+ - compute-cache-handling-functions (and general speed issues)
+ - automatic pattern-specializer generalizer computation
+ - prototype-oriented progamming a la Slate.
+** Acknowledgments
+ We thank Lee Salzman, Pascal Costanza, Mikel Evins for their
+ helpful discussions