els-specializers.org

   1 #+TITLE: Generalizers: New Metaobjects for Generalized Dispatch
   2 #+AUTHOR: Christophe Rhodes, Jan Moringen, David Lichteblau
   3 #+OPTIONS: toc:nil
   4
   5 #+LaTeX_CLASS: acm_proc_article-sp
   6 #+LaTeX_HEADER: \DeclareTextFontCommand{\texttt}{\ttfamily\hyphenchar\font=45\relax}
   7
   8 #+begin_src elisp :exports none
   9 ;;; use C-x C-e on this if org refuses to export
  10 (add-to-list 'org-latex-classes
  11              '("acm_proc_article-sp" "\\documentclass{acm_proc_article-sp}"
  12                ("\\section{%s}" . "\\section*{%s}")
  13                ("\\subsection{%s}" . "\\subsection*{%s}")
  14                ("\\subsubsection{%s}" . "\\subsubsection*{%s}")
  15                ("\\paragraph{%s}" . "\\paragraph*{%s}")
  16                ("\\subparagraph{%s}" . "\\subparagraph*{%s}")))
  17 (set (make-local-variable 'org-latex-title-command)
  18      "\\numberofauthors{3}
  19 \\author{
  20 \\alignauthor Christophe Rhodes\\\\
  21   \\affaddr{Department of Computing}\\\\
  22   \\affaddr{Goldsmiths, University of London}\\\\
  23   \\affaddr{London SE14 6NW}\\\\
  24   \\email{c.rhodes@gold.ac.uk}
  25 \\alignauthor Jan Moringen\\\\
  26   \\affaddr{Universität Bielefeld}\\\\
  27   \\affaddr{Technische Fakultät}\\\\
  28   \\affaddr{33594 Bielefeld}\\\\
  29   \\email{jmoringe@techfak.uni-bielefeld.de}
  30 \\alignauthor David Lichteblau\\\\
  31   \\affaddr{ZenRobotics Ltd}\\\\
  32   \\affaddr{Vilhonkatu 5 A}\\\\
  33   \\affaddr{FI-00100 Helsinki}\\\\
  34   \\email{david@lichteblau.com}
  35 }
  36 \\maketitle")
  37 #+end_src
  38
  39 #+begin_abstract
  40 This paper introduces a new metaobject, the generalizer, which
  41 complements the existing specializer metaobject.  With the help of
  42 examples, we show that this metaobject allows for the efficient
  43 implementation of complex non-class-based dispatch within the
  44 framework of existing metaobject protocols.  We present our
  45 modifications to the generic function invocation protocol from the
  46 /Art of the Metaobject Protocol/; in combination with previous work,
  47 this produces a fully-functional extension of the existing mechanism
  48 for method selection and combination, including support for method
  49 combination completely independent from method selection.  We discuss
  50 our implementation, within the SBCL implementation of Common Lisp, and
  51 in that context compare the performance of the new protocol with the
  52 standard one, demonstrating that the new protocol can be tolerably
  53 efficient.
  54 #+end_abstract
  55
  56 #+begin_LaTeX
  57 \begin{flushleft}
  58 Report-No.:~\url{http://eprints.gold.ac.uk/id/eprint/9924}
  59 \end{flushleft}
  60 \category{D.1}{Software}{Programming Techniques}[Object-oriented Programming]
  61 \category{D.3.3}{Programming Languages}{Language Constructs and Features}
  62 \terms{Languages, Design}
  63 \keywords{generic functions, specialization-oriented programming, method selection, method combination}
  64 #+end_LaTeX
  65
  66 * Introduction
  67   The revisions to the original Common Lisp language \cite{CLtL}
  68   included the detailed specification of an object system, known as
  69   the Common Lisp Object System (CLOS), which was eventually
  70   standardized as part of the ANSI Common Lisp standard \cite{CLtS}.
  71   The object system as presented to the standardization committee was
  72   formed of three chapters.  The first two chapters covered programmer
  73   interface concepts and the functions in the programmer interface
  74   \cite[Chapter 28]{CLtL2} and were largely incorporated into the
  75   final standard; the third chapter, covering a Metaobject Protocol
  76   (MOP) for CLOS, was not.
  77
  78   Nevertheless, the CLOS MOP continued to be developed, and the
  79   version documented in \cite{AMOP} has proven to be a reasonably
  80   robust design.  While many implementations have derived their
  81   implementations of CLOS from either the Closette illustrative
  82   implementation in \cite{AMOP}, or the Portable Common Loops
  83   implementation of CLOS from Xerox Parc, there have been largely
  84   from-scratch reimplementations of CLOS (in CLISP[fn:1] and
  85   CCL[fn:2], at least) incorporating substantial fractions of the
  86   Metaobject Protocol as described.
  87
  88   #+CAPTION:    MOP Design Space
  89   #+LABEL:      fig:mopdesign
  90   #+ATTR_LATEX: width=\linewidth float
  91   [[file:figures/mop-design-space.pdf]]
  92
  93   Although it has stood the test of time, the CLOS MOP is neither
  94   without issues (e.g. semantic problems with =make-method-lambda=
  95   \cite{Costanza.Herzeel:2008}; useful functions such as
  96   =compute-effective-slot-definition-initargs= being missing from the
  97   standard) nor is it a complete framework for the metaprogrammer to
  98   implement all conceivable variations of object-oriented behaviour.
  99   While metaprogramming offers some possibilities for customization of
 100   the object system behaviour, those possibilities cannot extend
 101   arbitrarily in all directions (conceptually, if a given object
 102   system is a point in design space, then a MOP for that object system
 103   allows exploration of a region of design space around that point;
 104   see figure \ref{fig:mopdesign}).  In the case of the CLOS MOP, there is
 105   still an expectation that functionality is implemented with methods
 106   on generic functions, acting on objects with slots; it is not
 107   possible, for example, to transparently implement support for
 108   “message not understood” as in the message-passing paradigm, because
 109   the analogue of messages (generic functions) need to be defined
 110   before they are used.
 111
 112   Nevertheless, the MOP is flexible, and is used for a number of
 113   things, including: documentation generation (where introspection in
 114   the MOP is used to extract information from a running system[fn:3]);
 115   object-relational mapping[fn:4] and other approaches to object
 116   persistence \cite{Paepke:1988}; alternative backing stores for slots
 117   (hash-tables or symbols); and programmatic construction of
 118   metaobjects, for example for IDL compilers and model
 119   transformations.
 120
 121   One area of functionality where there is scope for customization by
 122   the metaprogrammer is in the mechanics and semantics of method
 123   applicability and dispatch.  While in principle AMOP allows
 124   customization of dispatch in various different ways (the
 125   metaprogrammer can define methods on protocol functions such as
 126   =compute-applicable-methods=,
 127   =compute-applicable-methods-using-classes=), for example, in
 128   practice implementation support for this was weak until relatively
 129   recently[fn:5].
 130
 131   Another potential mechanism for customizing dispatch is implicit in
 132   the class structure defined by AMOP: standard specializer objects
 133   (instances of =class= and =eql-specializer=) are generalized
 134   instances of the =specializer= protocol class, and in principle
 135   there are no restrictions on the metaprogrammer constructing
 136   additional subclasses.  Previous work \cite{Newton.Rhodes:2008} has
 137   explored the potential for customizing generic function dispatch
 138   using extended specializers, but there the metaprogrammer must
 139   override the entirety of the generic function invocation protocol
 140   (from =compute-discriminating-function= on down), leading to toy
 141   implementations and duplicated effort.
 142
 143   This paper introduces a protocol for efficient and controlled
 144   handling of new subclasses of =specializer=.  In particular, it
 145   introduces the =generalizer= protocol class, which generalizes the
 146   return value of =class-of= in method applicability computation, and
 147   allows the metaprogrammer to hook into cacheing schemes to avoid
 148   needless recomputation of effective methods for sufficiently similar
 149   generic function arguments (See Figure\nbsp\ref{fig:dispatch}).
 150
 151   #+CAPTION:    Dispatch Comparison
 152   #+LABEL:      fig:dispatch
 153   #+ATTR_LATEX: width=\linewidth float
 154   [[file:figures/dispatch-relationships.pdf]]
 155
 156   The remaining sections in this paper can be read in any order.  We
 157   give some motivating examples in section [[#Examples]], including
 158   reimplementations of examples from previous work, as well as
 159   examples which are poorly supported by previous protocols.  We
 160   describe the protocol itself in section [[#Protocol]], describing each
 161   protocol function in detail and, where applicable, relating it to
 162   existing protocol functions within the CLOS MOP.  We survey related
 163   work in more detail in section [[#Related Work]], touching on work on
 164   customized dispatch schemes in other environments.  Finally, we draw
 165   our conclusions from this work, and indicate directions for further
 166   development, in section [[#Conclusions]]; reading that section before the
 167   others indicates substantial trust in the authors' work.
 168 * Examples
 169   :PROPERTIES:
 170   :CUSTOM_ID: Examples
 171   :END:
 172   In this section, we present a number of examples of dispatch
 173   implemented using our protocol, which we describe in section
 174   [[#Protocol]].  For reasons of space, the metaprogram code examples in
 175   this section do not include some of the necessary support code to
 176   run; complete implementations of each of these cases, along with the
 177   integration of this protocol into the SBCL implementation
 178   \cite{Rhodes:2008} of Common Lisp, are included in the authors'
 179   repository[fn:6].
 180
 181   A note on terminology: we will attempt to distinguish between the
 182   user of an individual case of generalized dispatch (the
 183   “programmer”), the implementor of a particular case of generalized
 184   dispatch (the “metaprogrammer”), and the authors as the designers
 185   and implementors of our generalized dispatch protocol (the
 186   “metametaprogrammer”, or more likely “we”).
 187 ** CONS specializers
 188    :PROPERTIES:
 189    :CUSTOM_ID: Cons
 190    :END:
 191    One motivation for the use of generalized dispatch is in an
 192    extensible code walker: a new special form can be handled simply by
 193    writing an additional method on the walking generic function,
 194    seamlessly interoperating with all existing methods. In this
 195    use-case, dispatch is performed on the first element of lists.
 196    Semantically, we allow the programmer to specialize any argument of
 197    methods with a new kind of specializer, =cons-specializer=, which
 198    is applicable if and only if the corresponding object is a =cons=
 199    whose =car= is =eql= to the symbol associated with the
 200    =cons-specializer=; these specializers are more specific than the
 201    =cons= class, but less specific than an =eql-specializer= on any
 202    given =cons=.
 203
 204    The programmer code using these specializers is unchanged from
 205    \cite{Newton.Rhodes:2008}; the benefits of the protocol described
 206    here are: that the separation of concerns is complete – method
 207    selection is independent of method combination – and that the
 208    protocol allows for efficient implementation where possible, even
 209    when method selection is customized.  In an application such as
 210    walking source code, we would expect to encounter special forms
 211    (distinguished by particular atoms in the =car= position) multiple
 212    times, and hence to dispatch to the same effective method
 213    repeatedly.  We discuss the efficiency aspects of the protocol in
 214    more detail in section [[#Memoization]]; we present the metaprogrammer
 215    code to implement the =cons-specializer= below.
 216
 217 #+begin_src lisp
 218 (defclass cons-specializer (specializer)
 219   ((%car :reader %car :initarg :car)))
 220 (defclass cons-generalizer (generalizer)
 221   ((%car :reader %car :initarg :car)))
 222 (defmethod generalizer-of-using-class
 223     ((gf cons-generic-function) arg)
 224   (typecase arg
 225     ((cons symbol)
 226      (make-instance 'cons-generalizer
 227                     :car (car arg)))
 228     (t (call-next-method))))
 229 (defmethod generalizer-equal-hash-key
 230     ((gf cons-generic-function)
 231      (g cons-generalizer))
 232   (%car g))
 233 (defmethod specializer-accepts-generalizer-p
 234     ((gf cons-generic-function)
 235      (s cons-specializer)
 236      (g cons-generalizer))
 237   (if (eql (%car s) (%car g))
 238       (values t t)
 239       (values nil t)))
 240 (defmethod specializer-accepts-p
 241     ((s cons-specializer) o)
 242   (and (consp o) (eql (car o) (%car s))))
 243 #+end_src
 244
 245 The code above shows a minimal use of our protocol.  We have elided
 246 some support code for parsing and unparsing specializers, and for
 247 handling introspective functions such as finding generic functions for
 248 a given specializer.  We have also elided methods on the protocol
 249 functions =specializer<= and =same-specializer-p=; for
 250 =cons-specializer= objects, specializer ordering is trivial, as only
 251 one =cons-specializer= (up to equality) can ever be applicable to any
 252 given argument.  See section [[#Accept]] for a case where specializer
 253 ordering is non-trivial.
 254
 255 As in \cite{Newton.Rhodes:2008}, the programmer can use these
 256 specializers to implement a modular code walker, where they define one
 257 method per special operator.  We show two of those methods below, in
 258 the context of a walker which checks for unused bindings and uses of
 259 unbound variables.
 260
 261 #+begin_src
 262 (defgeneric walk (form env stack)
 263   (:generic-function-class cons-generic-function))
 264 (defmethod walk
 265     ((expr (cons lambda)) env call-stack)
 266   (let ((lambda-list (cadr expr))
 267         (body (cddr expr)))
 268     (with-checked-bindings
 269         ((bindings-from-ll lambda-list)
 270          env call-stack)
 271       (dolist (form body)
 272         (walk form env (cons form call-stack))))))
 273 (defmethod walk
 274     ((expr (cons let)) env call-stack)
 275   (flet ((let-binding (x)
 276            (walk (cadr x) env
 277                  (cons (cadr x) call-stack))
 278            (cons (car x)
 279                  (make-instance 'binding))))
 280     (with-checked-bindings
 281         ((mapcar #'let-binding (cadr expr))
 282           env call-stack)
 283       (dolist (form (cddr expr))
 284         (walk form env (cons form call-stack))))))
 285 #+end_src
 286
 287    Note that in this example there is no strict need for
 288    =cons-specializer= and =cons-generalizer= to be distinct classes.
 289    In standard generic function dispatch, the =class= functions both
 290    as the specializer for methods and as the generalizer for generic
 291    function arguments; we can think of the dispatch implemented by
 292    =cons-specializer= objects as providing for subclasses of the
 293    =cons= class distinguished by the =car= of the =cons=.  This
 294    analogy also characterizes those use cases where the metaprogrammer
 295    could straightforwardly use filtered dispatch
 296    \cite{Costanza.etal:2008} to implement their dispatch semantics.
 297    We will see in section [[#Accept]] an example of a case where filtered
 298    dispatch is incapable of straightforwardly expressing the dispatch,
 299    but first we present our implementation of the motivating case from
 300    \cite{Costanza.etal:2008}.
 301 ** SIGNUM specializers
 302    :PROPERTIES:
 303    :CUSTOM_ID: Signum
 304    :END:
 305    Our second example of the implementation and use of generalized
 306    specializers is a reimplementation of one of the examples in
 307    \cite{Costanza.etal:2008}: specifically, the factorial function.
 308    Here, dispatch will be performed based on the =signum= of the
 309    argument, and again, at most one method with a =signum= specializer
 310    will be applicable to any given argument, which makes the structure
 311    of the specializer implementation very similar to the =cons=
 312    specializers in the previous section.
 313
 314    The metaprogrammer has chosen in the example below to compare
 315    signum values using \texttt{=}, which means that a method with
 316    specializer =(signum 1)= will be applicable to positive
 317    floating-point arguments (see the first method on
 318    =specializer-accepts-generalizer-p= and the method on
 319    =specializer-accepts-p= below).  This leads to one subtle
 320    difference in behaviour compared to that of the =cons=
 321    specializers: in the case of =signum= specializers, the /next/
 322    method after any =signum= specializer can be different, depending
 323    on the class of the argument.  This aspect of the dispatch is
 324    handled by the second method on =specializer-accepts-generalizer-p=
 325    below.
 326
 327 #+begin_src lisp
 328 (defclass signum-specializer (specializer)
 329   ((%signum :reader %signum :initarg :signum)))
 330 (defclass signum-generalizer (generalizer)
 331   ((%signum :reader %signum :initarg :signum)))
 332 (defmethod generalizer-of-using-class
 333     ((gf signum-generic-function) (arg real))
 334   (make-instance 'signum-generalizer
 335                  :signum (signum arg)))
 336 (defmethod generalizer-equal-hash-key
 337     ((gf signum-generic-function)
 338      (g signum-generalizer))
 339   (%signum g))
 340 (defmethod specializer-accepts-generalizer-p
 341     ((gf signum-generic-function)
 342      (s signum-specializer)
 343      (g signum-generalizer))
 344   (if (= (%signum s) (%signum g))
 345       (values t t)
 346       (values nil t)))
 347
 348 (defmethod specializer-accepts-generalizer-p
 349     ((gf signum-generic-function)
 350      (s specializer)
 351      (g signum-generalizer))
 352   (specializer-accepts-generalizer-p
 353    gf s (class-of (%signum g))))
 354
 355 (defmethod specializer-accepts-p
 356     ((s signum-specializer) o)
 357   (and (realp o) (= (%signum s) (signum o))))
 358 #+end_src
 359
 360    Given these definitions, and once again some more straightforward
 361    ones elided for reasons of space, the programmer can implement the
 362    factorial function as follows:
 363
 364 #+begin_src lisp
 365 (defgeneric fact (n)
 366   (:generic-function-class signum-generic-function))
 367 (defmethod fact ((n (signum 0))) 1)
 368 (defmethod fact ((n (signum 1))) (* n (fact (1- n))))
 369 #+end_src
 370
 371    The programmer does not need to include a method on =(signum -1)=,
 372    as the standard =no-applicable-method= protocol will automatically
 373    apply to negative real or non-real arguments.
 374 ** Accept HTTP header specializers
 375    :PROPERTIES:
 376    :CUSTOM_ID: Accept
 377    :END:
 378    In this section, we implement a non-trivial form of dispatch.  The
 379    application in question is a web server, and specifically to allow
 380    the programmer to support RFC 2616 \cite{rfc2616} content
 381    negotiation, of particular interest to publishers and consumers of
 382    REST-style Web APIs.
 383
 384    The basic mechanism in content negotiation is as follows: the web
 385    client sends an HTTP request with an =Accept= header, which is a
 386    string describing the media types it is willing to receive as a
 387    response to the request, along with numerical preferences.  The web
 388    server compares these stated client preferences with the resources
 389    it has available to satisfy this request, and sends the best
 390    matching resource in its response.
 391
 392    For example, a graphical web browser might send an =Accept= header
 393    of =text/html,application/xml;q=0.9,*/*;q=0.8= for a request of a
 394    resource typed in to the URL bar.  This should be interpreted as
 395    meaning that: if the server can provide content of type =text/html=
 396    (i.e. HTML) for that resource, then it should do so.  Otherwise, if
 397    it can provide =application/xml= content (i.e. XML of any schema),
 398    then that should be provided; failing that, any other content type
 399    is acceptable.
 400
 401    In the case where there are static files on the filesystem, and the
 402    web server must merely select between them, there is not much more
 403    to say.  However, it is not unusual for a web service to be backed
 404    by some other form of data, and responses computed and sent on the
 405    fly, and in these circumstances the web server must compute which
 406    of its known output formats it can use to satisfy the request
 407    before actually generating the best matching response.  This can be
 408    modelled as one generic function responsible for generating the
 409    response, with methods corresponding to content-types -- and the
 410    generic function must then perform method selection against the
 411    request's =Accept= header to compute the appropriate response.
 412
 413    The =accept-specializer= below implements this dispatch.  It depends
 414    on a lazily-computed =tree= slot to represent the information in
 415    the accept header (generated by =parse-accept-string=), and a
 416    function =q= to compute the (defaulted) preference level for a
 417    given content-type and =tree=; then, method selection and ordering
 418    involves finding the =q= for each =accept-specializer='s content
 419    type given the =tree=, and sorting them according to the preference
 420    level.
 421
 422 #+begin_src lisp
 423 (defclass accept-specializer (specializer)
 424   ((media-type :initarg :media-type :reader media-type)))
 425 (defclass accept-generalizer (generalizer)
 426   ((header :initarg :header :reader header)
 427    (tree)
 428    (next :initarg :next :reader next)))
 429 (defmethod generalizer-equal-hash-key
 430     ((gf accept-generic-function)
 431      (g accept-generalizer))
 432   `(accept-generalizer ,(header g)))
 433 (defmethod specializer-accepts-generalizer-p
 434     ((gf accept-generic-function)
 435      (s accept-specializer)
 436      (g accept-generalizer))
 437   (values (q (media-type s) (tree g)) t))
 438 (defmethod specializer-accepts-generalizer-p
 439     ((gf accept-generic-function)
 440      (s specializer)
 441      (g accept-generalizer))
 442   (specializer-accepts-generalizer-p
 443    gf s (next g)))
 444
 445 (defmethod specializer<
 446     ((gf accept-generic-function)
 447      (s1 accept-specializer)
 448      (s2 accept-specializer)
 449      (g accept-generalizer))
 450   (let ((m1 (media-type s1))
 451         (m2 (media-type s2))
 452         (tree (tree g)))
 453     (cond
 454       ((string= m1 m2) '=)
 455       (t (let ((q1 (q m1 tree)))
 456                (q2 (q m2 tree))))
 457            (cond
 458              ((= q1 q2) '=)
 459              ((< q1 q2) '>)
 460              (t '<))))))
 461 #+end_src
 462
 463    The metaprogrammer can then add support for objects representing
 464    client requests, such as instances of the =request= class in the
 465    Hunchentoot[fn:7] web server, by translating these into
 466    =accept-generalizer= instances.  The code below implements this, by
 467    defining the computation of a =generalizer= object for a given
 468    request, and specifying how to compute whether the specializer
 469    accepts the given request object (=q= returns a number between 0
 470    and 1 if any pattern in the =tree= matches the media type, and
 471    =nil= if the media type cannot be matched at all).
 472
 473 #+begin_src
 474 (defmethod generalizer-of-using-class
 475     ((gf accept-generic-function)
 476      (arg tbnl:request))
 477   (make-instance 'accept-generalizer
 478                  :header (tbnl:header-in :accept arg)
 479                  :next (call-next-method)))
 480 (defmethod specializer-accepts-p
 481     ((s accept-specializer)
 482      (o tbnl:request))
 483   (let* ((accept (tbnl:header-in :accept o))
 484          (tree (parse-accept-string accept))
 485          (q (q (media-type s) tree)))
 486     (and q (> q 0))))
 487 #+end_src
 488
 489    This dispatch cannot be implemented using filtered dispatch, except
 490    by generating anonymous classes with all the right mime-types as
 491    direct superclasses in dispatch order; the filter would generate
 492 #+begin_src lisp
 493 (ensure-class nil :direct-superclasses
 494  '(text/html image/webp ...))
 495 #+end_src
 496    and dispatch would operate using those anonymous classes.  While
 497    this is possible to do, it is awkward to express content-type
 498    negotiation in this way, as it means that the dispatcher must know
 499    about the universe of mime-types that clients might declare that
 500    they accept, rather than merely the set of mime-types that a
 501    particular generic function is capable of serving; handling
 502    wildcards in accept strings is particularly awkward in the
 503    filtering paradigm.
 504
 505    Note that in this example, the method on =specializer<= involves a
 506    non-trivial ordering of methods based on the =q= values specified
 507    in the accept header (whereas in sections [[#Cons]] and [[#Signum]] only a
 508    single extended specializer could be applicable to any given
 509    argument).
 510
 511    Also note that the accept specializer protocol is straightforwardly
 512    extensible to other suitable objects; for example, one simple
 513    debugging aid is to define that an =accept-specializer= should be
 514    applicable to =string= objects.  This can be done in a modular
 515    fashion (see the code below, which can be completely disconnected
 516    from the code for Hunchentoot request objects), and generalizes to
 517    dealing with multiple web server libraries, so that
 518    content-negotiation methods are applicable to each web server's
 519    request objects.
 520
 521 #+begin_src lisp
 522 (defmethod generalizer-of-using-class
 523     ((gf accept-generic-function)
 524      (s string))
 525   (make-instance 'accept-generalizer
 526                  :header s
 527                  :next (call-next-method)))
 528 (defmethod specializer-accepts-p
 529     ((s accept-specializer) (o string))
 530   (let* ((tree (parse-accept-string o))
 531          (q (q (media-type s) tree)))
 532     (and q (> q 0))))
 533 #+end_src
 534
 535    The =next= slot in the =accept-generalizer= is used to deal with
 536    the case of methods specialized on the classes of objects as well
 537    as on the acceptable media types; there is a method on
 538    =specializer-accepts-generalizer-p= for specializers that are not
 539    of type =accept-specializer= which calls the generic function again
 540    with the next generalizer, so that methods specialized on the
 541    classes =tbnl:request= and =string= are treated as applicable to
 542    corresponding objects, though less specific than methods with
 543    =accept-specializer= specializations.
 544
 545 ** COMMENT Pattern / xpattern / regex / optima
 546    Here's the /really/ interesting bit, but on the other hand we're
 547    probably going to run out of space, and the full description of
 548    these is going to take us into =make-method-lambda= territory.
 549    A second paper?  Future work?
 550 * Protocol
 551   :PROPERTIES:
 552   :CUSTOM_ID: Protocol
 553   :END:
 554
 555   In section [[#Examples]], we have seen a number of code fragments as
 556   partial implementations of particular non-standard method dispatch
 557   strategies, using =generalizer= metaobjects to mediate between the
 558   methods of the generic function and the actual arguments passed to
 559   it.  In section [[#Generalizer metaobjects]], we go into more detail
 560   regarding these =generalizer= metaobjects, describing the generic
 561   function invocation protocol in full, and showing how this protocol
 562   allows a similar form of effective method cacheing as the standard
 563   one does.  In section [[#Generalizer performance]], we show the results
 564   of some simple performance measurements on our implementation of
 565   this protocol in the SBCL implementation \cite{Rhodes:2008} of
 566   Common Lisp to highlight the improvement that this protocol can
 567   bring over a naïve implementation of generalized dispatch, as well
 568   as to make the potential for further improvement clear.
 569
 570 ** Generalizer metaobjects
 571    :PROPERTIES:
 572    :CUSTOM_ID: Generalizer metaobjects
 573    :END:
 574
 575 *** Generic function invocation
 576     As in the standard generic function invocation protocol, the
 577     generic function's actual functionality is provided by a
 578     discriminating function.  The functionality described in this
 579     protocol is implemented by having a distinct subclass of
 580     =standard-generic-function=, and a method on
 581     =compute-discriminating-function= which produces a custom
 582     discriminating function.  The basic outline of the discriminating
 583     function is the same as the standard one: it must first compute the
 584     set of applicable methods given particular arguments; from that, it
 585     must compute the effective method by combining the methods
 586     appropriately according to the generic function's method
 587     combination; finally, it must call the effective method with the
 588     arguments.
 589
 590     Computing the set of applicable methods is done using a pair of
 591     functions: =compute-applicable-methods=, the standard metaobject
 592     function, and a new function
 593     =compute-applicable-methods-using-generalizers=.  We define a
 594     custom method on =compute-applicable-methods= which tests the
 595     applicability of a particular specializer against a given argument
 596     using =specializer-accepts-p=, a new protocol function with
 597     default implementations on =class= and =eql-specializer= to
 598     implement the expected behaviour.  To order the methods, as
 599     required by the protocol, we define a pairwise comparison operator
 600     =specializer<= which defines an ordering between specializers for
 601     a given generalizer argument (remembering that even in standard
 602     CLOS the ordering between =class= specializers can change
 603     depending on the actual class of the argument).
 604
 605     The new =compute-applicable-methods-using-generalizers= is the
 606     analogue of the MOP's =compute-applicable-methods-using-classes=.
 607     Instead of calling it with the =class-of= each argument, we
 608     compute the generalizers of each argument using the new function
 609     =generalizer-of-using-class= (where the =-using-class= refers to
 610     the class of the generic function rather than the class of the
 611     object), and call =compute-applicable-methods-using-generalizers=
 612     with the generic function and list of generalizers.  As with
 613     =compute-applicable-methods-using-classes=, a secondary return
 614     value indicates whether the result of the function is definitive
 615     for that list of generalizers.
 616
 617     Thus, in generic function invocation, we first compute the
 618     generalizers of the arguments; we compute the ordered set of
 619     applicable methods, either from the generalizers or (if that is
 620     not definitive) from the arguments themselves; then the normal
 621     effective method computation and call can occur.  Unfortunately,
 622     the nature of an effective method function is not specified, so we
 623     have to reach into implementation internals a little in order to
 624     call it, but otherwise the remainder of the generic function
 625     invocation protocol is unchanged from the standard one.  In
 626     particular, method combination is completely unchanged;
 627     programmers can choose arbitrary method combinations, including
 628     user-defined long form combinations, for their generic functions
 629     involving generalized dispatch.
 630
 631 *** Effective method memoization
 632     :PROPERTIES:
 633     :CUSTOM_ID: Memoization
 634     :END:
 635     The potential efficiency benefit to having =generalizer=
 636     metaobjects lies in the use of
 637     =compute-applicable-methods-using-generalizers=.  If a particular
 638     generalized specializer accepts a variety of objects (such as the
 639     =signum= specializer accepting all reals with a given sign, or the
 640     =accept= specializer accepting all HTTP requests with a particular
 641     =Accept= header), then there is the possibility of cacheing and
 642     reusing the results of the applicable and effective method
 643     computation.  If the computation of the applicable method from
 644     =compute-applicable-methods-using-generalizers= is definitive,
 645     then the ordered set of applicable methods and the effective
 646     method can be cached.
 647
 648     One issue is what to use as the key for that cache.  We cannot use
 649     the generalizers themselves, as two generalizers that should be
 650     considered equal for cache lookup will not compare as =equal= –
 651     and indeed even the standard generalizer, the =class=, cannot
 652     easily be used as we must be able to invalidate cache entries upon
 653     class redefinition.  The issue of =class= generalizers we can
 654     solve as in \cite{Kiczales.Rodriguez:1990} by using the =wrapper=
 655     of a class, which is distinct for each distinct (re)definition of
 656     a class; for arbitrary generalizers, however, there is /a priori/
 657     no good way of computing a suitable hash key automatically, so we
 658     allow the metaprogrammer to specify one by defining a method on
 659     =generalizer-equal-hash-key=, and combining the hash keys for all
 660     required arguments in a list to use as a key in an =equal=
 661     hash-table.
 662
 663 #+begin_comment
 664     [could we actually compute a suitable hash key using the
 665     generalizer's class name and initargs?]
 666 #+end_comment
 667
 668 *** COMMENT
 669     - [X] =generalizer-of-using-class= (NB class of gf not class of object)
 670     - [X] =compute-applicable-methods-using-generalizers=
 671     - [X] =generalizer-equal-hash-key=
 672     - [X] =specializer-accepts-generalizer-p=
 673     - [X] =specializer-accepts-p=
 674     - [X] =specializer<=
 675 ** Performance
 676    :PROPERTIES:
 677    :CUSTOM_ID: Generalizer performance
 678    :END:
 679    We have argued that the protocol presented here allows for
 680    expressive control of method dispatch while preserving the
 681    possibility of efficiency.  In this section, we quantify the
 682    efficiency that the memoization protocol described in section
 683    [[#Memoization]] achieves, by comparing it both to the same protocol
 684    with no memoization, as well as with equivalent dispatch
 685    implementations in the context of methods with regular specializers
 686    (in an implementation similar to that in
 687    \cite{Kiczales.Rodriguez:1990}), and with implementation in
 688    straightforward functions.  We performed our benchmarks on a
 689    quad-core X-series ThinkPad with 8GB of RAM running Debian
 690    GNU/Linux, and took the mean of the 10 central samples of 20 runs,
 691    with the number of iterations per run chosen so as to take
 692    substantially over the clock resolution for the fastest case.
 693    Despite these precautions, we advise against reading too much into
 694    these numbers, which are best used as an order-of-magnitude
 695    estimate.
 696
 697    In the case of the =cons-specializer=, we benchmark the walker
 698    acting on a small but non-trivial form.  The implementation
 699    strategies in the table below refer to: an implementation in a
 700    single function with a large =typecase= to dispatch between all the
 701    cases; the natural implementation in terms of a standard generic
 702    function with multiple methods (the method on =cons= having a
 703    slightly reduced =typecase= to dispatch on the first element, and
 704    other methods handling =symbol= and other atoms); and three
 705    separate cases using =cons-specializer= objects.  As well as
 706    measuring the effect of memoization against the full invocation
 707    protocol, we can also introduce a special case: when only one
 708    argument participates in method selection (all the other required
 709    arguments only being specialized on =t=), we can avoid the
 710    construction of a list of hash keys and simply use the key
 711    from the single active generalizer directly.
 712
 713    | implementation        | time (µs/call) | overhead |
 714    |-----------------------+----------------+----------|
 715    | function              |           3.17 |          |
 716    | standard-gf/methods   |            3.6 |     +14% |
 717    | cons-gf/one-arg-cache |            7.4 |    +130% |
 718    | cons-gf               |             15 |    +370% |
 719    | cons-gf/no-cache      |             90 |   +2700% |
 720
 721    The benchmarking results from this exercise are promising: in
 722    particular, the introduction of the effective method cache speeds
 723    up the use of generic specializers in this case by a factor of 6,
 724    and the one-argument special case by another factor of 2.  For this
 725    workload, even the one-argument special case only gets to within a
 726    factor of 2-3 of the function and standard generic function
 727    implementations, but the overall picture is that the memoizability
 728    in the protocol does indeed drastically reduce the overhead
 729    compared with the full invocation.
 730
 731    For the =signum-specializer= case, we choose to benchmark the
 732    computation of 20!, because that is the largest factorial whose
 733    answer fits in SBCL's 63-bit fixnums – in an attempt to measure the
 734    worst case for generic dispatch, where the work done within the
 735    methods is as small as possible without being meaningless, and in
 736    particular does not cause heap allocation or garbage collection to
 737    obscure the picture.
 738
 739 #+begin_src lisp :exports none
 740 (progn (gc :full t) (time (dotimes (i 10000) (%fact 20))))
 741 #+end_src
 742
 743    | implementation          | time (µs/call) | overhead |
 744    |-------------------------+----------------+----------|
 745    | function                |            0.6 |          |
 746    | standard-gf/fixnum      |            1.2 |    +100% |
 747    | signum-gf/one-arg-cache |            7.5 |   +1100% |
 748    | signum-gf               |             23 |   +3800% |
 749    | signum-gf/no-cache      |            240 |  +41000% |
 750
 751    The relative picture is similar to the =cons-specializer= case;
 752    including a cache saves a factor of 10 in this case, and another
 753    factor of 3 for the one-argument cache special case.  The cost of
 754    the genericity of the protocol here is starker; even the
 755    one-argument cache is a factor of 6 slower than the standard
 756    generic-function implementation, and a further factor of 2 away
 757    from the implementation of factorial as a function.  We discuss
 758    ways in which we expect to be able to improve performance in
 759    section [[#Future Work]].
 760
 761    We could allow the metaprogrammer to improve on the one-argument
 762    performance by constructing a specialized cache: for =signum=
 763    arguments of =rational= arguments, the logical cache structure is
 764    to index a three-element vector with =(1+ signum)=.  The current
 765    protocol does not provide a way of eliding the two generic function
 766    calls for the generic cache; we discuss possible approaches in
 767    section [[#Conclusions]].
 768 ** Full protocol
 769    The protocol described in this paper is only part of a complete
 770    protocol for =specializer= and =generalizer= metaobjects.  Our
 771    development of this protocol is as yet incomplete; the work
 772    described here augments that in \cite{Newton.Rhodes:2008}, but is
 773    yet relatively untested – and additionally our recent experience of
 774    working with that earlier protocol suggests that there might be
 775    useful additions to the handling of =specializer= metaobjects,
 776    independent of the =generalizer= idea presented here.
 777
 778 *** COMMENT ideas
 779     Description and specification left for reasons of space (we'll see?)
 780     - [ ] =same-specializer-p=
 781     - [ ] =parse/unparse-specializer-using-class=
 782     - [ ] =make-method-specializers-form=
 783     - [ ] jmoringe: In an email, I suggested
 784       =make-specializer-form-using-class=:
 785
 786       #+begin_quote
 787       Could we change =make-method-specializers-form='s default
 788       behaviour to call a new generic function
 789       #+begin_src
 790         make-specializer-form-using-class gf method name env
 791       #+end_src
 792       with builtin methods on =sb-mop:specializer=, =symbol=, =cons= (for
 793       eql-specializers)? This would make it unnecessary to repeat
 794       boilerplate along the lines of
 795       #+begin_src lisp
 796       (flet ((make-parse-form (name)
 797                (if <name-is-interesting>
 798                  <handle-interesting-specializer>
 799                  <repeat-handling-of-standard-specializers>)))
 800         `(list ,@(mapcar #'make-parse-form specializer-names)))
 801       #+end_src
 802       for each generic function class.
 803       #+end_quote
 804     - [ ] =make-method-lambda= revision (use environment arg?)
 805
 806       jmoringe: would only be relevant for pattern dispatch, right? I
 807       think, we didn't finish the discussion regarding special
 808       variables vs. environment vs. new protocol function
 809
 810 * Related Work
 811   :PROPERTIES:
 812   :CUSTOM_ID: Related Work
 813   :END:
 814
 815   The work presented here builds on specializer-oriented programming
 816   described in \cite{Newton.Rhodes:2008}.  Approximately
 817   contemporaneously, filtered dispatch \cite{Costanza.etal:2008} was
 818   introduced to address some of the same use cases: filtered dispatch
 819   works by having a custom discriminating function which wraps the
 820   usual one, where the wrapping function augments the set of
 821   applicable methods with applicable methods from other (hidden)
 822   generic functions, one per filter group; this step is not memoized,
 823   and using =eql= methods to capture behaviours of equivalence classes
 824   means that it is hard to see how it could be.  The methods are then
 825   combined using a custom method combination to mimic the standard
 826   one; in principle implementors of other method combinations could
 827   cater for filtered dispatch, but they would have to explicitly
 828   modify their method combinations.  The Clojure programming language
 829   supports multimethods[fn:8] with a variant of filtered dispatch as
 830   well as hierarchical and identity-based method selectors.
 831
 832   In context-oriented programming
 833   \cite{Hirschfeld.etal:2008,Vallejos.etal:2010}, context dispatch
 834   occurs by maintaining the context state as an anonymous class with
 835   the superclasses representing all the currently active layers; this
 836   is then passed as a hidden argument to context-aware functions.  The
 837   set of layers is known and under programmer control, as layers must
 838   be defined beforehand.
 839
 840   In some sense, all dispatch schemes are specializations of predicate
 841   dispatch \cite{Ernst.etal:1998}.  The main problem with predicate
 842   dispatch is its expressiveness: with arbitrary predicates able to
 843   control dispatch, it is essentially impossible to perform any
 844   substantial precomputation, or even to automatically determine an
 845   ordering of methods given a set of arguments.  Even Clojure's
 846   restricted dispatch scheme provides an explicit operator for stating
 847   a preference order among methods, where here we provide an operator
 848   to order specializers; in filtered dispatch the programmer
 849   implicitly gives the system an order of precedence, through the
 850   lexical ordering of filter specification in a filtered function
 851   definition.
 852
 853   The Slate programming environment combines prototype-oriented
 854   programming with multiple dispatch \cite{Salzman.Aldrich:2005}; in
 855   that context, the analogue of an argument's class (in Common Lisp)
 856   as a representation of the equivalence class of objects with the
 857   same behaviour is the tuple of roles and delegations: objects with
 858   the same roles and delegations tuple behave the same, much as
 859   objects with the same generalizer have the same behaviour in the
 860   protocol described in this paper.
 861
 862   The idea of generalization is of course not new, and arises in other
 863   contexts.  Perhaps of particular interest is generalization in the
 864   context of partial evaluation; for example, \cite{Ruf:1993}
 865   considers generalization in online partial evaluation, where sets of
 866   possible values are represented by a type system construct
 867   representing an upper bound.  Exploring the relationship between
 868   generalizer metaobjects and approximation in type systems might
 869   yield strategies for automatically computing suitable generalizers
 870   and cache functions for a variety of forms of generalized dispatch.
 871 * Conclusions
 872   :PROPERTIES:
 873   :CUSTOM_ID: Conclusions
 874   :END:
 875   In this paper, we have presented a new generalizer metaobject
 876   protocol allowing the metaprogrammer to implement in a
 877   straightforward manner metaobjects to implement custom method
 878   selection, rather than the standard method selection as standardized
 879   in Common Lisp.  This protocol seamlessly interoperates with the
 880   rest of CLOS and Common Lisp in general; the programmer (the user of
 881   the custom specializer metaobjects) may without constraints use
 882   arbitrary method combination, intercede in effective method
 883   combination, or write custom method function implementations.  The
 884   protocol is expressive, in that it handles forms of dispatch not
 885   possible in more restricted dispatch systems, while not suffering
 886   from the indeterminism present in predicate dispatch through the use
 887   of explicit ordering predicates.
 888
 889   The protocol is also reasonably efficient; the metaprogrammer can
 890   indicate that a particular effective method computation can be
 891   memoized, and under those circumstances much of the overhead is
 892   amortized (though there remains a substantial overhead compared with
 893   standard generic-function or regular function calls).  We discuss
 894   how the efficiency could be improved below.
 895 ** Future work
 896    :PROPERTIES:
 897    :CUSTOM_ID: Future Work
 898    :END:
 899    Although the protocol described in this paper allows for a more
 900    efficient implementation, as described in section [[#Memoization]],
 901    than computing the applicable and effective methods at each generic
 902    function call, the efficiency is still some way away from a
 903    baseline of the standard generic-function, let alone a standard
 904    function.  Most of the invocation protocol is memoized, but there
 905    are still two full standard generic-function calls –
 906    =generalizer-of-using-class= and =generalizer-equal-hash-key= – per
 907    argument per call to a generic function with extended specializers,
 908    not to mention a hash table lookup.
 909
 910    For many applications, the additional flexibility afforded by
 911    generalized specializers might be worth the cost in efficiency, but
 912    it would still be worth investigating how much the overhead from
 913    generalized specializers can be reduced; one possible avenue for
 914    investigation is giving greater control over the cacheing strategy
 915    to the metaprogrammer.
 916
 917    As an example, consider the =signum-specializer=.  The natural
 918    cache structure for a single argument generic function specializing
 919    on =signum= is probably a four-element vector, where the first
 920    three elements hold the effective methods for =signum= values of
 921    -1, 0, and 1, and the fourth holds the cached effective methods for
 922    everything else.  This would make the invocation of such functions
 923    very fast for the (presumed) common case where the argument is in
 924    fact a real number.  We hope to develop and show the effectiveness
 925    of an appropriate protocol to allow the metaprogrammer to construct
 926    and exploit such cacheing strategies, and (more speculatively) to
 927    implement the lookup of an effective method function in other ways.
 928
 929    We also aim to demonstrate support within this protocol for some
 930    particular cases of generalized specializers which seem to have
 931    widespread demand (in as much as any language extension can be said
 932    to be in “demand”).  In particular, we have preliminary work
 933    towards supporting efficient dispatch over pattern specializers
 934    such as implemented in the \textsf{Optima} library[fn:9], and over
 935    a prototype object system similar to that in Slate
 936    \cite{Salzman.Aldrich:2005}.  Our current source code for the work
 937    described in this paper can be seen in the git source code
 938    repository at [[http://christophe.rhodes.io/git/specializable.git]],
 939    which will be updated with future developments.
 940
 941    Finally, after further experimentation (and, ideally, non-trivial
 942    use in production) if this protocol stands up to use as we hope, we
 943    aim to produce a standards-quality document so that other
 944    implementors of Common Lisp can, if they choose, independently
 945    reimplement the protocol, and so that users can use the protocol
 946    with confidence that the semantics will not change in a
 947    backwards-incompatible fashion.
 948 ** Acknowledgments
 949    We thank Lee Salzman, Pascal Costanza and Mikel Evins for helpful
 950    and informative discussions, and all the respondents to the first
 951    author's request for imaginative uses for generalized specializers.
 952
 953 \bibliographystyle{plain}
 954 \bibliography{crhodes,specializers}
 955
 956 * Footnotes
 957
 958 [fn:1] GNU CLISP, at http://www.clisp.org/
 959
 960 [fn:2] Clozure Common Lisp, at http://ccl.clozure.com/
 961
 962 [fn:3] as in many of the systems surveyed at
 963 https://sites.google.com/site/sabraonthehill/lisp-document-generation-apps
 964
 965 [fn:4] e.g. CLSQL, at http://clsql.b9.com/
 966
 967 [fn:5] the \textsf{Closer to MOP} project, at
 968    http://common-lisp.net/project/closer/, attempts to harmonize the
 969    different implementations of the metaobject protocol in Common
 970    Lisp.
 971
 972 [fn:6] the tag =els2014-submission= in
 973 http://christophe.rhodes.io/git/specializable.git corresponds to the
 974 code repository at the point of submitting this paper.
 975
 976 [fn:7] Hunchentoot is a web server written in Common Lisp, allowing
 977 the user to write handler functions to compute responses to requests;
 978 http://weitz.de/hunchentoot/
 979
 980 [fn:8] http://clojure.org/multimethods
 981
 982 [fn:9] https://github.com/m2ym/optima
 983
 984