Christophe Weblog Wiki Code Publications Music

This week, I went to the Effective Applications of the R Language conference. I’d been alerted to its existence from my visit to a londonR meeting in June. Again, I went for at least two reasons: one as an R enthusiast, though admittedly one (as usual) more interested in tooling than in applications; and one as postgraduate coordinator in Goldsmiths Computing, where for one of our modules in the new Data Science programme (starting todayyesterday! Hooray for “Welcome Week”!) involves exposing students to live data science briefs from academia and industry, with the aim of fostering a relevant and interesting final project.

A third reason? Ben Goldacre as invited speaker. A fantastic choice of keynote, even if he did call us ‘R dorks’ a lot and confess that he was a Stata user. The material, as one might expect, was derived from his books and related experience, but the delivery was excellent, and the drive clear to see. There were some lovely quotes that it’s tempting to include out of context; in the light of my as-yet unposed ‘research question’ for the final module of the PG Certificate in Higher Education – that I am still engaged on – it is tempting to bait the “infestation of qualitative researchers in the educational research establishment”, and attempt a Randomised Controlled Trial, or failing that a statistical analysis of assessments to try to uncover suitable hypotheses for future testing.

Ben’s unintended warm-up act – they were the other way around in the programme, but clearly travelling across London is complicated – was Hadley Wickham, of ggplot2 fame. His talk, about RStudio, his current view on the data analysis workflow, and new packages to support it, was a nice counterpoint: mostly tools, not applications, but clearly focussed to help make sense of complicated (and initially untidy) datasets. I liked the shiny-based in-browser living documents in R-markdown, which is not a technology that I’ve investigated yet; at this rate I will have more options for reproducible research than reports written. He, and others at the conference, were advocating a pipe-based code sequencing structure – the R implementation of this is called magrittr (ha, ha) and has properties that are made possible to user code through R’s nature of a Lisp-1 with crazy evaluation semantics, on which more in another post.

The rest of the event was made up of shorter, usually more domain-specific talks: around 20 minutes for each speaker. I think it suffered a little bit from many of the participants not being able to speak freely – a natural consequence of a mostly-industrial event, but frustrating. I think it was also probably a mistake to schedule in the first one of the regular (parallel) sessions of the event a reflective slot for three presentations about comparing R with other languages (Python, Julia, and a more positive one about R’s niche): there hadn’t really been time for a positive tone to be established, and it just felt like a bit of a downer. (Judging by the room, most of the delegates – perhaps wisely – had opted for the other track, on “Business Applications of R”).

Highlights of the shorter talks, for me:

  • YPlan’s John Sandall talking about “agile” data analytics, leading to agile business practices relentlessly focussed on one KPI. At the time, I wondered whether the focus on the first time a user gives them money would act against building something of lasting value – analytics give the power to make decisions, but the underlying strategy still has to be thought about. On the other hand, I’m oh-too familiar with the notion that startups must survive first and building something “awesome” is a side-effect, and focussing on money in is pretty sensible.
  • Richard Pugh’s (from Mango Solutions) talk about modelling and simulating the behaviour of a sales team did suffer from the confidentiality problem (“I can’t talk about the project this comes from, or the data”) but was at least entertaining: the behaviours he talked about (optimistic opportunity value, interactions of CRM closing dates with quarter boundaries) were highly plausible, and the question of whether he was applying the method to his own sales team quite pointed. (no)
  • the team from Simpson Carpenter Ltd, as well as saying that “London has almost as many market research agencies as pubs” (which rings true) had what I think is a fair insight: R is perhaps less of a black-box than certain commercial tools; there’s a certain retrocomputing feel to starting R, being at the prompt, and thinking “now what?” That implies that to actually do something with R, you need to know a bit more about what you’re doing. (That didn’t stop a few egregiously bad graphs being used in other presentations, including my personal favourite of a graph of workflow expressed as business value against time, with the inevitable backwards-arrows).
  • some other R-related tools to look into:

And then there was of course the hallwaybreak room track; Tower Hotel catered admirably for us, with free-flowing coffee, nibbles and lunch. I had some good conversations with a number of people, and am optimistic that students with the right attitude could both benefit and gain hugely from data science internships. I’m sure I was among the most tool-oriented of the attendees (most of the delegates were actually using R), but I did get to have a conversation with Hadley about “Advanced R”, and we discussed object systems, and conditions and restarts. More free-form notes about the event on my wiki.

Meanwhile, in related news, parts of the swank backend implementation of SLIME changed, mostly moving symbols to new packages. I've updated swankr to take account of the changes, and (I believe, untested) preserved compatibility with older (pre 2014-09-13) SLIMEs.