londonr june

← azed 2192 | blog | azed 2191 →

I went to the londonR meetup, at Balls Brothers near Fenchurch Street. Partly this was to get out and about at geeky meetups, and partly this was to do a little bit of background activity for the Goldsmiths Data Science MSc. My first observation is that, compared to my more usual Lisp or Emacs meetups, this was pretty well-populated: there were maybe 100 (one hundred!) people attending the 3 (three!) presentations. Maybe some of that was the free bar effect, but since it wasn’t advertised as such, maybe not. The event in general was organized by Mango Solutions, but the particular instantiation was sponsored by Tibco. So, thanks!

Perhaps poetic justice for my slightly impure motives for attending was being pushed, fairly gently, the EARL event in September, which in fact looks interesting: I feel like I should make an effort to attend, even given the fairly steep registration fee. I’m less interested in attending the R in insurance conference, because it’s a bit too specific (though also substantially less expensive...)

I didn’t make the ggplot workshop – too much to do during the day – so I arrived just after the doors opened for the main event. There was some time for chatting before the presentations started; I got to talk to some of the Mango Solutions representatives, including a project manager hoping to apply data analytics to his own job. Who knows how that will go. But soon enough it was time to sit down (to make sure that I had a seat, the latest newspaper scare story notwithstanding) and try to follow and take notes on the presentations.

The first presentation, by Chris Campbell of Mango Solutions, talked about the work they were doing on validating R. The point here was to be able to satisfy regulatory requirements on software, in order to be able to use R in industries with heavy regulation (pharmaceuticals being perhaps the most obvious). The presentation didn’t go into detail – 30 minutes is not long, which is perhaps encouraging for my forthcoming 20-minute presentation in Bremen – but gave an overview of the basic problem, and flagged up some packages scheduled to be released at EARL in September: functionMap, for visualizing call graphs; testCoverage for code coverage reports (like sb-cover for SBCL fans); and visualTest for testing for image similarity, given that the same R code on different operating systems produces not-quite-identical output. There wasn't really time to go into the detail of the image similarity algorithm, but the slide saying “Fourier Transform” made me worry a bit; it was picked up in the question section, and Chris motivated their current approach as being a sufficiently-sized hammer for their purposes – they’d experimented with wavelets to localize differences, but found it was unnecessary. Still, I’d wonder whether a SIFT-based approach would be potentially useful, and it’ll be interesting to see whether the similarity threshold in arbitrary units actually works...

The second presentation was given by Ana Costa e Silva of Tibco. She introduced the Spotfire suite of products, addressing all elements of the “tower of data” (from KPIs to real-time analytics), and demonstrated one in particular, the “Tibco Enterprise Runtime for R”, being a reimplementation of R with a different (might one say “saner”?) runtime engine; they offer a “community edition”, but it’s behind a sign-up form so I have not looked at it. The demonstrations focussed on the ability of developers to build user interfaces so that users (business decision-makers) could build their own models to answer their own questions; this makes data scientists happy. The question session, short but focussed, probed the real level of compatibility between TERR and R, and the company representatives did own up to the imperfections of package support: mostly residing in Rcpp’s use of undocumented internals. I suppose that’s reasonable; while R’s semantics do in theory allow for the redefinition of the { function (yes, really), I doubt any published package of importance actually uses that.

The third presentation was probably the most entertaining; Simon Hailstone from the Royal Free NHS Foundation talked about ggplot and map-making, using as the case study building interesting maps relating the locations of hospital A&E departments, deprivation indexes, and various categories of ambulance service incidents. For this audience, the point was not the inferences that one could draw, but rather the tips for making good-quality output with a reasonable amount of time and effort, but Simon also covered a variety of data sources, how to manage shape files at different scales, tips for better-quality graphical output and geocoding.

And then there was the networking; I did manage to speak to a few people, offering up in advance the enthusiastic but untrained labour of project students; we'll see what that brings. I had an enjoyable time, and I’ll try to make the next event in November.