I went to the londonR meetup, at Balls Brothers near Fenchurch Street. Partly this was to get out and about at geeky meetups, and partly this was to do a little bit of background activity for the Goldsmiths Data Science MSc. My first observation is that, compared to my more usual Lisp or Emacs meetups, this was pretty well-populated: there were maybe 100 (one hundred!) people attending the 3 (three!) presentations. Maybe some of that was the free bar effect, but since it wasn’t advertised as such, maybe not. The event in general was organized by Mango Solutions, but the particular instantiation was sponsored by Tibco. So, thanks!
Perhaps poetic justice for my slightly impure motives for attending was being pushed, fairly gently, the EARL event in September, which in fact looks interesting: I feel like I should make an effort to attend, even given the fairly steep registration fee. I’m less interested in attending the R in insurance conference, because it’s a bit too specific (though also substantially less expensive...)
I didn’t make the ggplot workshop – too much to do during the day – so I arrived just after the doors opened for the main event. There was some time for chatting before the presentations started; I got to talk to some of the Mango Solutions representatives, including a project manager hoping to apply data analytics to his own job. Who knows how that will go. But soon enough it was time to sit down (to make sure that I had a seat, the latest newspaper scare story notwithstanding) and try to follow and take notes on the presentations.
The first presentation, by Chris Campbell of
Mango Solutions, talked about the
work they were doing on validating R. The point here was to be able
to satisfy regulatory requirements on software, in order to be able to
use R in industries with heavy regulation (pharmaceuticals being
perhaps the most obvious). The presentation didn’t go into detail –
30 minutes is not long, which is perhaps encouraging for my
forthcoming 20-minute presentation in Bremen –
but gave an overview of the basic problem, and flagged up some
packages scheduled to be released at EARL in September: functionMap
,
for visualizing call graphs; testCoverage
for code coverage reports
(like sb-cover for
SBCL fans); and visualTest
for testing for
image similarity, given that the same R code on different operating
systems produces not-quite-identical output. There wasn't really time
to go into the detail of the image similarity algorithm, but the slide
saying “Fourier Transform” made me worry a bit; it was picked up in
the question section, and Chris motivated their current approach as
being a sufficiently-sized hammer for their purposes – they’d
experimented with wavelets to localize differences, but found it was
unnecessary. Still, I’d wonder whether a
SIFT-based
approach would be potentially useful, and it’ll be interesting to see
whether the similarity threshold in arbitrary units actually works...
The second presentation was given by Ana Costa e Silva of
Tibco. She introduced the
Spotfire
suite of products, addressing all elements of the “tower of data”
(from KPIs to real-time analytics), and demonstrated one in
particular, the
“Tibco Enterprise Runtime for R”,
being a reimplementation of R with a different (might one say
“saner”?) runtime engine; they offer a “community edition”, but it’s
behind a sign-up form so I have not looked at it. The demonstrations
focussed on the ability of developers to build user interfaces so that
users (business decision-makers) could build their own models to
answer their own questions; this makes data scientists happy. The
question session, short but focussed, probed the real level of
compatibility between TERR and R, and the company representatives did
own up to the imperfections of package support: mostly residing in
Rcpp’s use of undocumented internals. I suppose that’s reasonable;
while R’s semantics do in theory allow for the redefinition of the
{
function (yes, really), I doubt any published package of
importance actually uses that.
The third presentation was probably the most entertaining; Simon
Hailstone from the Royal Free NHS Foundation talked about ggplot
and
map-making, using as the case study building interesting maps relating
the locations of hospital A&E departments, deprivation indexes, and
various categories of ambulance service incidents. For this audience,
the point was not the inferences that one could draw, but rather the
tips for making good-quality output with a reasonable amount of time
and effort, but Simon also covered a variety of data sources, how to
manage shape files at different scales, tips for better-quality
graphical output and geocoding.
And then there was the networking; I did manage to speak to a few people, offering up in advance the enthusiastic but untrained labour of project students; we'll see what that brings. I had an enjoyable time, and I’ll try to make the next event in November.