Yeah, technical sessions starting at 08:30 in the morning... even if the majority of the delegates are staying onsite, that’s a little optimistic. It certainly was in my case; recovering from Wednesday’s early start was somewhat necessary. So, yes, I missed Thursday’s first plenary session on incremental optimization of performance measures, in favour of chocolate-festooned muesli (not convinced that I got the best of the deal, though) and joined the conference sessions at Alfred Inselberg’s presentation of parallel coordinates for exploration and visualization. As far as I can tell from the presentation, the principal benefit of parallel coordinates is providing a two-dimensional visualisation allowing for fairly straightforward interactive exploration of a dataset; although one particular parallel coordinate system imposes an ordering on the data dimensions, it doesn’t take too many permutations of the ordering (about half the overall dimensionality, by my count) to be able to visualise correlations between all pairs of dimensions. I’m not sure that Alfred’s exhortation to trust our eyes for pattern-finding is such a good one; maybe to an audience of this sophistication it doesn’t matter, but the balance between ease of finding patterns and ease of finding spurious patterns isn’t obvious. That said, the demo of the parallAX software was mildly nifty; not quite my cup of tea, but fine.

I then went to the session on data analysis in social sciences, largely because of the title of the first talk: effects of parenthood on well-being. The summary of the results of an investigation by Evgenia Samoilova with Colin Vance, trying to decouple various effects, might be: women show a rise on average in well-being when their adult children leave home; men, by contrast, show a fall at the same point (unless they had a particularly low well-being score in the first place); also on average, the strongest positive effect of children on well-being is exhibited by those who were already reporting a high well-being before becoming parents. The results should be taken with the usual pinches of salt, of course... the other talks in that session covered interventions to increase takeup of physical activity; a questionnaire evaluation of pilates treatment for urinary incontinence; student life-style; and, as a bonus talk, an introduction to using Latent Dirichlet Allocation for topic modelling, with an extension to include markov modelling between words (i.e. somewhat removing the “bag-of-words” document model).

I didn’t get very much out of Andreas Geyer-Schulz’s presentation on decision-makers without preferences; whether his approach really explains why the independence of irrelevant alternatives axiom is typically violated by human behaviour, I don’t know. (Really, this is a strike against the ECDA requiring and distributing only abstracts; the topic of the talk was interesting enough that I’d like to read the conference paper... but I can’t, because it doesn’t exist.) I next went to the “big data” session, and for me the best of the talks was definitely a small data one: Eyke Hüllermeier presenting Ammar Shaker’s work (minus his data, because of a cycling accident) on the distinction between epistemic and aleatoric uncertainty, and distinguishing between them in uncertainty sampling. I think my inner Bayesian would have represented this distinction using beta distributions rather than plausibilities (epistemic uncertainty being Beta(0,0) and aleatoric uncertainty for a coin flip being Beta(50,100) or so) but it was a nice presentation. Claus Weihs gave a talk on the relationships between big data analytics and “classical” data science, with some observations about the curse of dimensionality, and Dieter Joenssen talked about the dangers of naïve missing data imputation in the big data context.

And then it was time for the social activity: a brief tour of Bremen old town, a pleasant boat ride back to Bremen-Vegesack (topics of discussion on the boat: academic careers and publication metrics; the future of academic publishing; wind, solar and coal power; and other cheerful stuff. Over dinner we talked about Hong Kong, image analysis in marketing and branding, and I also talked a bit about my work, going on the basis that there might be only few people in the session where I was due to present formally on Friday, and so I wouldn’t be spoiling things for anyway.

And so it eventually proved. The data analysis in musicology session was not the most popular; who knows what everyone was doing instead? (Not tweeting) but I got a nice laugh from superposing SIFT features on a picture of a cute kitten, and the surprise payoffs in the image/music similarity analysis (spoiler: parody motif reuse, movable type reuse, title pages) got a reaction. The questions, justifiably, were about scaling; I don’t know if the audience bought the idea that it was mostly OK to do this book-by-book (and so not pay too many penalties for super-linear algorithms); honestly, I don’t know if I buy that myself, though it’s an easy thing to try...

The other talks in the session covered a comparison of metadata and audio features for genre classification (from Igor Vatolkin), where the result was, as usual, that even dodgy metadata from playlist co-occurrence appears to beat anything that we can do with audio. Tillman Weyde presented the state of the Digital Music Lab project, at this point describing the aims and infrastructure design, rather than any results (they hope to have some results as well as infrastructure before the end of the project). The last talk was about characterising musical scales; apparently there are thousands of scales in the Scala (no relation to the programming language) scale dataset, few of which have any metadata beyond a name. They tried looking at the relationship between the scale notes (expressed as a bitvector of 1200 entries and the name, finding pretty much nothing; I suggested that they model the scale as a (multi)set of intervals between notes in the scale rather than a set of intervals between notes and the root; I don’t expect that there will be any improved association between name and content, but maybe the cluster profiles will be more rich in the end.

For the final semi-plenary (an odd choice, to divide the audience in two at the last) I went to listen to Joaquin Vanschoren talk about his not-quite-launched OpenML project, which I think is an attempt to be myexperiment for data analysts (but, why not use myexperiment?). Certainly, the aims are good – to get people out of their silos; to encourage code and data reuse, and experiment reproducibility; to increase academic credit given for collecting data and writing programmes – but will it catch on? Who knows. It didn’t seem totally compelling to me, and I think the audience had reservations for other reasons – there were worries expressed about loss of control, and about the Open Source requirement for software. I think the question at the end (how many users does it have? 45? And not all of those are actually active?) was perhaps the most problematic, and just reflects that bootstrapping is hard.

Then it was really all over (my full, telegraphic notes here), and after a sandwich lunch I went back to Bremen city to see multiple instantiations of the town musicians, and to hear the glockenspiel. Then I did my usual thing of walking to the airport (not a particularly exciting or revealing walk, this time, though I did go through quite a nice park), and it’ll soon be time to submit to the multiple indignities involved in getting home. Will I go to ECDA next year? Well, on the plus side it’s fairly local (in Colchester), so it could be more practical; on the minus side, it does seem that music informatics is a minority sport, bolted on because someone on the committee is interested in it as a sideline. On the plus side, I found quite a lot of the vanilla data analysis pretty interesting anyway, even if I’m not actively working in the direct area, so I think it’s on the “possibly” list.