pages tagged crequillonnoteshttp://christophe.rhodes.io/notes/tag/crequillon/notesikiwiki2014-07-06T07:51:29Zecda days 2 and 3http://christophe.rhodes.io/notes/blog/posts/2014/ecda_days_2_and_3/2014-07-06T07:51:29Z2014-07-06T07:51:29Z
<p>Yeah, technical sessions starting at 08:30 in the morning... even if
the majority of the delegates are staying onsite, that’s a little
optimistic. It certainly was in my case; recovering from
<a href="http://christophe.rhodes.io/notes/blog/posts/2014/ecda_day_1/">Wednesday’s early start</a> was somewhat necessary. So,
yes, I missed Thursday’s first plenary session on incremental
optimization of performance measures, in favour of chocolate-festooned
muesli (not convinced that I got the best of the deal, though) and
joined the conference sessions at Alfred Inselberg’s presentation of
<a href="http://en.wikipedia.org/wiki/Parallel_coordinates">parallel coordinates</a>
for exploration and visualization. As far as I can tell from the
presentation, the principal benefit of parallel coordinates is
providing a two-dimensional visualisation allowing for fairly
straightforward interactive exploration of a dataset; although one
particular parallel coordinate system imposes an ordering on the data
dimensions, it doesn’t take too many permutations of the ordering
(about half the overall dimensionality, by my count) to be able to
visualise correlations between all pairs of dimensions. I’m not sure
that Alfred’s exhortation to trust our eyes for pattern-finding is
such a good one; maybe to an audience of this sophistication it
doesn’t matter, but the balance between ease of finding patterns and
ease of finding spurious patterns isn’t obvious. That said, the demo
of the parallAX software was mildly nifty; not quite my cup of tea,
but fine.</p>
<p>I then went to the session on data analysis in social sciences,
largely because of the title of the first talk: effects of parenthood
on well-being. The summary of the results of an investigation by
Evgenia Samoilova with Colin Vance, trying to decouple various effects,
might be: women show a rise on average in well-being when their adult
children leave home; men, by contrast, show a fall at the same point
(unless they had a particularly low well-being score in the first
place); also on average, the strongest positive effect of children on
well-being is exhibited by those who were already reporting a high
well-being before becoming parents. The results should be taken with
the usual pinches of salt, of course... the other talks in that
session covered interventions to increase takeup of physical activity;
a questionnaire evaluation of pilates treatment for urinary
incontinence; student life-style; and, as a bonus talk, an
introduction to using
<a href="http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation">Latent Dirichlet Allocation</a>
for topic modelling, with an extension to include markov modelling
between words (i.e. somewhat removing the “bag-of-words” document
model).</p>
<p>I didn’t get very much out of Andreas Geyer-Schulz’s presentation on
decision-makers without preferences; whether his approach really
explains why the
<a href="http://en.wikipedia.org/wiki/Independence_of_irrelevant_alternatives">independence of irrelevant alternatives</a>
axiom is typically violated by human behaviour, I don’t know.
(Really, this is a strike against the ECDA requiring and distributing
only abstracts; the topic of the talk was interesting enough that I’d
like to read the conference paper... but I can’t, because it doesn’t
exist.) I next went to the “big data” session, and for me the best of
the talks was definitely a small data one: Eyke Hüllermeier presenting
Ammar Shaker’s work (minus his data, because of a cycling accident) on
the distinction between epistemic and aleatoric uncertainty, and
distinguishing between them in uncertainty sampling. I think my inner
Bayesian would have represented this distinction using beta
distributions rather than plausibilities (epistemic uncertainty being
<em>Beta</em>(0,0) and aleatoric uncertainty for a coin flip being
<em>Beta</em>(50,100) or so) but it was a nice presentation. Claus Weihs
gave a talk on the relationships between big data analytics and
“classical” data science, with some observations about the curse of
dimensionality, and Dieter Joenssen talked about the dangers of
naïve missing data imputation in the big data context.</p>
<p>And then it was time for the social activity: a brief tour of Bremen
old town, a pleasant boat ride back to Bremen-Vegesack (topics of
discussion on the boat: academic careers and publication metrics; the
future of academic publishing; wind, solar and coal power; and other
cheerful stuff. Over dinner we talked about Hong Kong, image analysis
in marketing and branding, and I also talked a bit about my work,
going on the basis that there might be only few people in the session
where I was due to present formally on Friday, and so I wouldn’t be
spoiling things for anyway.</p>
<p>And so it eventually proved. The data analysis in musicology session
was not the most popular; who knows what everyone was doing instead?
(<a href="https://twitter.com/ascii19/status/484801739041038336">Not tweeting</a>)
but I got a nice laugh from superposing SIFT features on a picture of
a cute kitten, and the surprise payoffs in the
<a href="http://christophe.rhodes.io/notes/wiki/musical_image_deduplication/">image/music similarity analysis</a>
(spoiler: parody motif reuse, movable type reuse, title pages) got a
reaction. The questions, justifiably, were about scaling; I don’t
know if the audience bought the idea that it was mostly OK to do this
book-by-book (and so not pay too many penalties for super-linear
algorithms); honestly, I don’t know if I buy that myself, though it’s
an easy thing to try...</p>
<p>The other talks in the session covered a comparison of metadata and
audio features for genre classification (from Igor Vatolkin), where
the result was, as usual, that even dodgy metadata from playlist
co-occurrence appears to beat anything that we can do with audio.
Tillman Weyde presented the state of the Digital Music Lab project, at
this point describing the aims and infrastructure design, rather than
any results (they hope to have some results as well as infrastructure
before the end of the project). The last talk was about
characterising musical scales; apparently there are thousands of
scales in the <a href="http://www.huygens-fokker.org/scala/"><em>Scala</em></a> (no
relation to the <a href="http://www.scala-lang.org/">programming language</a>)
<a href="http://www.huygens-fokker.org/scala/downloads.html#scales">scale dataset</a>,
few of which have any metadata beyond a name. They tried looking at
the relationship between the scale notes (expressed as a bitvector of
<a href="http://en.wikipedia.org/wiki/Cent_%28music%29">1200 entries</a> and the
name, finding pretty much nothing; I suggested that they model the
scale as a (multi)set of intervals between notes in the scale rather
than a set of intervals between notes and the root; I don’t expect
that there will be any improved association between name and content,
but maybe the cluster profiles will be more rich in the end.</p>
<p>For the final semi-plenary (an odd choice, to divide the audience in
two at the last) I went to listen to Joaquin Vanschoren talk about his
not-quite-launched <a href="http://openml.org/">OpenML</a> project, which I think
is an attempt to be <a href="http://www.myexperiment.org/">myexperiment</a> for
data analysts (but, why not use myexperiment?). Certainly, the aims
are good – to get people out of their silos; to encourage code and
data reuse, and experiment reproducibility; to increase academic
credit given for collecting data and writing programmes – but will it
catch on? Who knows. It didn’t seem totally compelling to me, and I
think the audience had reservations for other reasons – there were
worries expressed about loss of control, and about the Open Source
requirement for software. I think the question at the end (how many
users does it have? 45? And not all of those are actually active?)
was perhaps the most problematic, and just reflects that bootstrapping
is <em>hard</em>.</p>
<p>Then it was really all over (my full, telegraphic notes
<a href="http://christophe.rhodes.io/notes/wiki/ecda_2014/">here</a>), and after a sandwich lunch I went back to
Bremen city to see multiple instantiations of the
<a href="http://www.pitt.edu/~dash/grimm027.html">town musicians</a>, and to hear
the <a href="http://en.wikipedia.org/wiki/Glockenspiel_House">glockenspiel</a>.
Then I did my usual thing of walking to the airport (not a
particularly exciting or revealing walk, this time, though I did go
through quite a nice park), and it’ll soon be time to submit to the
multiple indignities involved in getting home. Will I go to ECDA next
year? Well, on the plus side it’s fairly local (in Colchester), so it
could be more practical; on the minus side, it does seem that music
informatics is a minority sport, bolted on because someone on the
committee is interested in it as a sideline. On the plus side, I
found quite a lot of the vanilla data analysis pretty interesting
anyway, even if I’m not actively working in the direct area, so I
think it’s on the “possibly” list.</p>