I'm on a train!

Specifically, I'm on a train, still writing my talk on Tools for Music Informatics (previously). It's to be delivered to a mixed audience, including technologists already embedded in the Music Informatics area, and musicologists who have no idea what it is but have heard that there might be some money attached if they can spin an idea the right way. This mixed audience has led to the talk having something of an identity crisis, anthropomorphically speaking: should it be focussed on tools that have been used to perform musicological tasks? Should it be focussed on workflows that are simple to relate to? In the end, constrained by the fact that my slot is 25 minutes long at most, I have gone for surveying tools that are commonly used in the Music Informatics world, and biasing my survey towards those which provide visuals that can be used in a talk.

It turns out that the constraint on having example pictures is quite a strong one. I appreciate that academic researchers are not necessarily oriented towards powerpoint-style marketing; for many, and generally I include myself in this category, the ideas and results stand on their own merits. But in this area of software as output, where there's the possibility of scaling one's impact, moving beyond the set of people one can talk to, to anyone who can discover and use the software, why is the user-facing documentation so poor.

Well, one reason is that creating good documentation is hard, and not necessarily a skill that goes hand in hand with academic research or software development, so in the lone wolf model of academia the chances of finding someone who can do all three are low. But the fact that there are several software packages in Music Informatics, all doing approximately the same thing and all lamentably documented, suggests that there might other reasons, and one of the benefits of being on a train and needing to procrastinate away from finishing the talk is that I get to discuss reasons for this with my captive colleagues.

So, here's a cynical view. There are two sets of competing incentives that, together, might explain why there are a fair number of underdocumented PhD-level software frameworks, and why there aren't substantial communities formed around the further development of such software. The first relates to the principal developer of the software, assuming that they are still in academia – if they're not, then there's probably no incentive to do anything more with the software at all. In the academic case, the career ladder and prestige indicators are constructed such that the credit for new things is vastly greater than the credit for improving already-existing things. In particular, once a permanent academic position has come along, a potentially demanding userbase is actively detrimental, taking time away from the real business of academics, which is to write funding applications to be able to hire researchers to do research and write papers on which you gain author credit. At that point, the software might still be a help to generate the research results, but the most valuable users are likely to be in close physical proximity with the lead developer, and so they can learn directly from the horse's mouth.

The second incentive applies to the principal source of uncommitted labour in the system: the graduate student. A graduate student, in theory, could be doing anything with their time; in practice, they are guided in their activities by a supervisor, who suggests ideas and projects and has as a goal the successful construction of a dissertation showing substantial work at PhD level. As with the previous case, the student gains more credit from new things than improving existing things, and the contribution of the student to independent work will be clearer for the supervisors and the doctoral examiners than the contribution to a collaboration, particularly a dynamic collaboration such as in software development and maintenance. So an otherwise uncommitted supervisor will likely recommend to a student interested in software development for music informatics to develop their own software rather than invest time and effort in one of the existing ones: the outcome for the student's dissertation is likely to be better, as an independent implementation can make up a chapter or more, while participating in a development community is most likely not to be usable in a dissertation at all.

After all the cynicism, what's the solution? Changing the academic prestige system is clearly a long-term prospect, and likely to run into game theory buffers; while it's nice to imagine a world where community contribution is as valued as independent “discovery”, I wouldn't hold my breath. But this, it seems to me, is where agitating for more reproducible research is important; not only does it lower the barrier to participating in the scientific process, it also provides built-in tutorial material in a real research context for any and all software that is used in the research itself. Initiatives such as conferences awarding prizes for reproducible research, or journals mandating that reproducibility materials be deposited alongside the manuscripts, has the potential to lead to an improvement in reuse of software as well as in proving the research results themselves.