Many people have their own ways of managing their digital music collection. This is my way.

I am a relatively late convert to medium-less music. I think I might have been a relatively early adopter of digital music – or at least I was actively making purchasing decisions at a time when CDs were generally available and not too much more expensive than cassette tapes. But I've written before about my reluctance to embrace the shiny; I've been resisting digital audio files for longer than mobile phones have been able to play music (not counting Annoying Things).

And yet, there are obvious and clear advantages to having a digital and easily portable collection of music – brought home to me when I was regularly commuting 90 minutes on trains in each direction for work, and my “mp3 player” was a 1TB external hard drive weighing upwards of 1kg. Over a few years I used grip, and then sound-juicer, to convert my physical music collection of CDs to audio files; at the moment, because of something (I don't know what) in the sound-juicer chain causing the generation of broken FLACs, I use gstreamer directly:

for track in $(seq 1 $(cd-discid | cut -d\  -f 2))
do
  gst-launch-0.10 cdparanoiasrc track=$track paranoia-mode=255 ! \
    flacenc ! filesink location=track$track.flac
done

One of the nice things that sound-juicer did automatically was to import curated metadata from MusicBrainz; gstreamer doesn't do that itself, so I have had to learn how to use the MusicBrainz tagger picard to add decent-quality metadata to the audio files automatically – at least when it's available at MusicBrainz; perhaps more commonly, given my long-tail predilections, I use picard to use the CD Table of Contents as an initial seed for providing that metadata for the community. (There's a reason why most of my CD recordings of opera are unshifted as yet: the metadata standards for opera on MusicBrainz are fiddly, tedious and error-prone).

Now I have losslessly-compressed CD-quality audio files with high-quality metadata, and all is well. But if there's one thing that working with computers teaches us, it's that if the data is not backed up, it's already lost: you just don't know it yet. Not only that, but it would be good to have access to the music collection wherever I am, ideally without having to carry a “portable” hard drive along with everything else. The good news is that there's a tool for this: git-annex. (I suspect it is not coincidental that I use multiple tools by the same author: git-annex is written by Joey Hess, who is also responsible for ikiwiki). I have git annex repositories on computers an external hard drives both at work and at home, and any additions – or modifications, for example from retagging – can be synchronised across the checkouts. As long as I remember to get the content as well as git-annex's symbolic links, distribution and offsite backup requirements are automatically satisfied, and git annex even stores old copies in case of human error, which is definitely liberating: I have a safety net, so I'm free to try to fly. (Maybe that's over the top for a digital audio tagging workflow...).

Once the audio files are updated everywhere and checked out, the only thing remaining for digital audio utopia is to ask the various media servers to reindex the content. On my GNOME-based desktop, I'm a little bit out of luck at the moment; the bright new hope for music playing is gnome-music, which uses tracker for its indexing, and tracker doesn't currently follow symlinks – and the workaround of checking out an annex in direct mode is like cutting holes in my safety net. For my home music system, I use Logitech Media Server, and there it's as simple as M-x squeeze RET resc TAB RET (and I'd like it to be simpler! Hacking welcome).

To summarize:

  • Workflow:
    1. format-shift from CD audio to FLAC using gstreamer
    2. retag using picard
    3. import into git-annex
    4. sync git-annex repositories
    5. perform git-annex get on
      • the external hard drive (the same one!) which acts as the music source for my Slim Squeezebox Logitech Media Server;
      • the checkout in ~/Music/ on my workstation in the office.
    6. cause various servers to rescan or reindex their Music databases.
  • Current bugs:

All this was brought to mind because I recently made my first purchase of recordings in digital audio file form: I was listening idly to Radio 3, and heard this after the end of the evening concert. My ears tuned in gradually, and then my brain started sending contradictory messages: “it's by Pergolesi” “it's not by Pergolesi” “it's by Stravinksy” “wait what?”. I missed the announcer's statement of what the piece was, and spent a frustrating evening trying to google things like “piano arrangement "se tu m'ami"” and totally failing to find it.

The next day, I belatedly realised why they had played what they did: the following evening's concert was Marc-André Hamelin playing Katchaturian, and again I was listening idly and not really paying attention. But I paid attention to the encore, announced as: “the Minute Waltz... played in seconds!” and again, I thought “wait, what?”. Have a listen....

And, convinced that I wanted to have easy access to more of Hamelin's recorded material, I was pleased to discover that he has recorded for hyperion – which company offers digital downloads in losslessly-encoded DRM-free CD-quality FLAC format. Hooray! I feel like I have joined the 21st century.