Fixing some mistakes...

I was in discussion with some people on a Classical music forum recently. Topic of discussion: yet again, the issue of how you go about tagging your music collection so that it works efficiently and in a scalable manner to achieve good music discovery and access. Of course, I long ago decided I had the correct approach to that!

Anyway, the discussion did what it usually does: when push comes to shove, two of the people declaring my proposals unworkable turn out not to bother with tagging music at all (too much hard work!). So they simply rely on their operating system's search engine (which in Windows' case, keeps changing as new OS versions are released, but let's not worry about that just for now!), or in a physical storage hierarchy that they can traverse easily in Windows' file manager. For the record, if you've structured your music appropriately physically, it's trivially easy to use that physical structure to back-port into metadata tags in your music files, and thus most of the hard work of structuring it logically has already been done. But let's not worry about that just for now, either!

The other person insisting that my tagging guidelines wouldn't, couldn't possibly work turns out to running on Apple hardware and using what used to be called iTunes, but is now called Apple Music. He was adamant that storing a lot of information in the COMMENTS tag, as my tagging guidelines propose, can't possibly work on Apple, because iTunes/Apple Music cannot search the COMMENT tag. So I installed iTunes on Windows and proved him wrong: iTunes on Windows, at least, very definitely does search the COMMENT tag, but not by default and not as easily as most other music players do.

Anyway: during the discussion, I thought I should try to firm up precisely why I propose you should tag your classical music in a composer/genre/extended-album-name manner. As a former database administator, it's obvious to me that those things constitute the "primary key" of recorded classical music -but I get why it might not be so obvious to people who don't even really know what a "primary key" is. So I wrote up a new article addressing precisely the question of what are the minimum, unique natural identifiers that distinguish this recording from that one: the result is a punningly-entitled article, Primary Keys to Music. I think it explains the principles behind data retrieval pretty well in this particular context, and I recommend you have a read.

Funny thing about writing up things from the point of view of theoretical principle, however: you suddenly realise you've been breaking your own rules all the time! Specifically, the new article makes it clear that the year a recording was made should be included in the extended album name. That is, "Symphony No. 5" is not unique; but neither is "Symphony No. 5 (Karajan)", because Karajan recorded Beethoven's symphony multiple times in the course of his career. Therefore, the only truly unique album name would be "Symphony No. 5 (Karajan - 1966). I can't think of any examples of the same composer recording the same work in the same year more than once, so I don't think a concluding number is really needed -but, if you thought it was anyway, then "Symphony No. 5 (Karajan - 1966 #1)" would probably do. I also remarked in passing when writing that new article that whilst you might not need the recording date in your album names, because you've only got one copy of Beethoven's 5th Symphony recorded by Karajan, you ought to include the recording date anyway, because you never know when you might just acquire a different recording by Karajan in the future: if it's ever possible, even theoretically, for the conductor's name to be insufficient to distinguish between recordings, the recording date should be considered compulsory.

I found out the hard way about this principle years and years ago: I never used to include even the conductor's name in my ALBUM tags, because... well, as an impecunious youngster just starting out in collecting classical music, I didn't have duplicates of works. I didn't need to say "Tosca - Callas", because I only had one recording of Tosca! And then I went and bought Sutherland's recording of it, and lo! I now had two "Toscas" and needed to be able to tell the two recordings apart. So, I long ago made inclusion of the 'distinguishing artist' in the ALBUM tag mandatory: even if you don't functionally need it now, you will one day!

Well, I had forgotten this lesson! I have always allowed the year of recording to go into the ALBUM tag when necessary. It's right there in my original 'Axioms of Classical Tagging' article, Section 2.4: "Exceptionally, therefore, when the distinguishing artist name is not distinguishing enough, add the year of the recording". The trouble with that sentence is its first word: since we're talking about the data items which make up the primary key of recorded music, you must not make any one of them usable only 'exceptionally'. Primary key data is mandatory, always! Otherwise, it's not a primary key 🙂

Which is, of course, all highly theoretical stuff and you could probably just ignore it and get away with it 99 times out of 100! Only, kismet being what it is, I happened shortly after writing that new article about Primary Keys in Music to go and buy a new recording of Aaron Copland's Symphony No. 3. It's a fine piece, and I wasn't familiar with it, and I wanted to add it to my collection -especially as Aaron Copland himself was conducting on this particular recording. So I bought it, ripped it, catalogued it, and went to add it into my existing music library when... 'File to be be copied already exists' said my file manager. Turns out, I already had a copy of Copland's 3rd Symphony... and he was already conducting that version, too! Talk about chickens coming home to roost: the timing couldn't have been more apposite! There I was noticing that, theoretically, recording date ought to be included in the ALBUM tag; here I was just a week or so later finding out the hard way that if it isn't included, because you don't currently have a clash on composition name + conductor, you will encounter problems when at some point in the future you buy a new recording and suddenly find out that you do now!

The fix, of course, was to name my new recording "Symphony No. 3 (Copland - 1958)", so that the recording year was present and thus distinguished the new purchase from "Symphony No. 3 (Copland - 1976)", which was my earlier version of the work.

Anyway, I re-learnt the lesson: if a piece of data is required for uniqueness theoretically, then it must be present, even if your situation today means that, practically, it needn't be. Because your situation today won't be the situation you are in tomorrow or the next day!

All of which is a long-winded way of saying that I realise I've been cheating on my tagging all this time. Whilst I've added recording years to the YEAR tag, I haven't been adding them into the ALBUM tag, because I was obeying the "do it only exceptionally" principle, which is simply wrong.

To fix up my large music collection would take some time, since there are over 64,000 files involved representing about 7 months' of continuously-playing music! But fixing up it needed to be. The fix needed to come in three parts:

  1. Go through my collection and make sure every recording had a YEAR tag
  2. Go through my collection and see whether, if a year was included in the ALBUM tag, it matched the year stored in the YEAR tag
  3. Go through every recording and, if a year is not present in the ALBUM tag, put the value from the YEAR tag into the ALBUM tag and rename the folder to match, too

Most of my recordings do have YEAR tags -but quite a few that I ripped way back in the early 2000s didn't, because I personally am not terribly interested in when a recording was made and so didn't think I needed to record it! So some of those early rip howlers needed to be rectified.

Once every rip has a YEAR, I need to check that it matches the year found in the ALBUM tag, if indeed the ALBUM tag already mentions a year. For example, my fat fingers have been known to rip "Symphony No. 3 (Copland - 1958)" and then to tag the YEAR with "19958". So, I need to find out where "1958" doesn't match "19958" and manually correct where necessary.

Once I know everything has a correct YEAR tag, then I can go ahead and alter the ALBUM tag to include the YEAR value, if (as is true for most of my ALBUM tags), they don't include a recording YEAR at all. And, since I'm now changing the ALBUM tag for lots of files, and my physical folder structure depends on the ALBUM tag for the name of one of the directories involved for every recording... if I've updated the ALBUM tag to include a YEAR value, I also need to rename that folder to include the same YEAR value, so that folder name continues to match ALBUM tag.

Naturally, given the size of my music collection, every one of those three steps needs to be automated, via shell script. My next three posts here will show you how I went about completing each of these three tasks using individual shell scripts.

Meantime, have a read of the Primary Keys to Music article and see if you agree with my deductions about how and what we use to best distinguish one recording of a composition from another. I also want to thank the other participants in the forum on how to tag up music files, because if they hadn't challenged me, I wouldn't have re-thought things from first principles and I would therefore have breezed over the Aaron Copland symphony clash with a merely ad hoc correction, rather than a full-on, thorough re-cataloguing of my entire collection to always include a YEAR. It's a heck of a thing to have to go back and correct -but it's a good thing to know you need to correct it in the first place!