So, this is the next in a mini-series of posts, explaining how I went about fixing up the discovery that I’d tagged my music files incorrectly after all these years, despite knowing better!
The short version is that I always knew the recording date was an important factor in distinguishing between recordings of the same work by the same artist, but since I didn’t often have duplicates, I assumed I’d get away without including it in the ALBUM tag for a composition. And then I realised that though I might well get away with it today, a new acquisition here or there could well mean that I wouldn’t get away with it for ever: if the information is theoretically necessary to distinguish recordings, then it ought to be present, always.
Thus, I needed to go back to my music library and make sure that the recording date was present in the YEAR tag, which was designed for it. How I did that was the subject of my last post.
But a recording date stored in the YEAR tag isn’t actually functionally useful for distinguishing between two recordings of the same work. That’s because most music players will order and group recordings by what they call “Artist” and “Album Name” -which are the ALBUM and ARTIST tags in a FLAC file. If the recording date is stored in the YEAR tag, that’s fine, but it won’t usually be used from there to sort and group music properly. In other words, having made sure that I actually had a YEAR tag for every recording, my next task was to bring that YEAR data up into the ALBUM tag, where it could actually be useful.
I’ll just pause at this point to say that I’m really doing something at this point which I’d much rather not have to do: namely, repeating information already stored in one place in a second place. There’s a reason for not liking to do this: the two pieces of information are physically independent of each other and there’s no intrinsic mechanism in the audio player world to make sure they stay identical. That is, I might have an opera called Peter Grimes which was recorded in 1958. I could set YEAR=1958, and ALBUM=Peter Grimes – 1993 …and nothing can stop me now having two completely different recording dates associated with the same recording. You are then in the position of not really knowing whether one is right and one is wrong, nor which one is right or wrong -or whether both are as bad as each other! In strict database practise and theory we avoid duplicating information like this precisely because it makes data maintenance so tricky for the future.
On this occasion, however, I don’t have a lot of choice. The fundamental piece of data every music player sorts and groups by is ALBUM. If the recording year is not present in that, then it functionally cannot help to distinguish between different recordings of the same work by the same performer. So practically trumps strict theory on this occasion -but it’s still a good rule to bear in mind in general, which is why it’s such a good idea to not include the ALBUM data in your track TITLE tag, for example.
Anyway: that’s the purpose of this post. Now that I know all my recordings have a YEAR tag, I want to make sure that I haven’t ever included the recording year in the ALBUM tag… and used a different date when doing so. If YEAR=1958 and ALBUM says “Peter Grimes 1969”, I want to know that 1958 doesn’t equal 1969. I’ll have to make a manual decision about what to do, and how to fix, any discrepancies found -but the job at hand is to find any discrepancies that do exist.
As always, I’m after a script that will check my entire music library in either one huge go, or in whatever smaller chunks I feel like running from time to time. And here’s the script I came up with to do the job:
2 # Clear up previous runs
3 rm -f /home/hjr/Desktop/freshdatacheck.txt 2> /dev/null
4 # Initialise some counters:
5 # i=count of records processed
6 # b=count of records where YEAR is present in ALBUM, but it's the wrong one (i.e. "bad records")
7 # g=count of records where YEAR is present in ALBUM, and it's the right one (i.e. "good records")