I had been subsconsciously aware of a problem for quite a while, but had paid it no real attention: whenever I looked, a lot of my music files seemed to lack proper performer details! The tracks played fine and everything else about them seemed OK, but just no 'conducted by, orchestra playing is, soprano singing is...' stuff. I'd noticed it, in a casual sort of way, from time to time... but because you can live without knowing that particular information at your fingertips, I mostly just did.
I had also vaguely registered (erroneously, as it turns out!) that the problem was with a lot of music I remember buying in the 1990s and ripping in the early 2000s, so I put it down to sloppy tagging habits, waaay back before I knew any better. But it wasn't and isn't. It is a tale of software shenanigans and garbage programming that really kind of ticks me off, to be frank. So, first: let me show you the problem...
Here is the metadata associated with some Darius Milhaud music files:
You may be able to work out that there are 10 tags (the 'canonical ten') and one of them is 'COMMENT=' and that this particular tag has no value assigned to it. There is nothing showing on the right-hand side of that particular tag's equals sign. This music therefore has no performer information associated with it!
Well, assigning data to canonical tags is what my very own CDDT software was designed to do, so let me use that to fix this lack-of-data problem:
So that's me specifying the conductor and orchestra. If I save that data and re-check the metadata tags associated with these files:
So now, we have a COMMENT tag containing the relevant information. It's currently showing as tag [10], but if I save this in CCDT and then re-load it into CCDT and re-display:
...you can see that CCDT tidies tags up as it saves them, so that the COMMENT tag is saved into 'canonical position' [7].
So, now that the data is safely in the COMMENT tag, let's say that for some reason or other I opened up these music files in a graphical tag editor such as Easytag, a Linux-based tag editor I've been using for years to do quick-and-simple tag inspections and the occasional tag 'touch-up' when needed, for many years.
Something I've always noticed about Easytag at this point, and which you can see happening in that screenshot, is that whenever it opens any of my tagged music files, it has always emboldened their file names in the left-hand panel you see here. Now, it usually uses bold in that panel to indicate that 'there are unsaved changes that have been made to these files'... but since I've literally just opened the files in the program and haven't changed a thing, I've always assumed this to be a 'quirk' and nothing more. How wrong I was! As we shall see!!
Anyway, I've opened the files in Easytag, just to inspect the data and you can see that it's all present as CCDT said it should be: the 'Comment' tag is there, nicely labelled 'Comment', too.
Now, I've seen what I needed to see and haven't typed a thing, so I go to close Easytag down... and this happens:
That is, a pop-up appears warning you that there are unsaved changes and they'll be lost if you don't say to 'Save' them. Just press [Enter] at this point to make the pop-up go away (as I think a lot of users would tend to do!) and guess what: you've just taken the 'Save' option!
So let's say you've hit [Enter] without thinking about it much. Let me now immediately re-inspect the files in my own CCDT program:
....and what do we notice? In 'canonical position' [7], we now see a tag called 'DESCRIPTION', not 'COMMENT'. Without really telling us, or explaining why, Easytag has renamed the COMMENT tag so that it is now called DESCRIPTION.
Never mind that its own user interface calls this tag 'Comment' in its own right-hand pane. Under-the-hood, Easytag thinks it's OK to change tag names -and the fact that it had changed them behind the scenes is why it displays the files in bold when they are first opened and declares that there are 'unsaved changes' when you go to close the program without having manually changed a thing.
Is the fact that a tag got silently renamed the end of the world? Well, not in-and-of itself, no. It's not. The music file will play fine; it will display the composer, album and track names properly. Even the embedded album art remains safe. Indeed, even if you inspect the tags in another program, you won't see any indication of a problem at all:
That's the Clementine music player's tag editor, being used to investigate the tags associated with the same music files as before. Note that it still displays the file as having a 'Comment', so it's not particularly fazed by having a differently-named tag supplying the information it wants to display there. It's reading something now called "DESCRIPTION" and displaying it in something called "Comment", totally transparently and automatically.
This tells you that a lot of music players will quite happily play these files just fine; they'll also (usually) allow you to search what is now your DESCRIPTION tag as effectively as they'd previously allow you to search your COMMENT tag. So, functionally, not a lot changes because of Easytag's sleight-of-hand.
But I wrote a tool a while ago now, called the Dizwell Tag Cleaner (DTC to its friends). If run against a set of previously-tagged music files, it 'cleans' their tags up, removing 'non-canonical' tags and leaving only the canonical ones behind.
Here's a little code snippet to explain what it does:
# Fetch existing metadata into variables COMPOSER=$(metaflac --show-tag=Artist "$f" | sed s/.*=//g) ARTIST=$(metaflac --show-tag=Artist "$f" | sed s/.*=//g) ALBUM=$(metaflac --show-tag=Album "$f" | sed s/.*=//g) TRACKNUMBER=$(metaflac --show-tag=TRACKNUMBER "$f" | sed s/.*=//g) TITLE=$(metaflac --show-tag=Title "$f" | sed s/.*=//g) GENRE=$(metaflac --show-tag=Genre "$f" | sed s/.*=//g) COMMENT=$(metaflac --show-tag=Comment "$f" | sed s/.*=//g) YEAR=$(metaflac --show-tag=Date "$f" | sed s/.*=//g) ENCODED=$(metaflac --show-tag=ENCODED-BY "$f" | sed s/.*=//g) MD5HASH=$(ffmpeg -i "$f" -map 0:a -f md5 - 2>/dev/null | sed s/.*=//g)
That is, out of all the tags that might be present in an audio file, it reads in the values of 10 of them: the 'canonical ten'. The line in bold is specifically where it looks for a tag called COMMENT and reads its value into a variable called $COMMENT.
Now it has the ten canonical tag values safely rescued from a music file, it writes them out to a text file and then does this:
# Now wipe all existing tags (except Album Art), then re-load from the text file
metaflac --remove-all-tags "$f"
metaflac --import-tags-from="$TAGFILE" "$f"
rm "$TAGFILE"
That is, it wipes out all tags in the files without exception -and then reads back the 'canonical ten' from the text file previously created.
See the problem? If my performer details are now stored in a tag called DESCRIPTION (because Easytag renamed the tag without telling me); and if I now run DTC against those files... then the value of the DESCRIPTION tag is not read and saved into a variable (because it's got the wrong name!), but it is then wiped by the "remove-all-tags" command. And the 'import the canonical ten from the text file' routine then fails to replace the data back into the correctly-named COMMENT tag.
Net result: performer details are completely lost.
That's the worst-case scenario, anyway. The best case scenario is that the performer details are not lost, but are merely left sitting in a tag called DESCRIPTION, because you saved the files in Easytag and haven't yet run DTC. So long as you never run DTC against them, the data will still be there, just in a tag with a name different to what you thought they'd be in!
The moral of this story is that the entire tagging software industry plays fast and loose with your data as the mood takes them. Renaming tags without telling you is not OK, and shouldn't be happening -but the fact that Clementine displays data from a DESCRIPTION tag in a big block of text labelled 'Comments' tells you that pretty much all music programs "translate" and "interpret" tag data, on-the-fly, without warning.
So be warned: use Easytag together with my own tagging software at your peril. Mix them together, and data loss awaits. Personally, I think if a tag is called COMMENT and displayed in a box called 'Comments', it should stay called 'COMMENT'! So my software is always consistent about it and would never damage the data stored in the 'canonical ten' tags. It's then up to you to make sure you don't use software which does unexpected things to those tags.
If you must use a graphical tagger on Linux, I would suggest instead that you use Kid3. It is pretty long in the tooth these days, and has dependencies on KDE libraries, but it at least doesn't screw up tag names without warning!
For example: here is Kid3's main editing interface, applied to the music files I've been working with thus far:
You may be able to see that I've emptied out the tag called 'Description', so that it contains no text at all (and since all 6 music files are highlighted on the left, this change will apply to all those files). I've also used the big [Add...] button on the right to add in a completely new tag, called... Comment. You can see it about 4th down in the main panel of tags. I've then added a value to that new tag -a long line of rather sill text. If I save these two changes and go inspect the files in CCDT once more, what will we see?
Well, we see a canonical [10] tag called 'COMMENT', with a value set to be the same silly text I typed in Kid3. So Kid3 has saved the 'Comment' tag with a name of 'COMMENT' (the kind of sanity you might think was common, but clearly the Easytag developers didn't get that particular memo!)
Notice too that the DESCRIPTION tag has disappeared -even though I left it present in the main panel in Kid3, just with an empty value. Clearly, Kid3 is smart enough to work out that if a tag is set to a null value, there's no point in having the tag stored in the music file at all, so has removed it.
In fact, Kid3 can do what Easytag does -that is, treat DESCRIPTION as effectively the same thing as COMMENT- except that it is configurable and (importantly) not the default behaviour! Here, for example, is the very first screen you get to when you click Settings -> Configure Kid3:
Notice the "Ogg/Vorbis" panel in the lower-middle of that screenshot: it has an option to set the 'Comment field name' to something. By default (and in-line with commonsense!), the comment field name is set to be a tag called COMMENT. But you could select the dropdown box shown there and select DESCRIPTION instead. Do that, and Kid3 will behave as Easytag does... but you'll have, at least, chosen to make it work that way! I'm afraid Easytag borrows the Gnome developer methodology: don't give users a choice, because they're all too dumb to know what to do with it anyway. Short version: if you must use a graphical tagger in Linux, use Kid3.
The story In Windows is a bit happier: I suspect (but cannot prove!) that the two most-common graphical tagger programs on that platform are MP3Tag and dbPowerAmp. Both are excellent: and neither renames the COMMENT tag when saving a file. If it goes into those programs as a tag called 'COMMENT', it comes out called that too!
So it does seem at this point to be just a problem with Easytag. Which is unfortunate for me, because I've been using it for years. I therefore now have thousands of music files that don't have performer details at all. I have another several thousand that have performer details, but stored in a tag called DESCRIPTION that I didn't ask for, and which is at risk of being obliterated if I ever run DTC again!
The first thing I need to do, therefore, is uninstall Easytag from my PCs and Laptops with undue haste!!
Secondly, I need a script to (a) scan all music files; (b) if there's a COMMENT tag with a value, leave it alone; (c) if there's no COMMENT tag, but a DESCRIPTION tag, then create a COMMENT tag with the DESCRIPTION tag's value... and then delete the DESCRIPTION tag; and (d) if there's no COMMENT tag and no DESCRIPTION tag, then declare that the file lacks performer details and can't be auto-fixed by the procedure used at step (c).
I have written such a script and it's downloadable here. Save it somewhere (call it your $HOME/Downloads directory for now). Make it executable:
chmod +x $HOME/Downloads/act.sh
Then CD to a folder that contains music files (or is the parent of folders that contain music files) and invoke the script:
$HOME/Downloads/act.sh
If you're lucky, you won't see anything at all. If you're unlucky, you'll see this sort of output:
...which is an indication that there's a couple of CDs I have to go back and check the booklets for and then re-type in the performer details.
That replaced data will then be safe for all time in the future... so long as I don't use Easytag again! For Easytag will make garbage of your tag data (and other software that comes along to clean the mess up will do so by causing data loss)!
The broader lesson to be learned, I think, is: don't mix-and-match your tagging software lightly. Always check and inspect what a particular program will do to your data (preferably on some test copies of a few music files, rather than your real, entire library!), and how it interacts with your other music management programs before committing to its long-term use. Consider me once-bitten and twice very, very shy as far as Easytag is concerned!
One thought on “Easytag Garbage Creation”
Comments are closed.