I bought a digital download of the Naxos CD you see at the left the other week.
As with most downloads I purchase these days, the music files I bought came pre-tagged with all sorts of pieces of metadata. Which is kind of the record companies, I guess -in this case, Naxos.
But it turns out that the record companies make just as bad a job tagging their music as all those music ignoramuses that flooded the CDDB database with inaccurate and plain stupid data back in the day. But whereas the ignoramuses from the early days of the Internet and CD ripping had an excuse (namely, their ignorance), I can't make out what Naxos' excuse could possibly be. It's a record company, after all, selling classical music by the box-load, day in, day out. They have no excuse.
So see if you can spot the problem. Here's the 'raw' tag data, as provided directly from Naxos (via the Presto Classical website, to be fair: it could be Presto's fault I guess). Anyway, here's the vendor-supplied metadata, unaltered by me in any way:
Bear in mind that this CD is entirely made up of music written by Isotaro Sugato... but you will be hard-pressed to find his name anywhere! Likewise, the conductor of the entire CD's music is Kazuhiko Komatso... but his name doesn't exactly leap off the page, either! In the ARTIST tag, for example, we see neither of those names, but a couple of others' join the orchestra's, but what relation any of them have to this recording is unclear. From the ALBUM tag, we learn that the four different compositions on this disk are the 'Peaceful Dance of Two Drags' and "etc". It is commonly accepted that Sugato's "etc suite" is an utter masterpiece, of course... Not! So, from that tag data you could not possibly know that 'Dancing Girl in the Orient' is on this disk. It's now all just been subsumed into "etc".
The GENRE is composite (which is not good data cleanliness practice: it's never good to shove multiple levels of hierarchical data into one field); it's also pretty meaningless: it's "Classical Music" in the first instance, which covers about a thousand years of history!
The COMPOSER and CONDUCTOR tags do reveal relevant and accurate information, but since there's hardly a piece of digital music playing software out there that correctly and unambiguiously displays those tags, let alone allow you to use them to search, filter or order anything by... the utility of placing that data solely in those tags must be questioned. There's no DATE, but there is a YEAR (which is another non-standard tag). The COMMENT tag, which is about the only one which a lot of music playing software displays in long-form, free-form text, is just left empty: not a terribly efficient use of the available tags, I think!
You won't be able to read it properly from that screenshot, either, but the ALBUM_ARTIST tag (which is meaningless in the classical music sphere, as composers didn't compose albums) reads, in full, and at extreme length: "Kazuhiko Komatsu, Kanagawa Philharmonic Orchestra, Hisaaki Matsushita, Takeshi Muramatsu"... which is precisely the same information as already provided by the ARTIST tag, plus the CONDUCTOR one. So, information is being duplicated all over the place. Change it in one place, do you think the other place is automatically updated to keep in sync? Of course not.
Oh, and just to wrap this up, the ARTIST tag contains forward-slash characters. A number of tagging software utilities use tag data to automatically generate a physical folder hierarchy in which to store the digital music files. Try to do that on Unix or Linux with that tag data, and you're going to end up with a three-sub-folder nested hierarchy of folders, and the music will end up stored in the last one, 'Takeshi Muramatsu', whose relationship with the composer, conductor, or anything else associated with this recording is tenuous at best.
Finally, notice how the track number is just '1'. Great, provided your CD (or composition) contains less than 10 tracks. Not so great for sorting and ordering if the CD contains more than 9 tracks, however: computers will not sort 1,2,3,4...8.9,10,11... as you might think. You'll generally see it sorted as 1, 10, 11, 12...19, 2, 20, 21... and so on. Not using a consistent number of digits is just dumb data management -which, given the record company is selling you digital data, is just a mind-bogglingly stupid dereliction of duty on their part.
It is, in short, an utter shambles as far as logical, hierarchical, clear, unambiguous and cross-platform metadata tagging goes. But maybe it was an abberation.
Well, let's see. Here's a second CD I bought as a digital download at the beginning of May:
So, it's one disk (of 3) containing orchestral music from several of Janáček's operas, arranged as suites by Peter Breiner, who's then performing them with the New Zealand Symphony Orchestra.
Here's the raw tag data as provided by Naxos/Presto for this disk:
Again, blank COMMENT and DATE tags. Janáček's name is spelled incorrectly in the ALBUM tag. It's spelled incorrectly -and differently incorrectly!- in the COMPOSER tag. Neither tag bothers to mention that we're dealing with Leoš Janáček, unless you count misspelling it in the COMPOSER tag as a 'mention'. Mr Peter Breiner, however, who is merely the arranger and conductor gets three mentions, including as an ALBUM ARTIST, but not as an ARTIST: the logic of him being one but not the other in this context entirely escapes me.
From the tag information itself (rather than the file names that are visible over on the left of the screen), can you tell we're dealing a CD containing music from The Excursions of Mr Brouček? Of course not. You can't actually tell we're listening to music from Jenůfa, either, unless you're prepared to accept 'Jenufa' (without pesky "foreign" accents) is the same thing, which it isn't.
Again, we get a meaningless and unnecessarily hierarchical GENRE tag. Again, we get a multi-part ARTIST tag that contains forward slashes, which means the tag can't be parsed correctly on Linux or Unix systems.
It's easy to criticise, I guess: but how about I show you a better way of tagging these recordings up, to make the point that it doesn't actually have to be this appallingly bad?
Well, first thing I did with the first CD was recognise that it contained 4 different compositions:
Notice, in passing, that the physical storage layout adopted here also immediately tells you not that we're dealing with 'Classical Music' or even 'Orchestral Music', but that sometimes we're going to be listening to ballet music, and other times, it will be purely orchestral music (without a narrative arc).
Looking at the tags for any one of those four compositions, you'll find this:
The composer is now front-and-centre: listed as the ARTIST (and secondarily as COMPOSER, where it's there as a piece of politeness more than functionality, since so few players expose or use the COMPOSER tag). The composition name is clear. The conductor and orchestra are clear and distinct, with no muddle about whether they rank similarly to the composer, or should be mentioned as an ARTIST or as an ALBUM ARTIST or maybe both ...and so on. Oh, and track numbers are consistently two digits long, and there are no compound hierarchies of data in one field, nor any forward (or backward) slashes to complicate things.
My tagging of the Janáček recording is similarly efficient:
First, the various suites are split into separate physical folders: the fact that they were supplied on 3 CDs is irrelevant. They count as separate compositions, and it's one folder per composition. Secondly, the folders are spelled correctly. It's Jenůfa, not Jenufa, for example. The use of separate folders also allows it to be obvious when we're dealing with Jenůfa or Mr Brouček.
The tag data is equally clear:
The composer is now correctly identified and has both names spelled correctly. The conductor and orchestra interpreting his work are clearly identified. Two-digit track numbers are in use. There are no unnecessary data hierarchies, and no spurious slashes to chop things up ...and cause grief on the wrong operating systems.
I could go on, but I won't labour the point more than I already have done.
The two original examples of bad tagging came directly from Naxos, via Prestomusic.com. I don't know if it's Naxos that's at fault, or whether Presto do their own bit of tagging on the side... but I suspect it's Naxos. And in that case, they should know and do much better. Their original tag data is utter shite, putting it as politely as I know how. It lacks any understanding of what uniquely identifies this recording of a composition from that one. It shows no respect for composers. It shows zero understanding of the importance of the 'composition'. Indeed, they don't seem to appreciate what a composition is at all! It sticks two giant fingers up to anyone not using the American Latin alphabet. It demonstrates no understanding of how operating systems work. It pays zero regards to basic data information theory.
Now: I was able to correct their abhorrent tagging, and it didn't take me too long to do it. But I shouldn't have had to do it at all!
Lift your game, record companies. If you want to continue selling music to fans of classical music, try showing the bare minimum of respect to that specific audience and the way they think of and organise their music! You don't deserve anyone's money unless you do.