Houston, we have a problem... Part 2...

A brief reminder, if any were needed, that my music collection is, in some ways, in a bit of a pickle. Pickles which I hadn't realised it was in, that is, until some damn fool or other (that would be me!) released Niente Version 3! With that program's new-found ability to analyse for physical corruption and logical failings (or, if you prefer, failures to live up to the strictures and precepts of the Holy Text of the Axioms of Classical Tagging), it's now easier than ever to discover you've been merely mucking about with your music cataloguing all these years, even though you thought you were being rather good at it at the time!

A blog post or two ago, I pointed out that a lot of my album art was undersized, oversized or ok-sized-but-not-square. A bout of bulk-fixing via a fixart.sh script, plus a spot of intensive manual acquiring of good album art, plus a lot of manual re-tagging, means all those problems are now behind me. Sadly, however, that wasn't the only tagging issue Niente showed me I had!

This time, I'm looking at statistic 7 on the Niente Quick Aggregate Statistics (QAS) report:

Niente is of the opinion that I have 2,165 recordings that have PERFORMER/ALBUM tagging inconsistencies. As a brief reminder, Axiom 5 states that the ALBUM tag should be filled with data in extended_composition_name (distinguishing_arist's_surname - recording_year) format. That means the ALBUM tag might end up set to something like "Symphony No. 5 (Karajan - 1966)", for example.

Axiom 3 states that the COMMENT tag will contain the names of all the performers of the recording, in conductor, orchestra, choir, soloist order (leaving out any that don't apply: so if it's not a choral piece, there won't be mention of a choir). Your COMMENT tag might thus end up set to "Herbert von Karajan, Berlin Philharmonic Orchestra".

Finally Axiom 6 states that the PERFORMER tag will be the full name of the distinguishing artist -i.e., the performing artist that makes this recording of a work uniquely distinguishable from another recording of the same work. In the example case I'm suggesting here, PERFORMER would thus end up set to "Herbert von Karajan".

In order, therefore, Karajan in the ALBUM -> Herbert von Karajan in the COMMENT -> Herbert von Karajan in the PERFORMER tag. Everything is consistent with everything else and Niente is happy.

Now consider the case of, say, a clarinet concerto. It may be the case that the 'distinguishing artist' for such a recording is still the conductor of the orchestra: if the virtuoso clarinet part is being played by John Smith or Jíří Žéłůßñìčk, it is quite likely that you looked at it when ripping and tagging and decided that knowing the recording was being conducted by Colin Davis was rather more useful! If that's the case, you won't have a tagging inconsistency either: the conductor's surname will be in ALBUM; the whole name will be in COMMENT and PERFORMER: everything remains consistent.

Finally, consider the example of a clarinet concerto where Michael Collins is playing -a fine clarinetist and one whose name you recognise and would use to choose to play this recording over that one if you had the chance. In this case, you would be regarding Michael Collins as the distinguishing artist, not the conductor of the orchestra. You'd therefore tag ALBUM as "Clarinet Concerto No. 1 (Collins - 2014)", COMMENT as "Martyn Brabbins, BBC Philharmonic Orchestra, Michael Collins (clarinet)", as per Axiom 3 -and, at this point, if you're using my Semplice program to tag your music files, you've got a problem! Semplice does not let you explicitly set the PERFORMER tag at all: it merely takes the first name from the COMMENT tag and says to itself, "if it's first in the list of performing artists, that must be the distinguishing artist, and therefore that's the PERFORMER tag sorted!" In this case, therefore, Martyn Brabbins would be selected by Semplice as the PERFORMER tag, meaning that whilst the ALBUM tag mentions "Collins", the PERFORMER tag mentions "Martyn Brabbins" -and that's what statistic number 7 on the QAS Report shown above is telling you has happened to my music collection over two thousand times!

Sometimes, the inconsistency between ALBUM and PERFORMER tags will be because I'm an idiot and can't type properly. If ALBUM mentions the name "Philips" (one 'L') and then I type COMMENT is "Peter Phillips" (two Ls), then PERFORMER will contain the 2-L version of the name, ALBUM has the 1-L version, and there's again a logical inconsistency between ALBUM and PERFORMER.

Generally, however, that's not the problem with my collection. I'd say, at a rough guess, around 90%+ of my Statistic 7 errors are because of the fact that my concertos are COMMENT tagged with the conductor as the first name in the list of performing artists, and Semplice has therefore picked the wrong name to write into the PERFORMER tag. If I could bulk-fix those, most of my Statistic 7 problems would vanish!

So, that's the reason for the problem -and I have 2000 or so problems in that regard! I don't propose to manually edit the tags of every single affected FLAC, however, so I have instead cooked up another bulk fix-up script, which I've made available as an addendum to the Niente User Manual, called fixperformers.sh.

  • The script first determines the 'distinguishing artist name' (which I'll call the 'DAN' henceforth) from the ALBUM tag
  • It then checks the PERFORMER tag to see if the DAN is found within it. If so: everything's fine and we can move on and check the next file
  • If the DAN is not present in the PERFORMER tag, the COMMENT tag is read
  • For each comma-separated name in the COMMENT tag, the script asks if the DAN is present in the COMMENT-derived name.
  • If one of the COMMENT-derived names contains the DAN, then that's our PERFORMER.

In essence, the script finds someone in COMMENT that matches a partial name found in ALBUM and sets PERFORMER to that. This works to fix up most 'my distinguishing artist wasn't listed first in the COMMENT tag' problems -but it cannot fix up mere typos. The issue for such 'unable to fix' responses is that the script always assumes that 'ALBUM is Right' -that the DAN contained within the ALBUM tag ought to match up with a name in the COMMENT and therefore that name should appear in PERFORMER. If the ALBUM tag itself contains a typo, however, then the script cannot proceed further and will declare the error to be one that can only be fixed manually.

Here's a real example of that sort of thing happening:

The TITLE and ALBUM tags in this example declare the DAN to be someone called "Terwinkel". The PERFORMER is currently set to "Sebastian Tewinkel" -and you'll note that the PERFORMER version of the name lacks an 'R', but the ALBUM version has one. Thus ALBUM is inconsistent with PERFORMER. The fix-up script therefore starts looking through the COMMENT tag for something that matches part of the ALBUM tag... and it won't find it, because the R-less "Tewinkel" is found in the COMMENT tag, so the script never finds anything in COMMENT which matches "Terwinkel". Close enough isn't actually enough for the script! As a result of parsing all the way through COMMENT and not finding a match for "Terwinkel", the script declares the problem is unfixable except by manual means.

Here's me implementing the fix:

I first had to Google Sebastian Tewinkel to find out what was the correct spelling of his name. Having determined that "Tewinkel", without the R, was the correct spelling, I could then edit the tags accordingly. I could have implemented the fix in Semplice, but instead decided here to fix it in the graphical Kid3 tag editor. I manually adjusted the ALBUM tag and the TITLE tag: if you alter data in one place, make sure you don't thereby introduce new inconsistencies by forgetting to update the same data everywhere it appears! Don't forget, too, that if your FLAC contains an embedded cuesheet, that needs to be manually edited to match the new ALBUM entry too. After the manual change of "Terwinkel" to "Tewinkel", the ALBUM tag now already matches part of the PERFORMER tag, so nothing else need be done: the next time you perform a differential integrity check in Niente, the line representing this recording will automatically drop off the Option 7 report. If you were to re-run the fix-up script, however, the output for this recording would be a simple full-stop/period, indicating "This recording no longer needs fixing".

A slightly odd variation on this theme is that you may occasionally come across recordings which the Niente Option 7 report declares to be a problem but which the fix-up script resolutely declares to not be a problem at all! This will happen if you include terms in the ALBUM tag within brackets that are not in the (DAN - Recording Year) component. For example: if ALBUM=Adagio (from Symphony No. 4) (Bernstein - 1986) and PERFORMER=Leonard Bernstein, that will earn you a line on the Niente report as an inconsistent ALBUM/PERFORMER tag pair. Why so, given they are obviously entirely consistent with each other? Well, it's a limitation of the technology used to produce the Niente report, I'm afraid: that extra set of bracketted terms in the ALBUM tag is confusing Niente. It's looking inside the "(from Symphony No. 4)" brackets to find a performer name and notes that it doesn't match "Leonard Bernstein" in the PERFORMER tag. Hence it's on the report. Of course, though, Niente is looking inside the wrong pair of brackets! If it instead compared "(Bernstein - 1986)" with "Leonard Bernstein", it would spot that the two are consistent with each other after all.

The fix-up script is able to use a different technology to pick the right set of brackets to inspect, though -and that means the fix-up script will not see this recording as a problem at all. So, the recording will forever be listed on the Niente report unless you manually remove the extra set of round brackets: maybe replace them with square ones, or leave them out altogether. Thus "Adagio, from Symphony No. 4 (Bernstein - 1986)" will satisfy the Niente report, and so would "Adagio [from Symphony No. 4] (Bernstein - 1986)".

The general procedure to run this script is, therefore, as follows:

  1. Set the parameters at the top of the script before doing anything at all
  2. Perform a new, full integrity check of your music files, using Niente Main Menu option 3
  3. Review the scale of errors by taking Niente Reports Menu option 7
  4. If there are many ALBUM/PERFORMER issues reported, run the script
  5. After the script has run, perform a new, differential integrity check of your music files, using Niente Main Menu option 4
  6. Check the Reports Menu option 7 report again: anything still listed requires manual fix-up
  7. Keep on running Differential Checkups, performing manual fixes, re-running the fixup script and yet more Differential Checkups until the Reports Menu option 7 tells you that all records are fine. Phew!

Starting at point 1: the script has a section at its top which sets a number of parameters or variables which you should check apply to your situation. These are:

  • DBNAME - make sure you set this to the name of your Niente database that contains details of the tracks whose tags you want fixed up
  • CONFDIR - should point to where your Niente configuration file is stored. It's set to the standard, default location by default and shouldn't need altering
  • SEDPROG and GREPPROG - should leave these set to 'sed' and 'grep' if running on anything other than MacOS. Set to gsed and ggrep respectively if you are running on MacOS.
  • FINDPROG - This is the name of the operating system's 'find files' utility and is set to 'fd' by default. On Ubuntu and its derivatives, you'll probably need to set it to "find".

For most operating systems, you'll only need to set DBNAME and CONFDIR. For some, usually Ubuntu and MacOS, you'll maybe need to adjust at least one of the other three.

I suppose there ought to be a 0th item on that list, too: backup your music collection before running the script against it! The script will make alterations to tags in lots of your music files. Those alterations will probably be correct to make -but whether they're correct or not, there's no going back on them! So, I'd make sure you have a backup of your files before bulk-mangling them! This why the script will always start with this prompt:

A "dry run" is where the script checks all the files it knows from Niente are 'ALBUM/PERFORMER wrong', and produces a text file output of what it would change and what it cannot fix, but it doesn't actually change anything at all. Use this feature (by tapping 'd' at the prompt) to provide some reassurance that what the script would fix represent reasonable and correct fixes. When you are confident in the script's abilities, you can re-run it and tape 'r' to do a real run, where the proposed changes are actually made. Real runs also output a text file (to your desktop) -but ony list files which the script knows it cannot automatically fix. That's the list you use to perform step 6 in the above process: the manual fixing up of files.

Here's an example of the script doing its thing:

As the script steps through each recording at a time, it will output an 'X' if it cannot automatically fix the problem and an 'F' if it can (and has). If it finds a file that Niente thinks wrong but which it finds no need to fix, it will output a '.' by way of a placeholder. You can see from that last screenshot that the four Xs (no fixes) that start the run relate to the first four rows of the report being displayed behind the script-running terminal. I've already explained above that a typo in the ALBUM is unfixable (because the script assumes ALBUM is always right), so you can see my ALBUM tags for those first four recordings contain typos: Terwinkel instead of Tewinkel, Trondheim instead of Tronheim. Then I get two Fs, indicating two fixes: so the Perlman/Zimmermann recordings can be fixed (and, indeed, have been): looking at those names, I'm fairly certain that the ALBUM tag has mentioned the name of the soloists (Itzhak Perlman's is the big clue here!), whereas the PERFORMER has been set to the conductor's name (Seiji Ozawa's name being an obvious example of that sort of thing).

So that's visual proof that the script can fix up the right names in the wrong order; but it cannot fix up issues where there are typos in either the ALBUM or COMMENT tags themselves.

So how did I get on? Well, it took me six days to achieve (two of which were spent writing and refining the fix-up script), but here are my current statistics:

Compared to the first screenshot shown in this blog piece, you can see all 2,100+ PERFORMER/ALBUM tag inconsistencies have been resolved, by a combination of running the fix-up script and by quite a bit of follow-up manual tag adjustment. On the whole, I'd say that the script automatically fixed up somewhere between 75% and 90% of all reported errors, leaving me around 10-25% of fixes to apply manually. That still meant manually tweaking over 200 recordings' tags, but that was a manageable proposition, where doing a couple of thousand of them wouldn't have been!

I found that a lot of the inconsistencies arose with files I recall (or could tell!) I'd ripped over a decade ago, before I'd really nailed down what the format and contents of the ALBUM and COMMENT tags should be. These days, I'd always put conductor, orchestra, soloists in my COMMENT tag: back then, I was more slapdash and appear to have been quite satisfied just listing whatever the 'big name' was on the CD cover! It's been really useful to be directed back to these early rips and see just how bad my early tagging efforts were 🙂 On more than one occasion, the simpler thing to do was dig the CD out again and rip it afresh, applying my newer, more rigorous tagging standards to the result. I also discovered quite a few of those early rips were of music snippets I'd never consider a 'worthy' standalone recording these days -so quite a bit of deleting and/or combining recordings also took place.

Fixing up tagging issues, in other words, gives you a good opportunity to review your entire music collecting approach and appreciate your collecting development over the years. It's been a refreshing experience!

Unfortunately, it's not all sweetness and light:

As you can see, there are a couple of other statistics which have rather a lot of issues to deal with: some further fix-up scripts (and additional parts to this series of posts) therefore seem to be in order. Watch this space!

To conclude, however:

Download the fix-up script from here and save it to somewhere convenient (I tend to stick things on my Desktop). To then run it, you open a terminal session and type something like:

cd $HOME/Desktop
chmod +x fixperformers.sh

The first command makes sure you're sitting in the folder where the script is stored; the second makes the script executable; the third actually runs it. Good luck!