Houston, we have a problem... Part 3...

Following on from my two earlier escapades in fixing up my music collection's tagging, it is time to turn my attention to my penultimate big issue:That 'Folders with multiple tracks' statistic (number 12 in the list) is something that is not officially a problem, but it annoys me nonetheless! It is counting the number of times a folder contains more than one FLAC. Now, if you rip a symphony off most classical music CDs, you probably expect to end up with four separate 'tracks', each track representing one movement of the symphony -so the presence of more than one FLAC in a folder might not seem to be surprising or particularly 'wrong'.

You would be entirely correct in thinking that, I hasten to add: there is absolutely nothing wrong in having parts of a composition represented by a separate 'track' and there is nothing in my Axioms of Classical Tagging article to say otherwise. In fact, it's entirely silent on the subject.

However, I have long been an advocate for combining multiple per-track FLACs into a single composition-all-at-once "SuperFLAC". This doesn't affect the musical content of your files, but merely means that instead of a symphony consisting of four separate tracks, it is all now contained within a single file. Semplice will do this 'SuperFLAC creation' process for you, on a work-by-work basis. When it does so, it will also work out how long each of the individual tracks last for and construct something called a "cuesheet" which it will then embed within the SuperFLAC. The cuesheet then allows players that want to display per-track data to do so, even though no physical tracks actually exist. For example, here is my SuperFLAC of a recording of something by Aaron Copland:

You can see that, apart from the CD booklet sitting there in PDF format, the recording itself consists of a single FLAC file. Yet, when I play this within an appropriate player (such as DeaDBeef), I see this:


So the ten 'tracks' that make up this recording are still there and still individually selectable and playable, if I want them to be: nothing is lost in terms of playablity or flexibility in converting per-track ripts to per-composition ones, in other words. Meanwhile, the benefits are that my storage server has to store fewer large files, rather than lots of smaller ones: file systems and hard drives tend to work more efficiently and with better performance when doing the former rather than the latter. The other good thing about single-file recordings is that if your music player cannot do proper 'gapless' playback, whereby one movement (track) moves immediately onto the next, without pause or interruption, then feeding it a single FLAC rather than 10 separate ones eliminates that as an issue: there is no gap when only a single file is being played, so your player's limitations make no difference. If there was an attacca transition between the end of the first and the beginning of the second movement, even the gappiest of players will be able to cope!

In short: there are few drawbacks to creating SuperFLACs and quite a few benefits from doing so. Thus, seeing that I have nearly 600 recordings in my collection that clearly haven't been turned into SuperFLACs is annoying me. Naturally, however, I am not going to visit each of nearly 600 folders in turn and invoke Semplice individually, by hand! Instead, I've written another of my fixup scripts (fixtracks.sh) to automate the job, which I've then made available on Niente's User Manual 'Fix Up Scripts' page.

The script retrieves a list of all the folders containing more than one FLAC from Niente's database (so, you need to do a Full or Incremental integrity check within Niente before you run it, so that it's working on the most up-to-date data).

When first run, the script will ask if you want to do a 'real' run or a 'dry run': you tap 'r' or 'd' to choose. I'd strongly recommend doing a dry-run first, as this outputs a script on your Desktop which contains commands that, with a little bit of editing, allow you to automatically backup files which a subsequent 'real' run would irrevocably alter. Here, for example, is the output of my first dry run:

You can see the file is called 'fixtracks_folders.txt' (and, as I mentioned, it is written to your $HOME/Desktop folder). It contains 'placeholder text': the first line visible here, for example is:

mkdir "/somebackupdir/1" && cp -rp "<name of the folder containing more than one FLAC" "/somebackupdir/1"

In other words, if executed as a shell script, the file will create a bunch of sub-folders within a backup folder and then copy the contents of one of the affected folders to that new sub-folder, thereby backing it up. Obviously, you will need to use a text editor's search-and-replace functionality to substitute the name of a real backup folder for "somebackupdir", which I did here:

You can see I've substituted '/sourcedata/music/temporary_backup' for the 'somebackupdir' placeholder, for every line of the script. All you need then do is make the text file executable before, finally, running it:

chmod +x $HOME/Desktop/fixtracks_folders.txt

You then sit back and wait: in my case, the 596 folders of music consumed around 56GB of disk space and took around half an hour to copy to its temporary backup location. Once the backup is in place, you're ready to run the actual fixup script in 'real' mode.

When running 'for real', the fix-up script visits each of the music folders previously listed in the backup script in turn. It counts the FLACs in that folder. If there's more than one of them (as should always be true, but it pays to check these things, just in case!), it first extracts all the metadata from the first FLAC found, and then computes the play-length of every individual FLAC found, in order. It uses these per-track play lengths to construct a cuesheet for the playback of the complete folder of music files. It then concatenates all the audio streams from the individual FLACs into a single SuperFLAC, before writing back to that SuperFLAC all the metadata tags previously extracted from the first FLAC. It finally embeds the cuesheet it previously constructed into the SuperFLAC, before deleting the non-SuperFLAC files. Your folder is thus left with a single SuperFLAC that contains all the audio and metadata previously stored within the multiple, per-track FLACs.

You will note that the fixup script deletes the original per-track files, at least by default. See the AUTODELETE parameter described below if you want to change this behaviour.

The default behaviour of deleting the source files is, however, the reason why running the fixup script in 'dry run' mode and then using the output 'fixtracks_folders.txt' file as a backup is so important. There is otherwise no way back from the changes the fixup script will inflict on your music collection! Be warned.

Once you've run the fixup script, you need to perform a new Differential or Full integrity check from within Niente itself. This forces Niente's database to be refreshed to take account of the new file situation on-disk. Without a fresh integrity check, the Niente statistics and reports will not pick up the changes made by the fixup script.

Note that at the top of the script are a set of seven parameters which you may need or want to adjust, as follows:

  • DBNAME - make sure you set this to the name of your Niente database that contains details of the tracks whose tags you want fixed up
  • CONFDIR - should point to where your Niente configuration file is stored. It's set to the standard, default location by default and shouldn't need altering
  • SEDPROG and GREPPROG - should leave these set to 'sed' and 'grep' if running on anything other than MacOS. Set to gsed and ggrep respectively if you are running on MacOS
  • FINDPROG - This is the name of the operating system's 'find files' utility and is set to 'fd' by default. On Ubuntu and its derivatives, you'll probably need to set it to "find"
  • AUTODELETE - It's set to '1' by default, which means the script will automatically delete the per-track files after it has consolidated them into a SuperFLAC. If you are feeling nervous, you can set this to some other value (such as 0) to preserve both the originals and the SuperFLAC, leaving you to delete the per-track files yourself, manually, afterwards.
  • MAXSUPER - It's set to 9999 by default, which is meant to be an arbitrarily very high number that means 'fix up everything in one go'. When you're starting, however, you might want to set it to (say) '3': the script will then create three SuperFLACs in a row and then quit. In this way, the parameter lets you do several small runs, to get comfortable with its work, before finally letting it go big-time!

Here's what happened with me after I'd run the fixup script:

You can see, I hope, that I got my original 596 errors for statistic 12 down to a mere 88 persistent errors. You'll also notice that I acquired over 500 new statistic 11 errors as a result of this 'fixup'! That's because the multiple-flacs-in-one-folder fixup creates SuperFLACs that do not contain bit-depth or sample-rate indicators in their file names. In other words, you'll end up with a SuperFLAC called something like "Orchestral Variations (Wilson - 2016).flac", rather than "Orchestral Variations (Wilson - 2016)-16-44100.flac". If the lack of those bit-depth and sample-rate indicators in file names is an issue for you, as it is for me, another fix-up script can be re-run to sort these issues out (see part 4 in this series!).

The more serious issue, to be honest, is why 88 folders are still listed as containing more than one FLAC, considering that the logic of the fixup script simply shouldn't permit that to happen!

Well, the fundamental issue here turned out to be bad tagging on my part, rather than a reluctance on the part of me or the fixup script to create SuperFLACs. The fixup script only combines multiple FLACs into one SuperFLAC when they are contained within a single folder. If you've managed to tag the same composition in two separate folders, the fixup script won't see them as candidates for consolidation. All you really can do at this point, therefore, is work your way through Niente's Report Menu Option 12, line by line, and try and work out what's going on.

Here's an example that happened to me:

Here's a folder-full of Schubert symphonies, conducted by assorted people, each recording/composition stored within its own sub-folder. You can see that, from the folder names, I have a Symphony No. 8 and a Symphony No. 9 conducted by Blomstedt in 2021. When you examine the actual FLAC contents of those folders, however, you see that each one only contains a single SuperFLAC already. So how were these folders listed on my Niente report of 'folders containing more than one FLAC'? Well: look at the file names within each of the two folders. Though the folder names are different, the files within each folder are both named 'Symphony No. 8...'. The Niente report groups by whatever is in the ALBUM tag, and that reflects the physical name of the files -so, here, we have two separate symphonies named -and tagged- identically. The two ALBUM tags are identical, so Niente reports them as a multi-FLAC issue... but the fixup script sees that they are in separate folders and thus decides (correctly, as it happens) that they should not be concatenated together.

The issue here was not that easy to fix: I had to listen to each FLAC in turn to establish that, despite their identical names, they really were separate recordings of two distinct symphonies. Once I'd determined that they were, I had to listen to other recordings of Symphonies 8 and 9 by Schubert to determine which was which: it turned out that, in this case, it was Symphony No. 8 that was correctly tagged and named; but that Symphony No. 9 had been wrongly tagged and named by merely re-applying all the tags that had first been applied to Symphony No. 8 -including the ALBUM tag. So, Niente is stuck there seeing that for the same ALBUM, there are two FLACs (albeit in different folders) and thus reports them as a multi-track problem. It isn't actually that at all, of course: it's the fact that I somehow used exactly the same metadata for one symphony that I'd already applied to a previous one. The result is that the fix here was to manually re-tag the 9th Symphony from scratch, using the track/performer details in the CD booklet, finishing off with a manual re-name of the relevant FLAC.

Here's another example of my music collecting going a bit wrong because I was being a bit of an idiot!

In the background, you see a snippet from Niente's 'folders with more than one FLAC' report: it's listing the various components of Wagner's Ring Cycle, as conducted by Georg Solti, as being problematic. In the foreground, you see me visiting the relevant folder in my file manager... and the underlying issue becomes obvious to spot, almost at once! I have two folders here for the two versions of the Solti Ring that I own: one is just a set of ordinary CDs I bought way back in the 1980s (they were amongst the first of my CDs, if I recall correctly). Those are stored in the 'Solti Original CDs' folder. But not too many years ago, I also extravagantly purchased the SACD re-master of those same recordings, and I ripped those into the 'Solti Hi-Res' folder you see here. For some reason that now escapes me, I then retained both versions: I think it was early days for me and SACD and I wasn't convinced the huge amount of extra disk space an SACD occupied was actually worth it, so I kept the originals around, just in case I decided not to waste all that disk space after all.

Anyway: this sort of thing means I have something tagged as 'Das Rheingold' in both the 'original' and the 'Hi-Res' folder. Since they are separate physical folders, the fix-up script won't concatenate them into a single file (which is just as well: the Ring cycle is quite long enough without being doubled up like that!), but the fact that two files have exactly the same ALBUM tag means -once again- the Niente report still lists them as a single ALBUM comprised of more than one FLAC. The fix here is simply to make up my mind and pick just one of the versions to keep (and I ended up keeping the Hi-Res versions, if you're at all interested!)

You get the point, I hope: anything that is still listed as a problem after the fixup script has been run in 'real' mode is actually a problem with your tagging or collecting/folder organisation. They will represent multiple physical files that happen to use the exact same combination of ARTIST (i.e., the composer), GENRE and ALBUM tags, no matter that they are stored in separate physical locations on your hard disk. Note that recordings of Shostakovich's Symphony No. 5 (Bernstein - 1984) and Beethoven's Symphony No. 5 (Bernstein - 1984) will NOT be a problem in this regard, because although their ALBUM tags might be identical, their ARTIST tags will be different (given that ARTIST is meant, per the Axioms of Classical Tagging, to store the composer's name, which is different in each case of this example). Similarly, the two recordings Bernstein might have made of Shostakovich's fifth symphony won't be a problem, because the different recording years will mean the ALBUM tag will be different for each, even if only by one or two numbers (eg, Symphony No. 5 (Bernstein -1978) and  Symphony No. 5 (Bernstein - 1980) and so on).

Broadly speaking, in other words, anything still being reported as a problem by Niente after the fixup script has been run really is a cataloguing problem that needs resolving manually. In my case, I discovered quite an alarming number of cases of me cataloguing something as exactly the same composition as the one I'd just catalogued something else as. Having correctly tagged up Symphony No. 8. I just went right ahead and catalogued Symphony No. 9 in exactly the same way. The fixup script cannot save you from such ineptitude, I'm afraid!

I can't say it was trivially easy to work out what horrendous cataloguing stuff-ups I've made over the years, but using the residual Niente report as my guide through my music collection, I eventually got myself into this position:

...which would seem to indicate I managed to fix up everything, 500+ by the automatic means of the fixup script, followed by about 90 manual tweaks and corrections.

To conclude, therefore:

Download the fix-up script from here and save it to somewhere convenient (I tend to stick things on my Desktop). To then run it, you open a terminal session and type something like:

cd $HOME/Desktop
chmod +x fixtracks.sh

The first command makes sure you're sitting in the folder where the script is stored; the second makes the script executable; the third actually runs it.

Remember to do a dry-run first, edit the $HOME/Desktop/fixtrack_folders.txt file to point to a good backup location, and then chmod +x the fixtrack_folders.txt file, before running it. You really must take a backup of your original music folders before letting the fixup script change them in bulk.

Only when your backup is in-place should you re-run the fixtracks.sh script, this time in real mode.

Good luck!