Houston, we have a problem... Part 4...

This is the last in my series about how the new Niente Version 3.0 has revealed past cataloguing 'issues' with my music collection. In previous episodes, I've dealt with:

In this one, I'm turning my attention to the problem identified in the Niente Quick Aggregate Statistics report as statistic number 11, "Missing sample rates in filenames". As you can see from this screenshot, I'm missing this data in over 14,600 files (and I've only got 17,656 in the collection, so that's most of them!):

What that statistic is referring to is the ability of my tagging program, Semplice, to add the bit rate and sample size to the physical filename of any FLAC. Here's an example of it being done correctly:

The file has the numbers and text "-16-44100" at the end of its name. That tells me the FLAC has been produced as a rip from a standard audio CD, since the "16" means "16-bit depth" and the "44100" means "sampled at 44,100 Hertz". Those two numbers are the very definition of 'standard CD audio'. But here's another file in my collection:

Here, merely by visual inspection in my standard file manager, I can tell immediately that this copy of Peter Grimes is a 24-bit depth recording, sampled at 96KHz: that sounds like a rip from an SACD to me.

The point is that I could tell this information by inspecting the recordings in specialist tools, such as Kid3: here's the Peter Grimes example open in that tool, for example:

On the top line of the main part of the display, we read 'FLAC 2304 kbps 960000Hz', which certainly confirms it's not standard CD audio; but it doesn't tell me it's a 24-bit depth recording, and in any case, it's a pain to have to open a file merely to reveal some technical details about it. That's why Semplice makes it possible to store these details in the file's mere name, so that just using your everyday, ordinary file manager you know the audio quality of a particular recording.

I stress that this is merely an 'artefact' in the file name: adding these numbers to the file name doesn't in any way alter the quality of the audio contained within the file, nor does adding them to the physical file name alter the contents of the FLAC or its metadata in anyway. So, having them is a 'nice to have' feature (for me, anyway) and putting them there is non-destructive and completely harmless. I'll confess, too, that having this data in the file name is not a requirement of the Axioms of Classical Tagging, so it's a completely optional thing to do. In Semplice, when tagging your FLACs, you can turn the feature of adding the data to the final file name on or off as the mood takes you by setting the NAMEBITS persistent configuration parameter. It's entirely fine to have it switched off, but the default (and my personal preference) is to have it always switched on. The feature arrived quite late to Semplice, however, so -as is apparent from Niente's statistics- most of my recordings were tagged before its introduction... and thus lack the 'name bits identifiers'!

How then to put them back into the 14,000+ recordings that currently lack them?

As you can probably guess, there's no way I'm physically renaming 14,000 in my file manager by hand! I've therefore written another of my fixup scripts, this time one called fixnames.sh, which I've made available from the Niente User Manual 'fix up scripts' page.

The script first queries the contents of the Niente database (so you ought to first do a new complete integrity check with Niente first, so the database contents are as up-to-date as possible). It finds all files whose physical names do not contain a bitdepth identifier surrounded by hyphens on both sides. That means a filename containing -16- or -24- would be skipped: they already have their bit-depth identifiers.

Note that a recording called "24 Preludes.flac" or "16 Songs for Tenor.flac" would not be skipped. Though those file names contain the 'magic numbers', they are not hyphen-surrounded, so do not get regarded as bit-depth indicator numbers.

For all qualifying files found (i.e., all files listed in Niente's database as lacking the bit-depth identifiers), the script visits each file in turn and renames it to include them: nice and simple! The bit-depth identifiers go into the new file name at the end, right next to the ".flac" indicator. Thus "01 - Allegro.flac" becomes "01 - Allegro-16-44100.flac" or "01 - Allegro-24-88200.flac" or even "01 - Allegro-24-1920000.flac" if you have files ripped at absurdly high sample rates. The numbers added to the file names are sourced from Niente's analysis of the audio signal for each file, performed during a full integrity check: the fix-up script doesn't do any signal analysis of its own to work out what they should be.

There are no great subtleties here, just a bulk rename of things that appear to be otherwise lacking a couple of numbers in their file names.

Download the fixnames.sh script from the Niente user manual page, store it somewhere on your system (I usually stick it on my Desktop), and make it executable with a variation on the command:

chmod +x /home/hjr/Desktop/fixnames.sh

Before actually running the script, open it in a text editor of your choice and review the five parameters that appear at the top of the script (after the comments, but before the start of functional code lines):

  • DBNAME - make sure you set this to the name of your Niente database that contains details of the tracks whose file names you want fixed up
  • CONFDIR - should point to where your Niente configuration file is stored. It's set to the standard, default location by default and shouldn't need altering
  • SEDPROG and GREPPROG - should leave these set to 'sed' and 'grep' if running on anything other than MacOS. Set to gsed and ggrep respectively if you are running on MacOS
  • FINDPROG - This is the name of the operating system's 'find files' utility and is set to 'fd' by default. On Ubuntu and its derivatives, you'll probably need to set it to "find"

Once you've made any appropriate changes to those parameter, save the file and then run it in a terminal session with commands such as:

cd $HOME/Desktop

The script will immediately run off to the file system and start changing file names! For each file the script successfully renames, it will output an 'F'. For any file it cannot rename, when it knows it should, it will output a period/fullstop. If it were me and my collection (as it indeed was at one point!) I would remind you of the necessity for backups. These file name changes cannot be bulk-reversed, so be sure to have a fresh backup of the old file names before you begin!!

The only time you'll see a fullstop output, it will be because Niente has declared the FLAC lacks the bit-depth identifiers, but a file that contains them is already present in the relevant folder -and, of course, you cannot rename a file to have exactly the same name as a file in the same location already has. It will be a pretty rare occurrence, therefore, unless your Niente database isn't fully up-to-date before you start. If you were to re-run the fixup script after it had already once completed successfully, without having done a fresh full integrity check afterwards, then the script will output nothing by fullstops: the Niente database will be declaring lots of files need renaming, so the script will try to rename them... and then discover that, time after time, a FLAC with all the right bit-depth identifiers already exists.

How did I get on, after the script running for around half-an-hour or so? Pretty well:

I'll point out that the number of files mentioned in statistic number 0 at the top of that screenshot has gone down significantly from before: that's simply because the fix I discussed in the third part of this series of articles combined a lot of multi-file recordings into single-file SuperFLACs. So the drop in numbers there is nothing to do with this fix. For that, pay attention to the statistic number 11 again: missing sample rates in filenames... zero!

You'll also notice that my music collection now no longer suffers from any of the logical or physical anomalies which Niente detects, meaning that after four episodes of this series and four different fix-up scripts, my music collection is now 'clean' and free from the tagging stuff-ups that afflicted it. It is now, at least to Niente's satisfaction, entirely free of logical or physical defects and conforms exactly to all the requirements of my Axioms of Classical Music Tagging article. This makes me very happy, of course... but does rather mean that I've reached the end of this particular sequence of blog posts! Thanks for coming along for the ride...

Updated to add: I now report my Niente statistics, computed nightly, on my Current Aggregate Statistics page, so if I ever make mistakes when cataloguing and tagging new additions to the collection in the future, you'll be able to see me doing it in near-real-time (well, day-by-day, at least)! Hopefully, however, that page will continue to report lots of zeroes for all the various 'error conditions' statistics.