Niente Version 4 : Integrity Checks

1.0 Introduction

When you populate a Niente database, you fill it with data about which FLACs exist -and nothing much else. Here is a screenshot of the Niente database in a third-party tool that understands how to read and display databases after scanning for FLACs (Database menu, Option 2) but before performing any integrity checks:

You'll note that each row lists FLAC files found: that's the data which a Database menu Option 2 scan generates. You'll also note, however, that each row in this TRACKS table is supposed to have 13 columns of data -from MD5 hash values, to extracted PERFORMER, COMMENT and ALBUM tag data (to mention just a few). Immediately after a file scan all these columns are NULL for all rows: Niente knows nothing at all about the FLAC files, except for the fact that they exist.

A Niente integrity check is the process by which these columns are filled in with data, by visiting each listed FLAC in turn and reading all the metadata tags and copying the data found there into the relevant columns. Without an integrity check, therefore, Niente can report nothing meaningful about your music collection, other than the mere number of FLAC files that exist.

Generally speaking, populating these missing columns of data doesn't take a huge amount of time: one only needs to visit a FLAC file once to read all its tags in one fell swoop, so it's not particularly demanding on your CPU or hard disk. Tag data is, in other words, logical data about the FLAC; and logical data is easily and swiftly read by merely opening the file in question.

However, one particular set of columns requires quite a bit more work to populate: the HASH_ORIG and HASH_NEW columns are the columns in which the MD5 'digital fingerprint' of the audio signal within a FLAC are recorded. The HASH_ORIG is easy to fill, in that every FLAC is 'born' with an in-built MD5 hash value the moment it is created ...and so all we have to do is read it, like any other logical tag. The HASH_NEW column, however, is where Niente stores the results of re-computing the MD5 fingerprint from scratch -and re-computation involves reading the entire audio stream (so, for a Wagner opera that's going to be a lot of music to read!) and performing a fairly CPU-intensive computation on the data thereby retrieved. Filling the HASH_NEW column is therefore a major slog... and is involved in determining the physical characteristics of a FLAC.

In short, Niente integrity checks can be logical or logical+physical (Niente doesn't have an option to only do physical checks). Logical ones tend to be relatively fast, and populate around 12 of the columns in the TRACKS table. Physical ones tend to be quite slow and computationally expensive and end up populating the HASH_NEW column in the TRACKS table.

The Integrity Checks menu, Option 1: Perform a full integrity check is the option used to perform logical+physical integrity checks, for all records in the TRACKS table, ab initio. The Integrity Checks menu, Option 2: Perform a differential integrity check also performs logical+physical integrity checks, but it only does so for those records which fail the various logical and physical integrity tests that Niente performs. If you add 3 new recordings to the database, for example, they will start off without any of the 13 columns of associated data and Niente will regard that as meaning that they fail its logical and physical tests: only those 3 new recordings would therefore have their logical and physical characteristics read, computed and loaded into the database.

The Integrity Checks menu, Option 3: Perform a fast integrity check only performs a logical integrity check: it does not pay attention to hash value mismatches in determining what TRACKS rows to visit and if it visits a FLAC for any reason, it won't re-compute the MD5 'fingerprint' for that file. Like the differential integrity check, however, the fast integrity check only visits those FLACs which are recorded in the database as failing one or more of the logical tests that Niente applies. It will not, therefore, visit all the rows mentioned in the TRACKS table (unless all rows lack data for the metadata columns in the TRACKS table, of course).

By way of summary:

  • Full :
Logical and physical tests of all known FLACs; previous results are wiped
  • Differential :
Logical and physical tests of FLACs that are known to fail logical or physical tests; previous results are preserved for all known 'good' FLACs
  • Fast :
Logical tests only for FLACs that are known to fail logical tests; previous results are preserved for all known 'good' FLACs

When you have just created and populated a database with Niente, therefore, your first integrity check should be a full one. If that produces details of any logical tagging errors that you need to fix (such as the recording year not appearing in the ALBUM tag, for example), then you re-tag the affected FLACs and get Niente to notice your corrections by running a new fast integrity check: since the fixes were all in tag metadata, there's no need to worry about re-computing the physical hash value for the audio signal in any of those FLACs. If Niente ever reports physical file corruption, however, then you'd re-rip that CD, or restore the files from a known good backup -and you get Niente to notice the fix by this time performing a new differential integrity check, since that will re-compute the MD5 hash values for the affected FLACs.

2.0 Performing an Integrity Check

All integrity checks use the contents of the TRACKS table in the database to determine what FLACs should be visited and inspected. That is, they don't go off and do a re-visit of the file system to search for FLACs: that's what loading the database is for.

All integrity checks are performed in the same way: select the Integrity Checks menu, then select options 1, 2 or 3 (as explained in Section 1 above, each type of check is used in different circumstances). Immediately one of these options is taken, Niente starts visiting the appropriate FLACs listed in its database and reads their contents from disk:

The program display will show you which file is currently being read -and will list out some of the data that it has found within the file. Since this is a screenshot of a full integrity check, you'll notice that it displays the MD5 Hash which Niente has re-computed from the FLAC's audio signal (remember, full and differential checks both concern themselves with a file's physical integrity and MD5 hash computations are how they do that). If I'd selected to do a fast integrity check, the screen would have looked like this:

It's the same sort of logical data as shown before, but this time there's no 'MD5 Hash' item to display. Fast checks don't read or re-compute MD5 hashes at all: they don't concern themselves with the physical integrity aspect of a FLAC.

Note that you can interrupt an integrity check at any time (by pressing Ctrl+C, which will trigger the exit from Niente completely), and a new integrity check will simply resume from where it left off (unless it's a full integrity check: they always start from row 1, from scratch). However, interrupting by pressing Ctrl+C essentially causes Niente to self-terminate ...and after you do that, when you re-launch Niente and attempt to re-run an integrity check, you'll see this message:

When an integrity check starts, various temporary files are created. If we were to allow a second integrity check to be launched simultaneously, its temporary files would over-write those of the first check, and chaos and mayhem would result. So, every integrity check takes a temporary lock on things, so that second and subsequent checks know to display this message rather than attempt to do anything that would damage the currently-running check. The trouble is, of course, that if your PC were to crash in mid-check, or if you interrupt a check with the Ctrl+C key combination, Niente never gets a chance to remove the program lock it sets as an integrity check begins. When you next try to run an integrity check, therefore, Niente sees the last session's lock and assumes it's still valid and thus fails to run anything at all.

The fix for this is to tap the L key: that will forcibly remove the program lock and allow a second integrity check to pick up from where the original one got to.

Be warned: if you remove a program lock when it's not safe to do so, you can corrupt your database. That's not fatal, of course: you'd simply wipe it, re-load it and start a fresh integrity check. For a large music collection, however, that would be a lot of work to re-perform, so it's not recommended! Remove the program lock if you have to; don't do it routinely or unnecessarily.

At the end of an integrity check, if you were able to see inside the Niente database, you'd see this sort of thing in the TRACKS table:

Compare that to the first screenshot shown in the introduction to this page and you'll see that all the columns associated with each row of data now have meaningful data in their associated columns, not just the word 'NULL'. This means we now have data about a FLAC that we can read, interpret... and point out when it appears logically inconsistent (which is what Niente's various reporting options will do for you).

3.0 Album Art and Volume Boost Checks

Integrity checks gather metadata tags from FLACs and (at least if it's a full or differential check) re-computes the MD5 hash or 'fingerprint' of a FLAC's audio stream. This allows Niente to report on most sources of logical inconsistency or physical corruption. There are two special exceptions, however: checking if album art embedded within a FLAC is of a suitable size and shape; and checking if the volume levels of the FLACs within a folder are as loud as they could be. These album art and volume level checks are not performed as part of a 'standard' integrity check -simply because not everyone wants to volume-boost their FLACs or to embed album art within them, whereas everyone wants to know if their FLACs are physically corrupt!

Accordingly, the Integrity Checks menu, Option 4: Check album art for all files and Option 5: Check album art for new files both perform album art checks, either against every single FLAC in your collection (option 4), or only for those files known to Niente but not previously checked for album art status (option 5).

Similarly, Option 6: Check all files for volume boosts and Option 7: Check new files for volume boosts are provided to perform an analysis of the music stream of a FLAC file and determine its peak loudness. Option 6 performs that analysis afresh for every FLAC file known to Niente, whilst option 7 performs the same sort of analysis, but only for those FLAC files known to Niente but which have not previously been volume-level assessed at all.

In both cases, the 'all files' option literally wipes the existing album art or volume data from the Niente database and starts collecting it once more, from scratch, for every FLAC in your music collection: that makes these potentially lengthy options to run. When you've just added a couple of new CD rips to your collection and want their album art and volume levels collected, however, that 'start over' approach would be a bit of overkill: the 'new files' versions of each type of check thus helpfully only bothers to collect such data for files which are known about (thanks to them being added to Niente via the Database menu, Option 3) but which are known to lack any prior volume or album art data.

Both the Album Art and Volume Boost checks work the same way a 'regular' integrity check does: you tap the relevant menu option and Niente immediately runs off to read the FLAC files in its TRACKS table and work out the appropriate data to store back in its database.

An album art check-up displays minimal information as it works:

Such checks will be over fairly swiftly, even on huge music collections, as it's trivially easy to check for a piece of artwork's pixel dimensions (for this reason, you cannot interrupt an album art check. That is, of course you're allowed to press Ctrl+C any time you like, but if you do that in the middle of an album art check, all work performed up to that point is lost entirely: a renewed art check therefore has to start over from scratch).

The results of these album art scans are stored in their own table (called ALBUMART), as follows:

As you can see, only two bits of data are collected about any piece of embedded album art: it's height and width, in pixels. Niente does not (cannot!) concern itself with whether your album art is in focus, too old, too new or of questionable artistic merit! This data is, however, sufficient to tell us (via the appropriate reports, of course) that such-and-such a file has non-square album art (which is a problem for some people!) or that this-or-that FLAC has album art which is generally deemed too small or stupidly huge.

The volume boost check display contains a little more information as it proceeds:

The filename of the FLAC being analysed is displayed, along with the loudest volume level detected in that file. If the loudness is below a configurable threshold (by default, it's -2dB, but the Administration menu, Option 1 lets you alter that), then the display shows 'Volume boost possible' in the main part of the screen and increments a counter of possible volume boosts shown in the bottom right-hand corner of the display. It bears repeating that Niente isn't actually applying a volume boost: it's not its job and Niente never modifies the FLACs it knows about anyway. It simply collects the data that lets you determine whether to apply a volume boost or not.

The volume boost scan populates its own table in the database, called MAXVOLUMES. If you could see inside the Niente database, you'd see it looks like this:

You'll note that the table does not list individual FLACs, but folders. That's because you cannot meaningfully volume boost individual FLACs regardless of the volumes of the other files in a folder.

For example: take a two-movement symphony. Say the first movement is determined to have a potential volume boost of 0dB (i.e., no volume boost at all, because it's already as loud as it can be without distorting), but the second is determined to have a potential boost of 4.5dB. You could then boost the second file's volume by 4.5dB, so it's now as loud as the first: you've just destroyed the relative volume levels between movements. Or you could retain the relative volume between the two movements and boost both files by 4.5dB: you've just made the first file sound horribly distorted and clipped.

What of course you have to do -and what Niente actually does- is to say, the folder as a whole can only be boosted by whatever it takes to get the loudest file to a non-distorting maximum level... and in my example, that would mean that since file 1 is already at 0dB, the folder as a whole cannot be boosted at all. Hence, Niente's MAXVOLUMES table only collects folder-based aggregate data, not per-FLAC data.

Performing a volume boost check is CPU intensive and can take a long time, since the audio stream data in every FLAC has to be read to determine its peak loudness. Imagine having to wade through Wagner's entire Ring Cycle to work out what the loudest bits sound like! It's going to take a while ...and therefore volume boost checks can be interrupted (with a Ctrl+C). You'd resume from where you left off by taking Integrity Checks menu, Option 6, which picks up from where things had got to, without re-calculating everything from scratch (which is what Option 5 does). You'd have to remove the program lock (press 'L') to resume an interrupted boost scan, of course.

4.0 Scheduling Unattended Integrity Checks

Some integrity checks take a long time to complete (especially full integrity checks and volume boost checks). You generally don't want to be hanging around watching your screen for multiple hours as they plough through their work! Accordingly, Niente lets you schedule unattended integrity checks by launching it with various run-time parameters. This means that instead of launching Niente with the bare command niente, you add 'switches' to the command to make Niente do something immediately without further user intervention.

For example, if I launch Niente by opening a terminal and typing the command niente --check-full, Niente will immediately start to perform a new full integrity check. Or I could type the command niente --check-volume and a new complete volume boost check will begin. When Niente is launched with one of these run-time parameters, it provides absolutely no feedback at all to the user: nothing appears on the screen, which will simply go black and sit there as if nothing is happening at all (even though the program is running like crazy in the background!) The program provides no feedback in these situations, of course, because you're not meant to be sitting there watching it: they're for unattended operation of the program!

The complete list of run-time parameters that you can launch Niente with is as follows:

ParameterPurpose/FunctionMenu Equivalent
--scan-fullWipes the existing tracks from a database and then re-scans default music folder to re-populate it from scratchDatabase -> 2
--scan-newScans the default music folder for new or modified recordings and adds them to the existing databaseDatabase -> 3
--check-fullPerforms a full integrity check (i.e., physical and logical checks for all recordings in the database)Integrity Checks -> 1
--check-differentialPerforms a differential integrity check (i.e., physical & logical checks for recordings with known physical or logical corruption issues)Integrity Checks -> 2
--check-fastPerforms a fast integrity check (i.e., logical check only of recordings already known to the database)Integrity Checks -> 3
--check-artPerforms a new album art check for all recordings known to the databaseIntegrity Checks -> 4
--check-volumePerforms a complete check of possible volume boosts for all recordings in the database Integrity Checks -> 5
--aggstatsGenerates a quick aggregate statistics report and writes it to /tmp/nientestats.csvReporting -> General -> 1

The real idea of these parameters is, of course, that you'll use your operating system's scheduler to make these activities happen during the dead of night. In most cases, this means adding a suitable entry to your system's crontab. Here's the crontab I use:

59 23 * * * /usr/bin/niente.sh --scan-new
00 1 * * SUN /usr/bin/niente.sh --check-full
00 1 * * MON-SAT /usr/bin/niente.sh --check-differential
00 3 2 * * /usr/bin/niente.sh --check-art
00 3 3 * * /usr/bin/niente.sh --check-volume

...which means, as follows:

  • Run a scan for FLACs which aren't already in the database and add them, at 11:59PM every night
  • Run a full integrity check at 1AM every Sunday
  • Run a differential integrity check at 1AM every night that isn't a Sunday
  • On the 2nd day of every month, at 3AM, run a fresh, all-files album art check
  • On the 3rd day of every month, at 3AM, run a fresh, all-files volume check

There is no run-time parameter, you'll note, to trigger a fresh album art or volume check only for those files that don't have the relevant information collected already: it's the full 'all files' version of those checks or nothing. There's also a potential issue if the 2nd or 3rd day of the month happens to be a Sunday: my full integrity check is running on Sunday and by 3AM it's only been running for a couple of hours, which means it won't have finished. The art and volume checks that are meant to run that morning at 2AM are thus likely to spot the existence of a program lock and therefore not be able to run themselves. I consider this a rare enough occurrence not to care about a missed art or volume check, though, so I live with it. If that ever changes, I'll have to launch those checks later on in the day, maybe in the afternoon

Anyway: you get the idea. You can use the run-time parameters to schedule integrity checks to be performed in the dead of night, when you're not having to sit around and press menu options to make them happen!

Just be aware that there's no runtime parameter to specify the database to use or the music folder to scan for recordings: those have to come from a configured default database and music folder (so visit the Administration menu, Option 1 to set them).

5.0 Conclusion

Integrity checks of all kinds end with a whimper, not a bang! That is, no alarm bells go off, nor flashing lights annoy, whenever an integrity check reveals data corruption, logical inconsistencies or poorly-sized album art. The job of an integrity check is simply to collect the data that indicates those things exist: it's up to you to run the various reports that will tell you, precisely, which files are affected by such things.

Having run an integrity check, therefore, it's important to run one or more reports to find out what the integrity check discovered as it worked.

Remember that most integrity checks can be interrupted by pressing Ctrl+C, but that doing so leaves behind a program lock that will prevent any future integrity check from starting. Tapping the 'L' key removes that lock, permitting new checks to start: but if you remove the lock when you shouldn't do, your new check will completely screw up the work being performed by the already-running check.


[Back to Font Page]|[Integrity Checks]|[Reporting]|