Ripping Yarns

Down the rabbit hole we go! I wanted to rip a new CD I purchased recently, shown at the left, of the reconstructed Elgar 3rd Symphony. Prestoclassical had the physical CD listed at £5.75, where the FLAC download was £8.04 -which, by the by, is a very oddly specific number!- so I went for the physical product rather than the digital download because I'm a cheapskate! Physical product is rather unusual for me these days, though. So I then had to rip the purchased CD to FLAC files myself -and that's where the fun started! Bear with me as I set the scene...

My desktop PC doesn't have an optical drive, so I use USB-connected ones when I need to. I have two: a DVD-ROM (i.e., reader only) that identifies as a TEAC; and a DVD-RW (reader/burner) that identifies as an HP device. So I ripped the Elgar with each drive in turn, using my own somewhat unloved CCDR program. Out of interest, I then checked the MD5 sums per ripped file (essentially, the digital fingerprint of the audio component of each file), using the command metaflac --show-md5sum "filename" and... they were different, depending on which optical drive they'd been ripped with!

That's not supposed to happen. CCDR is supposed to know about the 'read offsets' applicable to each drive and to adjust its reads accordingly. Read offsets arise from the fact that your PC says 'go read block 1 from disk' and the drive, for all sorts of reasons, says 'OK, here's the data from block 7'. To deal with that sort of manufacturing imprecision, you tell your ripper to read with an offset of (in that example) -6. So your PC now says 'go read block -5' and the drive responds with 'here's the data from block 1', which is what you really wanted to read all along. I'm simplifying like mad, but that's essentially what read offsets are and why they're needed when ripping. The point is that if you've got two different drives by different manufacturers (as I do!), then reading without offset corrections means that very, very slightly different bits of the audio stream will be read by each... and a missed byte here or there, although not audible, will result in different md5 'fingerprints' for the audio signal from each rip.

Now, I wrote the CCDR code quite a while ago, but I was very particular to ensure that it took account of drive offsets, so it should have been extracting the exact same digital data off the CD, regardless of what optical drive I was using to do it... but it very clearly wasn't on this occasion.

Perplexed, I fired up a Windows laptop and used the 'gold standard' Windows CD ripper, Exact Audio Copy (or EAC to old-timers like me!). I used it to rip the same CD using each of the two USB optical drives in turn -and its results precisely matched what CCDR had achieved on Linux. That is, the Windows HP fingerprints matched the Linux HP fingerprints, and the Windows TEAC fingerprints matched the Linux TEAC fingerprints... but neither matched each other. So not even EAC could extract the same audio data from different optical devices properly! This was getting serious!

Desperate, I installed my preferred Windows CD ripper: dbPowerAMP, and again ripped the same CD on two different optical drives with it. Once again, the HP drive produced fingerprints identical to those produced by the HP drive on Linux and with EAC on Windows; and the TEAC likewise produced the same fingerprints as the TEAC had produced on Linux and with EAC on Windows... but neither set of fingerprints matched each other, which meant even dbPowerAMP wasn't getting the correct audio data when working!!

Now, I have a confession to make: my original rips had not actually been done using CCDR. They'd been using the code taken from CCDR, but a bit modified with a view to a bit of a bug-fix and code modernisation. Now thoroughly confused by the strange results I was getting from two drives on two operating systems and in three different programs -all completely consistent amongst themselves, but with different results from different optical drives, which isn't meant to happen- I took a casual glance through my code and saw this (sort of: I'm showing pseudo-code, not the actual stuff):

# Compute a drive offset
DEVICE_OFFSET=$(clever code to work out what read offset should be used when ripping from this drive)

# Now do the rip
cdparanoia --use-offset=$DRIVE_OFFSET

Can you spot the dumb mistake I'd made? I first defined a variable called DEVICE_OFFSET to store the read offset required for each drive: for the HP, that was 102 and for the TEAC it was 6. Then I ran the cdparanoia ripper software, supplying as an offset number the variable called DRIVE_OFFSET. The variable called DEVICE-something is not the same thing as a variable called DRIVE-something!! In fact, since the variable DRIVE_OFFSET wasn't actually set to anything at all, anywhere in my code, it meant that cdparanoia was functionally ripping, no matter what optical device was in use, with an offset of zero. Ooops. That would do it! Since both drives were manufactured with different offsets, applying a single (and incorrect) offset to both of them meant they would inevitably read different parts of the CD's audio stream at different times -and that would result in digital fingerprints that differed between the devices.

But hang on... my stupid typo might explain why my own code produced different fingerprints. But why were EAC and dbPowerAMP agreeing with those erroneous fingerprints? They're meant to be proper, professional rippers that do this offset malarkey for breakfast. My original problem was inadvertently supplying offset 0 when ripping using my own code -but how could EAC and dbPowerAMP also be using an offset of zero, which they clearly were, given that their fingerprints matched my original ones precisely?

Well, unbeknownst to me, you have to actively configure EAC to use read offsets. I didn't know that needed to be done -and therefore hadn't done it! Net result: EAC really was using an offset of 0! What of dbPowerAMP then? The problem there is that dbPowerAMP is also configured to use a zero read offset out of the box if you haven't supplied one or more CDs that it recognised in its online database of 'accurate rips'... and this CD of Elgar's 3rd Symphony was not a disk it recognised. So it, too, was running without any offset correction. So that's why both programs had agreed with my homebrew, bug-filled runs on Linux: all three programs were indeed using an offset of zero, one by coding error and two by user error 🙁

Once I'd configured EAC and dbPowerAMP correctly, they started producing rips of the Elgar CD whose fingerprints completely agreed with each other, regardless of what optical drive was being used to do the rips: two different programs, two different optical drives, one consistent fingerprint. Hurrah! That's precisely what's supposed to happen!! Once I'd corrected my own code on the Linux PC, so that the DEVICE_OFFSET variable name was being used consistently and the DRIVE_OFFSET one was consigned to the dustbin, it too began producing FLACs whose fingerprints matched those of the now-correctly-configured EAC and dbPowerAMP! In short, all three programs, on two different operating systems and using two different USB optical drives were now producing digitally-identical audio rips, whose fingerprints were identical and consistent throughout.

Problem solved, the rabbit hole could now be exited at last ...though the evening and the morning were now the second day!

The moral of this story is that I still make stupid mistakes when coding and I should check my code before I run down other possible rabbit holes! The other moral of this story is: even the best tools you're sort-of familiar with need configuring correctly if they are to produce the right answers. I haven't personally used dbPowerAMP since version 16, which I think dates back around 7 or so years; similarly, I've only ever dabbled with EAC and that a long time ago, too. So: I simply wasn't using those tools correctly. The good news, however, is that once my coding mishaps were fixed and the Windows programs were configured correctly, I was able to verify that my code produced results identical to those produced by those two Windows programs (and one of which currently charges more than US$60 for its charms).

Incidentally, the real and original CCDR was always applying offsets correctly: it was only my re-worked extract of the code which contained the big boo-boo. The other moral of the story is, therefore: don't re-invent the wheel if the existing one is already nicely round and functional!

Oh, and one final lesson learned: Elgar's Symphony No. 3 ('realised' by Anthony Payne, as Elgar died before he could really get his teeth into it) is a splendid bit of music that's very much worth listening to. Providing it's accurately ripped... 🥴