I mentioned last time that when ripping an audio CD to digital files, it's important to know that this optical drive reads the first audio sample from (say) sample 103, whereas that optical drive reads the same first audio sample from sample 6. The inability to precisely and accurately read the first audio sample from, er... the first audio sample is, unfortunately, inherent in the design of the audio CD standard in the first place (which has no absolute positioning information encoded in the data stream) and in the vagaries of hardware manufacturing, where tolerances vary between manufacturers, designs and even batches of the same design by the same manufacturer!
On the whole, however, a given optical device product will be consistent about its failings. If one specific ASUS DRW-20B1 device reads its first sample from actual sample -6, then you can be fairly sure that almost all ASUS DRW-20B1's will do the same thing. You can therefore build up a database of known optical device models with a record of what their read positioning errors are -and this is exactly what the AccurateRip database of CD drive offsets is. Knowing these 'offset corrections', you can then tell your device to read (say) sample 103 knowing that this will actually make it read sample 0 (computers usually start counting at zero!), which means you now know you're actually reading the correct 'start of audio'. Thus, once you know the read offset that applies to each make of optical drive, then the same audio CD can be read from the same absolute starting position in the audio signal no matter which make of drive is doing the reading. AccurateRip therefore lets you produce consistent rips with different optical drives, because applying the read offsets always ensures each drive can read the start of the audio data on the CD correctly. The story of me doing precisely this (once my code was correct and tools like EAC and dbPowerAMP were configured correctly!) is what my last blog post was all about, after all.
But does it produce actually accurate rips? Meaning, if I apply the correct offset for my particular make of drive, am I then guaranteed to start ripping at precisely the start of the audio data contained on a CD?
It sounds a daft a question, given the name of the project... but actually the answer to it is a definite 'no'. AccurateRip is not "accurate"!
The problem is that read offsets are relative numbers. Relative to what? To the first CD drive that was used to invent the read offset mechanism in the first place. That first drive produced a digital fingerprint of X for input a; the second optical drive produced a digital fingerprint of Y for input a... so the question was, what read offset can we apply to the second drive so that it would adjust its read position and start producing digital fingerprint X for input a, too? Whatever offset that is, add it to the database for that model. Every other optical device listed in the AccurateRip database is similarly listed with an offset that also makes that new device output a digital fingerprint of X for input a, too. Once all devices are consistently producing X for input a, they'll all produce fingerprint Z for input b, too, and so on. Everything is calibrated, in other words, so that given the same audio CD input, they will all output the same digital fingerprint.
The trouble is, that first device was out by 30 samples, so it's 'X' was actually an incorrect digital fingerprint to start with (possibly!). Using AccurateRip therefore ensures that all optical drives are consistently able to output the wrong digital fingerprint for any given source audio CD!
It is as if the inventors of the metre was 2 millimetres short of an actual metre: we'd say a building was 45 metres tall, and it would actually be 90mm short of that, if there were somehow an objective test of what a metre really was that we could appeal to. Since we would have defined our meaning of the word 'metre' on the basis of some physical 'standard' that was only 998mm long, however, we would all still say the building was 45 metres tall; and if we said to the architect 'now I want a 90m tall building', it would genuinely end up being double the height of the first one. Everything would be relatively correct in height, therefore: it's just that an alien species visiting us using this 'objective metre standard' would beg to disagree on the specific numbers we'd assigned as the measurements in each case.
When this was pointed out to the inventor of AccurateRip (whose own Pioneer CD-ROM was the 'ground zero' device whose own internal read position error he hadn't accounted for), in the early 2000s, his response was three-fold: first, it's too late to do anything about it now, because millions of CDs have already been ripped with the (wrong) ground zero device offset as their point of comparison; second, it's physically impossible to get a 'perfect' rip of the audio signal on a CD anyway (because of the lack of true, absolute location data within the physical medium), so any read offset will only ever be an approximate number; and third, even if all 'AccurateRips' are off by 30 samples, it doesn't really matter.
Let's deal with that last part of his response first. The "error" in AccurateRip's database of read offsets is 30 samples: tell a device to read sample 1 and it will actually read sample 31, missing the data contained in the first 30 samples entirely (and adding 30 samples of blank data to the end of the last track on the CD). Remember that the audio on a standard CD is sampled 44,100 times a second. So 30 samples amounts to 30÷44100 = 0.000680272108843 seconds of audio data missed when ripping. That's to say, if your CD starts with a moment of silence, you're missing around 680 microseconds of nothing at all. If it starts with a bang, however, then yes: you're missing 680 microseconds of the bang! Can your ears actually hear that, though? "Unlikely" is a bit of an understatement, I think, unless you possess mystic superpowers: the fact is that no human ear based on messy, wet biology will ever really be able to spot that amount of missing samples, no matter how 'bang-y' your choice of music might be -and classical music in particular tends not, on the whole, to start with abrupt bangs anyway (Grieg's Piano Concerto excepted, I suppose)! This is the underpinning of the 'it doesn't really matter' argument -and I can't really argue with it.
The database administrator in me, however, objects to losing 30 tiny samples of even completely zero-based data!
The inventor of AccurateRip also made the perfectly reasonable point that it's far better than 100 rips of a given audio CD should be identical with each other, even if objectively they're all 'out' by a miniscule amount, than that 100 different versions should exist -for we would never be able to know which one of the 100, if any, was 'true'. If you were to rip the same CD, would you like to know you agree with the work of 100 other people, even if you are all 'wrong' by a trivial amount; or would you prefer to be yet another unique rip? Which situation would give you more assurance that your rip was a good one and didn't include random pops and clicks caused by scratches or spots of jam on the disc surface? I again agree with Andre Wiethoff that I'd probably prefer to know I was 'in consistent company' than that my audio hardware was doing things entirely unique to me, for that uniqueness might be 'uniquely good' or 'uniquely, randomly weird and wrong'. No matter the tiny 'error' built in to AccurateRip, therefore, it still serves a very useful purpose: ensuring consistent results, even if they are consistently very, very slightly wrong results.
The database administrator in me, however, still objects to losing 30 tiny samples of even completely zero-based data!
So is AccurateRip accurate in the sense of 'guaranteed to be objectively correct'? No. It's guaranteed to mean that your optical device can output precisely the same audio signal as someone else's completely different device, given the same audio CD input. Is that good enough? Yes.
Except for obsessive-compulsive former database administrators.
For such people, the cure is to subtract 30 from whatever the official list of read offsets declares is needed for your optical drive. If the list says your offset is 6, it should actually be -24; if it says it's 103, it should actually be 73. Rips made with these adjusted offsets will never match those produced by millions of other owners of that same CD (this is the "it's too late to fix it now" argument), so you will not know whether your rip is digitally, bit-wise perfect; nor whether it agrees with all those other rips or not -but at least you will have the satisfaction of knowing your rips aren't losing 680 microseconds of anything at all at their start!
It's for this reason that I'm considering writing a new CD ripper which will have the option of setting HYPER_ACCURATERIP=1 or HYPER_ACCURATERIP=0. If set to 1, the program will subtract 30 from the 'official' offset for your optical drive. If you set it to 0 (which will be the default), your rips will use the same 'wrong' offsets that everyone else uses. Hyper-accurate rips won't be comparable to anyone else's rips of the same audio CD: the audio stream will have been ripped from a starting position that millions of other rips won't have used. So you will be special and unique; and though your ears won't thank you (because they won't be able to tell), you may still gain an inner sense of satisfaction from 'doing it right'!
Am I going to use this new feature to go back and re-rip my 6000+ CDs? Not a chance! My inner database administrator may be obsessive-compulsive, but he's not stupid! I may use the new capability for any future rips I make, but I'm not sure I'd bother even so. I just think it fair the option is there to use or not as you see fit, but I'm not going to lose sleep over it.
Funnily enough, the largest 'family' of comments I saw when researching this topic in the dark undergrowth of the Internet that is audiophile discussion groups was on the theme of 'who on Earth rips physical CDs these days?'! Clearly, everyone's doing Tidal, Qobuz, Spotify and/or other music streaming these days and using physical media marks you out as some sort of dinosaur 'boomer' that is well on the way to a care home. Which is possibly true enough, I suppose, from the perspective of the music industry as a whole. Specifically classical music listeners, however, still have a fondness for physical media (including, gasp, LPs!) which means this sort of esoteric discussion is still worth having from time to time, as far as I'm concerned. Take a look at the background to any of Dave Hurwitz's videos as proof of this contention! Even I, purchaser of downloads from the likes of Prestoclassical though I am, sometimes use physical CDs: they are incredibly cheap sources of music from second-hand and charity shops, after all.
I do wonder what is going on when I buy a digital download from Presto, though: are they taking account of read-offsets when producing their FLAC files? Or does the digital data come to them in completely novel forms where this sort of concern doesn't apply? I can't answer that... but it's certainly food for thought... or lying awake at nights...!
You shouldn't worry too much about the arbitrary reference point for AccurateRip offsets. Some drives in that database go as low as -1164s and others as high as +1176s, so as a baseline it is pretty well-situated near the center.
And while a 2340-sample swing might sound like a lot, it is nothing compared to the swings one can find when comparing different pressings of the same (bit-identical) material; those can vary by half a second or more!
(All professionally-manufactured discs should be sufficiently padded on either end to accommodate this imprecision and then some.)
But offsets aside, it is important to understand optical drives have no innate way of verifying the accuracy of their reads. Back when the redbook standard was formalized, audio data was viewed as "error-tolerant", so the medium's corrective mechanisms intentionally stop at Good Enough.
That's basically why AccurateRip (and the later/better CUETools database) was established. Statistical certainty is the closest proxy for certainty you can get with this particular medium.
Hello Josh, and thanks for commenting...
I agree with everything you said -and said so in the post (though I realise there were a lot of words there. Succinctness has never been my strong point, I'm afraid).
Specifically, "So is AccurateRip accurate in the sense of 'guaranteed to be objectively correct'? No. It's guaranteed to mean that your optical device can output precisely the same audio signal as someone else's completely different device, given the same audio CD input. Is that good enough? Yes."
And: "I'd probably prefer to know I was 'in consistent company' than that my audio hardware was doing things entirely unique to me"
And: "the lack of true, absolute location data within the physical medium"
And: "Am I going to use this new feature to go back and re-rip my 6000+ CDs? Not a chance!"
I don't obsess about accuracy, really. The whole tenor of the piece is about consistency being more significant than accuracy, however defined. I like knowing my rips match other people's. And the discussion is twenty years old at this point, anyway: but I thought it was interesting and knocking 30 samples off an offset is trivially easy to do for any obsessive compulsives out there...
I have looked at the CUETools database, and I certainly like it in theory. Unfortunately, I'm a Windows-phobe, so unless there are Linux tools for interacting with it (which I haven't found, sadly), I'll be sticking with the shenanigans of AccurateRip (and no, WINE is not an
Em...er, option!)