Hi-Res Audio - Part 56

I hate to keep banging on about hi-res audio formats (especially when I am not keen on them myself), but now that AUAC can do DSF as well as ISO conversions (see my last post), some interesting things have come out of the woodwork that needed tackling. It's also the case that as lockdown finally eases, this will likely draw to a close a period of time in which I obsess about software and not a lot else... so, it's probably best to get these things out of the way whilst there's not a lot else to be doing!

First off is the question of why AUAC treats SACD ISOs differently from SACD DSFs. In other words, when you say auac -i=iso, you have to specify -o=hires if you want high resolution FLAC files extracted from the source SACD ISO (otherwise you get standard resolution ones)... but, if you say auac -i=dsf, you don't (you'll get hi-res ones by default).

Now, there's actually a perfectly reasonable explanation for the difference, as it turns out: I've been ripping SACDs to ISOs for many years but never bothered to convert them to high resolution FLACs. Instead, I always down-sampled them to standard CD Audio FLACs. The reasons for me doing that are also perfectly reasonable: my ears cannot tell the difference between high resolution and standard resolution FLACs, but my hard disk appreciates only being asked to store 56MB for a symphony movement rather than 390MB. So: I've always gone ISO-to-standard-FLAC, and I wrote AUAC to do exactly what I wanted it to do, after years of doing something a particular way.

So why the difference with DSF ripped from SACDs? Well, I've only just started ripping to DSFs, and my hard disk capacities are not in so much danger of being exhausted these days as they were back in the early 2010s. I also have a professional database administrator's aversion to data loss: even though I can't hear any difference between a high-res FLAC and a low-res one, ripped from the same SACD, there is data loss involved in the down-sampling to standard quality FLACs. It is certainly inaudible data that's being lost, but the mere fact of its loss kind of irritates me! Just knowing it has happened bugs me, if there's no really good practical reason why the extra large file sizes involved in retaining all the data on an SACD can't be accommodated. So, I wrote AUAC to convert DSFs to high resolution FLACs by default, because hard disk circumstances have changed and my ripping practices have done so similarly.

In short, the different ways AUAC has treated ISOs and DSFs was a reflection of my own evolving relationship with high resolution audio.

But that difference has now gone. In the latest release of AUAC (version 2.07), both ISOs and DSFs will be converted to hi-res FLACs by default. Both will only ever be turned into standard resolution FLACs if you explicitly say -o=flac. I've basically admitted that consistency of approach is better than reflecting my own personal audio journey: hi-res inputs should go to hi-res outputs, unless you explicitly say otherwise.

There is a second new feature in this latest version of AUAC, however, that I want to mention now, too. It concerns what sample rate should be used when converting an SACD to a FLAC format. It's not as simple an issue as it might seem to be, because DSD (Direct Stream Digital) audio, as used on SACDs, and encapsulated within DSF files, uses a completely different way of representing sound than a FLAC (or a WAV or an MP3) does. The difference is best illustrated in this graphic:

At the top, we have a sort-of Slinky or spring that is being pushed or pulled: that sets up a 'compression' or an 'expansion' in the spring. As an area of compression/decompression moves along the spring, so energy is transmitted along the spring. This is exactly how sound works in the physical world, in fact: air molecules are compressed and de-compressed, creating sound waves which propagate through the air until they impart their energy to your ears, at which point you hear sound. We call these sorts of waves 'longitudinal waves'. Meanwhile, at the bottom of that picture, we have a more common depiction of waves, as up-and-down motions of a medium: the kind of signal electronics make on an oscilloscope, for example. Light and electromagnetic waves are the real-world equivalents of that sort of 'transverse wave'.

And the relevance of this? Well, the transverse wave is how nearly all computer audio works. In a standard CD, for example, and in its ripped-to-FLAC equivalent, we take that up-and-down wave and chop it into 44,100 vertical slices a second. The height or depth of each sample can then be measured on a scale that permits 65,536 unique values: that number being what 16-bits buys you. It looks vaguely like this:

Each bar is a sample and the height of each bar is measured on a 65,000+ scale. If you make the bars thin enough (i.e., sample more often and make the width of each bar fine enough), you can get a very good approximation to the original wave-form: much closer than shown in the graphic, anyway. In fact, as I mentioned several posts back, the Nyquist-Shannon sampling theory tells us that under particular circumstances, the samples can perfectly describe the original wave-form, not just be a good approximation for it. In computer audio, we talk about this sort of audio encoding as 'pulse code modulation' or PCM. All the audio formats you're familiar with do it this way: FLAC, WAV, ALAC, APE and so on.

DSF is profoundly different, however, because it uses a more 'longitudinal' way of representing audio. First, it only provides a 1-bit value for each sample, which is a hell of a lot less than 16-bits! Instead of being able to represent 65,000+ different values, a single bit can only represent 2 values: on or off, 1 or 0. This sounds like a crippling limitation -and it is certainly that!-, but by way of compensation, the DSF files stored on an SACD are sampled not at 44,100 times a second but at a staggering 2,822,400 times a second. The 1 and 0 sampling 'depth' doesn't therefore describe the height of a wave, but whether the volume in this sample is louder (1) or quieter (0) than the previous sample. The data stream in a DSF file therefore ends up as a bunch of 1s and a bunch of 0s, something like


...and if you squint a bit you'll perhaps be able to see how all those grouped 1s look like a compressed bit of spring in the earlier graphic, and all the 0s look like the uncompressed bits of spring. So this data representation may appear to be limited, but is actually capable of describing a longitudinal wave very well. On a computer, we call this representation of a longitudinal wave 'pulse density modulation'. PDM is clearly profoundly different from PCM, just as an 'in-and-out' longitudinal wave is profoundly different in character than an 'up-and-down' transverse wave. Herein lies the problem, though: how can you compare a 44,100/65000+-value representation of a sound wave to a 2.8million/2-value representation of it? It's almost literally an apples to orange comparison! Accordingly, it's not entirely obvious whether a DSF file is best converted to, say, a 48KHz/16-bit PCM format; or an 88.2KHz/24-bit PCM format and so on.

The best I can do is to direct you to this graphic from the Wikipedia article on high resolution audio formats:

What that's trying to show is that the 2.8MHz/1-bit approach used by DSFs can produce a very slightly greater dynamic range (that is, the difference between the quietest and loudest sounds) than standard CD audio, but can capture significantly higher frequencies (none of which you can actually hear, of course, because the human ear simply cannot detect sound frequencies that high). The diagram also shows you what PCM encoding most closely matches that set of DSF performance characteristics: 24-bit bit-depth, sampled at around 96KHz.

And this is why AUAC outputs its high-resolution formats at 88.2KHz and 24-bit. The 88.2KHz is a bit on the low side, I suppose, if we are agreed that 96KHz is 'closest', but it's a perfect doubling of the standard CD Audio sampling rate (i.e., 44.1 x 2= 88.2), so producing 88.2KHz sampling is close enough but computationally easier than going for 96KHz.

But... clearly, some people will claim they can hear the difference between 88.2KHz and 96KHz (though they really can't!). And in any case, DSD technology hasn't stopped still since its first incarnation. They now talk about the original 'DSD' as 'DSD64', since it samples at that insane 2.8MHz sample rate, which is 64 times the original CD's 44,100Hz sampling rate. But it didn't take long to introduce 5.6MHz sampling rates (i.e., DSD128, because that's 128 times the original CD sampling rate); or 11.2MHz (so, DSD256). There is even DSD512, with a sampling rate of 22.4MHz -and I confidently expect there to be further doublings of the relevant frequency still in the pipe-line. You can even buy recordings mastered at these sorts of bit-rates: this website readily lists recordings made in DSD256, for example, though none I've seen are yet available in DSD512.

Anyway: the short version is, therefore, that however you cut it, you might be inclined to regard 88.2KHz as too stingy, if not exactly now, then maybe some time in the future. A higher sampling rate than 88KHz therefore seems required for the ultimate audio purists ...and looking at that previous diagram, it seems fairly obvious what that super-duper-ultimate-extreme high-res audio sampling rate should be: 192KHz at 24-bits.

Which brings me, finally, back to the second new feature in AUAC version 2.07: there is now a new switch, -o=xhires, which will trigger AUAC to output extreme hi-res FLAC files that, internally, use 192KHz sampling frequencies at 24-bit bit-depth. It only really makes sense with DSF and ISO inputs, but there's nothing stopping you from asking for MP3s to be converted into extreme hi-res FLAC format, if you are mad enough to want to do so.

As ever, updates to AUAC are best achieved by running auac --checkver. Failing that, type these three commands:

wget https://absolutelybaching.com/abc_installer
bash abc_installer --auac

Or, if you prefer, simply download the script and install it manually.

Happy super-dooper-extreme-ultimate-hires-FLACing. 🙂