Semplice Version 2: Audio Processing

1.0 Introduction

As a digital classical music purist and enthusiast, your general approach to your digital music collection will likely be: leave it alone, don't mess with it, don't muck it about, get it pure and leave it that way! Something along those lines, anyway. Sadly, real life has a nasty habit of intervening: your CD rip sounds a bit quiet for your tastes, perhaps? Maybe you want to take a small sample of your collection on a car or plane trip, so huge, hunking FLAC files might not be entirely appropriate? Having just ripped a 4 hour opera, how can you spot if there are any rip errors without having to physically listen to it all? And so on: these and similar considerations mean that, sometimes, you will need to rely on Semplice's ability to 'tweak' or 'fiddle' with your audio files and its related abilities to analyse your files for obvious flaws.

Thus, Semplice Version 2 has three basic audio processing functions:

The ability to apply a non-distorting volume boost to an audio signal
The ability to convert between different audio codecs (e.g., FLAC to MP3)
The ability to analyse a collection of FLACs and display the results as a spectrum analysis graph, for visual checking of the characteristics of the music signal they contain

I'll cover each of these functions in turn, as separate articles, with links at the bottom of this page. Before we get to that point, however, I think it wise to set out some preliminary information that underpins everything discussed in these sections of the Semplice user manual.

Firstly, let's just get some 'audio basics' terminology under our belts, as follows:

Sound volumes are measured in "decibels", abbreviated to dB
Decibels are measured on a logarithmic scale, so that a sound that is 10dB louder than another has 10 times more 'sound pressure' than it; but an increase of 20dB implies an increase of sound pressure of 100 times
Boosting volume by "a mere 10dB" is a big deal, basically
In audio circles, sound is sort-of measured 'backwards', so that a 0dB signal is considered to be the loudest non-distorting volume an audio signal can be and something that's -5dB from that would be considered relatively more quiet. Something -15dB might be practically inaudible unless you crank your amplifier's volume knob all the way to 11

Next, we need to be clear about some digital music format terminology:

A "codec" is an algorithm that encodes and decodes a digital audio signal into something the human ear can understand
Good audio codecs aim to represent the source audio signal with high fidelity and the least data
Lossless codecs preserve all the data in the source audio signal with perfect fidelity
Lossy codecs chuck some of the audio signal away in order to achieve significantly smaller file sizes. They hope and intend that the data they dispose of is practically inaudible to most ears: for example, most humans have difficulty hearing frequencies higher than 16,000 waves per second (Hertz, or Hz), so if the audio signal contains a violin screeching at 18,000Hz, you could dispose of the top 2,000Hz of that signal and no-one is likely to notice, at least in theory. Another example is that if a harp string is plucked at the same time as a bass drum goes 'boom', it's unlikely the harp pluck would be audible to most listeners: so it could be disposed of and no-one would be the wiser. Different codecs develop different 'psychoacoustic models' in order to work out what can and can't be thrown away: some are better than others -and even the best can't be guaranteed to work the way your ears and brain do. What they toss away thinking you won't notice... you might notice!
It is always possible to convert a lossless codec into another lossless codec without damaging the audio signal: the entire signal was there at the start and will be there at the end
Converting a lossy codec to another lossy codec is possible, in the sense that most software permits it to happen; but turning a signal in which some data has been lost already into another signal in which more data is disposed of merely results in a final audio signal in which a lot of audio data has been lost. The results are likely to be ghastly, therefore.

And finally, let's get some acoustical physics sorted (though this is usually the point at which the people who paid $10,000 for a speaker cable walk out in a huff!):

The perfect human ear physically cannot hear audio frequencies above around 20,000Hz (or 20kHz, i.e., 20 kilohertz). It depends on the individual, obviously, but for the vast majority of humans, 20kHz is a literal upper limit on hearing.
Human hearing deteriorates as you age: so a teenager might be able to hear 20kHz, but a retiree is unlikely to be hearing much above 14kHz. Classical music listeners being, on the average, a bit older than listeners to non-classical music, this age-related hearing attentuation means hardly any classical music listeners will be able to hear audio signals higher than 16kHz.
A fundamental theory of audio processing is the Nyquist-Shannon Sampling Theorem which states that you can perfectly reproduce an analog input signal by digitally sampling it at twice its highest possible frequency.
Back in the 1970s, Philips and Sony engineers combined these various points: they applied a filter to cut off frequencies above 20kHz in the recorded music signal, because no-one would be able to hear anything higher than that anyway; that meant it could be represented perfectly by a digital signal sampled at 40kHz. Filter technology not being perfect, however, requesting a cut-off of frequencies at 20kHz sometimes allowed frequencies a little bit above that into the signal, so the engineers compensated and digitally sampled their audio signal at a bit above 40kHz, so that the occasional signal a little above 20kHz could occasionally appear in the audio stream without screwing everything up. Digital Audio CD was therefore invented with a sampling rate of 44.1kHz: this allows a completely perfect analog reconstruction of a digital signal containing frequencies higher than practically any human can hear seconds after their birth.
Volume levels are represented in digital music as numbers. Digital numbers have to be represented by 'bits': sequences of binary 1s and 0s. If I allow you 4 bits to work with, you can represent the numbers 0000 to 1111, which is decimal 0 to 15. If I let you use 8 bits, than can store values from 00000000 to 11111111 (decimal 0 to 255). If you store your numbers with 16 bits, you can have up to 65,536 unique numbers: thus, a 16-bit digital audio signal can have 65 thousand-odd different volume levels. This allows a digital audio signal to have a "dynamic range" (the difference between the quietest and the loudest part of the signal) of 96dB. The difference between the human ear hearing the quietest pin drop and the pain associated with standing next to a jet engine in full throttle is around 140dB, so 16 bits would not completely cover the entire dynamic range of which the human ear is capable. But vinyl LPs have a dynamic range of around 70dB -and remember the difference between 70dB and 96db is over 100, because decibels are measured on a logarithmic scale. So a 16-bit signal has more than 100 times the dynamic range of an LP record: that's why, again, the Philips and Sony engineers of the 1970s decided that the new-fangled CD should use a 16-bit audio signal: it's more than good enough to capture the quietest and loudest music signals likely to arise for anyone who is not a jet engine afficionado.
In practical terms, therefore, there is never a need to listen to digital music which is not a 44.1kHz signal or with anything other than each sample stored as a 16-bit value. Your ears literally cannot tell the difference if higher sampling rates are used, or a greater number of bits is used to store the resulting samples. Audiophiles will wax lyrical about 192,000Hz signals stored in 24 bits, but their ears cannot actually hear the difference between the identical signal encoded at 44.1kHz/16-bit and 192kHz/24-bit: it is simply physically impossible. (With the caveat that the jump from 16-bit to 24-bit samples might be perceptible as a greater dynamic range ...but it would be a close-run thing).
For technical reasons, however, the recording industry has a legitimate reason for mastering recordings at high frequencies and bit-depths: they are not so interested in what an ear can hear as what their recording hardware is doing with the audio signal when they're working on it, which is a technological issue, not a biological one -and hardware can use as much headroom as you can give it. It is therefore common for recording studios to record audio at 88.2kHz and higher, whilst sampling at 24-bits. That they do so, however, does not mean their signal is audibly better than when they sell you a 44.1kHz, 16-bit version of it on a CD: their hardware can tell the difference; your ears cannot.
Semplice indulges the audiophile fraternity by allowing output of 88.2kHz and 176.4kHz, 24-bit FLACs. Sensible people will not use these options, however!

By the way, don't confuse the effects of re-mastering with the use of "high resolution digital audio" (i.e., digital signals recorded at frequencies higher than 44.1kHz and more than 16-bits per sample). It is common for record companies to re-master their old recordings, to re-balance things, to filter noise out and so on: the resulting audio will very likely be easily discernible as a better (i.e., nicer to listen to) audio signal than the 1960 or 1970 original, because of the various tweaks being made to the balance, sound stage and so on. Unfortunately, the record companies then market that new mastering in 88.2kHz/24-bit hi-res audio SACDs and the like (because they charge more for such formats than 'standard audio CD'): that makes people think the nicer audio is a result of the high resolution signal. It's not. It's nicer because the remastering engineer has worked a miracle modifying and improving the original audio recording. The extra bits and sampling frequencies you are being sold on an SACD remain physically indiscernible compared to a 44.1kHz, 16-bit standard CD versions of the same mastering. An analogy might be if you gave me an old family black-and-white photo to preserve: I scan it, run that scan through Photoshop, touching up the scratches, blemishes, rips and tears, improving the contrast, doing a bit of dodging and burning, etc, etc: naturally my re-worked version of the original is going to be visibly different from the original, because I'm altering the original. But if I now print the retouched photo using expensive HP ink, and once more using Amazon Choices generic brand ink... well, it's possible that a good eye might spot the difference between the two prints, but most people are unlikely to be able to. In the world of audio, there's not even a doubt about it: the human ear being what it is, you simply cannot hear the extra data in an 88.2kHz/24-bit signal as compared to the el-cheapo 44.1kHz/16-bit one.

If you've got that terminology and concepts firmly in your head, we can move on to describing Semplice audio processing functionality in three separate parts, as follows: