How does digital audio work? | Basic Pro Audio Concepts

THE GIST

Digital audio is a representation of an analog audio signal used by computers and digital devices to record and playback sound. Similar to the frames of a video, digital audio is made up of a series of samples which recreate a sound when played back in sequence. There are many formats of digital audio, which can have varying fidelity and dynamic range.

Theory

Digital audio is inherently limited. While acoustical sound and analog signals are comprised of actual fluid waves, digital audio is only an approximation of the real thing. Like a video made up of frames, digital audio is a series of samples.

Avid Pro Tools

This article will focus on Pulse Code Modulation (PCM), the most commonly used system for encoding digital audio. Other systems, such as DTS and Dolby Digital, also exist but are more prevalent in the film and technology industries.

In PCM audio, signals are sampled many times per second, each sample recording the wave’s amplitude at one particular moment. Since analog waveforms cannot be perfectly recreated, each sample is rounded up or down (quantized) to the nearest value. When these samples are played back in sequence, sounds can be accurately recreated.

Just as analog audio is defined by the values of frequency and amplitude, digital audio has two main two parameters: sample rate and bit depth. Sample rate is how many times per second the sound is sampled, and bit depth is the amount of dynamic range each sample is capable of capturing.

Sample Rate

The standard CD-quality sample rate of 44.1kHz may seem like a random choice, but it's based on the Nyquist-Shannon Sampling Theorem—a principle stating that the sample rate must be more than twice the highest frequency to be captured. Since the upper limit of human hearing is 20kHz, a sample rate greater than 40kHz is necessary to capture the entire range (the extra 4.1kHz helps avoid aliasing, a form of distortion). In theory, 44.1kHz should be all we need to accurately reproduce any sound, but higher rates do exist.

The next most common sample rate is 48kHz, and it's the dominant standard for film and video sound. This is because it's designed to integrate with the existing frame-rate standard for film, 24 frames per second (FPS). Similar to the Nyquist frequency, 24 FPS happens to be the magic number for making a series of pictures look like a fluid moving image. The audio sample rate must be a multiple of the frame-rate in order to stay in sync. 44.1kHz would cause a noticeable drift over time, hence 48kHz.

Higher sample rates are also widely used, but their necessity is debated. Proponents claim the ultra-high frequency content subtly increases fidelity and adds “air” to the signal, while critics argue that 44.1 is good enough and that anything higher simply creates larger files and the potential for artifacts when dithering down to lower sample rates.

Higher sample rates are always multiples of 44.1 or 48. For example, 88.2, 96, and 192kHz are all common options on modern equipment and software.

Bit Depth

The bit depth of a file determines its dynamic resolution, similar to a digital photograph. Each bit can convey four amplitude values (two positive and two negative), so more bits per sample means greater dynamic range.

This doesn't mean one bit depth is “louder” than another, but higher bit depths will sound more realistic, as they're able more accurately recreate sounds (like a high-resolution photo). Here's a rundown of common sample rates and their stats:

  • 4-bit: 16 possible values, 24dB dynamic range. Sometimes used for extremely low-fi “bitcrushed” audio effects.
  • 8-bit: 256 possible values, 48dB dynamic range. Used by early systems such as classic video games.
  • 16-bit: 65,536 possible values, 96dB of dynamic range. Standard bit depth of audio CDs.
  • 24-bit: 16,777,216 possible values, 145dB dynamic range. Most commonly used bit depth.
  • 32- or 64-bit “floating point”: a recent advancement which provides better signal-to-noise ratio, but has yet to be widely adopted.

Formats

PCM audio can be encoded in many formats for the end user, and these formats fall into two categories: lossless and lossy. Lossless formats perfectly preserve whatever information was captured at the time of recording but can take up a lot of hard drive space.

Lossy formats create compressed files (note: data compression is different from audio compression) which take up significantly less hard drive space but can sacrifice some audio quality or result in unpleasant artifacts. Here’s a rundown of the most common formats:

Lossless Formats

  • .WAV (Waveform Audio File Format): commonly used by recording equipment to capture raw, uncompressed audio. Broadcast WAV Files (.BWF) are able to store additional metadata.
  • .AIFF (Audio Interchange File Format): similar to WAV, but proprietary to Apple devices.
  • .FLAC (Free Lossless Audio Codec): an open-source format which compresses files without sacrificing sound quality but is not supported by all players.
  • .ALAC (Apple Lossless Audio Codec): slightly less efficient than FLAC, but compatible with Apple devices.

Lossy Formats

  • .mp3 (Mpeg Audio Layer III): by far the most common compressed format, popularized during the advent of portable music players.
  • .AAC (Advanced Audio Coding): an alternative designed to improve on the quality of mp3.
  • .OGG (Ogg Vorbis): an open source alternative used by Wikipedia, Spotify, and certain video games, but not as popular with individual users (Fun fact: Vorbis is a character from Terry Pratchett’s Discworld book series).
Back to Pro Audio Concepts | The Basics
comments powered by Disqus