Imagine being surrounded by 12 crystal singing bowls, awash in a bath of ringing tones and shimmering harmonics, complex overtones combining in the air around you. Despite being immersed in the sound, you can pinpoint the origin of each tone to a precise point in space, down to the gentle noise of the sticks and mallets used to resonate each bowl. Then the demo ends, and you remember that you're actually on the floor of the music industry's biggest trade show.
This was the scene at Genelec's Immersive Experience booth at NAMM 2019, where electronic musician and sound designer Richard Devine was invited to preview some of his ambisonic recordings on a 7.1.4 Dolby Atmos speaker system (that's seven directional speakers, one subwoofer, and four overhead speakers). With Genelec's state-of-the-art loudspeakers set up to Dolby's acoustical specifications, the booth was designed to envelop listeners in a fully three-dimensional hemisphere of sound.
"Everyone in the audience there was like, Holy shit, even myself," says Devine, giddily recalling the experience. "All of us were sitting there with our jaws open. The above-the-head information you're getting, the sides, and just the whole spectrum of sound, being completely immersed in it... it took you immediately there. It was like you were in that room, surrounded by these bowls."
For the last few years, Devine has been on the cutting edge of the growing field of immersive audio: the study and practice of creating convincing three-dimensional sound experiences. The singing bowl recordings, for example, were the result of a recent collaboration with Rode Microphones, creators of the Soundfield Ambisonic mic. The modular synth aficionado has also been hired by tech giants like Google, Apple, and Microsoft to contribute 3D soundscapes and music for virtual reality apps, as well as some top-secret projects.
Devine's resume alone makes it clear that immersive audio really is the future of sound, which begs the question: Why hasn't immersive music emerged as a real trend? With a few exceptions—like the quickly abandoned Quadraphonic format of the 1970s or Brian Eno's 3-speaker home stereo hack (which he described in detail in the liner notes of Ambient 4: On Land)—music has never really gone beyond stereophonic sound. A small market exists for surround-sound albums, but by and large, plain old stereo has reigned supreme for over 50 years.
However, that might all be about to change with the advance of immersive audio. In fact, it may have already changed. Did you just feel something?
Whatever the future of sound holds, you're going to want to be in-the-know about immersive audio. Read on for a primer on how it all works, the amazing potential it holds for musicians and creators, and some tools and resources for getting started in your own sonic explorations.
What Exactly Is Immersive Audio?
Pretty much everywhere you look, media is getting more and more immersive. VR headsets are getting cheaper and more advanced. More games and apps are developed for VR every month. Blockbuster movies in 3D picture with Dolby Atmos sound are pretty much expected at any big-name theater. Sound is central to all of these experiences, and the field of immersive audio has grown at a rapid pace as a result. (The term "spatial audio" is often used interchangeably, but there are differences. See our full glossary below.) But what exactly is immersive audio, and how is it different from surround sound?
For starters, immersive audio goes above and beyond surround sound—quite literally, in fact. Even in the fanciest 9.1-channel systems, traditional surround sound is limited to a horizontal plane around the listener, making it effectively two-dimensional, a circle to immersive audio's sphere. Immersive formats like Dolby Atmos and DTS:X, on the other hand, extend the sound field into three dimensions by adding ceiling speakers to create a dimension of height.
Another major difference is that immersive audio is speaker-independent, meaning the same mix can easily be adapted for playback on different types of sound systems. Instead of requiring separate mixes for stereo, 5.1, 7.1, and so on, the same immersive audio content can be played on a cinema sound system, a network of smart speakers, or a regular pair of headphones—with maximum spatial information preserved.
The funny thing is, immersive audio isn't new at all. Ambisonics, one of the main immersive audio technologies used today, was invented back in the 1970s.
Binaural sound, another key buzzword in recent years, has also been around for quite a while (remember the Virtual Barbershop?). While the term was originally used as a synonym for stereo recording, true binaural audio uses special recording and playback techniques to reproduce sounds as two human ears would naturally hear them—and this is usually what people mean when they use the word "binaural" today.
History is full of ambitious attempts at immersive recording and listening systems, developed by engineers in lab coats as well as forward-thinking musicians.
While the concepts and technology behind immersive audio are nothing new, the recent explosion in the field can be attributed to one major factor: entertainment. Now that technologies like VR and Dolby Atmos have matured, there's finally a market for the mind-blowing sound techniques of immersive audio. You can now go to a theater and experience it, or watch the same movie on an iPad with headphones and get nearly the same effect.
Abbey Road—the venerable studio where The Beatles, George Martin, Geoff Emerick, and others helped expand the horizons of previous eras of recording—is now one of the leaders in the field of immersive audio through the Abbey Road Spatial Audio Forum. The group, founded by the studio's head of audio products, Mirek Stiles, conducts research, brings together engineers and musicians to experiment and brainstorm, and publishes helpful material about immersive audio on its website.
Explaining the origin story behind this '70s technology finally blossoming into a new field, the Abbey Road Spatial Audio forum states, "The real breakthrough happened within the development of gaming audio delivered via headphones, the same technology now being deployed via Virtual Reality. The reason is simple: While you are interacting with 3D images, the sound should also give you the sense of space to compliment the immersive experience."
But if full 360°-sound is now possible with headphones, how long until immersive audio bleeds over into the larger music world?
Musical Possibilities
Since the film and game industries seem to be spearheading the sonic revolution, professional composers creating music for such worlds may well be the first class of musicians to adopt immersive audio as the new standard. Not only does the increased sonic freedom help these composers express more emotion, it also allows the score to make space for the action and dialogue in front of (or surrounding) the viewer. If you're a professional composer or aspire to be one, you can't afford to ignore immersive audio.
For the average musician, on the other hand, 360° music videos are the most obvious application. With 360° cameras and ambisonic microphones becoming more affordable, artists now have the tools to bring listeners into their performances like never before—whether it's a street performance, a live gig, or a full-on studio production.
Audio engineers and producers will definitely want to get familiar with immersive audio—after all, those forward-thinking composers and musicians will need someone to edit and mix their work. Even if you're not creating fancy immersive media content, ambisonic microphones are practically Swiss Army knives of sound. Their full-sphere sound field can be shaped into "virtual microphones" and focused on any part of the room, like a custom microphone pickup pattern.
Even DJs can get in on the fun of immersive audio. Nightclubs like London's Ministry of Sound are equipping themselves with Dolby Atmos sound systems, in order to deliver an immersive experience for club-goers. Dolby has even created an app that enables DJs and electronic musicians to manipulate the sound field in real time, enveloping the crowd in a swirl of beats, synths, and samples.
In a video promoting the platform, the UK-based DJ Yousef explains, "When you listen to it, you feel like you're inside the track." In that same video, producer and DJ Deadmau5 says, "I think that's going to be my primary compositing technique. Now I'm going to write my tracks in Atmos and then sub-mix down to stereo."
Musicians with a penchant for experimentation will love immersive audio most of all—the possibilities for mind-bending studio productions, hypnotic live performances, and ambitious art installations are endless.
Composer Stephen Barton, who has made music for Titanfall, Call of Duty: Modern Warfare, and more, is enthralled by the possibilities. “The word revolutionary is thrown about very frequently in audio, and often without much justification—but it is justified here,” Barton said in an interview with Mirek Stiles of Abbey Road. “When spatial audio is done well, it's like the walls of the listening space or the headphones drop away completely—and the emotional effect of that is absurdly powerful.”
Devine explains the effect in a different but similarly impassioned way. "At least for me, in my head, the music becomes this architectural piece. I envision it like Frank Gehry's architecture, like these bending, metal and glass, rib cage–like structures twisting around you. What if music could be like that? Multi-dimensional in every direction, surrounding you, above your head, below you, and you're just experiencing it in that sort of detail. It would be amazing."
Tools of the Trade
So how does one actually get started with immersive audio? First of all, you don't need an expensive and complicated multi-channel speaker system to get the full effect. Any immersive mix can be experienced in headphones as long as it is "decoded" binaurally. Binaural decoding replicates the unique way that our two ears recognize spatial information, allowing a full 360° experience with any old pair of headphones (use a nice, open-back model for best results).
While special software tools are needed to mix in immersive audio formats, many of them are freely available and surprisingly easy to use. The best place to start is with a suite of spatial audio plugins like the Ambisonic Tool Kit (ATK), AmbiX, Rode's Soundfield, or Facebook's FB360 Spatial Workstation, all of which are free to download and easy to learn.
These packages include encoder plugins to bring mono or stereo sources into ambisonic format, tools for moving and positioning sounds in 3D space, and decoder plugins for hearing your mix on different listening systems. Spatial Workstation also includes tools for working with 360° video and VR content with full head-tracking.
If you want to record immersive audio at the source, you'll need an ambisonic microphone, which records fully 3D spatial sound on four channels. Zoom's H3-VR standalone recorder is a fairly inexpensive, entry-level option, while the pricier Rode Soundfield NT-FS1 and Sennheiser Ambeo VR pack more advanced features. With an ambisonic microphone, you can capture anything from a circle of singing bowls to a live band in the studio with incredibly lifelike detail.
The Future of Sound
Only time will tell if immersive audio truly catches on with the music world, but if recent advances in the entertainment and tech spheres are any indication, it won't be long. While it's still not likely that complex speaker arrays for home listening will find a wide audience, the increased use of headphones may be the deciding factor in the public adopting immersive audio. Whatever happens, one thing is for sure—we need creative people to push the boundaries of sound.
"It's going to take somebody weird that just wants to do it just for the sake of doing it, and then being like, 'Hey, I'm just doing it in this format. You can't experience it any other way than by listening to it this way,'" Devine says.
If that weirdo happens to be reading this article, we encourage you to go forth and blaze a trail toward a new world of sound.
- Spatial audio is essentially a more scientific way of saying immersive audio. The two names are often used interchangeably, but it could be argued that spatial audio is the more accurate term.
- In ambisonics, sounds can be positioned anywhere in a full sphere around the listener by encoding them on four channels—three directional axes for front/back, left/right, and up/down information, and one "all-around" channel. The ambisonic format must be decoded to be listened to on a specific playback system, which can be anything from a DTS:X home theater system, a traditional 5.1 surround setup, or a regular pair of headphones.
- Ambisonic microphones utilize four capsules arrayed in a tetrahedral fashion to capture a full sphere of sound with accurate spatial positioning preserved. Ambisonic recordings are captured in raw four-channel A-format, which must be converted to spatialized B-format for further mixing and processing. This is referred to as first-order ambisonics
- Higher-order ambisonics refers to a series of increasingly complex ambisonic formats that use more channels to achieve more accurate spatial positioning. If you visualize first-order ambisonics as a clump of three bi-directional microphones forming a "tack" shape, second- and third-order ambisonics would look similar, but with many more nodes sticking out.
- Object-based audio is another spatial audio format that works in a fundamentally different way than ambisonics. While ambisonic format is still technically channel-based, object-based audio considers each sound as an object that can be moved around a virtual space.
- Dolby Atmos is a proprietary object-based audio format and platform developed by cinema technology giant Dolby Laboratories. Atmos is actually a network of connected systems including special mixing software, specifications for cinema speaker arrays, and even immersive audio hardware for home listening, such as Atmos-equipped AV receivers and sound bars.
- DTS:X is another object-based audio technology developed by Dolby competitor DTS. It works essentially the same way and accomplishes essentially the same result, but uses different tools. The good news is that a lot of Atmos-capable equipment is also equipped to decode DTS:X format.
- Head-Related Transfer Function (HRTF) is the scientific name for the effect your head and ears have on the way you perceive sounds. Sounds coming from different directions arrive at each ear with slight phase and frequency differences, which is what allows us to locate the source instinctively. In immersive audio, artificial HRTFs are used to translate complex spatial mixes into left and right channels, which produces a realistic binaural effect in headphones.