Playstation VR: Soundtracking A New Reality
Virtual Reality is experiencing a rebirth and Sony is right in the game. The pioneers of PlayStation VR audio give AudioTechnology tips on how to get it right.
Story: John Broomhall
Until recently, Virtual Reality has been most comfortable in the science-fiction realm. Much like anyone wearing a ye olde ‘futuristic’ VR headset, early attempts at actually creating fully immersive 3D worlds stumbled along, cut off at the knees by primitive graphics, horrific frame rates and appalling motion lag.
Slowly but surely, VR technology caught up to the future, though sky high costs rendered a consumer offering improbable — until now. Consumers are buzzing at the prospect of in-home VR; at CES punters were lining up around the block to don some snow goggle-sized headsets for a virtual tour of VR’s capabilities. Global corporations — unperturbed by 3D’s lack of wide scale take up — have also been digging deep, funding research and development to make Virtual Reality a reality at retail.
No surprise that gaming giant PlayStation has been, and remains at the forefront of this pioneering endeavour with their PlayStation VR (PSVR) system due to hit the streets during 2016.
Until you’ve personally donned the PSVR headset and some decent headphones, VR may seem like an interesting idea and something you might like. However, as soon as you try it for yourself be prepared to get broadsided by a massive paradigm shift. You instantly get it. You can look up, down and all around you, and the settings — like being underwater in Sony’s The Deep scuba encounter — feel breathtakingly expansive. PSVR delivers on VR’s necessary promise; it immerses you in a deeply compelling world that makes you feel like you’re leaving the real one far behind.
VR, A NEW DAWN FOR AUDIO
The sound, music and dialogue components of these experiences are crucial. They can subtly draw a player in or psychologically bump them out, so members of Sony’s London Studio — where the operation, originally dubbed ‘Project Morpheus’, was nurtured — have put plenty of thought into PSVR’s application of audio. Alongside them in the UK capital are members of Sony Computer Entertainment Europe’s Creative Services Group: a collective of experts servicing the music, audio and video requirements for a plethora of videogame titles and their marketing campaigns.
It’s primarily fallen to these pioneering sound designers, music creatives and software engineers to establish the platform’s audio requirements; create the technical train tracks for PSVR audio to run on; and explore what does and doesn’t work creatively, for a new dimension of interactive entertainment.
London Studio Director, Dave Raynard, reckons VR is putting the spotlight on audio again. Having to wear a visor and headphone combo to experience VR properly means game developers have the entire audio attention of the player. “VR is really audio’s day in the sun,” said Raynard. “Audio plays a huge role in taking people to another place, it’s half the experience. One aim of VR experiences is attract the player’s attention — to make them look around using the ‘VR-ness’ of it — audio is a great way to do that. If you get audio wrong, it’ll be very, very noticeable.”
HEADPHONICS
Fortuitously, Sony has not only a long history in audio, Raynard says Sony has put a great deal of effort into developing binaural audio systems for PSVR.
Alastair Lindsay, Head of Music, says binaural audio is the perfect audio companion to VR: “It really helps create the illusion of a virtual 3D world. It convincingly reproduces the location of a sound: behind, ahead, above, or wherever else the sound is emitted from.”
Lindsay offered a concise explanation of binaural audio: “In short, this is achieved by taking a piece of audio and processing it such that it includes all of the key cues the brain uses to locate sounds in space.” Sony’s audio system uses head-related transfer functions (HRTF) to filter the sound emitters depending on their position in the world, taking into account things like ‘Interaural Time Difference’, i.e. the difference in time taken for a sound to reach either ear. “The small adjustments to the different sounds using this technique, as well as a number of other factors, create compelling positional audio,” said Lindsay.
Nick Ward-Foxton, Senior Audio Programmer, explains PSVR uses real-time binaural processing to achieve the most realistic-sounding and immersive experience possible: “Headphones are a great output format for us; the best option for delivering HRTF and binaural sound. Audio developers don’t need to worry about head tracking because the output is already aligned to head orientation.”
To create the overall soundscape in the most efficient way, the team developed options for the route a sound takes through the 3D audio system. Ward-Foxton described three of the basic paths they find really useful: Discrete 3D object, Surround Bed, and Straight To Ear.
Ward-Foxton: “Discrete 3D objects are the highest fidelity. The majority of sounds will be this type, and you may need a priority system to manage the number of voices playing versus the number of 3D voices available.
“We also have a 10-channel Surround bed made up of virtual speaker positions including height. It’s useful for lower priority positional sounds and still gives you a good sense of position. If your mix is busy, then once you hit the voice limit for 3D objects the lower priority sounds can mix down into this bed until a 3D voice is available.
“Thirdly, you can send a signal directly to the headphones, a really important option for non-diegetic music and abstract ambiences. We’re also using it as a kind of LFE channel where we send a low-passed copy of a 3D sound straight to the headphones for explosions and other big sounds. This feature also gives you the ability to play back binaurally-recorded material in the game which can give great results for certain sounds — e.g. an object striking a helmet of an in-game avatar.”
HOW TO MIX IT
While the origins of binaural recording can be traced back to over a century ago, creating and mixing real-time binaural audio for VR is still yet to become anything close to commonplace. The people involved in project’s like PSVR are still pioneering the format. Simon Gumbleton, Sony London Technical Sound Designer, says VR experiences require full immersion; a complete suspension of disbelief on behalf of the player. A state that must be maintained by not triggering the subconscious ‘reality testing’ our brains do in the background. “The reason our dreams often seem so real is because this ‘reality testing’ mechanism is effectively switched off when we sleep,” he explained. Gumbleton shared some of the hard fought lessons the team has learnt along the way to keep players from ‘reality testing’ VR.
VR coming along is really audio’s day in the sun — Dave Raynard, London Studio Director
APPROACH TO ASSET CREATION
Simon Gumbleton: “Choose audio content that’s as dry as possible and let the system add early reflections, late reverb and decide the balance between those and the dry signal. Fully anechoic material works best through HRTF filters and dynamic reverbs, but isn’t always practical or possible so aim to find or record material with minimum room or reflections baked in. That sonic information can end up giving the player incorrect cues and make spatial localisation difficult.
“Sounds should ideally be mono sources, so that our 3D audio system has full control over adding the spatial information. Design them how you want without worrying about the HRTF process, and don’t try to correct for it with EQ. Otherwise you’ll break the spatialisation cues added by the system.
“Humans are much better at localising sounds when they move their heads and get that change in content between ears, so you might design slightly longer sounds for situations that require precise localisation.”
PROCESSING TIPS
SG: “Channel-based content is definitely still possible in VR, but it’s important to restrict it to non-positional and non-diegetic material. Mood stuff like abstract drones work fine in 2D, but positional content that ‘sticks’ to your head when you turn is distracting.
“Keep ‘player sounds’ subtle and neutral — a subconscious reinforcement of player actions in the virtual world. If they become too noticeable you’ll create a disconnect between player and avatar. The player will be distracted, conscious they’re not making that sound.
“Too much compression on dialogue can pull it out of the world and make it feel 2D. Use less compression than normal on dialogue to keep it feeling like it’s coming from the characters, rather than a phantom centre speaker somewhere in front of your face. When recording dialogue, capture a performance that relates to how it will be played back.”
SPECIFICALLY PLACE AUDIO
SG: “Location of sound sources in the world needs to be accurate. This means using more emitters in multiple locations. For the vast majority of sounds in a scene, you can’t just emit them from the root of an object.
“Respecting head tracking is very important. Sounds should move correctly in the world. Much audio that previously might have been piped in stereo will benefit from being positioned correctly in the world. You might treat almost all the elements of an ambience as positional sources. It really helps place the player in that space; they can move around, lean in, turn their head, and what they hear makes sense in the virtual world.
“If you’re using any sort of dynamic obstruction or reverbs, you need to have a good relationship with the environment creators on the project. You also need tools which allow physics authoring that works for audio. We’ve often found that ‘accurate’ physics doesn’t always give the best results from a design perspective — you need to find what works for your audio design.”
MAKE AUDIO REACTIVE
SG: “It’s crucial the player feedback you provide with audio is believable, especially with object interactions. Design sounds for these interactions in a way that reacts believably to any player input. It really helps to be able to easily get player parameter values like speed, acceleration, rotation and angular velocity from any player input at any time so you can design reactive blend containers.
“Another key aspect of placing sounds is distance modelling – 3D audio systems don’t give you this ‘out of the box’ – you need to design it. Volume and basic filtering over distance is a great starting point and still works in VR. But there are some extras that really help sell distance. For example – dynamically driving the send levels to reverbs over distance. This works for simulating proximity as well. You can exaggerate certain properties at very short distances and drastically reduce the reverb send level to emulate very close sounds.”
DYNAMIC MIXING
SG: “With high fidelity azimuth and elevation, plus a 360-degree sound field, 3D audio systems give you more space in the mix. You need to move most of the run-time mixing controls out of the bus structure and into your sound object hierarchy. By using lots of side-chaining, meters, states and ducking to dynamically drive different sound objects, you can still create dynamic and reactive mixes.
“However, too much ducking quickly becomes obvious and can pull the player out. The trick is lots of side-chaining in small amounts. Where dialogue needs to be heard in a busy scene, having the dialogue a little louder is better than ducking other elements too aggressively.
“You also have the ability to influence the player’s focus, which can be really powerful. Thinking back to tracking player input parameters — you know exactly where the player is looking in the scene, so you can use that data to manipulate the audio focus of the scene, enhancing specific elements whilst suppressing others. “Decide on the focus and be aware of your ability to grab the player’s attention, particularly with respect to how new elements enter the soundscape.
“Mixing audio in VR is probably the biggest departure from our old workflows. This is primarily because 3D audio systems change the end point of our signal chain. With channels and busses, the end point is way down at the master fader, but in 3D audio systems, the end point is at each audio ‘object’.
“This has some implications for how you construct and sculpt what the player ultimately hears. Concepts like summing and buss processing don’t make sense in an object-based system. You can’t stick a multiband-compressor or EQ or tape saturation on the master fader any more — not only because you need to maintain the individual object signals — but you would also end up wrecking any ILD and HRTF filter cues.
“This means effects processing must be done at the object level which, depending on the number of objects you have, likely means an increase in real-time DSP. It also means group processing is more difficult — for example you can’t run all your vehicle sounds through a single compressor on a buss.
“By using traditional techniques such as side-chaining, states and meters, alongside a few VR specific systems we’ve built, we’ve found that even without a traditional mixer structure, we can still create immersive, dynamic and reactive mixes.”
A FRESH PERSPECTIVE
Perhaps the most valuable thing you can do, said Gumbleton, is experiment. The entire VR spectrum is relatively new, and a lot of the things that worked for the team were found by simply trying out ideas. “You’re working on totally new experiences and systems,” reminded Gumbleton. “You’ll face new and exciting challenges in VR that no-one else has faced.”
Despite the infancy of this generation of VR, some are already predicting its universality. Raynard is one of those. He’s been around for the debut of many game technologies in the past — camera tech, motion gaming — but this is the first time he’s seen a technology’s potential extend so far beyond games, and that makes him excited about its future: “There are film people interested, medical people, business to business. That’s why I think it will become established as a ‘medium’ — just like theatre, radio, TV, film — and that’s what makes it so exciting.” That said, he’s wisely not putting a timeline on its adoption. “It could surprise us and be really quick or there might be a longer tail,” mused Raynard. “When film went from silent movies to sound, it took 10 years for all the cinemas in America to change over. This is just the beginning.”
RESPONSES