Read Next: NAMM 2024 Video Highlights

Home
/
Issues
/
Issue 93
/
Mixing AAA Videogames

Mixing AAA Videogames

Game sound now arguably exceeds film’s potential for sophistication, artistry and story-telling. Some of the best in the business explain the ‘state of the art’.

15 February 2024

Story: John Broomhall

Today’s game audio is multi-faceted, multi-layered and multi-channel, capable of delivering a jaw-dropping aural experience to rival any cinematic extravaganza. What’s more, the time, talent and tech required to max out a videogame’s sonic potential can make movie soundtracking seem like a walk in the park.

God of War Ragnarök features a whopping 20 hours of interactive music, 25,000 lines of dialogue in 13 languages, the equivalent of over 100GB of uncompressed sound effects content, and over five hours of cinematics.

IT’S BIG

Big? Wait… it’s massive. Case in point – God of War Ragnarök features a whopping 20 hours of interactive music, 25,000 lines of dialogue in 13 languages, the equivalent of over 100GB of uncompressed sound effects content, and over five hours of cinematics. Under the bonnet, the punchy PlayStation 5 game tech delivers dozens of audio channels running live with multiple ‘run-time’ DSP plug-ins for all kinds of FX, all delivered in 7.1 surround. It was brought to our screens by a crack team of over 100 audio professionals (not including outsourcing partners) working across seven time zones in five different countries.

So creating a fab final mix is no mean feat.

IT’S COMPLICATED

For a start, you can’t exactly ‘through-mix’ a game. Sure, there are distinctly linear sections of the experience – your cinematic intros and cut scenes – but these must dovetail perfectly with the complex ‘run-time’ audio code systems under the bonnet, controlling and managing sound assets, cueing them live on demand just-in-time, and digitally processing them on-the-fly according to the programmed-in physics and acoustical properties of the game world, not to mention for emotive and artistic effect.

If that’s not enough to keep tamed, there’s also the wildly unpredictable nature of player behaviour – free to roam around and fire at will; who knows what they’ll do, where they’ll go or how fast. Hardcore gamers blast through the action while others are more cautious. There are ever-changing ambiences with weather and time of day to account for. Meanwhile, complex weapons and vehicles are constructed of many moving sound parts to provide authenticity, believability and fidelity. And all this incredible sound, music and dialogue content could be played back on anything from lo-fi TV speakers to gaming headphones to a top end Dolby Atmos system.

IT’S CHALLENGING

So just how do you create a beautiful mix from such a gordian knot of audio? How can you keep control and consistency? How do you avoid a messy sonic overload to instead provide a coherent, comprehensible and kick-arse end mix delivered specifically for each player live as they play?

The talented folks at Sony PlayStation’s Creative Arts group know more than a thing or two. Their centralised sound and music team based in Santa Monica get to work with multiple game developers around the globe on some of the gaming platform’s most signature of signature flag-waving titles, beloved by gamers the world over – appreciated not just for their gameplay, but most definitely for their multi-award-winning audio.

CREATIVE MIX PRINCIPLES

According to Supervising Dialogue Designer, Jodie Kupsco, it all starts with ‘creative mix principles’: “Clarity, impact, immersion and spectacle are the lenses through which we evaluate all our game mixes. We want our game experience to be understandable to the player at all times and as to ‘impact’ – the mix should support and reflect a very dynamic and exciting experience. Plus, the player should be continually immersed in the game environment and never pulled out by any transitions, glitches or bugs. As for ‘spectacle’ we’re talking about those big special ‘wow’ moments we love to make sound for – we really want to focus our mixes on highlighting the storytelling as well as the gameplay ‘systems’ to create that cohesive and engaging player experience.”

Music Designer, Sonia Coronado, adds: “Each game project has its own distinct high-level vision. For example, God of War Ragnarök is about a very intimate narrative – and it’s about non-linear exploration and, of course, expressive combat – all in service of character growth – we want to make sure the mix itself creatively and technically supports these game pillars. But for, say, Returnal, they would look a bit different – you have cosmic horror, time-loop distortion and an emotional story of loss. We want always to honour the original vision of the game throughout development and carry that all the way through the mix.”

Extensive communication and collaboration are vital – both inside the audio team but especially too with the game developer and creative director. And individual egos must be left at the mix room door with everyone recognising that not all three disciplines will be front and centre all the time – ‘the whole is greater than the sum of the parts’ is the team’s mixing mantra.

ALWAYS MIXING

All well and good for the ground rules, but what about the practicalities?

The essential keys to the process are timing and control. You simply can’t leave such a gargantuan mixing process until the end of the project. Right from the off, you need to create and maintain a mix master plan and workflow which can serve those creative tenets, replete with a definitive technical bus structure through which you can set, monitor and control loudness standards for each category of audio content. Thankfully, today’s so-called game audio ‘middleware’ such as Audiokinetic’s Wwise (aka Wise) used for God of War: Ragnarök, among many others, allows you to do exactly that. It’s kind of like the ProTools of game sound allowing audio designers to add and package sound assets to the game environment linking them to character actions and geographical locations, and determining exactly when and how they will play back – not to mention how they’ll be FX-processed and panned.

It’s a set of systemic train tracks for everything audio to run on and to some extent, then mix itself. Except it’s not quite that simple – you’ll continually want to override and subvert those systems for informational intent or artistic effect, in the mix and, most importantly, for the sake of clarity. So, sure, there will, of course, be game play-through mix passes at the project’s denouement, but these are for final tweaking and polishing. You could say mixing actually starts on Day 1.

Jodie: “Our dialogue pre-mix process begins right at the very beginning of production – like a pre-pre-mix! We’ll set up monitoring and listening standards for the whole team regardless of their work location to make sure everybody has confidence in what they’re hearing – especially with dialogue since it’s the core foundation of the mix.

“Also, at PlayStation, we’re looking to standardise our dialogue performance intensities (eg. shouting as opposed to normal speaking) and loudness targets for mastering so we can jump from game to game and get up to speed quickly. We’re using a lot of metadata in our speech asset databases to help us organise content and mix quickly. We then carry that over into our middleware to help group things together usefully and run queries and target different levels, mixing with broad strokes to get things 80% of the way… fast. Videogames are getting ridiculously large and there’s not time to fine tune every last detail.”

Jodie Kupsco_600-pichi — Supervising Dialogue Designer, Jodie Kupsco

Sonia Coronado_600-pichi — Music Designer, Sonia Coronado

Audiokinetic’s Wwise allows audio designers to add and package sound assets to the game environment linking them to character actions and geographical locations, and determining exactly when and how they will play back.

Technically, this entails setting attenuation presets for different types of dialogue intensity categories running on different buses keeping things tidy and maintaining a decent base quality mix throughout production. Reference speech assets from the game’s main characters are set up in the project, always available for reference to reality-check mix targets and visual metering, ensuring the team never get caught out by the inevitable nuances between actors.

Jodie: “We’ll start to do volume levelling passes, quickly playing through our speech content in player experience order to make sure it flows naturally, and there aren’t lines spiking in the middle of a conversation – especially if we didn’t have a chance to record those actors together. We tweak and get everything as close as we can before the final mix playthrough. We test it all in-game – we can do a lot of work offline but playing the game is very important.”

SOUND JUDGEMENTS

No surprise then that there’s a similar approach for sound effects where planning and control is arguably even more crucial given the extensive variety and staggering quantity of content involved.

The entire sound design team is mandated to work at identical SPL listening levels to aid general consistency, and an upper momentary LUFS limit is defined for the loudest sound in the game.

Planning and control of SFX is crucial given the variety and quantity of content involved.

Alex Previty, Senior Sound Designer explains: “In God of War, there are lots of loud sounds, like explosions, so we want to see, at this certain listening level, what’s our max limit to not be blowing people’s eardrums out – nobody wants that! On a more granular level we also want to define some rough loudness ranges for different sound categories – for instance, ambience should be sitting between this and that decibel level. We have lots of different sound categories and we want to make sure things are roughly in the ballpark before a more polished final mix pass.”

Straightforward naming conventions and colour coding for quick visual reference are important.

There’s another layer of mix design to consider too – not just pure loudness but also at each moment, the importance of a certain sound or sound category may inform how things sit in the mix, effectively creating priorities for sounds – a voice volume hierarchy. Using Wwise’s HDR (high dynamic range) mixing tech, Alex can set up systems which cull sounds below a defined decibel ‘window’ to free up channels and thus CPU power – and ultimately prevent the ‘sum of the parts’ being a cluttered sonic mush. When a loud explosion happens there’s no need to playback the low-level sound around you – it will just muddy the water.

Alex: “We also have certain reference sound effects to check against and, yeah, playing the game is super-important – things might sound great in your DAW and middleware tools, but in-game, there could be a bunch of other factors such as HDR, dynamic mixing and in-game reverb affecting the mix. Creating pre-groups of attenuation presets to ensure that per category, sounds behave and fall off a certain way is one route to helping sound behaviour be more reliable. Then on a more granular level, our sound designers can define what individual voice values should be, and all this is combed through by various people to make sure everything’s set up correctly before we take it to the mix stage.”

Alex Previty_600-pichi — Alex Previty, Senior Sound Designer

MUSIC MATTERS

For music, a sensible folder structure and naming convention is also vital – and simple too, so anyone can jump in and find where things are. Again, there are certain music categories with varying characters and loudness targets.

Sonia: “Depending on the music style, something might be perceived louder than it actually is – orchestral music has a different mix versus say electronic music – so that’s when we need constant collaboration, and we rely on dialogue to keep us grounded as we’re testing in-game – always listening to make sure music’s never burying dialogue.”

These days many games feature interactive music layers which have different bus needs and loudness targets. For example, Layer 1 might be a quiet, sparse ‘exploration’ vibe but when combined with Layer 2, perhaps added due to a rise in enemy threat level, it creates a low combat intensity vibe – and that combination still needs to hit the required loudness targets.

Sonia: “We also adjust content after playing the game to find the right pacing – sometimes we put music in a game level when, say, the ambience isn’t 100% there and maybe there’s some VO missing. So, before we do a final playthrough, while in the pre-mix stage, we check: does the music placement still make sense? Not too much, not too little – you find the right balance.”

OWNING THE BUS

You get the message: buses and bus structure are really, really important and the route to keeping the whole thing organised and predictable. So it’s wise to designate a single, specific owner or gatekeeper for the bus system.

Alex: “Everything is running through this pipe – all this behaviour is co-dependent – so if you have multiple people changing stuff it can confuse things a lot, so we have one person overseeing, ensuring things are running smoothly with no changes happening that could affect the game at a holistic level. That person must communicate with other audio departments to ensure their needs are met – it’s a constant process.

“At first, you set up your bus structure for working in broad strokes then later if you need more specific control for a certain part of a game then go ahead and add things that work for you. It’s also generally good to have straightforward naming conventions so anyone can see why things are where they are – and colour coding’s a lot of fun! Just having colours for quick visual checking is a huge plus – a small thing that goes a long way.”

With the fundamentals in place, the team then creates sub buses and splits as required. For instance, you don’t want your cinematic cut scene audio going through the in-game HDR bus. Some buses may be split out not only based on category but on mix state usage – for example if you always want to turn down a particular set of ambient sounds when combat starts, then you create a one-stop solution to automatically mix for that situation. Or you might want to selectively duck certain frequencies in the combat music using a dynamic EQ to free up audio space for dialogue. You can add whatever controls you need thanks to the Wwise tools and systems. It’s all very impressive considering it’s running live in a game.

OPEN DIALOGUE

Jodie: “Alex and I collaborated for a good year prior to mixing making sure that whole foundation was set up to keep things efficient. With our speech bus structure, we organise around types of dialogue. We’ll split out our ‘efforts’ (grunts, oofs etc) from our spoken words to facilitate different systems we might be setting up in different projects – we’ll do sidechain ducking potentially; we’ll set up breathing systems if the game calls for it – having those separated out helps with mixing later on.

Sidechain ducking helps ensure dialogue is given precedence at all times.

“If a project calls for something like ambient voiceover (eg. background chatter) we’ll also treat that differently bus-wise so we can duck those conversations and have the player focus on the high priority dialogue at any given moment. In a lot of games, we’re setting up an LCR bus – this helps maintain spatialisation of the dialogue but keeps it forward-facing in the mix helping with clarity while maintaining immersion and adhering to our overall mix principles. We’ll often lock player voice to the centre channel experience – again to assist clarity by getting it out of the way of the rest of the content happening in the surround environment.”

Alex: “And we don’t want dialogue ducking just stomping on SFX all the time. There are tons of quiet moments and if we just hear a quiet stream and some wind, ducking everything will be very audible – especially when there’s a lot of broadband content – so these systems enable us to see where core speech frequencies are sitting for body and intelligibility and air – and hit those in a much more surgical fashion in the sound effects or music bus. We’ll have distinct frequency bands directly corresponding to and driving gain negatively on whatever target bus we want. This ensures dialogue is cutting through nicely and we’re maintaining clarity without sacrificing immersion.”

DYNAMIC MIXING & METERING STREAMS

The team typically likes to have a lot of dynamic mixing systems running which leverage the relationships between different streams in the bus structure – different sound categories effectively talking to each other and able to affect each other in a cohesive way that yields an exciting and ever-changing fluid soundscape. To keep track of what’s going on, auxiliary buses are used to send signal to real-time metering, meaning the sound code can continually measure how loud, say, combat or music is at any moment, providing useful data which in turn can be leveraged for further control and dynamic mix changes.

Alex: “We also want to create spectacle – things that are a little bit more subjectively fun, for example, on God of War Ragnarök, you start off in huge battles and the music begins at lower intensity. Through the fight it ramps up to be grand and very loud and we want it to feel a bit more oppressive. So, we measure the RMS of the music with a really slow attack and release just to get a sense of ‘how big is the music right now’. And we turn down the diegetic sounds’ volume so there’s a crossfade – when you get to the end of battle and it starts to feel very emotional, we’re actually giving you a very music-forward mix, but it happens so slowly you don’t really notice – nevertheless it has a subconscious effect on the player. Since we already have the streams and metering to tap into, it’s super-easy to set that up in 10 minutes rather than do it all from scratch at the end – and that’s welcome when you’re in the thick of it.”

PARITY & PLAYER STYLE

Naturally, the team must be ever vigilant to maintain parity between various listening setups players may use. It might all sound wonderful in your 7.1 sound-treated reference listening studio but if you pop on headphones and everything sounds different in terms of frequency, volume and how things sit in the spatial field, you’ve failed your mission. And you need to experience the game played in multiple ways – different difficulty settings and play styles can lead to quite differing mix experiences.

‘PRINT IT’?

The fact you can’t at some point say ‘print it’ about a AAA videogame music, sound and dialogue mix is precisely what makes it so deliciously challenging and when you get it right, so rewarding.

Jodie: “Our mix process is ever evolving – we’re constantly revising and adjusting. Over the last couple of years with the three of us collaborating closely, we’ve learned a lot to help standardise our processes and get those core foundations set early on, so we can have good mixes throughout production not just at the end. We’ve learned to continually work on and discuss our approach together as multi-disciplinary groups to keep driving our mixing forward.”

It’s exceptionally clever stuff and what this crack team of audio professionals have to say ample testament to just how far the world of game sound has come, easily rivalling film’s potential for sophistication, artistry and story-telling when in the right hands with technology serving creativity.