Shooting for a Third Hit

DICE studio’s audio team managed to navigate the minefield of Battlefield 3’s virgin Frostbite 2 game engine, completing the mission to deliver more critically acclaimed game audio that sounds like true-to-life, unedited war.

18 April 2012

Story: John Broomhall

Battlefield 3’s predecessor Battlefield: Bad Company 2 deservedly scooped a BAFTA for ‘Use of Audio’. For Audio Director, Stefan Strandberg, and the DICE audio team, the Frostbite 1.5 game engine Bad Company 2 was built on had proved a solid, stable technology platform. But level technology ground gave way to shifting tectonic plates when Strandberg and DICE were faced with the challenge of creating another hit game whilst simultaneously building a brand new game engine, Frostbite 2. Many basic audio features were missing or late and nascent graphics-streaming bandwidth issues demanded a serious data management re-think.

All credit then to the intrepid audio team for not getting trapped in a bunker of coding and memory hassles. The DICE team kept their heads down and focused on sound design not the daily bombardment of issues, as well as collaborating closely with other disciplines to create a conceptually consistent world. Oddly, this organised, organisation-wide approach was key to effectively simulating the disorganised clamour of war.

WAR, UNEDITED

Strandberg: “Our fundamental vision was to create the impression of unedited war. One thing that makes our game perceived as sounding ‘right’ is that we continue to hunt down all those other disciplines that affect us. This time we worked even closer with the animators and effects people when it came to camera shake and re-coil. It’s super important that we collaborate and help drive analysis, so we captured a lot of explosions footage and examined it closely. sound designers are the disciples of timing issues – they have a rhythmic sense, which other disciplines don’t necessarily have. We noticed that in order to fit together, animations had to be as snappy and short as our sounds insofar as they expand over time. It took several weeks to get this timing right. We noticed that the effects artists might have 20 frames of fire, whereas in many cases, the fire in real world explosions is consumed within approximately three frames. Even in massive explosions that propagate slowly at the

outer rims, the actual core of the explosion happens so fast and expands so quickly that Hollywood slows them down. But we want our game to be real so we needed that snappiness and punch in the visuals. Four things work together – sound, camera shake, effects and rumble – it’s really powerful and a key factor in the perception of our sound being right and having impact.

In an unusual development, the audio team ended up controlling the camera shake so they could ensure perfect synchronisation with explosions. In fact, not only are the shaking x, y and z parameters triggered and controlled from the audio engine, it also triggers rumbling of the joypad! It’s all done subtly but it makes the explosions seem louder. “it’s like we’ve got more dimensions to work in,” said Strandberg.

“We are on a quest for consistency of approach, where every sound fits in a balanced way in the universe. So there are no super-designed weapon sounds. Individually, our sounds are not that impressive, but it’s our consistency in how we treat them that counts. it’s the sum of the parts that’s impressive, not each individual component. We build the blueprint on how they’re supposed to be used rather than focusing on making the best individual effects. Sure, you can always create a better sound, but if you go down that path, you produce an ‘inflation’ – we’re more focused on asking ‘does it fit in context, is it part of the world?’ The combination of sounds works – similar to record production where individual elements combine to sound good. You carve out a space for each component – even though the drummer wants the drums loud and the bassist wants bass in your face. sometimes it’s so tempting to make that individual big-super-mega sound, but it will actually push everything else over the edge.

“It’s the rule-set and the thought process behind it – and really staying true to those concepts – that makes people think the game sounds good. It doesn’t have to be de-codable for the average user. There’s a depth there that you could argue no one would notice, but the fact that it’s there, and that it’s followed through, proves the opposite.”

BELIEVABILITY TRUMPS FIDELITY

The DICE team feel their key strengths for achieving good dialogue content are simply to encourage plenty of actor improvisation and group interaction, and record multiple actors together – still not a universally accepted modus operandi in games. Strandberg: “Ensemble recording achieves a completely different result. We have two excellent full-time voice producers working with us. We also work a lot on analysing the ‘outdoors’ sound, trying to replicate that in the game – in fact, nowadays, we’re actually recording a lot of Marines outdoors. Even if the noise floor is quite heavy it doesn’t really matter – when you get it into the game, believability overrules fidelity. Actually you can tell from all of our sound content that this is a guiding principle – in eight out of 10 cases we’ll use the sounds that sound real – even if they’re of lesser fidelity. We can always pimp them later with extra detail.

“Where do you draw the line? There’s clip distortion and then there’s distortion… To me, it’s like a punk band with a good set of songs – it doesn’t matter that they’re recorded ‘improperly’ – the message is still more vital or it has more energy or more attachment to us as humans.”

THE BUILDING BLOCKS OF WAR

With a background in computational fluid dynamics, Ben Minto should, by rights, be working for British Aerospace. Instead he’s part of the core of DICE’s crack audio team, and more than qualified to walk AudioTechnology through the team’s approach, processes and techniques for creating and delivering game audio.

Before any guns are fired, Minto and the team decide on the type of reality they’re trying to portray – for instance for Black, Rambo films were in mind and they went a bit OTT with ’80s tape saturation. “With Bad Company 1 and to some extent Bad Company 2,” said Minto. “The idea was to emulate YouTube footage or Handicam sound – quite harsh. Whereas Battlefield 3 went more towards a documentary feel – professional microphones actually deployed in the theatre of war. From that we look at a benchmark – a key source of reference with inspirational value. We micro-analyse that clip. And as we break it down to constituents, we realise that a bullet crack on its own sounds really terrible. but you need to understand that some parts of the sound image are just horrible viewed up close. It’s the overall picture that matters.”

Strandberg places high importance on starting with the right source sound. 10 years ago he was working at a local studio, using his spare time to customise the audio content of his favourite computer games. “Even in those days the thing I cherished more than anything else was the right source sound,” said Sandberg. “I’m not a sound designer in the sense of sound manipulation. This ‘right source’ principle carries right through to what we do in DICE nowadays. There are two sides to our production: first, we are fortunate to work with fantastic sound engine technology and accompanying technical sound design talent that totally supports the ethos. Second, we are provided with great source audio and have a finely tuned selective ear. We’re not manipulating as much as people might think. Perhaps if we were making a role-playing game with spells and magic it might be different, but I’ve always worked with real stuff like cars and guns and soldiers.”

Roughly 25% of the raw audio assets, or ‘sources’ come from commercial sound effects libraries like Sound Ideas and Hollywood Edge. The remainder is an amalgam of shared online libraries created by Electronic Arts’ studios worldwide for previous titles, and DICE’s own field recordings.

Minto: “We go out and record real guns in real situations – as opposed to relying purely on more pristine ‘static’ gunshot recordings – part of the bf sound comes from capturing the dirt and grit with all the mic bumps, saturation and clipping. We also buy small niche libraries from sound recordists for documentaries about Afganistan or Iraq.”

4094535659_88b3998d1e_o-pichi — The gloves come off – nothing make s putting a battery in your HHB Portadrive more fiddly than doing it in a tank

3117723993_8035af3e2c_o-pichi — Hmmm, which shotgun for which shotgun?

GO ON, COMPRESS ME

“With something like an explosion, we break it down into constituent elements. We might track that we have a large explosion, on gravel, in an urban environment – and at run-time, pull those sound components together, effectively creating a patch, Max-MSP/Reaktor-style. It gives you more variation and makes the sounds fit the environment better. we have a memory (ram) limit – historically between a tenth and an eighth of the total system memory – say 20-35mb for an average console title. There are two ways to get audio from the optical media. The first is streaming – pretty much the same way you listen to music off a CD – but using DVDs and BluRays, we have a much greater bandwidth. You might typically be able to stream four surround files, a couple of stereos and a couple of monos. The other delivery type for the main sound effects is loading them into the physical RAM and then, much like a sampler, they’re played back with variable pitch and volume.

“These days we import WAV files straight into the game and then use various options in our tools to set audio quality, which in turn deploys different forms of data compression transparently in the background. We set those options differently depending on the individual sound and its significance/prominence. The compression can also be adaptive – the tools might do a spectrum analysis of the wave you imported at 48k, and calculating that it can be downsampled to 24k with little impact on fidelity.

JOYSTICK SYNTHESISER

The entire team at DICE uses Sound Forge to manipulate source sounds according to Minto because they, “haven’t found anything better for the final preparation of assets – making them sample accurate with tiny fades, markers and looping regions.” This could entail redeploying a lion’s roar as part of the multi-layered groan of a tower collapsing, alongside the more obvious splintering and creaking sounds. Once the sounds are ready, the thousands of parts are assembled into a frame-by-frame ‘sound picture’ developed at run-time and completely peculiar to each player – each player gets their own bespoke audio experience.

“There are some instances where I have to produce a linear track but mostly we don’t work that way,” said Minto. “Imagine the sampling keyboards from the ’80s where you have a different sample under each key. We’re giving that to the player – we build the samples for them and then depending on which keys they press, i.e. how they play the game, those sounds are triggered. Of course, it’s a lot more complicated than that. You have a whole series of logic in there as well – a car is driving at a certain speed or if somebody’s shooting at you, which gun are they shooting? you’re really creating a set of playback rules, the simplest being ‘start’ and ‘stop’. However, take the sound of an explosion, then we also like to know how far away it is from the listener – so we can choose a close, mid or distant sound. As well as what material it occured on – water, gravel, sand, etc. — and whether the explosion happened next to a load of rubbish, or metal tins or a palm tree. All those answers give us flavour to add in and will also determine part of the debris tail audio.

“On top of this we also run ‘obstruction’ and ‘occlusion’ effects. This means doing a ‘raycast’ between the listener and the sound source in the 3D game world to see if there’s something like a wall in the way. If so, you would filter the sound on-the-fly. Or did the explosion occur in another room in a different part of the building? Which means you need to apply that room’s reverb to the explosion, transmit that to the listener by bringing it into their environment and then applying the reverb of their space over the top.

3104599571_6ce9fc36e0_o-pichi — Tape, inconvenient? Bollocks!

HIRED GUNS

When it came to the right source material for weapons, the team were able to take advantage of a quite extraordinary gun wrangling and recording session in LA and Fort Irwin (complete with mocked-up Iraqi village), which had taken place around the time of the Bad Company 2 production. Strandberg: “I think it’s the biggest gun recording ever attempted – a joint venture with the Medal Of Honour team. For two days we had movie guys Jean Paul Fasal and Brian Watkins there with an 80-microphone set-up capturing every weapon sound you can imagine to both computer and analogue tape recorders. We even had people sync-recording in the mountains 5km away. The resulting ProTools sessions allow you to create any kind of weapon sound you like, though the portrait of the tail is a constant, which is why we still like to use ‘guerrilla’ recordings we make during the military exercises that take place around Stockholm periodically.”

“We’ve worked very closely with design to make sure there are small sonically unique identifiers for everything in the game providing important audio cues that will improve you as a soldier. Though they’re not mentioned in the manual, it’s cool how people pick up on them – it’s all part of the game-play depth.”

As unbelievable as it may seem, sometimes setting up a makeshift warzone isn’t enough. “We have foley recordings of particular guns but they may have been done in a sterile environment like a foley stage. There are air soft guns available that we regularly source to get an idea of mechanisms and feel and the best way to record them. As a studio we buy a lot of army gear for the artists to look at and because most people in Sweden have done one year’s military training, they’re not newbies with this gear. One of our team will dress up in the equipment, take a small Zoom or Olympus recorder and just go out and experiment with it. Sometimes those recordings are just perfect – they’re outdoors, they’ve got a bit of wind, there are bumps, there’s dirt in there that makes you believe it’s real. For very key specific things, we might go into a live space for more close detail, but not a Neumann U87 in an ultra quiet room – all clinical and perfect – it’s rough, dirty and full of life.

SOUND CULLING

“You really don’t want to start any more of those CPU-expensive ‘samplers’ than necessary – perhaps 25-30. There’s the possibility in a multiplayer game we could be receiving 1000 triggers – requests for sound replay based on game events and player actions. You need a culling system. If a rock falls on the ground five kilometres away, you obviously don’t need to hear it so we make calculations based on drop-off of amplitude with distance.

“We have a scoping system that works like this: Say if we have 20 explosions within five seconds, we’ll play the first one. If the second one isn’t louder or different enough in character we’ll probably ignore it. If the third one is bigger or it’s on a different material, then we will go ahead and play it. I think the early instances of people saying they were mixing videogames wasn’t so much mixing but more like culling – we found we could play too much and ended up with mush.

“Think about how film is mixed: At certain times the ‘red thread’ – the narrative that they want you to experience – is very, very clear. At other times it can be ambiguous, depending on how they want the audience to react. With videogames, you have to make sure the red thread is present all the time but there’s a certain amount of fluff around it because we want the player to engage. as soon as the player is making conscious decisions to filter out fluff, to find the red thread, they are engaged in the activity of listening.

3120833903_c3e0d64a3c_o-pichi — If anything was going to clip...

HIGH DYNAMIC RANGE

“The very earliest form of our High Dynamic Range mixing system was aimed at multiplayer games. In a 64-player game there’s a mass of sound happening. We posited a simple idea for a mixing rule – ‘loud is important’. We would define a decibel level for every single sound in the game. The base ambience might be 60dB, the waterfall 70dB, a gun might be 100dB, and an explosion 120dB. One way of culling was to decide to only represent a specified dynamic range (depending on the system you’re listening on – for a TV the dynamic window in which we want to represent the mix might be 40dB whereas on a home cinema system it might be 70dB).

“So in the case of TV, if we’re starting out with a base ambience at 40dB, that would be the quietest sound and therefore set the lower limit of the dynamic range window, which would not move if there was no sound louder than 80dB. Footsteps would happily live in that range, as would small rockfalls. Until the gun fired at 100dB and the dynamic range window would jump up to encompass the gunshots. Therefore because we’re only preserving a 40dB window the lowest level that we’re now allowing in the window is 60dB and at that point, you wouldn’t hear the 40dB ambience because it would be culled. If an explosion hits 120dB, the lower limit of the window is going to be 80dB and the gun that was at 100, would be reduced by half because the explosion is needing to be perceived as twice as loud. hdr preserves decibel differences between sounds within a fixed db window.

“But you find the rule ‘loud is important’ isn’t always true. For example, a tank driving past you as your friend’s dying in your arms. His quiet sobs are clearly more emotionally important than the tank. A film mixer would probably turn down the tank volume to draw focus towards this dying person. So on top of HDR we have a full snapshot mixing system. All game sounds are group-bused – e.g. enemy weapons, player weapons, footsteps, impacts, explosions – and we set default values for all of those. If we expressly want to draw the player’s attention to, say a very loud explosion in the distance where the rule ‘loud is important’ doesn’t hold true, we will create a new snapshot that ducks everything down maybe 6dB and turns up this special event. We then pass that to the HDR system to work out.”

Battlefield 3’s realistic audio comes out much less boom-heavy; even the machineguns have more of a raspy rat-a-tat-a-tat, than layers of throaty roar. While competing first-person military shooters have been guilty of cacophony, Battlefield 3 is elevated from the noise floor by great sounds from real sources, delivered with great mix clarity by the intelligent HDR system. The result is a game that effectively plonks players right in the middle of an accurate sounding warzone. But it’s this last consideration of a dying friend that really lends Battlefield 3 its gravitas. It’s no cleaner than any other war – there’s still blood, gore, violence, and friendly fire – but at least an intelligent approach to real-time mixing gives players the dynamic range to experience the emotion of it.

John Broomhall is a composer and independent audio director, consultant & content provider. www.johnbroomhall.co.uk

REVIEWED