Mixing With Headphones 5
In the fifth instalment of this six-part series Greg Simmons continues distilling seven years of mixing without big monitors into useful tips for mixing on headphones, this time focusing on compression, limiting, dynamic perspective and expression inversion.
The Mixing With Headphones series aims to emphasise the important differences between mixing with speakers and mixing with headphones, while also providing beneficial tips, techniques and ‘thinking’ aimed at helping our headphone mixes serve the music.
As stated in the first instalment of this series, “headphones have supplanted speakers for the vast majority of music purchasing decisions and active music consumption… we would be foolish to underrate the importance of headphones in our mixing and monitoring decisions”. The second instalment examines the audible differences between mixing with speakers and mixing with headphones, and explains why speaker mixes usually translate well to headphone playback but headphone mixes often don’t translate well to speaker playback. In the third instalment we set up a Mixing With Headphones session template and loaded it with tools to help us work around those audible differences and make headphone mixes that translate well to both types of playback.
In the previous instalment we started putting the ‘Mixing With Headphones’ session template into use. We installed some reference tracks, established the foundation of the mix, and applied corrective, enhancing and integrating EQ to the individual sounds. Most importantly, we explored the concept of tonal perspective to ensure that all of our sounds worked together tonally in the mix.
In this instalment we’ll continue working with the ‘Mixing In Headphones’ template by discussing the strategic use of dynamics processing (e.g. compression and limiting), exploring the concepts of musical dynamic range and dynamic perspective, and understanding the problem of expression inversion. One of the biggest contributions we can make towards serving the music with our mixes is getting all of the sounds into an appropriate dynamic perspective while avoiding expression inversion, and that means understanding the use of dynamics processing to control each sound’s musical dynamic range. [For a deep dive into how dynamics processors work, see the accompanying Understanding Compression series.] Building on the previous instalment, our goal here is to get our headphone mixes sounding good enough that they should only require five minutes of corrective work in mastering to sound tonally and dynamically acceptable.
DYNAMICS & HEADPHONES
In the third instalment of this series we saw how the oft-repeated ‘just trust your ears’ advice was invalid when mixing with headphones because headphones do not give us the tonal, spatial or visceral information needed to make resilient mixes that translate well when played back through speakers.
The same ‘just trust your ears’ advice can be equally invalid for using dynamics processing when mixing in headphones. One of our uses of dynamics processing is to give sounds more punch or impact, and that means we must be constantly aware of our headphone’s inability to create visceral impact (i.e. the physical feeling of impact on our bodies, as explained in the second instalment of this series), and therefore we should regularly check against a known reference to make sure we are not over-doing it. If our super-compressed kick sound is 10x punchier than the punchiest kick sound in our reference tracks, it is almost certainly wrong.
DYNAMIC RANGE
Whenever we use dynamics processing we are altering the signal’s dynamic range. Expanders and noise gates increase the dynamic range, while compressors and limiters decrease the dynamic range.
What is dynamic range and how does it apply to our mixes?
The term ‘dynamic range’ generally refers to the difference between the largest and smallest values in a set of values. When used as a specification for audio equipment, ‘dynamic range’ refers to the difference between the noise floor (the lowest signal level, below which the signal is dominated by noise) and the clipping point (the highest signal level, above which clipping distortion occurs).
In a fully analogue recording/mixing system, the noise floor is the combination of the tape hiss, the self-noise of the microphones, the EIN of the mic preamplifiers used during recording, the noise of any external processing, and the noise of the mixing console’s channel paths and summing circuits. The analogue system’s clipping points will be determined by the supply rails (i.e. the voltages provided by the power supplies of the individual devices in the signal path) and the saturation point of the analogue tape.
In a fully digital recording/mixing system, the noise floor is the combination of the self-noise of the microphones, the EIN of the mic preamplifiers used during recording, the quantization error noise of the word sizes (aka bit depths) used throughout the signal path, and the accompanying dither of the digital processing. The digital system’s clipping point is the maximum signal level the system is capable of processing, which is 0dBFS.
MUSICAL DYNAMIC RANGE
In the analogue and digital contexts given above, dynamic range is a clearly defined specification – if we know the noise level and the clipping level we can determine an objective dynamic range specification. However, for our mixing purposes we need a more subjective specification because we’re not processing the dynamic range of our audio equipment. Our focus here is on altering the dynamic range of a musical performance, so let’s call it musical dynamic range and define it as the difference between the softest and loudest parts of interest within a musical performance. In other words, the difference between the level of the softest note or chord in the performance and the level of the loudest note or chord in the performance. Every individual sound within a mix will have its own musical dynamic range, and every mix will have an overall musical dynamic range.
[In situations where we are are not working with music – such as dialogue, sound effects or field recordings – we can replace the term musical dynamic range with content dynamic range, and define it as the difference between the softest and loudest parts of interest in the content. For example, if the content was a recording of thunder with crickets in the background, we would consider the content dynamic range as being the difference between the level of the crickets and the level of the thunder. For the purposes of this discussion we’ll stick to the term musical dynamic range, bearing in mind that it simply refers to the difference between the level of the softest sound of interest and the loudest sound of interest and so everything discussed below still applies.]
Sometimes we alter a signal’s musical dynamic range in an objective way to reduce peak levels and thereby allow the signal to attain a higher overall level without clipping – for example, making our mixes conform to the level requirements of a streaming platform. Other times we do it in a subjective way to reduce the perceived level differences between the softest and loudest parts of a performance – usually to help the performance maintain audibility within the mix, and often to enhance its impact.
The use of dynamics processing is very common in popular music, where a tightly controlled dynamic range is often an important part of a genre and where individual instruments are recorded at separate times. It is rarely used in acoustic jazz or similar acoustic music where all of the musicians are playing together in real time in a situation where they can all hear each other properly, and each is responding to the dynamics of the others. In that situation, the musicians don’t want anyone messing with their performance dynamics. Similarly, dynamics processing is rarely used with orchestral music when there is a conductor controlling the overall dynamics of the performance. With these forms of acoustic music we will primarily use dynamics processing if the overall performance’s musical dynamic range exceeds the limitations of the delivery medium – such as preparing something for uploading to Youtube.
Streaming services provide very specific level recommendations that ultimately affect the dynamic range of the uploaded audio. If we want our uploaded audio to sit comfortably alongside other audio on the streaming platform (i.e. not suddenly too soft when swiping from one piece of music to another), we will probably have to use dynamic processing in mixing and/or mastering to keep our mix within the streaming service’s recommendations. This use of dynamics processing is explored further in ‘Compressing For The Format’ (scroll down).
DYNAMIC PERSPECTIVE
In the previous instalment we discussed the use of EQ when mixing with headphones, and introduced the concept of tonal perspective – where everything in the mix sounds as if it belongs in the mix, and no sound is too dull or too bright relative to the other sounds. When starting a mix it is likely that some of the raw sounds are going to be too bright or too dull compared to other sounds in the mix. These sounds are tonally out of perspective with the mix; at the very least they will benefit from some integrating equalisation to fit them into the mix and ensure all sounds are working together as a cohesive whole while retaining the tonal aspects of the performer’s expression.
Dynamic perspective is the same concept, except it is applied to the dynamics of the individual sounds in a mix rather than their tonalities. When starting a mix it is likely that some of the raw sounds are going to have moments when they are too loud and moments when they are too soft when heard alongside other sounds. Small and momentary level problems are easily solved with fader automation or clip gain, but this is done at the risk of inverting the performers’ expression (see ‘Expression Inversion’ below).
These problematic sounds are dynamically out of perspective with other sounds in the mix. At the very least they will benefit from some compression to fit them into the mix and ensure all sounds are working together dynamically as a cohesive whole while retaining the dynamic aspects of the performer’s expression. Putting the individual sounds into the correct dynamic perspective allows each sound to be soft in the mix without being too soft for the mix, and to be loud in the mix without being too loud for the mix. When applied correctly, the compression will convert the performer’s expression from changes of level into changes of impact, thereby maintaining the expression in a way that works alongside the other sounds in the mix.
When applying dynamic processing to a sound it is always wise to consider how that sound’s musical dynamic range sits against the other sounds in the mix. It is relatively easy to use compression creatively to maintain a consistent level for a vocal, to make an electric guitar thicker, or to push the music along by adding propulsion to a rhythm section. One of the challenges when using compression creatively is keeping all of the individual sounds in a similar dynamic perspective so that they are working together as a unified whole without losing the dynamic aspects of each performer’s expression.
Authority & Impact
The concept of authority was discussed in the previous instalment of this series (see ‘Establishing The Foundation’). It essentially refers to how much importance we give each sound in the mix – not simply how loud it is, but how impactful it is or how much weight it carries in the mix. Lightweight sounds can be very loud in the mix but have little impact or authority, while heavyweight sounds can be very soft in the mix while still having significant impact and authority. The concepts of authority and impact are important in this instalment because compression is what we reach for when we want to increase a sound’s impact and thereby increase its authority within the mix. With the right combination of compression and EQ we can give a sound the appropriate authority, impact and weight within the mix while retaining the mix’s tonal and dynamic perspectives.
Authority and impact are both difficult to judge when mixing with headphones because headphones don’t provide the visceral impact provided by speakers that allows us to feel the impact of the sounds or to appreciate their weight. However, with practice and the strategic use of reference tracks (as we’ve installed in the Mixing With Headphones template) we can get these important aspects right when mixing with headphones.
Dynamically Out Of Perspective
One common dynamic perspective problem is when a kick drum (or similar) is so heavily compressed and EQ’d that nothing else in the mix can match its authority or weight. Turning the fader down reduces the kick drum’s level but makes no appreciable difference to its authority in the mix until it suddenly becomes too soft. The kick drum is either too loud or too soft, with nothing in between. This typically occurs with mixes that begin by making a killer kick drum sound, instead of starting with the most important sound in the mix (as discussed in the previous instalment of this series) and putting all of the foundational elements of the mix into an appropriate dynamic perspective with it. The kick drum is dynamically out of perspective with the mix, and the solutions are a) reduce the amount of compression used on the kick drum, or b) apply similar amounts of compression to everything else in the mix to match the kick drum.
The ‘Karaoke Mix’ Phenomenon
Another common dynamic perspective problem is when a solo performer creates a piece of music featuring their chosen acoustic instrument (guitar, violin, bamboo flute, whatever) playing to an accompaniment made of sounds from sample libraries. Most of the sampled musical instruments have already been tonally and dynamically produced by the sample library’s audio engineers to ensure they are ‘ready to go’ for users with no audio engineering experience. Hence they are all ultimately in perspective with the sample library’s tonal and dynamic aesthetic – which is what gives each sample library its consistent ‘sound’ and creates the perception of quality. There’s no guarantee that a ‘purist’ recording highlighting the broad natural dynamic qualities of the featured acoustic instrument is going to sit comfortably within the narrower dynamic perspective of an accompaniment mix made from ‘ready to go’ sounds from a sample library. When this type of musical dynamic range mismatch occurs the result is invariably a ‘karaoke mix’, where the featured instrument is wallpapered over the accompaniment rather than being dynamically integrated into it.
This is one of very few mixing situations where the ‘start with the most important sound in the mix’ advice given in the previous instalment is not helpful because we cannot sufficiently ‘unproduce’ the sampled sounds to remove the sample library’s dynamic aesthetic and then ‘re-produce’ those sounds to put them into the same dynamic perspective of the featured instrument. Instead, we have to process the dynamics of the featured instrument to fit in with the dynamics of the accompaniment. This requires the type of processing that will upset many purists (“Compress the violin? Blasphemy!”), but if done right can be very effective.
One of the best tricks for processing the dynamics of acoustic instruments while maintaining a natural feeling is by using the author’s ‘Wrap-Around EQ’ technique, as detailed at the end of this instalment – but in order to understand how and why it works we first need to understand what we’re doing with compression. Read on…
COMPRESSION & LIMITING
Dynamics processing is primarily used to reduce a sound’s musical dynamic range, and this is achieved by altering the level of the output signal depending on whether the input signal is above or below a threshold level.
Upwards vs Downwards
There are two forms of compression: upwards and downwards. Either or both can be used to reduce the overall musical dynamic range of a signal and put it into the desired dynamic perspective in the mix (i.e. matching dynamics), or to help a mix conform to the level requirements of a streaming service or other delivery medium. The choice of upwards compression or downwards compression depends on what we want to achieve.
Upwards compression applies gain addition (i.e. increasing the gain) when the input signal’s level falls below the threshold level. The gain addition raises the lowest levels of the performance upwards towards the threshold level, thereby reducing the signal’s musical dynamic range. Upwards compression is primarily used when we want to maintain the audibility of the softest parts of a performance without affecting the louder parts, and is similar to using fader automation to keep the softest parts of a musical performance audible in the mix. Unlike downwards compression, upwards compression does not give a sound more impact or punch; rather, it allows the smaller details of a performance to be heard while retaining their subtlety.
Downwards compression applies gain reduction (i.e. reducing the gain) when the input signal’s level rises above the threshold level. The gain reduction decreases the highest levels of the performance downwards towards the threshold level, thereby reducing the signal’s musical dynamic range. Downwards compression has many uses in mixing, including altering a sound’s envelope to give it more impact and authority in the mix, and reducing peak levels to allow a signal’s overall level to be increased in the mix without clipping – in this application, if we take the downwards compressor’s settings to extremes it becomes a peak limiter.
The vast majority of compressors on the market use downwards compression, and that is the type of compression we will focus on throughout this instalment. Understanding how downward compression works makes it easier to understand how upwards compression works.
Gain Reduction (GR)
The concept of gain reduction is fundamental to dynamic processing. As the term implies, gain reduction is the process of reducing the gain that is applied to a signal. As shown in the Understanding Compression series, a circuit inside the dynamics processor follows the incoming signal’s envelope and uses it to control the gain of an amplifier that the incoming signal passes through. The amplifier typically has a maximum gain of 0dB (i.e. x1), so it is only capable of reducing the signal level, not amplifying it – hence ‘gain reduction’ or simply ‘GR’.
Most dynamics processors have a gain reduction meter that shows us when gain reduction is being applied or removed, how much gain reduction is being applied or removed, and how fast it is being applied or removed. Using dynamics processors becomes much easier when we understand how to interpret the GR meter. The fourth instalment of the Understanding Compression series goes into considerable detail about using the GR meter to quickly achieve the desired effect. It states, “Everything the compressor is doing to our signal is reflected in the GR meter… We correlate what we’re seeing on the GR meter with the changes we’re hearing in the signal, and make strategic adjustments based on what we’re trying to achieve.”
Feedback vs Feedforward
There are numerous ways that compressors can determine how much gain reduction to apply, and therefore there are numerous ways to categorise compressors. One of those categories is based on which signal the compressor is responding to – the input signal or the output signal. This gives us feedforward compressors and feedback compressors. What’s the difference?
Feedforward compressors respond to the signal at the input of the compressor, before the compressor has applied gain reduction. This makes them a good choice for all of the corrective, enhancing and integrating compression applications described later in this instalment – especially those that are applied to individual sound sources within a mix.
Feedback compressors respond to the signal at the output of the compressor, after the compressor has applied gain reduction. Although they can be used for the corrective, enhancing and integrating compression applications described later in this instalment, they excel at working with more complex sounds (i.e. stems, submixes, mixes, direct-to-stereo recordings, etc.) where they are often used as the ‘sonic glue’ that binds the individual sounds together.
The operational differences between these two compressor designs are discussed in more detail in the first instalment of the Understanding Compression series. Unless otherwise mentioned, for the rest of this instalment we will focus on feedforward compression because it is more appropriate for tailoring individual sounds to fit into the dynamics of the mix. When we understand how feedforward compression works, we will be able to transfer that knowledge to feedback compressors and determine which one to use for any given compression application.
THE FEEDFORWARD DOWNWARDS COMPRESSOR
The processes of downwards compression and feedforward compression are explained in the Understanding Compression series. The following provides a basic refresher for those who are familiar with the features of a typical feedforward downwards compressor as shown below.
Sensing
On the far left of the illustration above is the sensing switch. This tells the compressor which aspect of the input signal’s level to respond to: its peak level or its RMS level. We use peak sensing when we want the compressor to react quickly to changes in the sound’s envelope – such as limiting to rein in transient peaks and prevent clipping, and envelope shaping to add impact or punch to a signal. We use RMS sensing when we want the compressor to respond to changes in the sound’s level in a similar way to how our hearing perceives those changes – such as matching dynamics and leveling.
The illustration above shows the peak and RMS levels of the same signal at the same gain. The compressor will obviously respond differently to each of these two signals.
Threshold
A downwards compressor applies gain reduction to all input signal levels that exceed the threshold level, and that’s what the threshold level control allows us to adjust: the level at which compression (i.e. gain reduction) begins. Assuming we are using a ratio greater than 1:1, all input signal levels that are higher than the threshold level will have gain reduction applied; the amount is determined by how far the signal level exceeds the threshold, along with the settings of the ratio and attack time.
Signals below the threshold level theoretically receive no gain reduction and should therefore pass through the compressor unaffected, but in practice they can be affected by leftover gain reduction if the release time is not fast enough to remove any gain reduction from the preceding sound. The affects of leftover gain are demonstrated in the third instalment of the Understanding Compression series.
Ratio
For any given threshold level, the ratio control determines how much gain reduction will be applied based on how far the input signal’s level exceeds the threshold level. A ratio of 2:1 means if the input signal’s level increases 2dB above the threshold level, the output signal’s level will increase by 1dB above the threshold level. In other words, the increase of output level above the threshold level will always be 1/2 of the increase of input signal level above the threshold level. Similarly, a ratio of 3:1 means the increase of output level above the threshold level will always be 1/3 of the increase of input signal level above the threshold level, a ratio of 4:1 means the increase of output level above the threshold level will always be 1/4 of the increase of input signal level above the threshold level, and so on. The difference between the input signal’s level and the output signal’s level is, of course, the gain reduction that was applied; so if the input increased 8dB above the threshold level but the corresponding output only increased 2dB above the threshold, it would mean that 6dB (i.e 8dB – 2dB) of gain reduction was applied. Below the threshold level the ratio is always 1:1 and there is no gain reduction – except perhaps for leftover gain reduction due to a slow release time.
The illustration above demonstrates a downwards compressor with a threshold of -20dBFS and a ratio of 3:1. We can see that the input (horizontal axis) has exceeded the threshold level by 6dB (-20dBFS to -14dBFS), but the 3:1 ratio means the output (vertical axis) only exceeds the threshold by 6/3 = 2dB (from -20dBFS to -18dBFS). In this example 4dB of gain reduction was applied, turning an input level increase of 6dB above threshold into an output signal level increase of 2dB above threshold.
For an in-depth understanding of the relationships between threshold, ratio and gain reduction see the first instalment of the Understanding Compression series.
Knee
A downward compressor’s knee control determines how the compressor transitions from a ratio of 1:1 (input signal level below threshold) to the user-selected ratio (i.e. the ratio shown on the ratio control). The knee control usually has two switchable options – hard and soft – although more sophisticated units offer a knee control that is continuously variable from hard to soft. What does it do?
A hard knee means the compressor applies gain reduction at the user-selected ratio as soon as the input signal’s level exceeds the threshold level. A soft knee means the ratio gradually increases from 1:1 to the user-selected ratio as the input signal level approaches and exceeds the threshold.
The illustration above shows a compressor with a threshold level of -16dBFS and a ratio of 8:1. We can see that the hard knee applies the 8:1 ratio as soon as the input signal’s level exceeds the threshold, while the soft knee applies a more gradual transition that begins at 1:1 somewhere below the threshold level (at approx. -22dBFS) and reaches 8:1 somewhere above the threshold level (at approx. -10dBFS).
For an in-depth understanding of how the knee control works, see the second instalment of the Understanding Compression series…
Attack & Release
A downward compressor’s attack control determines how quickly it will apply gain reduction, and its release control determines how quickly it will remove gain reduction. Although usually specified in seconds, the times taken to apply and remove gain reduction also depend on the input signal’s level at the moment the attack or release process begins.
For a full explanation of how a downwards compressor’s attack and release times affect the application and removal of gain reduction, see the second and third instalments of the Understanding Compression series.
Output
Using downwards compression to reduce a signal’s musical dynamic range invariably reduces the peak level of the signal, and we often do this so that we can raise the overall signal level without clipping. This is what the compressor’s Output level control is for. It increases the overall level of the signal coming out of the compressor to ‘make up’ for the loss in peak level; hence it is often referred to as ‘make-up’ gain.
For a full explanation of how a downward compressor’s Output control is used to apply make-up gain, see the third instalment of the Understanding Compression series.
Meters
A downwards compressor will typically have metering to show the input signal’s level, the output signal’s level, and the gain reduction. These levels could be shown on three separate meters, as shown below, or could be shown with one or two meters and a switch to determine what is shown.
The following formula shows the relationship between input signal level, output signal level, and gain reduction:
Output Signal Level = Input Signal Level – Gain Reduction
In the illustration below we can see that the input signal’s level is -6dBFS and there is 4dB of gain reduction. Accordingly, the output signal’s level is -6dBFS – 4dB = -10dBFS.
MIXING WITH COMPRESSION
In the previous instalment of this six-part series we discussed the strategic use of equalisation in the mix, and saw that it served three purposes – corrective, enhancing and integrating. We saw how and when to apply each type of EQ to the individual sounds within the mix, and we explored the concept of tonal perspective to ensure that all of the sounds in our mix worked together tonally.
To reinforce the analytical approach required when transitioning from speaker mixing to headphone mixing, we used a separate plug-in for each EQ purpose and applied them in a methodical manner – starting by correcting problems in the sound, then enhancing the corrected sound and turning it into the finished sound we wanted to hear in the mix, and finally making small tonal adjustments to integrate the sound into the mix’s tonal perspective. To do this we set up a Mixing With Headphones template and configured each channel strip as shown below:
The placement of the compressor shown in the channel strip above is a good default placement for most forms of dynamics processing. It is compressing the sound after the corrective EQ and the enhancing EQ, but before the integrating EQ. Among other things, the integrating EQ will allow us to compensate for the subtle tonal changes that often result from the use of compression – as the great Barry Sherman Keene once advised, always follow compression with EQ. However, when mixing it is very important to understand the interaction between EQ and compression.
An EQ placed before a compressor has the potential to influence how the compressor works. Any significant EQ boosts will make the compressor use more gain reduction when or if the boosted frequencies dominate the signal above the threshold level, and any significant EQ cuts will make the compressor use less gain reduction when or if the cut frequencies dominate the signal above the threshold.
An EQ placed after a compressor typically requires smaller boosts and cuts to achieve the same result without the compressor. It can be used to enhance the impact of compressed transients, and to generally restore any aspects of the tonality that were affected by the compression.
When we combine EQ and compression on the same sound and adjust their respective settings to create the desired sound, a small change in the settings of either processor can lead to a corresponding change on the other. It is vitally important to understand this interaction; if we are using compression and EQ on the same sound, changing one will probably require a compensating change in the other.
Because this instalment of the Mixing With Headphones series focuses on dynamics processing we can pay more attention to where we place each particular type of compression in the channel strip, rather than only using the placement shown above. This approach allows us to avoid or exploit the interactions between compression and EQ in the signal path.
As we saw earlier, we use compression in a mix for three purposes. We use corrective compression to fix fundamental problems with the raw sound, enhancing compression to create the sound we want to hear in the mix, and integrating compression to fit the corrected and enhanced sound into the appropriate dynamic perspective for the mix. Let’s look at those different purposes in detail…
CORRECTIVE COMPRESSION
This is used for controlling a sound’s overall dynamic range and is therefore an objective process. It solves dynamics problems that we can define and measure in a strategic manner. This includes peak limiting to rein in unruly transients, and the subtle use of compression to keep a sound’s musical dynamic range within appropriate limits for the mix. Because corrective compression is a preventative measure we need to apply it as early in the signal path as possible. Whether we insert it before or after any corrective EQ depends on what the corrective EQ is doing, as discussed below.
If the corrective EQ includes significant cuts of low frequency energy to remove unwanted rumble or boominess, then we would be wise to put the corrective compression after the EQ. Why? So that the compressor isn’t being influenced by the unwanted low frequency energy and applying more gain reduction than is ultimately needed. If we remove that unwanted low frequency energy after the compression we will be left with a signal that is being overly-compressed in response to the removed low frequency energy.
If the corrective EQ includes significant boosts of high frequency energy to brighten up an excessively dull sound or to emphasize the impact of transients, then we would be wise to put the corrective compression before the EQ. Why? So that those high frequency boosts don’t get compressed and thereby give the transients too much impact. By definition, corrective compression requires a transparent result because we are cleaning up a performance’s dynamics before adding it to the mix. We might want to compress the EQ’d attack transients later to add impact and further highlight the performer’s articulation, but that is the role of enhancing compression used in conjunction with enhancing EQ as described further on.
It is always wise to consider where we should apply corrective compression in the channel strip – before the corrective EQ or after it. The examples given above are rational guidelines to determine a good starting point; there will be cases where reversing the order sounds better, depending on how the combination of EQ and compression is altering the signal’s envelope. In any case, corrective compression should always be applied transparently; we should be able to see movement on the GR meter but not hear it.
Peak Limiting
This is a classic application of corrective compression. Let’s say we’ve started with a quick ‘fader only’ preview mix to familiarise ourselves with the material we’re going to be mixing – this is a good starting approach when mixing material we’re not familiar with or that is not fresh in our minds. During this preview mix we might find that some sounds contain high transient peak levels that prevent them from reaching the desired level in the mix without clipping. In this situation we have a few options depending on a) the importance of those transient levels, b) how much work is required to fix them, and c) how much time we have to fix them.
If the transients are vitally important to the dynamics of the performance and must be preserved in their original form, our only option is to reduce the level of all other sounds in the mix appropriately to accommodate those high transient peak levels.
If the transients are very short duration and are so fast that they’re barely audible then we have the option of applying hard peak limiting (i.e. high threshold, high ratio, hard knee, fast attack and fast release); if time permits we can use fader automation or clip gain in conjunction with hard peak limiting, or else we can simply edit them out.
In most cases, however, we need to preserve the performance expression contained within the transients while reducing their peak levels. This is a good application for softer peak limiting that combines very fast attack and release times with lower thresholds, lower ratios, and softer knees. The process for setting up a peak limiter this way is detailed in the fourth instalment of the Understanding Compression series (see ‘Peak Limiting’), while a more mathematically detailed explanation for soft peak limiting is detailed in the first instalment (see ‘An Extremely Common Problem’ and ‘A Smarter Solution’).
Musical Dynamic Range Control
Sometimes we simply want to reduce the musical dynamic range of an overly-enthusiastic performance by a dB or two to ‘tidy it up’ dynamically and help it to sit comfortably in the mix. Here’s a quick way to do that:
Start with the settings shown in the illustration above. We’re using peak sensing here because we want to reduce the signal’s musical dynamic range without altering its envelope too much. With these initial settings we should be seeing a very large amount of gain reduction. Now we raise the threshold until we’re not seeing any gain reduction during the lowest level of interest within the performance, but we are seeing gain reduction at all levels above it. This ensures we are not compressing unnecessarily (i.e. below the lowest level of interest), and we are spreading the gain reduction over the entire musical dynamic range of the performance – thereby minimising its effect at any given point on the signal’s envelope.
With the threshold set, we now reduce the ratio until we are seeing suitably small amounts of gain reduction, typically a dB or two. Following this, we can adjust the attack and release times until the gain reduction appears to be moving in time with the signal’s input level – at which point the attack and release times are timed to the dynamics of the performance. We can change the setting of the knee to create a more subtle effect, bearing in mind that changing the knee may require adjustments to the other settings to dial the compression in even further.
This is a simple corrective compression application, and should not be confused with the more sophisticated integrating compression approach of matching dynamics (scroll down).
ENHANCING COMPRESSION
This is used for sculpting an individual note or chord’s envelope, primarily to give it more punch/impact or otherwise alter it aesthetically as part of creating the sound we want to hear in the mix. It is, therefore, a subjective process in which the effects of the processing are meant to be audible. We should be able to see movement on the GR meter and hear it.
Unlike corrective compression – where we typically combine high thresholds with high ratios (limiting) or low thresholds with low ratios (musical dynamic range control) – with enhancing compression we typically set the threshold to affect all of the sound’s envelope except for the natural decay or release at the end of the note or chord. In most cases we use a moderate ratio (3:1 to 6:1), a relatively slow attack time to allow the initial transient to pass through (thereby preserving the performance’s articulation and expression), and a fast release to remove the gain reduction as soon as the signal falls below the threshold level. This form of dynamics processing enhances a sound’s punch and impact while also improving the performer’s consistency from note to note or chord to chord.
It is always wise to consider where we should apply the enhancing compression in the channel strip or signal path. A safe approach is to follow the advice given earlier for corrective compression: if the enhancing EQ contains significant cuts we place the enhancing compression after the EQ, but if the enhancing EQ contains significant boosts we place the enhancing compression before the EQ.
However, because we use enhancing compression creatively to shape a sound’s envelope, we can experiment with placing the enhancing EQ before, after or around the enhancing compression. If, for example, we had used enhancing EQ to boost the attack transient of a sound and we placed that boost before the enhancing compressor, the compression will add further impact to the transient. The result will not be the transparent or natural sound we aspire to for corrective purposes, but that’s not the point of using enhancing EQ or enhancing compression. We might find that we get the best results by placing some parts of the enhancing EQ before the enhancing compression, and other parts after the enhancing compression. For more information about this approach see the author’s ‘Wrap-Around EQ’ below…
Impact
An example of using enhancing compression to add impact was given in the fourth instalment of the Understanding Compression series (see ‘Adding Punch’), where enhancing compression is used to give a percussive sound (e.g. drum hit, plucked guitar string) more impact and punch, as shown in the illustration below. Start with the fastest attack time possible and subtly slow it down to allow more of the attack transient through, knowing that if we make it too long we lose the impact-enhancing effect of the compression. Tweaking the attack time is, therefore, critical for fine-tuning this type of impactful enhancing compression.
Visceral Impact & Enhancing Compression
The subjective use of compression (e.g. to provide impact to a sound) is difficult to judge when mixing on headphones because headphones offer no visceral impact – which means we have to be very analytical when it comes to making a sound ‘punchy’. However, with a bit of experience and by comparing against the relevant reference tracks in our Mixing With Headphones template we can develop a feel for the right amount of compression. Working around the lack of visceral impact is discussed in the fourth instalment of this series (see ‘Visceral Elusion’). As stated earlier in this instalment, if our super-compressed kick sound is 10x punchier than the punchiest kick sound in our reference tracks, it is almost certainly wrong.
INTEGRATING COMPRESSION
This form of dynamics processing is primarily used to fine-tune an individual sound’s musical dynamic range to match the musical dynamic range of other sounds within the mix or within the overall dynamic range of the mix (scroll down to ‘Dynamic Perspective’ and ‘Matching Dynamics’). It allows individual sounds to be dynamically integrated into the mix without requiring huge fader movements to keep them audible – thereby avoiding the expression inversion that occurs when performers are trying to step to the front of the soundstage but the automation is pulling them back.
Integrating compression can often be achieved by fine-tuning the settings of any corrective and/or enhancing compression that’s already in use on the sound, but if those settings are working well together and with their respective EQs it is best to leave them alone. Why? Because when we are chaining numerous EQs and compressors together, each one affects the other and all it takes is one well-intended tweak to change the sound significantly. After that happens no amount of Ctrl-Z will convince us that it sounds the same again – even if everything has been returned to exactly the same settings, there is always a nagging doubt that it doesn’t feel the same any more and we ultimately wish we never touched it. To avoid this form of mixing neurosis it is smarter to add integrating compression as a separate part of the signal path.
It is always best to place the integrating compression before the integrating EQ. Why? Because the process of compression often affects a sound’s tonality, and these tonal changes can be compensated for with the integrating EQ.
Matching Dynamics
This is the primary use of integrating compression. Sometimes a sound does not sit well alongside other sounds in the mix – not due to overlapping frequency spectrums or ridiculous compositional ideas (as discussed in the previous instalment), but because the sound’s performance has moments when it is too soft in the mix and moments when it is too loud. The sound’s musical dynamic range is so large that no single fader position provides an acceptable reference for the majority of the mix; this suggests that the sound is dynamically out of perspective with the mix.
There’s nothing wrong with moving the faders during a mix; that’s why we have linear faders on our mixing consoles rather than rotary knobs. Large fader moves are appropriate for fading a sound completely in or out of a mix. Small fader moves (e.g. ±2dB) are appropriate to maintain a sound’s place within the mix. Barely perceptible fader moves (e.g. ±0.5dB) are excellent for serving the performance, e.g. subtly raising the level when it feels like the performer is trying to step to the front of the stage, and subtly lowering the level when it feels like the performer is trying to step to the back of the stage. This awareness of the feel of a performance allows us to transform a balance into a mix.
However, if a sound requires large fader changes (i.e. greater than ±3dB) simply to maintain an acceptable level in the mix then we should ask why. Perhaps the sound was a vocal track compiled from numerous takes, and the vocalist’s energy and/or distance from the microphone changed from take to take. Perhaps it is an exotic percussion track compiled from numerous sample libraries. These are good justifications for making large fader changes, although a better solution in these cases would be to apply clip gain or similar level automation at the top of the channel strip so that the important level-balancing adjustments occur before the sound enters the channel strip and therefore before any dynamics processing is applied.
What if the problematic sound was captured in a single uninterrupted take but requires large fader movements (e.g. ±3dB or more) because the performance itself gets too loud at some moments in the mix and too soft at others? These changes in level, although perhaps extreme and possibly due to a poor monitor mix during recording, were intentionally made by the performer and are an important part of the expression within the performance. The simplistic solution of turning it up when it’s too soft and turning it down when it’s too loud solves the perceived level problem but risks creating expression inversion (see below). Let’s solve this problem in a smarter way, and then we’ll see how it also minimizes expression inversion.
This awareness of the feel of a performance allows us to transform a balance into a mix.
The illustration above shows the levels of two sounds that are dynamically out of perspective. Sound A has a musical dynamic range of 15dB, and Sound B has a musical dynamic range of 5dB. There are times when Sound A becomes too loud for Sound B and times when it becomes too soft, and using faders or other level automation clearly results in expression inversion.
Let’s say that Sound B sits nicely in the mix and we don’t want to alter its musical dynamic range, therefore we need to reduce the musical dynamic range of Sound A to approach or match the musical dynamic range of Sound B. In this case we ideally need to reduce 15dB down to 5dB, which will require a ratio of 3:1 (i.e. 15/5 = 3). If we applied downwards compression to Sound A with a ratio of 3:1 and the threshold set to the softest moment of musical interest, it will reduce Sound A’s musical dynamic range from 15dB to 5dB, as shown below.
Now both sounds have the same musical dynamic range, but Sound A has lost considerable level. Adding some make-up gain brings it up to a more realistic level for use in the mix.
The extreme example above demonstrates the process of matching dynamics, but does not consider the audible effect that such compression might have on Sound A. Let’s say it is now too compressed. In the example below the ratio has been reduced, increasing the musical dynamic range of Sound A from 5dB to 7dB as shown below. The make-up gain has been reduced accordingly. Because there is only a small difference between each sound’s musical dynamic range, any level discrepancies can be easily solved with small fader movements without causing obvious expression inversion.
A more elegant solution is shown in the illustration below. It combines the downwards compression process described above with upwards compression. We set the downwards compressor’s threshold high enough that it only applies gain reduction to parts of the signal that go too high, and we set the upwards compressor’s threshold low enough so that it only applies gain addition to parts of the signal that go too low.
The strategic application of dynamics processing for matching dynamics – including choosing the appropriate settings for attack, release and knee – is detailed in the fourth instalment of the Understanding Compression series.
EXPRESSION INVERSION
Solving musical dynamic range problems with large fader movements invariably results in a situation where performers are being pulled back in the mix (turned down) when they’re trying to push forward (playing louder or stronger), and are being pushed forward in the mix (turned up) when they’re trying to pull back (playing softer or weaker). The performers’ expression has been inverted: they’re being made softer when they’re trying to play louder, and they’re being made louder when they’re trying to play softer. We are literally mixing the performance out of the sound. We wouldn’t tell a musician to turn down when they’re intending to play loudly or vice versa, so why would we do that to them with the fader?
Once heard in a mix, expression inversion cannot be unheard; it is the work of novices and fader-riders who only focus on levels without considering performance dynamics or expression. It is forcing the music to serve the purposes of the mix, when it should be the other way around: the mix should be serving the purposes of the music.
We are literally mixing the performance out of the sound.
The solution here is the strategic use of compression, as described throughout this instalment, to reduce the performance’s musical dynamic range while maintaining the performer’s expression. Unlike fader movements and automation, downwards compression has the added benefit of altering the sound’s envelope in ways that enhance its perceived impact; thereby making it ‘punchier’ and preserving the performer’s expression by converting perceived level differences into perceived impact differences. Fortuitously, changes in perceived impact were what the performer was trying to achieve by playing louder or softer anyway.
With the strategic application of compression the perceived level of a sound may not change significantly within a mix, but its perceived impact does. If we get it right we can place sounds that were captured with very large musical dynamic ranges alongside sounds that were captured with very small musical dynamic ranges without causing any expression inversion.
What & Where Is The Expression?
The expression is the feeling the performer puts into the performance. Performers with sufficient mastery of their instruments do much more than simply play the right notes and chords at the right times.
A performer can imbue expression (aka feeling) into their performance at any point where they make physical contact with the instrument. This includes when and where they put energy into the instrument to create a sound (drum stick hitting skin, plectrum plucking string, piano hammer hitting strings, mouth on reed, etc.) and/or dampen a sound (e.g. placing a palm upon a drum skin or acoustic guitar strings/body to stop the sound suddenly). These are all points where musicians are able to change the tonality and dynamics of the sound they’re creating depending on how they want it to feel, and that is where we find the expression.
No discussion about expression would be complete without mentioning timing. Experienced musicians will use subtle timing changes to imbue expression, for example, playing ever-so-slightly ahead of the beat to create a sense of urgency, or ever-so-slightly behind the beat to create a more relaxed feeling. This is not the kind of expression we should be altering with tonal or dynamics processing, but it is something we must be aware of when editing a performance, and also when applying spatial processing (as discussed in the next instalment of this series).
Whenever we use dynamics processing we should be aware of how it affects the dynamics of the expression that the performer has embedded into the performance.
When using corrective compression as described earlier we are primarily trying to reduce a sound’s dynamic range to make it more useable within the mix. If we are using peak limiting to control a few wayward transients then we’re less likely to be worried about expression during those very brief moments when gain reduction is applied – because nobody is going to care about one sound in the mix being over-compressed for 0.3s if it means that the remaining three minutes of the mix feels right. It’s a worthwhile compromise. However, if significant gain reduction is being applied to a sound throughout the mix then we must strive to preserve the expression embedded into the performance of that sound.
When using enhancing compression as described earlier our goal is to give the sound impact and punch. For sounds with percussive envelopes (drums, piano, plucked strings, etc.) most of the expression is in the attack of the envelope, so we adjust the attack time to determine how much of the expression gets affected by the processing. We follow this by adjusting the release time to get the processing out of the way as quickly as necessary.
When using integrating compression as described above to match the dynamics of a sound to the dynamics of other sounds or the overall mix, we need to be very careful how it affects the expression contained within the sound. One thing we need to mindful of is the opposite of expression inversion, which is…
Expression Exaggeration
A performer’s expression can extend beyond the notes and chords, and often to good effect. A classic example of this is the sound that a guitarist’s fingers make when sliding up and down the neck between notes. Another example is the breathing and guttural sounds that vocalists make between verses when singing. Both of those examples can work very well to bring expression into a performance, and it is not unusual to make automation adjustments to ‘dial them in’ so that they’re at just the right levels in the mix. It is often the subtlety of those sounds that makes them effective as expressive elements, and it is their relative ‘smallness’ that helps to make the musical aspects of the performance ‘bigger’. However, if we lower the threshold enough to compress those performance sounds/noises we risk giving them too much impact, and they no longer serve their contrasting role against the performance. It is not a problem if that’s the effect we want, but it’s important to be aware of it during the mix because expression exaggeration is easy to miss if we’re not listening for it, and, like expression inversion, once heard in a mix it cannot be unheard.
MAKING GAINS
When working with dynamics processors we have to do more than simply listen to the performance – we have to feel the performance too, because we are altering the performers’ dynamics and therefore altering their expression. When mixing we have to try to preserve, and perhaps enhance, the expression while also fitting the sound into the desired aesthetic and into the mix’s dynamic perspective.
Whenever we use compression we reduce a sound’s dynamic range, and that means we reduce the difference between its loudest and softest moments. When done carefully this makes it easier to place a sound within the mix, but it can also exaggerate performance noises and make it harder to isolate one part of a sound from another because it reduces the important level contrasts between the different parts of a sound. Parts that are meant to be soft become louder, and parts that are meant to be loud become softer.
When mixing we should always make sure we are effectively using our compressors and EQs together, because each affects the other. Nobody cares how well the expression has been preserved if the mix has no clarity or separation, and nobody cares about clarity and separation if the mix has no expression or feeling.
Throughout this and the previous instalment we have focused on using spectral processing and dynamics processing to create and/or maintain tonal perspective and dynamic perspective within our mixes. In the next instalment we will explore the use of spatial processing (delays, reverberation, etc.) to create the desired spatial perspectives within our mix – whether that means giving each sound in the mix a unique spatial perspective for maximum separation, putting all of the sounds in the mix into the same spatial perspective to create a sense of ensemble, or using a combination of both.
Next instalment: Spatial Processing, coming soon...
The general guidelines for ‘wrap-around EQ’ are simple: cuts before compression, boosts after compression. If used in conjunction with the corrective compression example shown above, ‘wrap-around EQ’ allows us to apply considerably more compression before sounding unnatural or constricted, and is a very helpful approach when it is important to maintain an ‘unprocessed’ tonality.
Hi Greg, this is a fantastic series as you’ve explained clearly and great detail things I’ve noticed but not fully understood for years!
One question I’ve also struggled with is determining the best headphone SPL level that gets me closest to the 80 phons contour.
I saw a device that was fairly recently marketed that purports to measure headphone SPL and give a reading, but it is a bit expensive and not sure how well it works.
Unless I missed it in the series, how do you go about setting your headphone level so that all of the tips and concepts in this series fall into place?
Thanks again for a fantastic series! It’s been extremely informative and helpful.
I am glad you found this information useful, Glenn! Regarding measuring the SPL of headphones, I am sure the product you’re referring to will work well enough. I took a different approach, however…
For years I was using a pair of AudioTechnica’s M50X plugged directly the headphone socket of my early 2013 Macbook Pro. I had already established (by comparison with calibrated monitor speakers years ago) that I was close enough to a calibrated level if a) I had the Macbook Pro’s headphone level at a certain number of bars on the Mac’s screen, and b) the metered level of the sound I was working on was averaging around -20dBFS.
Soon after their release I got a pair of Neumann’s NDH20 closed back headphones. I was initially trying to match their level against the M50X, but the NDH20’s higher impedance made it difficult to drive them from my old Macbook Pro’s headphone amplifier – as soon as any significant low frequency energy appeared in the content you could hear the headphone amplifier struggling – which left me wondering what I was actually hearing. I ended up with a CEntrance USB headphone amplifier (a remarkable product) to drive the NDH20s.
To calibrate the NDH20s I did a theoretical SPL estimate using a simple voltmeter with RMS reading. I cut open a cheapo minijack extension lead and connected the voltmeter across the wires for one channel.
The NDH20s have a stated Sensitivity specification of 114dB at 1kHz at 1 volt RMS. That’s 34dB higher than I needed (I was aiming for 80dB SPL), so I figured that if the measured voltage coming out of the CEntrance was 34dB lower than a 1kHz sine wave at 1 volt RMS then the headphones’ SPL would be somewhere around 80dB at 1kHz (i.e. 80 Phons). Volts and SPL are both measurements of pressure and therefore both use the ‘20log(n)’ formula to convert to decibels, so dropping one by 34dB will drop the other by 34dB.
Mathematically, a drop of 34dB is close enough to dividing the voltage by 50, so…
With a 1kHz sine wave coming out of my DAW at -20dBFS (mix bus level with RMS metering), I adjusted the CEntrance’s headphone level until the measured voltage (with the headphones connected) was 1/50 or 0.02 volts RMS. That should theoretically give me an SPL of around 80dBSPL. There are a couple of loose elements to this approach, I am going from memory here, but the resulting difference in perceived level between the NDH20/CEntrance combo and my usual M50X/Macbook combo at 1kHz seemed close enough.
It is more important to remain at a consistent SPL throughout your work rather than getting stuck on being at exactly the reference SPL (whatever that might happen to be).
I now have two tiny dabs of white correction fluid inside the ‘teeth’ of the CEntrance’s level control; one for the NDH20s, and one for the NDH30s (which have 10dB less Sensitivity.) I alter the CEntrance’s volume appropriately depending on which headphones I’m using (it’s mostly the NDH30s these days), and try to keep my DAW’s metered level sitting around -20dBFS. It’s all ‘close enough’.
I hope that was helpful…