Automation Options in the DAW

One of the key advantages of the digital functionality of a DAW is its ability to dynamically automate practically every individual parameter, rotary knob, switch and fader, as well as the virtual mixer, and also each of the plugins used in the session. With the help of graphically represented automation tracks, the functions of all parameters can be freely defined at any point in time, and, as if by magic, the DAW will automatically regulate all these editing processes. The ability to dynamically change every aspect of audio editing is made possible only within the virtual environment – in an analogue studio one would require an infinite number of sound engineers’ hands to manually carry out all the changes in real time during playback.

In the processing of vocal signals, DAW automation offers an enormous range of possibilities, and it represents a real alternative for many editing processes. For example, the control processes conducted automatically by a compressor could also be applied with the appropriate dynamic automation of the volume fader. In some cases, such automation could even offer advantages when compared to using a compressor. There could be situations in which you don’t want to have a compressor working on a lead vocal track. In such a case, volume automation could represent a more flexible solution worth considering. It is also possible to avoid the sound by-products of the compression process in this way.

The same also applies in principle to the de-essing process. Why not reduce the dangerous ‘S’ sounds with careful automation of the volume fader in the level? You would thereby achieve a de-essing without all the possible unwanted shortcomings and artefacts.

Since all parameters can be automated as required, dynamic frequency editing is also conceivable. You may want to regulate the low-mid parts for individual words in a verse with differing levels. This is not so feasible with an EQ; it might be possible with a dynamic EQ but automating the relevant EQ band would also have this effect.

On the other hand, you should not overdo automation – there are enough tasks and requirements which can be fulfilled very well and completely adequately with the use of classic gear or plugins. However, every now and again, it pays to ask yourself whether some tasks really can’t be applied more easily and quickly, and with a better sound result, by using careful and attentive automation. 

In any case, volume automation with the fader, more than anything else, can help promote the most dynamic vocal performance. If the singer shows only slight emotional differences between aggressively loud and emotionally quiet passages in the recordings, it may help to subtly automate the volume during post-editing work. Often, one will want to make the verses a bit quieter, to give the refrain a bit more energy after a small volume increase. Bridges, and in particular breakdown parts, are suitable for bringing the voice upfront very directly and immediately. Volume automation can carefully promote all these subtle impressions and shape the dynamic course of a vocal performance. 

 

Spatiality in the Mix

For various reasons, which we have already discussed in part in previous chapters, modern-day singing and speech are mostly recorded in very anechoic rooms or even cabins with concise reverberation times – such are modern productions. However, this was not always the case – in the early years of sound recording people used the natural spatial information of large studio rooms effectively – including and especially in the case of vocal recordings – this was to give the vocal signal the greatest possible naturalness. After all, every acoustic signal only develops in the interplay of a room’s own interactive echo behaviour and its own natural sound effects.

Nevertheless, in modern sound engineering technology, the practice of recording vocals as neutrally and “dry” as possible has become standard practice, in order to retain all possibilities and options for artistic and production-related spatial changes in the later post-editing stage. There are countless products available on the market today for artistic and, above all, virtual simulation of spatial sound. These create almost every kind of natural sound behaviour in such an amazingly authentic way that, when combined with a compact mix rich in signals, it is tough to determine the difference between it and real three-dimensionality.

It is not; however, only the authentic image of real spaces that can be effortless reproduced with the help of devices and plugins. It is also possible to create very effective, totally unreal spatial projections which could not exist in reality and where one could never have recorded any singing. With these comprehensive options, vocals can be placed in very impressive soundscapes, which in turn can help to make it so unique and exciting for the listener that they will remember the sound of the vocals and the connected artistic-emotional statement more quickly.
 

The two most important elements of spatial effects include (only naturally) the reverb (reverb) and the echo (delay). In most cases, both are integrated into the mixer routing as send effects when it comes to the vocal side of music production – the actual effects device is looped into the insert of its own effects channel. All tracks which are to use the effect send a certain level portion of their direct signal to this effect channel, whereby this signal component runs through the effect device 100 %, and the acoustic result is heard on the main outputs. The dry and unedited signal will, in parallel, also be forwarded to the main outputs; with this one will listen to a mix of processing and dry signal. The ratio of this signal mix will ultimately be determined by using the channel send level.

The processing of vocals with reverb and delay can serve different purposes:

  • Simulation of an artistic space appropriate to the production, whereby originally dry vocals subsequently receive a natural surround sound.
     
  • Deliberate exploitation of sound effects on the depth offset of signals: vocals can be positioned more in the foreground or background of the mix with the application of skilled editing work. With this, they may, for example, float clearly and concisely “in front of the mix” or be homogeneously integrated into the overall sound.
     
  • With flexible spatial simulation, the possibilities for creating unreal spatial conditions, vocal signals with very impressive and effective sound aspects can be equipped. These can help to promote the attention of the listener and at the same time, emphasise the meaningfulness of the song.
     

Algorithmic Reverb

Let us have a quick look at what happens in a real space, how sound is propagated there and ultimately becomes a spatial impression which we will associate as inseparable from the sound. Some of the original sound waves reach our ears directly, while others are reflected beforehand (possibly more than once) in the immediate vicinity; still, others are reflected over and over again very often until they combine and create a diffuse, reverberant sound image depending on the space, its size and the surfaces located therein.

Over time all these sound waves lose their energy, with the higher frequencies breaking down in level fastest while, the lower ones are preserved longer. These three described stages are also known as direct sound, early reflections and reverb (tail). These three parameters and their individual levels are especially important when it comes to describing and perceiving spatiality and reverberation.

Two additional time constants are also important, both for the development of reflected sound in a space and also for the simulation of this process in devices and plugins. These are the time intervals between direct sound and the arrival of the first echoes at the listening point (or ear) and between direct sound and the insertion of the actual reverberant reverb tail.

The first time constant, the Initial Time Delay Gap (ITDG) is provided as a reverb simulation parameter only very rarely, while the second, the pre-delay, is provided all the more frequently. Signals can be positioned as part of the spatial depth classification perception, in particular when manipulating the pre-delay in connection with the decay time of the reverb tail and making adjustments to the high reverb frequencies.

Some reverb devices and plugins still offer additional parameters, e.g. size (spatial size) or density (density of the reflections). However, ultimately these are specific modifications of an original algorithm, that is, calculation models which calculate and simulate early reflections.

So much for parameters: But what is an “algorithmic” reverb? The word “algorithmic” already reveals the following: it is a purely synthetic reverb, which corresponds with the aforementioned number of parameters and can be adjusted. As part of this, many adjustment knobs and variables are changed – these are anchored in an algorithm which calculates the behaviour of the reverb precisely.

This means there are basically countless echoes generated which take over the task of the initial echoes and the reverb tail. In this way, the algorithm keeps the diffusibility, the frequency image and the duration of the reverb under control, using modulations.

First, a dry signal is sent through a variety of “delay lines”. This results in delays which follow one after the other quickly and which are close together. Exactly how these delays take shape, depends on the settings of the size and the form of the “theoretical” space. Mathematical algorithms regulate the timing, volume and sound of the delays using these parameters. It is quite similar to surfaces in a real space.

After the early/initial echoes, the late echoes follow – also known as the “reverb tail”. Thus, you should keep in mind exactly when these occur: they are initial echoes which affect other surfaces!

In order to replicate this, the reverb uses feedback loops to send the generated echoes through the algorithm again. These spatial properties are then combined with the initial echoes already sent by the algorithm are re-applied, and “late reflections” arise.

At this point, however, the reverb algorithm has other variables, which influence timing, volume and the sound of the feedback loop.

Now the length of the reverb can now be determined based on how often the signal is sent through the feedback loop. The more often it is sent through the loop, the longer the reverb is.

With the help of these processes, an algorithmic reverb can generate a pretty convincing real impression of a space. But it also comes with the possibility of generating “surreal” spaces which can convey a strange and unreal sound expression. This is a lot of material available for creative, crazy or even natural reverbs.

Various digital reverbs have established themselves as plugins. Here are just a few of them:

The true classic among digital reverbs, however, is the Lexicon 224, which was first used in music productions in 1978. Even if it was not the first digital reverb device, it is certainly one of the most famous.

Convolution Reverb

As an alternative concept to an algorithmic reverb based on complex mathematical models, reverb devices have also been around for some time which create artificial reverberation using sampled characteristics of real spaces and spatial surroundings.

This so-called “convolution reverb” uses samples (spatial impulse responses) for this, which it includes in the dry output signal. The samples arise with an actual space being vibration-stimulated by a very short acoustic impulse (Knall, DIRAC, Sinus-Sweep), with the result (the “answer” or reaction) being displayed. This gives you a so-called “impulse response”, or IR for short.

Convoluted reverb? Why “convoluted”? In this case, it is not meant that the reverb is symbolically convoluted and inserted into a reverberation device. This is a term which originated from mathematics. A mathematical convolution describes the multiplication of two functions. Alternatively, to put it very simply: the frequency image of the signal is multiplied by the impulse response. We do not, however, want to get too theoretical at this point.

This “response sample” represents the individual and unmistakable real spatial behaviour of this special space and it can be calculated on every audio signal using a corresponding plugin algorithm (the convolution reverb). The audio result in the end is the same as if the dry output signal had actually died away in this space – it results in extremely realistic reverberation behaviour, the naturalness of which cannot be surpassed.
Thus, it is possible to insert a dry signal into every conceivable real space. These may include legendary concert halls or studio rooms, or also reverb devices in unusual places such as the inside of an oil tanker, a plastic bucket or a car boot.

However, the reverberation behaviour of the convoluting reverb often cannot be as flexibly edited as would be possible using the parameters of algorithmic reverb devices and plugins, but it can be considerably more natural and realistic in this regard. When it comes to spatial editing, therefore, you should think carefully about the signal you value so much – either you want the maximum possible naturalness and the amazingly realistic reverb of convolution reverb, or you rely on the flexible simulation and recalculation of echo behaviour with the help of an algorithmic reverb, which can also be strongly adjusted to your own wishes and requirements. The respective decision should be made depending on the desired sound image for the particular song and the sound of the vocals.

Convolution reverbs are not just useful in music production – practically every location can be “convoluted”. Thus, film sound can be brought to life in a picture shot in front of a green screen – on an acoustic level – using the convoluted reverb of a real environment.

Plate Reverb

Over the decades, the particular sound character of the so-called plate reverb has established itself as a consistent stylistic device and a good choice for vocal reverb. The German company EMT Franz initially put forward a monstrous reverberation device in the 1950s, which creates sound echoes over a freely swinging metal plate. The echoes of the EMT 140 and its successors are very compact, with a high-middle pitched, metallic sound character, lending a pleasant and strikingly fresh, artificial space to vocal signals in particular. The original EMT 140 plate reverb, more than 2 metres in length, was replicated by many manufacturers as a virtual plugin emulation, and it is a good selection for a fine vocal reverb. The plate reverb is a popular sound, which serves both as a template for sound behaviour of algorithmic reverbs and as a convolution reverb. This reverb ensures an unbeaten open and light vocal sound in countless releases. 

There have already been several models established in the plugin world in particular, which convey a very realistic or true-to-original sound impression. 

Reverb Selection for Vocals

What kind of reverb should one use when processing vocals? As is so often the case, there can be no general answer or recommendation here either, since the decision depends strongly on the respective sound objective and the requirements of the song. One will generally have the least parameter editing work to do with the selection of a suitable convoluting reverb since these devices come with fewer settings. Even if some parameters can be changed, however, in the event of any doubt it would be better to choose a different, more suitable impulse response rather than to manipulate the one selected and, ultimately, distort its sound. That is because this was not the original idea behind the concept of convolution reverb; after all, you are looking for precisely this unique, particular form of echo found in the space whose impulse response has been chosen. A change or adaptation of this sample will typically, then, lead to significantly worse sound results as compared to simply having selected another impulse answer.

On the other hand, if you decide on an algorithmic reverb, you have many more manipulation options; however, you will never achieve the realism of a convolution reverb. Ultimately, however, outstanding results can be obtained with both concepts. Answering the question of how much reverb one should use with a vocal signal depends very strongly on the current zeitgeist. Reasonable and necessary space is good for every vocal signal since it ensures a basic level of naturalness, with which one should equip a very dryly recorded singing-signal. The length, sound and “colouring” of the reverb signal will vary – we are all familiar with, for example, pop productions which are drowned in mostly very clear and long reverb tails. Every style and every pop music epoch seems to deal with reverbs differently, no doubt to clearly distinguish and differentiate one’s work from everything that has gone before. This meant that, in the 1980s, long and bright, high-frequency reverberation rooms were modern, whereas for the past several years we have experienced a trend more towards drier productions (among other things), and space is often created only with early reflections and/or delays, as well as spatial ranges with very little reverberation rating. The Rock’n’Roll era of the 1950s used very short reverb times, and the legendary slap delay – hardly any recordings by Elvis Presley or other representatives of this genre got by without this punchy and in-your-face bathroom sound. 

Algorithmic ReverbConvolution Reverb
Free and flexible processing of all relevant parameters.Fixed characteristic echo patterns and reverberation behaviour of the convoluted space
Unreal and unusual spaces can be created through deliberate “abuse” of the parameters.Extremely realistic sound
Not-as-realistic results as you would get with convolution reverb technology.The opportunity to use the fantastic rooms of large concert halls, studios, etc.
More resources-friendly than convolution reverb.Very unusual sound spaces available (cans, forest, shoebox, etc.)
Fully suitable for most tasks in a mix and especially when editing less important signals.Especially suitable for very high-value and important signals, which are far at the front in the mix. Tips for editing the vocal reverb.

Tips for Processing Vocal Reverb

No matter the current taste in music, when it comes to editing vocals with reverb, as we have seen, it is not just a purely realistic simulation of space that needs to be in the foreground. Much importance is also placed on editing, which is capable of making the main voice (which ultimately is the most important element of the song) radiant, assertive and unusually attractive. It is not uncommon to use multiple reverb devices for this with different settings and which give the voice different sound aspects (even if only in small proportions).

A device with a rather short and compact reverb plate emulation, but rich in early reflections, ensures a full, voluminous and significant basic sound. Another reverb with a slightly longer reverberation time helps to embed the voice in the accompanying arrangement of the other instruments and gives it depth and substance at the same time. Thus, the vocal reverb can still be rich in high-frequency components, resulting in a more radiant and shiny impression. Often one will also work with a rather long-selected pre-delay (popularly around 100 ms), which decouples the direct sound of the voice from the reverberation, with the consequence that the voice will, as usual, be perceived as very present and clear in the near foreground of the sound image. Naturally, one can also get very good results with only a reverb device or a convoluting reverb plugin.

  • Two different reverb devices can help you to achieve different spatial sound aspects.
     
  • A very compact, rather short plate reverb (EMT 140 Simulation, Plate etc.) but rich in early reflections for a voluminous and assertive voice
     
  • A longer, finer reverberation for embedding the voice in the playback, for depth and a refined appearance
     
  • High-frequency components in the reverb help promote the shine and radiance of the vocals.
     
  • If you want to make the reverberation more natural and inconspicuous, on the other hand, you should dampen the high-frequency components somewhat (Low-Pass/High-Cut Filter). This corresponds to the normal decay behaviour of echoes in nature.
     
  • Often longer pre-delay (not rarely up to 100 ms and depending on the rhythm of the song), to decouple the direct sound component of the voice from the reverberation. Thus, the voice gains significance and clarity, and the impression of closeness – despite high-quality reverberation – is promoted.
     

Here is a sample representation of the parameter settings of two algorithmic reverb devices which illustrate what has just been described: 

Delay (Echo)

An effect closely related to reverb, which plays an almost equally important role in the  editing of vocals in the mix and is one of the standard audio editing tools, is echo, known mostly as delay. As we learned about reverb in the previous chapter, echo effects are, strictly speaking, the acoustic basis for what we designate as reverb and attempt to simulate with the help of complex algorithms or elaborate convolution with the help of hardware devices or virtual plugins. After all, the echoes that are so important for the creation of a compact and more or less long reverb tail, are nothing other than short echoes which resound from walls, floors and ceilings due to the nature of the space as well as the reflecting surfaces; echoes which overlay each other. Initial sound event echoes (early reflections) responsible for reverb creation and the already mentioned pre-delay, too, are built on very short delays (in the 1 to 2-digit millisecond range).

Thus, delays consist of very important additional sound information which subconsciously helps us to relate an audio event to a surrounding space typical of it. With this information, we can understand how far away from us, a sound event is taking place, and what the space and its reflective surfaces could be exactly. Thus, a delay, in connection with (and, ultimately, no less than) an important component of the reverb, is the effect which allows us to give a relatively dry audio signal with little natural spatial information, such as a vocal recording, a suitable and objective three-dimensionality for the respective production and the desired sound concept which, for the listener, represents an artistic reality which we have created.

Technically speaking, delay is not really a complex effect. A signal present at the input of the device is issued at the output, in an artistically delayed state, by the delay switch. If the signal delayed in this way is played back again at the input, to be delayed there again, it will result in a so-called feedback loop, which will continue for as long as the audio signal still has enough energy to be thrown back on itself. 

Tape Delay

The concept of delay, which is simple in itself, can be found in different device types (in a slightly modified form and with correspondingly different parameterisation) creating a wide variety of options for editing audio material.

A so-called tape delay thus simulates the echo behaviour that was to be achieved using the original hardware devices. In these early devices, the delayed signal was captured and repeated by successively recording and repeating the signal on tape, using several record/play heads arranged in a row. The delays achieved in this manner were mostly of increasingly bad sound quality as the signal lost more and more of its high frequencies with each repetition becoming increasingly distorted.

This gradual worsening of the delay sound became so characteristic and striking, its distinctive sound even decisively shaped entire music styles (e.g. reggae, dub, etc.). This sound behaviour can even be elaborately simulated and reproduced most authentically in today’s virtual tape delay plugins. In order to do this, the individual delay repetitions are edited (more or less intensively) with high cut filters, special EQ settings, compression and distorting tape saturation simulation, so that the delay signal will get that coveted vintage sound.

Many legendary tape delay device emulators also still offer massively extended filter and editing options, with which very interesting and unusual effects can be achieved, including sound design options.

Sound Example 5

Multi-Tap Delay

So-called multi-tap delays make up another kind of contemporary delay devices/plugins. They offer a variety of simultaneous and very flexibly selectable signal tapping options, delivered mostly on a virtual basis. These allow individual delay signals to be respectively individually edited with filters and EQs, and also rhythmically synchronised precisely with the song tempo in different note values. This comprehensive parameterisation can give rise to very complicated repetition patterns which are capable of turning a possibly somewhat one-dimensional and unspectacular dry signal into a real wonder marked by acoustic and rhythmic complexity. In electronic music, in particular, these complex delays are readily applied using all possible instruments, giving the track a special mood and atmosphere. With vocal signals, too, the sound effects triggered by a multi-tap delay are quite good (especially in electronic genres); they can contribute to a voice that is thus edited remaining impressive in the mind of the listener. The strong recognition effect can help to make a “normal” song a hit. 

Sound Example 6

Multi-Effect Delay

If one combines the concept of multi-tap delay along with the ability to modulate the individual parameters of other source signals, thus letting them change dynamically, one can get very complex multi-effect delays or even modulation delays, with which particularly unusual effects can be created. Instrumental or vocal signals edited in this way can take on completely different sound characteristics and ultimately be “edited up” until they are completely unrecognisable, which in turn can be very practical and helpful for many applications, especially in electronic music.

Exactly which kind of delay you decide to use when editing vocal recordings, depends very much on what kind of music you are currently producing and your specific sound goals for the production. In just about every case, you will want to set the vocal delay in rhythmic relation to the tempo and groove of the song. A classic application of this technique often used by sound engineers when mixing vocals is to time the delay effects to the song’s tempo in eighth or quarter notes.

You can also get quite good results with triplet or dotted delay time values. The delay times and also the feedback values should not be selected too long or too high, respectively, lest the repetitions of certain singing passages be inevitably reproduced in the words of the following repetition and so on until sooner or later you are left with a load of incomprehensible gibberish. If the individual singing phrases are divided by long breaks, this will result in somewhat longer delay values.

If the individual sentences follow each other very closely, you will have to work with very short delay times (if you work with any at all). One form of editing which is quite often practised is to let the delay effect actually start only at the ends of sentences/phrases, or at least significantly dampen them within sentences. This can be achieved either with volume automation of the delay send level, or by turning it down, or even by completely muting the delay channel. It is necessary to decide how you will ensure the intelligibility of the song despite the use of delays, on a case by case basis and depending on the audio material.

One should also consider the spatial positioning of the delay signal in the stereo panorama. A delay does not always have to affect the entire stereo width; it is often more effective to organise a delay return on a certain side and maintain mono conditions. 

Sound Example 7

Cross-Routing Reverb and Delay

In many cases, it makes sense to re-route the individually controlled send effects reverb and delay back into each other by so-called cross routing. Part of the delay effect channel’s level can be fed into the reverb effect channel using send. The delay is thus partly processed with additional reverb, resulting in a very dense effect that differs significantly from parallel signal routing of the two effects. Alternatively, a portion of the reverb channel’s signal can also be sent into the delay effect channel. You should not be afraid to extensively edit the effect channel with additional insert effects. Very interesting sounds may be created by the use of compressors, EQs/filters, distortion and more complex modulation effects that heavily process the reverb/delay signal in the inserts. As is so often the motto here: experimenting with routing possibilities is not only allowed, it can also lead to very interesting and useful effects – there is no right or wrong, anything which sounds good is allowed.

As a starting point for processing vocal signals with delay in the mix, we want to make a suggestion here. As already mentioned, however, different processing possibilities and configurations are just as possible and goal-oriented.

  • Depending on the tempo and rhythm of the song as well as the density of the individual vocal phrases, you should choose 1-3 different delay effect channels, which are individually applied in the DAW. You can name these channels, for example, DEL 8th, DEL 4th, DEL Tape and assign separate aux or bus routings to them.
     
  • The individual delays are loaded into the inserts of the effect channels (e.g. a simple delay in 1/8 note rhythm, one in 1/4 note rhythm, and a tape delay with dotted 1/8 note groove).
     
  • The feedback of the individual delays is kept very moderate, especially if the text phrases of the vocals are already quite dense. Feedback values of about 10-15% are usually enough. The tape delay’s feedback can be a bit more pronounced.
     
  • The delay plugins are all set to 100% wet, which means that the incoming send portions of the vocal channels are processed at 100%. The faders of the effect channels remain in 0dB position, but you can distribute the three delays a bit in the stereo panorama if necessary.
     
  • Now three different sends and their individual levels send different signal components to the delay effect channels. That does not have to be much in individual cases; here usually already very low send levels are sufficient for the desired effects. As already mentioned, the send levels are branched off in POST fader mode, which means that if the channel strip volume is reduced via the fader, the branched-off send signal also becomes correspondingly quieter. Although there are situations in which the exact opposite is desired and can be realized via a PRE fader circuit, the POST fader variant is usually better for this purpose.
     
  • If you want to use even more prominent and distinctive delay effects, you can either create new additional effect channels with corresponding multi-tap or modulation delays, or you can edit existing delay channels with corresponding insert plug-ins.
     

Cross-routing in Cubase (Lead Vocals to Reverb + Delay and Delay to Reverb)

Cross-routing in Pro Tools (Lead Vocals to Reverb + Delay and Delay to Reverb)

Likewise, dynamic automation of the individual parameters or the applied send level can help to create more unusual and exciting effects.

Sound Example 8

Other Send-Effects

In addition to classic echoes and reverbs, there is a whole range of other effects, which can easily be controlled with an aux send and added to the mix at will. We would like to have a brief look at a few typical representatives of these here:

Slapback

Slapback is essentially nothing more than a simple, one-time echo. The origin of this one-time repetition originates from tape machines. With such devices there is a small spatial distance between the recording and playback head, and – depending on the tape speed and the distance between the heads – a certain period of time will pass before the recorded signal is issued again.

In the digital age, slapback is possible by setting a simple time delay to the send channel. Common delay times here are approx. 45 – 70 ms. To make the delayed signal a bit more interesting in terms of sound and make it sound a bit more like tape, it is not uncommon to give the signal a bit of distortion and to lower the highs. Although, even in combination with other effects, a slight delay with the mixing effect can often lead to interesting results.

Sound Example 9

Slapback Delay

Audio Player

00:00 | 00:06Use Up/Down Arrow keys to increase or decrease volume.

Widener

The word “widener” is, strictly speaking, a collective term for tools which allow for spatial broadening. In popular music, one often has a mono-signal for the lead vocals and would like to get this “space-filling” effect stretched out. If one already has a stereo signal, a simple broadening can, of course, also be achieved by raising the side channel, but this would have no effect with a mono signal since there is no side part available at all. There are different techniques for broadening mono-signals.

The following are suitable examples of giving a mono-signal a certain width: a short stereo delay, a doubler or a stereo chorus (see also the “Classic Modulation Effects” chapter). Other techniques work by using different frequency processing for the right and left audio channels. A one-side time delay also ensures a stereo image – but be careful: The perception of the signal then shifts to the side on which it first sounds. One simple widening technique which is mostly quite effective with vocals is to shift the vocal track on a send channel with differences in pitch on the left side and the right side.

It is often sufficient to, for example, pitch the left channel one-tenth of a semitone lower and the right channel one-tenth of a semitone higher, to get a “floating” width – this technique often functions very well, especially with singing. Of course, however, all these broadening techniques should not be used in place of a doubling; rather, in the mix, they can help to reach additional “size”.

Sound Example 10

With Mono-to-Stereo

Audio Player

00:00 | 00:06Use Up/Down Arrow keys to increase or decrease volume.

Without Mono-to-Stereo

Audio Player

00:00 | 00:06Use Up/Down Arrow keys to increase or decrease volume.

Routing is Everything!

In the modern practice of mixing there are usually several different send effects used in connection with vocals, which, through well-thought-out cross-routing (i.e. mutual influence), often in the end converge with dry vocals on a bus and are likely then again “united” by bus compression. You often only notice how many different send effects were used when you listen to current music mixes closely. Usually, it is precisely the combination of many different effects in relatively small proportions, as well as their intelligent combination, which creates a sound image that really showcases the vocals to their best advantage.

We have already shown that a cross-routing between delay and reverb is a very sensible procedure. As already explained, however, this, of course, applies not only for reverbs and echoes, but basically for all conceivable signals which may be applied in a mix.

With each one, you can ask yourself in which other effects the signal should be distributed and at what ratios the effects should work together. It is not uncommon for the selected cross-routing to have a considerable influence on the hierarchy of the signals and their spatial effect in relation to each other. It is therefore worthwhile exploring different routing options, signal flow sequences and ratios in a playful way and always trying something new and unconventional once again.

One should also not forget the sidechain function of gates and compressors! This is how the famous “gated snare” effect came about, in which the reverb of a snare was gated by the snare signal. It results in an unnaturally large, choppy reverberation space, the likes of which shaped 80s pop music enormously. This too is, of course, a conceivable effect on vocal signals. Another technique frequently used in modern pop music is, for example, ducking your own echo. The procedure is relatively simple: the dry voice is sent to a delay via send. On the delay channel, there is also a compressor, which should compress the echo.

However, on this compressor, the external sidechain is activated, so that it responds not to the echo signal itself, but to an externally supplied signal. If you send dry vocals to this sidechain input, then the echo will always be reduced when the singing sounds. In this example, a relatively loud echo can be used, and a compact “vocals carpet” created, without it resulting in unfavourable competition for the direct signal. The vocals can be definitively at the front in this way, and the echo flag only really comes out in the singing breaks.

In addition, in the practice of mixing it is not just dedicated send effects mixed on additional channels; dynamics, distortions and EQs are also used on send channels. Classic parallel compression (“New York Style”), compression or frequency editing of reverberation spaces or echoes, and even the isolation of certain frequency ranges for more controlled use, are all used by professionals to make popular mixing tracks. One can obtain and develop, for example, a “vocal shimmer”, by isolating only the highs of the vocals on a send channel and leaving them condensed with relatively strong compression (along with, if necessary, counter-de-essing). This “shimmer” can then be added in the mix, in a carefully measured manner. 

As these examples show, the following rings true with the various send effects as well as in an entire mix: the interaction of the different elements provides only the common sound image, and it is well worth experimenting with different routing options. In this area, in particular, the possibilities are endless.

Exercise 4

  1. What is the difference between destructive and non-destructive work?
     
  2. In which frequency ranges can the components of a voice be classified (fundamental range, presence, sibilants)?
     
  3. How can a low-cut filter help when it is used on vocal tracks?
     
  4. How is an excessively long attack time expressed in sound terms in the compression of vocals?
     
  5. Moreover, how is an excessively long release time expressed in sound terms?
     
  6. Which parameter is influenced by the hard/soft knee in a compressor?
     
  7. Outline how an algorithmic reverb works.
     
  8. If you had to choose the reverb for a vocal recording of a classical song, would you prefer a convolution reverb or an algorithmic reverb? Why?
     
  9. Why can it make sense to send the delay into the reverb along with the vocals?
     

Pitch Correction

One of the biggest hurdles on the path to a perfect vocal performance is the intonation of the singer, which should be correct and as clean as possible.  During the recordings, along with the artistic emotional expression, they should, more than anything else, have the “melody notes resonate” with the correct and appropriate voice “colouring” and dynamics and the corresponding song text.

However, it is not just heavy intonation blasts that should be avoided. The singer should also pay attention to correct micro-intonation. There are many (mostly inexperienced and untrained) studio singers who tend to intonate too low (flat) or too high (sharp). This may also be due to the fact that the headphones mix is not optimally set. An excessively loud mix in the headphones leads to a rather sharp intonation; conversely, a mix that is too quiet in many cases leads to flat vocals.

What should be done if you have been unable to fully eliminate these strong and/or minimal pitch fluctuations during recording? Subsequent pitch correction is necessary – an operation which, in earlier decades, was possible only with the help of complex pitch shifting and time stretching with the audio passages in question.

The resulting adjustments in the pitch of the vocal signal were mostly accompanied by rather poor and, above all, not unnoticed changes in quality. No-one would have argued that pitch correction was unobtrusive.

Manual Pitch Correction

Behind the bulky term “manual pitch correction” lies what is probably the most controversial trick of sound technology: correcting a singer’s wrong notes! This is perhaps an oversimplification.

Tools and editors which offer manual pitch correction are capable of analysing audio material (mainly vocal signals) and determining the natural direction of a pitch. This pitch information can then be used, in a manner similar to a MIDI editor, to edit the song’s melody. 

Studio One with Melodyne (via ARA2)

Logic Pro X with Flex Pitch

Cubase Pro with Variaudio

Melodyne – the Revolution

At the start of the 2000s, the German company Celemony developed an independent and at the same time revolutionary way to edit pitches of audio material graphically and with the highest sound quality, in the form of Melodyne, whereby monophonic or polyphonic audio data is analysed in detail and, depending on the pitch, sound lengths and levels are displayed graphically.

Following an analysis, all these parameters are freely accessible, and they can be edited and shifted with total flexibility. This allows not only for pitch corrections with highest audio quality; length, articulation, volume, transitions, formant shares and vibrato with individual phrases, words and sounds can also be individually edited.

The resulting corrections are very realistic within a wide range of values and represent the absolute professional standard with regard to audio manipulation, time-stretching and pitch correction. With Melodye, it is, to a certain extent, possible to transform any vocal performance into a completely different melody and rhythmic structure with a certain melodic delivery or timing.

The possibilities of manipulation are almost frightening, and one can certainly argue about the sense and nonsense of such a powerful manipulation. In any case, Celemonys Melodyne offers the most impressive and comprehensive range of functions for editing vocal tracks and is therefore recommended as the absolute standard for professional vocal editing and/or creative sound design for any sound engineer.

Auto-Tune

But there is also a fully-automatic form of pitch correction which occurs (at least approximately) in real-time! The effect is quite well-known as “auto-tune”, even though this is actually a brand name.

The software plugin Auto-Tune from the company Antares was revolutionary in this area, as it enabled authentic-sounding pitch correction for the first time, by forcing the pitch of the melody of the input signal to match the predefined frequencies of a reference scale. If in the original signal, any pitches were analysed which did not coincide with the values of the target scale, these were adjusted accordingly.

The level and speed of the corrections can also be freely adjusted. By quickly and radically changing pitches to match just a few “allowed” reference tones, some very distinctive and immediate frequency jumps emerged between sound transitions, which were, initially, used as a conscious sound effect and later to extremes in the mega-hit “Believe” by the artist Cher.

The resulting sound effect as applied to a human singing voice was so characteristic and had such a strong recognition value that people still speak of the so-called “Cher effect” today, which is actually the result of using an overly extreme reaction parameter adjustment, combined with a drastic scale limitation in Auto-Tune.
This world-famous effect basically came about as a result of an “incorrect”, or at least very unorthodox, handling of the plugin – a prime example of the principle of innovation with unprejudiced experimentation using the tools that were available.

Moderate and gentle use of Auto-Tune, on the other hand, leads to quite good results when editing vocal signals, and unwanted intonation fluctuations are corrected safely and largely inconspicuously. However, you should only really edit the critical passages of the vocal performance with the options offered by Auto-Tunes – the effect is usually not completely inaudible, even with the most careful use.

Today the effect can be found in countless rap, trap and cloud rap genres; it has indeed massively characterised the current epoch of hip-hop music. But Auto-Tune and consorts are also massively used in pop, EDM, rock and folk, even if it is usually less noticeable than otherwise.

In addition to the various versions and offshoots of the classic “Auto-Tune” from Antares, there are of course many other suppliers of comparable software as well. Incidentally, in everyday life in modern music production, it is advisable to have a few different versions of these tools in stock. In many cases, a direct comparison shows that different Autotune tools do not function equally well with a specific voice. It is, therefore, difficult to provide a general assessment; rather, it is always a question of the desired effect and the respective voice. Here are a couple of suggestions for different Auto-tune tools: Waves Tune (Real-Time), GVST GSnap, Auburn Graillon or Izotope VocalSynth. In addition, many of the commercially available DAWs already include real-time pitch correction. These alternatives are definitely also worth trying!

Auto-Tune in Production Usage

There often arises the question of the point at which Auto-tune should be used in a production. While an artist may insist on listening to the effect at 100% over their headphones during recording, in other cases the better solution would be to record the vocals “clean” and wait for the mixing stage to do pitch correction work, for the sake of promoting intonation or including a certain sound “colouring” in the vein of the “Cher effect”.

Whether or not a real-time Auto-tune effect on headphones during recording is sensible, naturally depends on the technical feasibility factor, but more than anything else, it also depends on the desired sound effect. Many singers deliberately play with the sound transitions, which generate a certain “flutter” through pitch quantisation. If such a striking auto-tune sound effect is desired, the use of the plugin on the headphones mix makes sense – it means that the singer can “play” the effect during the performance. This approach is, for the sake of example, frequently used in more modern hip-hop EDM or Trap. If, on the other hand, a more natural singing performance in the foreground is what is wanted, and pitch correction is to be used only in the context of sound “colouring”, or even just for the sake of obtaining a correction which is as inaudible as possible, it is advisable to put the vocals on the headphones without Auto-tune, so that the singer can better control their actual intonation without being distracted from their performance by the effect. In pop or rock music, one would tend to choose this route. Of course, the needs of the artist are also as individual as art itself; for this reason, this topic should never be decided on without consulting the singer.

In any case, however, the effect is not directly recorded, meaning that, afterwards, you also have the opportunity to change settings or to apply plugins prior to the pitch correction.

As a matter of principle, it should be borne in mind that real-time editing represents a certain sound-technological challenge when it comes to headphone monitoring. Just like with any digital editing, a certain buffer time is required for calculation purposes in the case of pitch correction as well – this, of course, also creates a certain temporal delay over the headphones. Exactly how long this time period is, depends on the complexity of the editing algorithm, as well as the available CPU power and, of course, the tasks that are to be calculated alongside this. As a matter of principle, under such circumstances, one requires a relatively powerful computer and a resource-saving algorithm, and it also makes sense to close, or bounce/freeze, all unnecessary background programs and plugins, to enable as little latency as possible. A special session is often held for more complex vocal recordings, in which only exported group tracks of the playback are available. Since these only need to be played, and all effects are already included, the studio computer will be significantly relieved. During recording, a simpler auto-tune effect can also be used, which will then later be replaced by a more resource-intensive algorithm.

Some audio interfaces also offer the option of creating a mix, including complex effects, in near-real-time, with the help of internal processors. This solution allows you to apply Auto-tune effects to the voice during recording without being dependent on the computing power of the recording computer.

Sadly, the term “auto-tune” has taken on a negative meaning on account of its frequent strong effects-related use. It is also a helper which can even be almost invisible on a vocals track and correct the voice on a pre-defined sound scale. This eliminates the need for costly manual correction if the singer has not strictly sung “along with the pitch”.

Here are some examples of “effects usage”, and soft and “invisible” use. In both examples, Antares Auto-Tune PRO was used. 

Sound Example 11

Why Make Corrections Anyway?

The question of whether one should correct vocal recordings in terms of pitch and timing initially seems somewhat banal. Obviously, incorrectly-sung tones and poor intonation should be corrected – this much is clear! However, manual or even automatic pitch and timing corrections also find their way onto the recordings of highly professional singers whose performances are immaculate.

In modern music, it is often common for the instrumentals to be completely comprised of electronic or virtual instruments. Pop music, in particular, usually still contains only a very few “natural” instruments. It should be clear that a virtual instrument, in its full digital splendour, only rarely varies in pitch or is out of tune. However, the exact intonation of these instruments can be challenging when combined with a natural voice. Under these circumstances, even the most natural deviations in the voice are dangerous, and even slight deviations or a strong vibrato in the voice might not sound completely “clean”. Therefore, pitches may be precisely corrected even in seemingly perfect vocal takes.

Even if such corrections are barely perceptible the first time you listen, they still ensure the usual “perfect” vocal sound, as we have all heard from Rihanna, Katy Perry and Madonna. 

ARA 2

Those who like Melodyne should also like ARA! When it comes to correcting pitches and timing, there have been many different programs developed over the past years – Melodyne, Antares Autotune, SynchroArts VocAlign and ReVoice, and then some. However, using these in the context of separate editors is only moderately practical. Thus, to a given extent, audio tracks had to be “recorded” again in programs like Melodyne in order that they could be edited.

However, this changed with the idea of the VST and AU plugin extension ARA. ARA means Audio Random Access, and it has revolutionised the pitch correction tools’ sector. In 2018 and 2019, version 2 was integrated into different DAWs. A team from Celemony (Melodyne) and Presonus (Studio One) played a key role in its development. In 2019, Steinberg also implemented the ARA extension int0 the DAWs Cubase and Nuendo.

How does it work?

With classic plugin formats, data could only be edited in real-time. This meant that, for the plugins to be able to work, there had to be a data stream at the input. This data stream was, so to speak, received, edited and re-issued serially. This limits the section that the plugins can edit – only the section which is currently being replayed.

For editors like Melodyne, which have their own views of events (or so-called “blobs”), this is certainly problematic. These plugins essentially require static values, on which the edits are applied. And that is where ARA comes in.

ARA enables permanent access to the audio files present in the sequencer of the DAW – completely independent of the play position. With this, Melodyne and the DAW sequencer can work as independent editors while always knowing what the respective other editor is doing. In this way, if one moves a clip in the DAW, the event is also automatically moved in Melodyne. If the length of a note is changed in Melodyne, it is immediately reflected in the DAW sequencer. This simplifies the workflow very much and speeds up work with programs such as Melodyne or VocAlign. 

Tips and Tricks for Vocal Processing

In the following we would like to offer some useful tips and tricks derived from regular experience in the recording studio. These are applicable to daily work and especially for post-recording vocal signal processing. Some of these are based on principles and concepts already introduced in previous chapters.

Recommended Routing Setup for Vocal Mixes

A suitable routing setup will, of course, look slightly different for each music production, and be accordingly adjusted for its respective requirements; the number of tracks and the integration of effects. However, it makes sense to consider using a standard routing setup – which can, in many cases, be used as a general template. This facilitates a quick session overview, since the same configuration can always be used in a similar way at the same juncture. This will, consequentially, enable faster and more purposeful work.

  • It is important to name the individual tracks in a meaningful and clear manner and, if appropriate, give them a specific colour scheme, most of all with regard to the routing configuration of vocal tracks during a given session.
     
  • You should give some real consideration as to whether you will route each of the vocal tracks individually and independently of the stereo aggregate (the main stereo output) or if you might combine them into one or more sub-mix channels. A vocal stereo submix which combines all vocal signals on the main-out prior to the output, offers the advantage that the vocals can be quickly removed from the signal path, or soloed, with a single click by using the fader’s mute/solo switch. This can be very helpful in many situations. Thus additionally, you could supply even more common sound arrangements for all vocal signals using the submix channel’s inserts – possibly a slight common compression, a bit of tape saturation or a special kind of final EQ-ing. Using this method you could, in some situations, even give all of the vocal signals a certain reverb or delay effect by using send in the submix channel; however, it is probably better to route reverb and delay effects using the individual sends of the individual channels.
     

Parallel Compression and Parallel Processing

One popular type of compression editing, called parallel compression, was originally used to gain flexibility when compressing drums and percussive signals, and has recently become quite common for compressing the dynamic range of a vocal signal. During the 1960s, this practice was named the “New York Drums Trick” or the “Motown Trick”. The principle behind this compression technique can be explained quickly. Parallel to the actual vocal signal, which remains untouched during the dynamic editing stage, an identical copy of the track is strongly compressed in its dynamics, thanks to an extreme compressor setting.

The compressor settings required for this have such an unusually strong effect that signals compressed in this manner, when heard alone, sound distorted almost to the point that it was unrecognisable. However, if this supercompressed signal is quietly mixed into the main vocal track with a subtle volume level, then the combined sound result receives a convincing and effective kind of stability. The mixture of the two signals does not really sound compressed; nevertheless, it has all the sound characteristics of a sufficiently compressed vocal signal. The advantage of parallel compression is that all of the negative aspects normally found in strong compression are not evident.
 

  • The percussive transients at the beginning of words and consonant syllables remain completely preserved in the unedited signal, which the listener will perceive as a transparent and direct naturalness of sound and performance. In addition, the transients define the “punch” and the liveliness of the vocal performance.
     
  • The strongly compressed parallel signal ensures a constant, wide and compact sound basis for the overall sound which, when subtly mixed together, gives the combined output signal a noticeable (rather than directly audible) kind of fullness and stability. The overall sound becomes significantly more powerful and assertive, without sounding noticeably compressed.
     
  • Depending on the chosen settings of the time constants of the compressor – attack and release – a strong pumping sustain can be mixed in with the unedited signal, which fills the time gaps of the song (with exact settings conditions) and as such, enables a sound density that one could never achieve using normal compression. This can be used very effectively with singing in particular, by filling the spaces between the individual sentences and phrases by “pumping back” the compressor signal, which leads to the creation of a particularly compact overall performance.
     

Just exactly how you finally implement the parallel tapping of the vocal signal depends on the given circumstances and requirements valid within the session and the production considering the tracks present. One can, for example, simply create an identical duplicate of the original track – this is certainly the easiest way to get a parallel signal. Implementation of the parallel signal (by branching off a level component via the Send Routing of the DAW mixer) is somewhat more complex. In any case, you must make sure that both signals are played back identically, both sample-wise and timing-wise; otherwise, you can experience frequency cancellations, resulting from phase shifting. The level of the parallel compression track does not need to be selected at a high rating for the purpose of editing the vocals; just a very quiet intermixture will bring about the desired effects..

It is not just the effect of compression that is available for parallel editing by using duplicate tracks; this can be done just as effectively with effects added to the original signal in parallel, as well as virtual tape saturation emulation, EQing, filtering and all types of distortion.

One good example for such parallel editing in the real world of professional music production is the vocal sound of the Nickelback singer Chad Kroeger. When you listen closely, you can recognise parallel editing characteristics on Chad’s lead vocals, on the band’s early recordings (“How you remind me”/Silver Side Up) in particular but also on its later hits (“Photograph”/All the right reasons or “I’d come for you”/Dark Horse). Amp simulations or parallel vocals distorted with bit crusher effects, some of which have a clear tremolo effect, promote and compress the main vocal track and give it an unmistakably rough character with a definite presence despite how it is relatively subtly used.

Meanwhile, in many compressor plugins, one can find a dry/wet regulator, which makes parallel compression easy, all without complex routing. Thus, the wet value determines the volume of the compressed signal in relation to the unedited original signal.

Sound Example 13

Panning – Centre, Left or Right?

Although in most productions the lead vocal track is placed in the centre, you should not regard this rule as obligatory and may experiment with different positioning options in the stereo field. If you want to especially highlight and emphasise specific sentences, phrases or vocal fills, an unusual panorama position can help to emphasise the desired effect.

In most cases, doubled tracks, harmony vocals and choir arrangements should be distributed over the stereo field, and this – in combination with the central lead vocal sound – will yield a wide and impressive sound image.

Stereo Widening with Short L/R Delays

To give the mono recording of a singing voice a breath of stereo effect and, in this way, be able to make it “wider” in the truest sense of the word, one will normally use a simple Stereo Delay with different values (which can be adjusted independently of each other) for the left and the right side. One will use such a delay in an insert channel of an effect send and will select for each side slightly different, quite short delay values in the milliseconds’ range (e.g. left side 10-15 ms, right side 20-40 ms). The effect value is set to 100 % wet – after all, this is a sound effect expected to edit, in its entirety, the whole signal portion assigned to it. If you send a certain level portion of the central mono original signal into the effect send, then, with moderate levels, it will result in a fine widening of the vocal sound. You might also use a light modulation of the delay times using an LFO – this will create even more authentic beats, and the widening will not be quite as static. The more extreme the choices with the differences in the delay times, the wider the sound will be. Of course, you should pay attention to the phase accuracy of the signal; for this reason, slight send levels, more than anything else, are to be recommended. 

Ping Pong Delay in Ableton Live

  • If you decide to use a vocal submix, you could also use different submix channels for the different vocal track types. All exclusive lead vocal track types can be merged on a lead vox submix, all harmony and backing vocals on a backing vox submix, and finally all ad-libs on an ad-lib vox submix. However, using  routing this complex only makes sense if you have a lot of individual vocal tracks to manage in your mix.
     
  • For a standard effects tracks configuration, you should create at least 1-2 stereo reverb channels and 1-2 stereo delay channels, and make them available using the send architecture of the DAW. Reverb 1 could be a very short, basic three-dimensionality comprised almost entirely of early reflections, and reverb 2 could be a somewhat longer plate reverb which is matched to the pace of the song and the atmosphere (approx. 1.2-1.4 seconds is a good output value). Delay 1 and Delay 2 each offer a delay that is timed to the song tempo at 1/8 and 1/4, with initially small feedback values which can be more accurately readjusted at a later time.
     
  • Additional standard effects tracks, each with a flanger and a short L/R Delay (left side about 15 ms, right side about 25-40 ms) could also be created and made available; whether or not they are ultimately needed would then be seen during the mix. The quietly mixed delay helps to make the vocal signal a bit wider, and it is especially well suited for refrains. A flanger, when very discreetly used, gives the vocal signal a subtle dynamic sound change and movement. 
     

Doubling – Rich Vocals Through “Real” Duplications

A quite popular and at the same time obvious technique for making lead vocals more powerful and present, is to double the main voice. As with guitar recordings, where distorted rhythm guitars in particular are recorded several times as congruently as possible, the method can also work very well on vocal tracks. With the almost identical double track, there are minimal timing and pitch fluctuations, which create a kind of natural chorus effect.

If both (or more) signals are distributed a little in the stereo panorama, you will receive a powerful and wide sound. In the case of music styles with which the singing is at the foreground of the sound image, in particular, doubling the whole take, or at least individual important words or phrases, is an absolute must and contributes to the genre-typical vocal sound. In rock/pop music and related music styles, as well, doubling is almost a standard recording technique, which is employed in the convincing sound design of refrains in particular. A vocal track used in a refrain can be given the required lift with doubling (or even tripling or quadrupling); in addition, it, together with the textual statement, gains assertiveness and significance.

Doubling of vocal recordings in Cubase

Doubling of vocal recordings in Logic

Doubling of vocal recordings in Pro Tools

To correctly double vocal tracks, in most cases the same singer should also sing a take which is as congruent as possible to the main track. Of course, there are also techniques and situations in which it makes sense to have the duplications carried out by another voice.

However, you will normally get the best results if the same singer sings once again in the same pitch and voice “colouring”. The important thing here is that the singer be able to reproduce the phrasing, pitches and timing of the original take as well as possible. Small fluctuations are allowed to a certain extent and even desired; however, it becomes difficult as soon as the timing has actually become something else audibly or the phrasing is not original. With this, duplications no longer offer psychoacoustic support; rather, they are perceived as independent vocal voices, which, in the case of the listener, will only lead to more confusion and irritation in the context of a “powerful sound”.

When it comes to duplication, special attention is required not just with as-precise-as-possible copying of the original take (i.e. pitch and timing) but also with careful handling of the conspicuous consonants, e.g. “S”, “T”, “K”, “P” or “B”. Even for very experienced studio singers, it is almost impossible to sing the consonants exactly, and congruently one after other, and this can result in strange hissing effects and double transients. When both vocal tracks are played, they stand out as separate tracks – an effect you certainly want to avoid.

A common solution for this problem is to ask the singer to avoid the sharp consonants in the duplication or at least significantly attenuate the level of them. Of course, this will not come so easily to the singer – it takes quite a bit of experience to be able to sing those sentences which sound so strange. One alternative is to either cut out or hide the disruptive double consonants in question in the editing stage. This is mostly connected with extensive editing work; however, very useful results from doubled vocal performances can be achieved this way.

Decrease of the sibilants in the doubled tracks (by automation)

Additional Tips and Tricks

  • With a slight variation of the voice “colouring” used for the duplications, one can create a fine contrast to the lead voice. In many cases, it is advisable to sing the duplications with somewhat less character, i.e. in a more “neutral” way. After all, on a sound level, duplications should compete as little as possible with the main voice.
  • The voice colour variations represent an opportunity to achieve an interesting contrast to the lead vocal voice. The pitch/register is an additional parameter which can be used effectively. This leads to many sound combination possibilities, starting from upper or lower octaves to breathed or whispered duplications. Interesting sound results can be achieved with the help of the latter two things in particular – of course; basically, everything is allowed, and this helps the achievement of the desired sound vision.
  • In music genres in which singing is the dominant element (hip-hop, rap, R&B etc.), in particular, so-called “shouting” is recommended when doing duplications of individual words or phrases, for the sake of giving them more expression and weight. These duplications are usually not deliberately sung/spoken with precision; rather, they are deliberately articulated and phrased in a more aggressive and extensive manner. Thus, they become conspicuously massive and “rough”, and therefore, the duplicates receive something outside of rock/pop music; they receive an independent character and help one to define the recognition value of the refrain or the textual statement. 

Audio Alignment

At this time, there are very useful tools, which – mostly in the form of plugins – simplify the required editing and the alignment of the duplicate takes and, in part, even automate it. One of these specialised plugins is VocAlign from SynchroArts. This tool compares the amplitude swings of the individual syllables and phrases (of both the original and the double-track) and corrects them with time-stretch algorithms or automatically set cuts and fades. Cubase has – since version 10 – also mastered this function with “Audio Alignment”. The result is – subject to having enough usable output takes – quite exact and synchronous duplications, which are suitable both for thickening of lead vocals and for creating compact choir arrangements. 

SynchroArts VocAlign Pro

Audio Alignment dialog in Cubase 10

Sound Example 5

Special Cases in Vocal Production

We have already looked at which techniques one can use to record most forms of singing, but a “clean” singing take is not always what is wanted, and there are definitely vocal performances which certainly have a musical element but cannot be counted among those using classical singing techniques. In what follows, we would like to look at a couple of these individual cases and discuss the unique features which therefore come into play and should be considered during recording/mixing.  

Vocals in Hip-Hop and Rap

Chant-oriented music styles have developed into very influential genres over the past decades. In the case of hip-hop and rap, rapped vocals stand in the foreground very clearly as they define and shape the music. Given the commercial success of these genres in recent years, this principle has also been reflected in other styles of music. As a result, rapped vocals can be repeatedly found in many styles of today’s black music, R&B and nu-soul, but they can also be found in indie rock, EDM and nu-metal. In addition, there are naturally also many special kinds of rap in modern hip-hop and pop derivatives, such as trap, emo or cloud rap. This type of singing has long since developed beyond the narrow genre boundaries of hip-hop music and is now indispensable as an independent element of modern pop music.

When it comes to recording singing, first and foremost, the same basic principles apply as those applied with the recording of conventional sung vocals. Nevertheless, spoken vocal content automatically becomes more clear and immediate at the centre of the listener’s attention, which is why a number of things need special consideration and observation during the recording and editing of these performances.

Presence, Clarity and Assertiveness of Rap Vocals

If, when dealing with sung vocals, one often strives to “embed” them (in the truest sense of the word) in the overall sound structure of the mix and the arrangement of the other instruments as harmoniously and elegantly as possible, then the opposite is usually true in the case of spoken vocals. Vocals in hip-hop/rap should, first and foremost, be clear and distinct, and easily understandable, with a presence in the foreground of the mix. Rap music is usually characterised by an overly present “in your face” character in the vocals, which also has much to do with the underlying aggressiveness and energy of its artistic performances, and, in many cases, with the textual statement as well. The vocals, therefore, should have an unrivalled presence at the foreground of all musical events – the accompanying arrangements are often very strongly reduced, with an absence of showy and grandiose individual instrumental performances, in order not to distract from the textual statement of the vocal performance. 

All recording preparation and subsequent editing should be subordinated to the goal of guaranteeing a strong concentration on the spoken vocals and the textual content:

  • When selecting a microphone and EQ-ing for rap vocals, the classic focus is primarily on the presence area and the highs, because this is where the intelligibility of the speech and the perceived proximity to the listener are represented. Consequently, in rap recordings, mostly rather bright condenser microphones are used, and the height range is also often raised by another few dB. The fundamental range is mostly maintained in proportion to this. Naturally different genres, and even speech, will differ from each other a bit here – for example, dynamic microphones are more frequently used in America than, for example, in Germany, and with genres such as trap or cloud rap, speech intelligibility is also often seen as really not essential any more; which is why a somewhat duller and muddier vocal sound can definitely work with these styles.
  • Even during the recording stage, you should keep the voice is as dry and direct as possible, without much space. There is a very simple reason for this. Spoken vocal lines have a significantly larger number of words than sung ones. If you use long-lasting room ambience or even reverb here, words – and, along with this, the textual statement and, last but not least, the energetic “punch” and aggressiveness of the performance, would inevitably and literally sink into the reverb. Words or parts of words would be concealed; the entire expressiveness and ultimately, the decisive style-defining characteristic of the genre would suffer if you were to work with too much space. For this reason, an atmosphere of a (preferably small) recording area which is as dry as possible – ideally acoustically optimised in a truly “compact” way – is a recommended spatial environment when recording rap/hip-hop vocals. A good mix of absorbed space and high sound diffusivity will also deliver very good results here.
  • The subsequent use of effects is rather limited in classic hip-hop genres. Reverb here is mostly taboowhen you are essentially working with short and subtle delays or ambiences, to give the speaking voice sufficient dimension and liveliness. However, often the recorded surround sound of the recording room alone is sufficient to ensure the realisation of this characteristic.
  • In some more modern stylistics, such as cloud rap, emo rap or trap, on the other hand, effects are sometimes employed very unsparingly – in addition to the excessive use of autotuning, very clear reverberation rooms, delays, vocoders and distortions are also often used in this regard. On the one hand, in these styles, speech intelligibility is often no longer as crucial as it was in the more classic hip-hop genres; on the other hand, relatedly, there is usually more time between the text lines, allowing the reverb and echo tails to develop quite well without having to compete with the text all that much.
  • Naturally, compression also plays a major role with vocals in this genre; however, the approaches are not fundamentally different from those for sung vocals. However, in the case of rap vocals, it is mostly a somewhat clearer and more audible compression effect that is used, to give the voice even more presence in the foreground. The time parameters of the compressors tend to be set somewhat lower, thus allowing for even short transients in the speech to be kept under control. But here, too, excessive compression should not be the norm. Pressure, punch and aggression in the vocals come mostly from the performance of the rapper – these things are not achieved artificially through extreme compression.

Treatment of Ad-libs, Shouts and Doublings

Alongside the important main raps in the singing performance, which can be compared with the lead vocals of the singing performance, there are two more important stylistic elements in hip-hop/rap:

  • Duplications are, as previously indicated, particularly crucial in this genre. With their help, vital textual statements and emotions are impressively reinforced. Sometimes even whole song parts (refrains, hooks) are completely duplicated; in other cases, only certain individual phrases or words. Experienced rappers are very accomplished at performing duplications as precisely as possible, and one should also give “real” duplications absolute preference over artificial duplication of the lead track. Duplications which are deliberately performed in a different sound/voice can be particularly effective. For complete duplications of an entire refrain, one can also use duplicated versions of the parts rapped a whole octave higher or lower. The technique of breathing/whispering a duplication as much as possible is equally widespread. Strongly compressed and mixed under the actual lead, these whisper duplications promote the main singing element, giving it a particularly powerful and assertive sound.
  • Just as important as duplications are so-called ad-libs or shouts, which are called out, “groaned” or mumbled in many extra tracks in the remaining gaps of the lead rap (“Yeah”, “Aha”, “C’mon”, “What?” etc.). These short phrases are supposed to stoke the rapper and maintain the flow of the overall performance; they bridge gaps in order avoid breaks, and can intensify the whole atmosphere and message of the performance making it all the more expressive. In this respect, these short insertions are not just nice additions to the actual vocal performance they are an essential component of hip-hop culture and an indispensable singing element. In terms of sound technology, these ad-libs are often alienated at the level of sound, to differentiate them from the actual rap vocals. Alongside the somewhat old-fashioned telephone effect, all types of distortion/bit crushers and other distorting sound destroyers are of course suitable for this. However, often, just the selection of a dynamic microphone or a matching EQ can deliver the desired effect.

Beatboxing

Beatboxing is a very independent art, which involves using the human speech apparatus as a musical instrument. Culturally, beatboxing is closely tied to hip-hop culture – but forerunners or related fare can also be found in scat singing, and partly in blues or jazz and even in pop music – one of Michael Jackson’s trademarks, for example, was embellishing his vocal performances with percussive sounds. It was not until the 80s; however, that beatboxing as an independent art form came about – to this day it is done competitively in battles and competitions and is used in many different music styles.

Beatboxing is about generating percussive sounds with the help of respiratory and vocal apparatus; sounds which are supposed to resemble those of a drum kit. On a musical level, the beatbox – when pulled off and mixed properly – can also pass as a perfect substitute for classic drum sounds. Beatbox elements are even more common as a sound addition, in order to give the rhythm section a bit more of human touch.

Incidentally, the term beatbox derives from a slang term for drum computer. However, beatboxers of course sometimes also use their vocal cords and use resonances of the mouth and throat area to create tonal sounds. Basslines and synthesiser sounds are imitated in this way, and naturally, speech or singing inserts are also allowed in a beatbox performance.

Miking

Since beatboxing is not about capturing the voice as sound-neutral or as natural as possible, but rather about noises which (in part) require relatively strong acoustic manipulation in order to generate the desired sound, with many beatboxing techniques the use of the microphone is a part of the performance; on a sound level, it is indispensable. For this reason, it is important for the sound engineer to know that many beatboxers use the microphone specifically to generate or amplify certain sounds.

In almost every other case, a vocalist will provoke the ire of any sound engineer if they wrap their hand around the hilt of their handheld microphone. This leads to violent resonances in the closed hand, and backward sound cannot enter the microphone capsule, which adversely affects the directional characteristics of the microphone. However, what is normally considered a no-go, is often specifically employed in beatboxing to generate a powerful and resonating sound.

In addition, in beatboxing, microphones are often held not only at the mouth – sometimes they are also held on the cheek, the larynx or the nose – depending on which sound characteristic or resonance is to be emphasised.

Since most singing techniques work fully independently of any kind of miking, while  recording the sound engineer can place any microphone anywhere where they will not disrupt the performance, and capture a good signal. In the case of beatbox recordings, however, this approach will not lead to satisfactory results in most cases.

In order that the beatboxer may be able to integrate the microphone as part of their performance, some typical approaches to vocal recording must be discarded here. The following key points should be noted:

  • Beatboxers must always be able to hold their microphone in their hand, so that they can freely select the vicinity and placement!
     
  • Considering the purpose, the otherwise popular large-diaphragm capacitors are out of the question with beatbox recordings!
     
  • Dynamic handheld microphones with integrated pop protection are available! The classic: the Shure SM 58.

Processing

Beatboxing has a completely different function in a piece of music from that of singing or even rap; for this reason ,it normally also requires a different approach in the post-editing stage. A few leads:

  • Given the high proportion of percussive sounds in the close-up range of the microphone, sometimes one can expect unusually high levels. With this, during levelling, sufficient headroomshould be planned.
     
  • Since the powerful, bass drum-like sounds feature tonic keynotes far below the regular vocal range, it is necessary to be careful even if  you’re only using lowcuts. In these cases, beatbox samples are better without an impact sound filter, and even in the post-editing stage, the lowcut should be positioned carefully.
     
  • In beatboxing, breathing, smacking and similar noises are also used as sound effects. Some of these noises are relatively quiet and need to be raised with the help of relatively strong compression with short release times.
     
  • Sounds from the mouth and throat area are often not as percussive as drums; it may be necessary to do a bit more work with the transients in order to generate the desired “gaudy” sound. Thus, during compression, the attack time should not be too short, so as not to lower the transients. Additionally, transient designers (for example) may be used to give a beatbox performance that little something extra.
     
  • When using creative modulation effects, delays, vocoder, reverb, distortions and the like, creativity in a beatbox performance is in no way limited. It is, then, by no means the naturalness of the signal that is in the foreground, but rather the end justifies the means. However, as in normal vocal recording, it is more advisable to record the performance dry and add effects only afterwards, so as to avoid any unwanted results in the mix stemming from previously-used effects.

Beatbox and Loop Stations

Many beatboxers also work with a loop station and, if appropriate, additional effect pedals, so that their live performances can be accompanied by singing, rap or other percussive elements. One well-known representative who uses this approach with beatboxing and singing is the Australian Dub FX. In addition to the human voice, one can of course also frequently use other instruments, and these days many loop musicians also rely on a computer-based workflow, for example, with the help of Ableton Live.

Here the focus clearly lies on live performance, and often a loop station artist is regarded as a one-man-band. During recording, in such cases, one should generally also be prepared to do some live work in order to maintain this sound. Since obtaining cuts from different takes is often difficult to impossible because one loop accompaniment usually sounds different from other takes, it is common to just let the artist perform live in full and simply select the best take from many.

Different from a conventional recording situation, for loop artists the use of effect devices is also usually just part of the performance; thus, it is also advisable to record the performance “as a whole”, that is, including the effects. However, depending on the setup, you can use a signal splitter at strategically sensible points in the signal chain (e.g. in front of an effects pedal), so that, during post-production, the effects-free version can be accessed if necessary. 

  • Background voices (duplications, backing vocals, ad-libs) are usually somewhat more dully mixed than lead vocals so that they are more clearly separated from each other and do not compete. In addition, background voices tend to be somewhat more strongly compressed and contain a bit more space. During the recording, it can make sense to select a duller microphone and/or a somewhat greater microphone distance for background voices.

Growling, Grunting, Screaming & co.

In this chapter, we will summarise some different particular forms of singing that are found in pop music – in death and black metal in particular, but also, in part, in other sub-classifications of the metal genre, as well as in grindcore and in the industrial range. One thing common to all of them is that the human voice is not sung or spoken “cleanly”; rather, it is distorted or manipulated through the conscious formation of the larynx. However, the origins of larynx techniques in sound generation are clearly older than all known genres; to this day they can still be found again and again in the folk music of many different cultures (including often in a shamanic context). Throat singing techniques are known by, for example, the Inuits, the Sami, the Mongols, and the Tibetans, but such singing techniques can also be found among the Xhosa in South Africa, or in alpine yodelling.

Larynx Singing

While it is mostly the vocal cords that are responsible for sound formation in speaking and singing, in guttural singing the so-called vestibular convolutions above the actual vocal cords are used. By narrowing the larynx, these false vocal convolutions end up vibrating, and a strange, low-tone noise arises – it can be willingly formed in words with the formation of sounds in the oral cavity.

Guttural singing techniques require a bit of practice before they can be used in a controlled way. In addition, when larynx singing is attempted unprofessionally, there is a not-inconsiderable risk! Anyone who wants to learn, for example, growling or screaming, should definitely seek a professional teacher who specialises in such singing techniques, or else as a result of incorrect larynx singing the vocal cords can become permanently damaged!

Recording Technology

When recording growls, screams, shouts, grunts, squeals and the like, there are not too many special features to consider; nevertheless, not every approach in traditional vocal production will achieve the desired result. There follows a couple of aspects which one should observe during production:

  • Given the sometimes high sound pressure and the sound “colouring” in this genre that is fully desired for this purpose, you should obtain a dynamic moving coil microphone for recording these singing techniques! In this context, the Shure SM7B, in particular, is very popular – it has an extremely detailed and balanced design for the conditions of a dynamic microphone.
     
  • With screaming, growling and grunting in particular, an extremely powerful and solid sound is desired. With this, it makes sense to take advantage of the microphone’s proximity effect, to achieve an increase in the depth and give a close impression.
     
  • With larynx singing, the keynote is often significantly lower than it is in the case of a traditional singing performance. With this, the low-cut should always be handled with care, so that these fundamental sounds are not affected.

The following sound examples show quite clearly how great the influence of the microphone and the recording distance is. Let us start by listening to the SM7B and comparing two different distances during a recording: the first recording was made with approx. 2 cm distance; the second was increased to approximately 15 cm. The sound-distance result of the close-up effect is conspicuously audible in the bass range:

Sound Example 16

Growling_SM7B_close

Audio Player

00:00 | 00:08Use Up/Down Arrow keys to increase or decrease volume.

Growling_SM7B_far

Audio Player

00:00 | 00:08Use Up/Down Arrow keys to increase or decrease volume.

Comparison of the frequency curves with different recording distance: 2 cm (green) and approx. 15 cm (red)


For the sake of comparison, we did the same test setup using a second microphone that is also very popular: the Neumann U87 AI, a large-diaphragm condenser microphone which is very popular worldwide, especially with high-resolution voice recordings and singing voices.

Growling_U87_close

Audio Player

00:00 | 00:08Use Up/Down Arrow keys to increase or decrease volume.

Growling_U87_far

Audio Player

00:00 | 00:08Use Up/Down Arrow keys to increase or decrease volume.

As can be clearly seen, the more detailed and significantly lighter recording with the U87 in this context is rather unsuitable in comparison with the dynamic version. The frequency response does not fully meet the expectations expected in a growling recording, and quiet mouth sounds and breathing sounds are also louder in the foreground. What is particularly striking is that, with increasing distance, the depth pressure decreases even more strongly than with the comparable recording with the SM7B.

What, however, makes rather thin and depressurised recordings in this recording situation – and even captures unwanted sounds in part – may naturally be completely desired in other cases. Once again, this clearly shows one thing: there is no “one size fits all” microphone for all cases; rather, the decision must be individually made appropriate to the desired sound impression.

Comparison of the frequency curves in the proximity range: SM7B (green) and U87 (red)

Choral Vocals

Recording choirs is significantly different from recording individual voices. This is due mainly to the fact that, in a choral situation, it is not the voice of an individual singer that is in focus; rather, many singers together form a large singing ensemble, and it should be depicted as such.

Choral Lineups

Most choral pieces are written for four voices, meaning that there are four separate musical voices, and the respective bits are to be sung jointly by multiple singers. The four voices, from highest to lowest, are soprano, alto, tenor and bass. The situation being what it is, for each voice there will be a smaller or larger group of people in the room. The four different voice groups can, in turn, be set up in different ways. This spatial distribution is a decisive factor with regard to the resulting sound and recording options.

As with any larger, acoustic sound source, a main microphone forms the basis for recording a choir. This is usually located at the conductor’s desk or a few meters behind it and is often using a classic stereophonic method such as the AB, XY or ORTF setup. In exceptional cases, however, more unusual variants such as Blumlein or dummy head systems can be found. When producing for surround sound (e.g. in film music), multi-channel methods are often used. In any case, the goal of main miking is to pick up the entire ensemble as naturally and completely as possible.

In addition to the main miking, spot microphones are used where necessary and possible, so that individual voices can be brought out or individually processed in the mix.

The main choral lineups are as follows:


Choral lineup variant 1: (from left to right) soprano, alto, tenor, bass

This widespread lineup essentially resembles the classic American string lineup in an orchestra or a string quartet. The highest (and, in most cases, melodic) voice is on the far left, followed by next-lower voices; the bass, the deepest voice, is, accordingly, the farthest one on the right.

This lineup has the technical advantage that the individual voices can be very easily recorded with the use of support microphones. This can be applied with an individual condenser microphone with cardioid characteristics, or with a compilation of subjects, whereby each voice receives its own stereophony. The supports are distributed according to the panoramic spatial layout.

Distribution of the voices from left to right, however, will result in a slightly uneven frequency distribution. Such a condition is not desirable in a music mix. For one thing, consumers sometimes listen to music only with one speaker/earphone or are close to one of the two speakers; this means they “miss” either the bass or the melody voice. Also, an unequal energy distribution will mean an unbalanced utilisation of the technical devices, which can lead to difficulties in the mastering process.

 Choral lineup variant 2: tenor and bass behind

The second choral lineup variant can be frequently found with, for example, mixed men’s and boys’ choirs, where the men, with their deeper voices, are in the back rows and at the back, they sing over the heads of the boys.
This two-row lineup has the advantage that the energy is distributed somewhat more evenly in the panorama.

However, here the voices can no longer be miked so easily with spot microphones – for example, at the soprano spot mic, a certain portion of the tenors will also inevitably be heard if the microphone is positioned at the front. One good option in such a case is to position the microphone above: using high tripods, trusses and the like, spot microphones can be lowered from above into the centre of any singing group. If small membrane capacitors are used, often these can even simply be hung on the cable.

Choral lineup variant 3: tenor and bass in the centre

In this third option, only bass and tenor are set up in two rows, leaving the deepest voices in the centre. This also results in relatively uniform energy distribution in the panorama. Soprano and alto are antiphonic, and it is easier to perceive them separately from one another given the spatial distance. This lineup has parallels to a German symphony orchestra lineup, and it is especially suited for pieces in which soprano and alto are interrelated.

Soprano and alto voices can be recorded from the forefront relatively well using spot microphones. Bass and tenor can be supported either together and from the front or individually and from above.

There are also choral formations in which the four singing groups are mixed. Normally in these circumstances the voices alternate, so that no soprano stands next to a second soprano, no tenor stands next to another tenor and so on. On a sound level, this has the advantage that the voices merge with each other, thus leading to a more homogenous sound image. On the one hand, this suits the sound engineer, because the panoramic frequencies are evenly distributed, and the sound image in the mix does not “tilt” to one side (highs or lows). On the other hand, with a mixed setup, it is not feasible to place spot mics for the individual voices. Thus, a lot more attention is required to achieve the desired sound in the main microphone, since separate processing of the individual voices in the mix is not possible.

Of course, there is also choral literature in which more than four voices are sung. If necessary, a group of voices can then be divided into voice types; there may be resulting “voices within voices”, for example, mezzo-soprano (between soprano and alto) or baritone (between tenor and bass). The selected lineup depends on the literature and the desired audio image.

Those in charge of the recording should always, prior to recording, determine which voices are set and which lineup has been selected. Not only does this facilitate communication, but it is also, more than anything else, relevant to the lineup of the microphones and the selection of the acoustic measures. Normally those in charge of the recording will have a copy of the score in hand prior to the start of the recording session, and the choir director will be consulted beforehand, to get the best result.

Acoustics in Choral Recording

Just like with every larger sound ensemble, a sufficiently spacious environment is not just important for accommodating the large number of people. Above all, a large and high space is also needed so that the choir can properly develop acoustically. Just like with a classic orchestra, the space is an important part of the desired sound character. The space and the sound character of a choral recording, however, naturally also depend on the musical context.

Classical Music

In classical choral recordings, the aim is usually a large, comparatively reverberant and ambient sound. The different singing voices should blend together. Accordingly, a particularly large space with a high ceiling and a long reverberation time is best suited for this. In many cases, good-sounding church vaults are the right acoustic environment; naturally, these are especially suited for sacred music in particular. For more secular works with a large chorus, the best choice would be concert buildings which normally combine a long reverberation time with more controlled acoustics. Room microphones are a good choice in addition to the main mics, for capturing the character of the space and can be added to the mix as needed.

Jazz/Gospel

Gospel choirs, too, have a sacred background; however, with gospel choirs, the sound image is usually not as large and reverberant as is the case with classical music. Instead of large stone cathedrals, somewhat smaller churches with a large wood interior (for example) are more suited for this. In this musical context, choirs usually have more emotion, and sometimes there will also be dancing and clapping during the performance. This should be considered and, if appropriate, discussed with the choir director prior to the recording. Light footwear and airy cotton clothing can, for example, reduce disruptive noise during dancing or swaying.

In gospel music, there is normally a lead voice which – in the African-American tradition of the “Master of Ceremony” – initially sings a line which is then repeated or answered by the choir. This soloist naturally has a prominent position and must be microphoned individually.

Choirs in Pop/Rock

If, in pop and rock music, there is work with done with choirs, then said choirs tend to play mostly subordinate roles; they are often expected to give the band arrangement and the lead singer(s) a certain level of sound “lubrication” and ensure a greater sound image. In such circumstances, a larger recording room in a recording studio – one which is rather dry compared to a concert hall or even a church – is mostly used. Nevertheless, to avoid inconvenient echoes on nearby walls, here too, spaces with a floor space of at least approx. 100 m² and a ceiling height of more than 4-5 metres are advisable. With this, a certain amount of space is also important here, but additional reverberation is usually artificially added.

A Cappella Band

An a cappella band is not a choir in the proper sense of the word. They are a group of singers, but not one with any intention of merging as a unified sound ensemble. Rather, an a cappella band must also actually be understood as a band in which the individual members take on equally weighted and unique functions in the piece. This also applies to smaller vocal ensembles in which every singer gets their own voice, just like in an American barbershop quartet. In this case, a single microphone arrangement is the best choice, so that one can have sufficient control over all of the voices. This applies in particular if other instruments are to be imitated with the use of a voice. In a studio, overdub productions are preferred, for the purpose of enabling maximum separation and control.

In the case of popular choirs, shanty choirs, singing clubs, etc., depending on the repertoire, the somewhat more controlled sound conditions which one would also choose for a jazz or pop choir, also mostly apply. If more space is wanted, this will normally be achieved with artificial reverberation spaces.

If a stronger acoustic separation of different vocal groups is desired, this can be achieved with a change to the lineup. Often, a simple solution is simply to place the groups a bit further apart from each other. If complete separation is desired, then an overdub recording is best. For the purpose of better control in the mix, especially in the context of popular music, mixed choirs are often recorded separately by gender. However, a separation based on vocal groups can also naturally be made. The overdub process always has the advantage that more attention can be paid to the performance of the individual. In this, a more precise representation is normally possible. However, an overdub recording is also, naturally, considerably more time-consuming, and, above all, the sound results with an electrically or digitally summed up choir recording in parts will never be as natural and organic as a real lineup in a big space. However, this can, for example, be counteracted a little, by spreading the vocal groups in the space as they would be in a joint performance. Whereas one would hardly be likely to decide on a shared choir in the case of a classical or jazz recording, this technique is more commonly found in the rock-pop genre.

Soloists

In many situations, it is not only a choir (as a joint sound ensemble) that sings; instead, individual soloists fulfilling a solo function are employed. This may be an individual singer who interacts with the choir or who is accompanied by the choir. However, it may also be individual choir members who will step out from the choir ensemble for specific passages (usually also physically) and who temporarily take on a lead voice.

The soloist should always be given their own support microphone, which allows for better control of this especially important voice in a later mix. Even if the soloist has a well-trained singing voice (in the classical context) which requires no technical support, an independent signal is still worthwhile when recording in order to later be able to mix the choir and soloist as independently from one another as possible.

If the solo is being recorded at the same time, it can be spatially separated, in order to avoid crosstalk between the microphones. You can use a separate recording room (best if within line of sight) – with the use of partitions/gobos, or you can create an acoustic separation in the same space, or crosstalk can be minimised through appropriate spatial arrangement combined with suitable directional characteristics. A feasible example of the latter would be to use a main directional microphone for the choir and position the soloist behind it.

Exactly which type of lineup and acoustic separation is selected, will naturally depend on the spatial and technical conditions and the desired sound image. In any case, it makes sense to think of the mix, and to direct the sound image of the space in the right direction, even during the recording stage.

If it is a matter of temporary soloists who emerge from the choir for a solo passage, then they should also be given their own microphone, which, in the mix, will typically also be used only for the corresponding passage.

Monitoring

In the case of a choir, monitoring can be quite a challenge. As a matter of principle, individual headphones for every choir member – especially considering the high cabling requirement – should only be used if it is necessary on a musical and technical level or is at least sensible.

Without Headphones

A choir recording in its classical sense is normally a common live performance which, at the same time, takes place in one room. In such a case, headphones are not only unnecessary, but they are also even capable of being disruptive to the performance – ultimately, all choir members want to be able to hear the voices of their colleagues, and any accompanying instruments, just as they are accustomed to hearing them in a live performance or rehearsal. Given this, all that is really required is a communication path from the control room to the recording room for talkback. A single speaker in the recording room is fully sufficient for this so that those in charge of recording can communicate with the choir between the takes. Alternatively, or in addition to this, a single headphone path can be established for the conductor/choir director. The technical and musical directors can communicate with each other in this way without having to involve the entire choir. 

With Headphones

If a choir is recorded for an overdub production, then, naturally, each choir member must also hear the remaining instruments during the singing. This is important not only for the timing but also – and more than anything else – for the intonation! Since the human voice allows for free intonation, it is of enormous importance that singers simply hear a reference – for example, a piano accompaniment – so that they can adjust their own pitch to it.

In this context, it, therefore, makes sense to give every individual choir member their own headphones. Of course, this poses a certain challenge for the sound engineer, since it requires many headphones along with a correspondingly high number of headphone amplifiers. Fortunately, the different choral singers normally do not require a separate mix, which means work can be done with a single stereo headphone mix, which is then distributed to each singer through headphone amplifiers. Singers often find it useful to wear the headphones on one side only, so that they can better hear the rest of the choir and their own voice in the room.  Some studios have special headphones for this purpose, with an earpiece on only one side. A standard stereo headphone can, however, also be used by sliding one side of the headphone off of one ear. 

In certain circumstances the choir director may be granted an independent headphone mix, to allow for direct communication between director and engineer and/or to give the director other volume ratios or, for example, a metronome click on the headphones.

By the way, it is not necessary to give each choir member a high-priced pair of studio headphones. A slightly cheaper, durable and lightweight model is normally sufficient. For psychological reasons, and in the interest of not slowing down the session unnecessarily, it actually makes sense to be able to offer 40-50 headphones of the same model for choral recordings. With this arrangement, no-one feels disadvantaged, and there are also no delays resulting from different preferences. However, it makes sense to equip soloists, conductors, producers, arrangers and of course sound engineers with better headphones, since these individuals require a more detailed playback considering their unique role.

Since the wiring requires a fair bit of time, the exact number of singers should be queried in advance, and the lineup should be accordingly planned so that all the cabling can be prepared before the choir comes into the studio.

With Loudspeakers

As already mentioned, in a choir only production, it is possible to work with one simple speaker in the recording room which is used only for the purpose of communication between the takes. Of course, the speaker must be switched off during recording, and under no circumstances should choir microphones be played back via the loudspeakers, in order to avoid unnatural discolouration and feedback.

There are also methods in which choir monitoring can be achieved with the help of speakers. However, in such cases, it is necessary to ensure that as little sound from the speakers as possible is recorded by the microphones. Since, however, a certain level of crosstalk is always inevitable, and the sound quality of the recording ultimately suffers as a result of it, these should be considered “emergency solutions” which are only to be applied if an individual monitoring solution is not technically feasible.

If you must use loudspeakers for monitoring, you should, above all else, take advantage of the directional characteristic of the microphones. For example, one possible solution is to install loudspeakers on the ceiling (for example, on a traverse or the like) and provide monitoring for the choir from above. The microphones are then also suspended from above. If you are using directional microphones with cardioid features, a relatively clean choir signal can be recorded. With this setup, however, the main microphone, or even a room microphone, would contain a great deal of crosstalk from the speakers and, as such, probably be useless. Naturally, placing the loudspeakers in front of the choir is also conceivable, if miking with cardioid features is also to be done from the front. One great advantage with sounding from above, however, is that the choir itself absorbs the sound energy from the loudspeaker quite efficiently before it is reflected in the direction of the microphones again. Ultimately, the sound waves need to travel longitudinally between people to the ground and then take the same route back. The following graph does an exemplary job showing this setup:

Choir recording with monitoring and miking from above

This technique is frequently employed in the events’ sector since it combines the advantage of a free stage with a relatively low cable demand. During live performances, one can also assume that the choir itself is already relatively indirectly filling the space with sound; meaning that the main microphone and/or a room microphone may, depending on the circumstances, be unnecessary, and direct miking of the different voices would be sufficient for the live mix. For a studio recording, on the other hand, this would be a rather inadequate solution, as the main microphone and room microphone arrangement is hardly feasible in this kind of setup without there being substantial crosstalk. 

Leave a comment

Design a site like this with WordPress.com
Get started