• i

  • launchpad

  • Phantom


    • Ghazal: Floaters und Augenverlust
    • GV: eine Frau versteckt ihre Periode ein paar Jahre, wird dann aber enttarnt und genitalverstümmelt; die Verstümmelung führt zu einer starken Infektion und sie verliert ihre Gebärmutter, dann Phantomschmerz
    • Haare abschneiden: 
      Eine Frau ist im Gefängnis inhaftiert. Mit der Zeit versucht sie durchzusetzen, einen längeren Ausgang zu bekommen und argumentiert dafür, dass ihre Haare Luft bekommen sollen. 

    Mögliche Organ-/Körperteilverluste: Auge, Genital, Kind, Panzermine, Säure, Haare (Schande, Trauer, Protest)

    • Widerstand: zwischen Gleichgültigkeit, Wut, Hoffnung; zwischen Verletzlichkeit und Stärkung

    Witz & Humor: Wird mein Geist auch nur ein Bein haben? Warum ist das wichtig? Weil die Treppe zur Hölle hinabzusteigen mit einem Bein besonders beschwerlich ist. 

    1. Szene: 

    Bahar auf der Bühne, filmt ihre Umgebung, filmt ihre Körperteile, dann ihr Gesicht, ihre Augen

    Ich erinnere mich, an diese Langeweile. Als ich Kind war… Wenn ich nur mit mir sein musste, konnte ich das kaum aushalten.

    stellt Livekamera an und setzt sich ins Bild

    Es ist ein heißer Mittag, so wie jeden Tag macht meine Familie ihre Mittagsruhe. Die Fenster werden geschlossen, alle ziehen sich in ihre Ecken zurück und schlafen oder ruhen… es ist, als ob die Welt für eine kurze Weile einfach still steht und auf uns wartet. Ich fühle mich eingeengt in dieser Stille. Der Stillstand lähmt und beklemmt mich. Also suche ich nach etwas, das in Bewegung ist. Etwas, woran sich meine Augen festhalten können… Aber alles ruht. Selbst der kleine Plastiksoldat auf dem Tisch ruht.  
    …bis langsam das Augenspiel beginnt

    Wenn die Welt um mich herum still steht, bewege ich sie, damit sie mich wieder bewegen kann…

    Augenspiel mit Spielzeugsoldat (Augenzwinkern)

    Links, rechts, links, rechts… Soldat bewegt sich… 
    Video: Spielzeugsoldat seitlich, dreht sich und schießt, Kugeln kommen aus Gewehrlauf, werden zu Floatern.. Auge erscheint

    Musik… 

    1. Szene: Vorgeschichte Aktivismus 
      nimmt Hand von Auge

      Ich stehe auf der Straße und fühle mich lebendig. Der frische Herbstwind streift meinen Nacken, schlängelt sich an ihm vorbei und weist mir die Richtung. Ich setze mich in Bewegung und spüre mich. In Sorge, aber voller Hoffnung. Langsam steigt das Blut höher in mir.. warm pulsiert es in meinen Gliedern/Lidern. Und auch um mich pulsiert es. Der Rhythmus meines Pulses und der Straße gehen ein turbulentes Verhältnis miteinander ein: Sie rufen sich, antworten, machen sich bekannt, werden eins, entfernen sich wieder voneinander, warten aufeinander, formieren Neues miteinander, wenden sich anderen Takten zu und folgen früher oder später wieder ihrem eigenen Maß.
      Ich gehe immer weiter voran. Dann lasse ich mich wieder kurz zurückfallen, um den Überblick zu behalten. Egal ob ich in die Masse eintauche oder aus ihr heraus steche… Ich bin solange wir, wie es nötig ist. Wir streben weiter, durch kleine Gassen, über Brücken, an wichtigen und unwichtigen Häusern vorbei, überqueren Kreuzungen und steuern einen großen Platz an. Hier wabern wir wie eine unbändige Flamme. Während das Alte stirbt, kämpft das Neue darum, zur Welt zu kommen. Gesänge, Rufe, Schreie… Dann: die Flamme stiebt wie Funken auseinander. Sie verselbständigen sich, manche gehen zu Boden und verglühen schneller als meine Augen ihnen folgen können. Auch ich will die Glut retten, aber weiß noch nicht wie. Stehen bleiben oder weglaufen? Verstecken oder draufzugehen? Bevor ich mich entscheide, schaue ich noch einmal zurück… 
      Und das Letzte, was ich sehe, ist sein Lächeln.

    Video: Spielzeugsoldat dreht sich und schießt, Kugeln werden zu Floatern


    Übergang Floater:
    Soldat bewegt sich nicht mehr. 

    Seitdem mir ein Auge genommen wurde, sehe ich Dinge, die ich vorher nicht gesehen habe. Was mir der Blick raus in die Welt nicht mehr zeigt, ist heute ein Gewimmel in mir. 

    Wenn ich so in mich schaue, bin ich nur mit mir. Also ich meine bei mir.. nicht im Sinne von in mir ruhen, oder im Reinen mit mir sein oder so, nein.. einfach nur konfrontiert mit mir selbst sein. Eigentlich strebe ich nach dem Außen, will mit ihm sein, durch es werden, aber es gibt diese Zeit, in der ich nur auf mich zurückgeworfen bin. Diesen Zustand kannte ich damals schon; heute erlebe ich ihn in neuer Form… Der Abgleich mit mir selbst, bevor ich die Welt wieder auf mich wirken lasse. 

    Aber in letzter Zeit, wenn ich so mit mir war, schaute ich nochmal genauer hin und sah: etwas schaut mit mir…
    Erinnyen-Sound beginnt
    Trübe, unheimliche Gestalten.. in ausgewaschenen, beuligen Mänteln; manche ganz nackt und ausgemergelt… sie kriechen, kreuchen, kringeln sich, huschen ins Bild. Folgt ihr mir oder bin ich es, die euch scheucht!? …unsere Wege schneiden sich jederzeit. Ich kann euch nicht ausweichen und wenn ich euch erblicke, sehe ich, was ich nicht mehr erblicken kann. Ihr quält mich, sucht mich heim, klagt und krächzt und knurrt… 
    die Erinnyen knurren, seufzen, schreien und klagen

    Wart ihr schon immer da? Woher seit ihr gekommen? 

    Ihr bringt mir Zornesträume, wollt mich polieren, schleifen, abschaben… Warum wälzt ihr euch über meine Lider wie der Sturzbach über die Kiesel? Euer Haß überschwemmt mich, verschlägt mir den Atem, steigt wie Blut in meine Augen…

    die Erinnyen raunen 
    “Wir kratzen, piksen, stechen, stoßen… wir wollen zerren, beißen, reißen, bohren… Welche Liebe könnte uns so beglücken wie der Haß? Bss, bss, bss, bss, bss. Wir sind die starren Augen der Häuser. Das Brummen, das über deinem Kopf am Himmel fliegt. Das Rauschen des Waldes, Das Pfeifen, das Knacken, das Zischen, das Heulen. Wir sind die Nacht. Die dichte Nacht deiner Seele. Bss, bss, bss. Hejah! Hejah! Hejaha!

    schreit Hinfort!


    …und überall wirst du die Nacht auf deinem Kopf mit dir herumtragen.”

    Video: Floater Zoom-in

    Virginia: 

    Es ist ein ganz besonderer Tag. Ich weiß gar nicht genau weshalb… aber so wie heute ist es sonst nur, wenn jemand Geburtstag hat. Meine Familie ist sehr aufgeregt. Aus der Küche dampft der Geruch von frischem Kuchen. Ich soll im ganzen Haus bunte Blumen auf die Tische stellen. Nur im Flur zum Badezimmer habe ich keinen Zutritt. Im Wohnzimmer steht ein großer Haufen mit verpackten Kartons und Tüten und ich denke, dass doch jemand Geburtstag haben muss… meine Schwester? Jetzt wird die Aufregung noch größer… Immer mehr Verwandte kommen bepackt mit Geschenken an und freuen sich. Ich glaube meine Schwester ist glücklich. Dann fangen alle Frauen an zu tanzen und führen meine Schwester in einem Reigen aus dem Wohnzimmer… 

    Schrei

    30 Jahre später: Ich bin eine begnadete Puppenbauerin. Meine Arbeit besteht aus gestalten und restaurieren. Ich bin in der Lage, Figuren mit noch so großem Schaden wieder zum Leben zu erwecken. Ich kann ihnen ein neues Gesicht verleihen, wenn es gewünscht wird oder ihre alte Gestalt wieder herstellen. Messer, Nadel, Schere, Licht… 
    Egal, welches Körperteil abhanden gekommen oder lädiert ist, ich habe immer ein passendes Heilverfahren parat. Mich erfüllt es mit Freude, altes Material mit neuem zu verbinden, es zu modellieren, zu fixieren und ganz neue Möglichkeiten zu schaffen. Aber die Nebenwirkungen meiner Behandlung sind fast noch schöner. Immer steht ein Mensch hinter dem Material, für den es eine ganz besondere Bedeutung hat, dass es ihm erhalten bleibt. Ich schaffe die Möglichkeit, Verlust und Erfindung zu verbinden.

    Aber diese alte Puppe…

    Ich weiß nicht mehr wann und wo es war, aber irgendwann erzählte meine Schwester mir, was wirklich an dem besonderen Tag geschah. Die Geschichte war kurz, aber ihr unausgesprochener Epilog dauert bis heute an. Sie lautete wie folgt: Blumen, Pfirsich, Badezimmer, Geschenke, Messer, Kuchen. Ich verstand nicht viel, aber ich verstand, dass es schlimm für sie war. “Glaub mir, du möchtest das nicht verstehen. Wenn du es nie verstehst, haben wir alles richtig gemacht.” Und dann begann unser unsichtbarer Kampf. Und wir wurden Schwestern, wie wir es vorher noch nicht waren. Ich verstand, dass wir gleich sind, weil auch mich irgendwann dieser Tag erwarten sollte. Verschwestert aber wurden wir dadurch, dass wir gemeinsam dafür kämpften, dass mich dieser Tag nicht erwarten sollte. Unsere Komplizin war eine alte Puppe. Über Wochen höhlten wir sie langsam aus, füllten sie über Monate mit blutigen Lumpen und nutzten sie jahrelang als Umschlagplatz, bevor wir die krustigen Stoffe als Zeugen meines Blutflusses endgültig vergruben. Und so vergingen Jahre unserer Bemühungen. Lange glaubte meine Familie, dass ich spätreif bin. Vielleicht hatten sie einfach keine Ahnung von Hormonen oder vielleicht wohnte auch in ihnen eine Spur von Hoffnung. Doch wir konnten noch so sauber arbeiten, irgendwann wurde die Zeit unsere Verräterin. 
    Und so kam der Tag. Blumen, Pfirsich, Badezimmer, Geschenke, Messer, Kuchen. So wurde mir meine Klitoris genommen.
    Weil ich die Zeit nicht strafen konnte, musste unsere ehemalige Komplizin herhalten.
    große Puppe auf Video verliert ein Auge 
    Viele Jahre später fand ich die Puppe wieder. Ich war damals so wütend auf sie. Wenn ich sie anschaute, sah ich die Verkörperung meiner enttäuschten Hoffnungen. Aber als ich sie diesmal in den Händen hielt, fühlte ich etwas Neues: den Wunsch, die Wut hinter mir zu lassen. Also fing ich an, mich um die Puppe zu kümmern. Ich ersetzte das Auge, das ich ihr damals in Schmerzen ausriss; ich schenkte ihr Haare und nähte einen Anzug für sie. Während ich meine Nadel ansetzte, passierte etwas, das ich nicht erwartete: Der Schmerz über meinen Verlust verwandelte sich… mit jeder Veränderung, die ich an der Puppe kreierte, kamen neue Varianten meiner Selbst auf. Visionen wurden zu Versuchen; Versuche wurden zu Praxis, Praxis wurde zu Erfahrung. Während die Puppe immer neue Gestalten annahm, entdeckte ich mich neu.. ich baute eine Beziehung zu meiner Geschichte auf, ich fand Freude an meiner Sexualität, ich lernte mich als Mutter kennen, ich machte mein schöpferisches Talent ausfindig. Die Wut trat dabei immer weiter in den Hintergrund.. inzwischen kommt sie nur noch selten zu Besuch. Die Puppe, die ich einst zerstören und vergessen wollte, ist heute eine andere und kann noch unendliche neue Gestalten annehmen.  

    Video-Puppen verwandelt sich in Floater 

    Übergang Floater:

    Ich fürchte euch nicht mehr. Euer Klagen soll mich nicht ängstigen, es lädt mich ein… Erst mein Einstimmen in euer Lied, kann es verwandeln. Und eure Rufe und Gesänge klingen allmählich andersartig.. und es ertönen Klänge, die ich noch nicht vernommen hatte…

    Kamera

    1 Auge (Erleben, Leid, Reha):

    Habe ich Schmerzen? Ja. Hab ich geweint? Ja. War ich verzweifelt? Ja. Wollte ich Rache? Ja. Habe ich mich isoliert? Ja. Wurde ich isoliert? Ja. Habe ich darüber gelacht? Ja. Bin ich einsam? Ja. Ob ich mich nicht mehr im Spiegel anschauen wollte? Ja. War ich gleichgültig? Manchmal. Wollte ich mich umbringen? Ja. War ich wütend? Ja. Konnte ich…

    Will ich weiter machen? Ja. 

    Stille –Warum? (mit Tanz) 

    Tanzsequenzen aus indischem Film

    • das erste Mal, dass ich mich im Spiegel gesehen habe, musste ich lachen. Ein bitteres Lachen, aber trotzdem befreit 
    • Augenprothese (ich sehe mein Auge, aber es sieht mich nicht) 
    • ich bevorzuge meine leere Augenhöhle statt ein Glas in mir  
    • sexy Augenprothese 

    Übergang Floater:

    Um so heller und klarer wir sehen, um so mehr Floater erkennen wir. 


    Idee: Choreografie Floater “Floater, die sich in charakteristisch huschender Weise gemeinsam mit der Blickrichtung verschieben, wobei sie um eine Grundposition herum langsam schwingende Bewegungen ausführen.”

    Und wenn du lange in einen Abgrund blickst, blickt der Abgrund auch in dich hinein.
    Und wenn der Abgrund lange in dich blickt, blickst du auch in ihn hinein.


    Figuren mit Haaren werden von Schere beschnitten 

    Schere – Stein – Papier



    Erst erzählt eine Frau 

    Medusa: 

    Glockenton eines Bahnhofs, bevor sich der Vorhang öffnet. Die Inschrift: Haft – Stadt der JVA für Frauen*
    Als Medusa den Bahnsteig betritt, kommt ein rauer Wind auf und schüttelt das kleine Örtchen durch. Sie wird direkt zum Eingang der Haftanstalt geführt. Bevor sie die Schwelle übertritt, zieht ein letzter Luftzug auf und streichelt durch ihr Haar. Ihre Haft beginnt. 

    Keiner weiß, wer sie eigentlich ist oder was sie früher gemacht hat. Medusa ist ein Symbol. Manche sagen, sie sei Bäckerin gewesen, andere glauben, sie war Zuhälterin, Schreinerin, Metzgerin, Hure, Schusterin, Ärztin… Sie selbst soll einst erzählt haben, sie sei eine Hexe. Manche glauben es, andere zweifeln, viele lachen darüber. 
    Medusa steht oft im Innenhof der Anstalt und wartet auf den Wind. Ihr Haar wird länger und länger. Aber es gibt Tage, da steht die Luft still und ihr Haar bleibt unberührt. Dann will sie auf den Wind warten und sträubt sich, den Hof wieder zu verlassen. Sie vermisst den Wind und harrt weiter aus. Ihre Auflehnung kann nur eine weitere Minute Ausgang bewirken, aber in dieser Minute kommt der Wind und küsst kurz ihr Haar.. bis man sie ergreift und zurück in die Anstalt führt. 
    Tag für Tag, trotzt Medusa der Zeit, verbleibt etwas länger im Hof und erwartet den Wind. “Was will diese Frau, die behauptet, eine Hexe zu sein?” fragt man sich. “Was soll dieser Kampf? Wem soll damit geholfen werden?” Unheimlich ist ihnen Medusas Verhalten. “Wenn sie so weiter macht, wird uns eines Tages die Minute abgezogen.” denken die Insassinnen. “Wenn sie so weiter macht, wollen eines Tages alle eine Minute länger bleiben.” denken die Wärterinnen. Und so beschließen sie, Medusas merkwürdiges Treiben zu beenden. 


    Glockenton der Haftanstalt, bevor sich die Zellentür einer Einzelzelle schließt. Ich setze mich auf einen Stuhl in der Mitte der Zelle. Ich bin wach, frisch und hart, meine Seele ist aus Kupfer – und ich fühle mich heilig. Ich sehe die Wächterinnen in ihren Uniformen, ich sehe meine Mitinsassinnen in ihren Uniformen, ich sehe die erwartete Erleichterung in ihren Gesichtern. Sie greifen meine Glieder und fixieren sie. Sie umfassen die Klinge und setzen sie an. Sie schwingen das Messer über meinem Schädel. Meine Haare fallen. Ein Luftzug umgibt meine Kopfhaut. 
    Haare werden abgeschnitten und Schlangen wachsen aus ihrem Kopf heraus
    Medusa singt/summt ein Lied?

    Neuer Tag: Sie kommen, sie greifen meine Glieder und fixieren sie. Sie umfassen die Klinge und setzen sie an. Sie schwingen das Messer über meinem Schädel. Ein Luftzug umgibt meine Kopfhaut. 

    Neuer Tag: Sie kommen, sie greifen meine Glieder und fixieren sie. Sie umfassen die Klinge und setzen sie an. Ich habe geträumt, dass ich zubiss. 

    Neuer Tag: Sie kommen, sie greifen meine Glieder und fixieren sie. Ein Luftzug umgibt meine Kopfhaut. 
    Neuer Tag: Sie kommen. Sie erblicken die Gorgo und erstarren.  

    Man sagt, ich wandele noch heute im Hof der Anstalt. Mit zischendem Haupt nehme ich mir eine Minute und empfange den Wind. 

    das Gorgonenhaupt wird kleiner und von einem Kreis umrundet
    Ghazal hält ihr Gesicht in die Kamera und positioniert die “Medusenhaube” auf ihrem Auge

    Ghazal:

    Was wäre gewesen, wenn auch in den anliegenden Zellen das Lied gesungen worden wäre? Wenn noch andere eingestimmt hätten.. was wäre dann gewesen? Hätte sich das Lied verändert? Hätten sie mitmarschiert? Hätten sie alles zum Erzittern gebracht? Hätten sie dazu getanzt, wie um sie zu bannen? Hätten auch die Wärterinnen mitgesungen? 

    Hätten sie nicht zubeißen müssen?

    Schere – Stein – Papier
    Wie kann Papier gegen Schere gewinnen? Ich will nicht zur Schere und nicht zum Stein werden… ich will Papier sein, aber ich will auch gewinnen. 
    Indem es dicker wird. Wenn viele Papiere aufeinander kommen, ist das Papier nicht mehr. Ich vermehre mich durch die Schnitte. Und die Schere wird stumpf. 


    Übergang: Was wäre, wenn alle Gefangenen mit ihr eingestimmt hätten? 

    Ghazal im Wandel: 

    – Meine Schwäche ist ihre Stärke und ihre Schwäche ist meine Stärke. 

    – Ich möchte die Geschichten von anderen Frauen* erzählen (viele Papierschnipsel)

    Kamera: 

    Das muss ich aufnehmen… 

    Überbleibsel: 
    – Früher hat der zweifache Abgleich Tiefe in die Welt gebracht, heute erlebe ich die Tiefe der Welt eindimensional. 

    Floater als Erinnyen:

    – schwarzgekleidete Frauen kommen in einer Prozession herein
    – die Erinnyen knurren

    – streckt sich Haaah! Ich habe im Stehen geschlafen, ganz gerade vor Wut, ich habe ungeheure Zornesträume gehabt, schöne Blume der Wut. 

    – Ich werde mich über ihren Bauch und über ihre Brust wälzen wie ein Sturzbach über die Kiesel. Ich werde dieses feine Fleisch geduldig polieren, ich werde es schleifen, ich werde es abschaben, ich werde es bis auf die Knochen abnagen.

    – Ich bin wach, frisch und hart, meine Seele ist aus Kupfer – und ich fühle mich heilig. 
    – Der Haß überschwemmt mich und verschlägt mir den Atem, er steigt wie Milch in meine Brüste. Wacht auf, meine Schwestern, wacht auf; es ich Morgen.

    – Ich habe geträumt, dass ich zubiss. Haa! Ich will kratzen. 

    – Ich werde seinen bleichen Hals auf meine Knie nehmen, ich werde sein Haar streicheln. Und dann werde ich mit einem Stoß diese beiden Finger hier in seine Augen bohren. 

    – Sie seufzen, sie bewegen sich, ihr Erwachen ist nahe. Los, meine Schwestern, meine Schwestern Fliegen, reißen wir mit unserem Gesang die Frevler aus dem Schlummer.

    – Welche Liebe könnte uns so beglücken wie der Haß? Bss, bss, bss, bss, bss. Wir sind die starren Augen der Häuser. Das Brummen, das über deinem Kopf am Himmel fliegt. Das Rauschen des Waldes, Das Pfeifen, das Knacken, das Zischen, das Heulen. Wir sind die Nacht. Die dichte Nacht deiner Seele. Bss, bss, bss, bss, bss. Hejah! Hejah! Hejaha!
    – …und überall wirst du die Nacht auf deinem Kopf mit dir herumtragen.

  • Protected:

    This content is password protected. To view it please enter your password below:

  • Protected: Ghesmate akhar + Excersises

    This content is password protected. To view it please enter your password below:

  • Automation Options in the DAW

    One of the key advantages of the digital functionality of a DAW is its ability to dynamically automate practically every individual parameter, rotary knob, switch and fader, as well as the virtual mixer, and also each of the plugins used in the session. With the help of graphically represented automation tracks, the functions of all parameters can be freely defined at any point in time, and, as if by magic, the DAW will automatically regulate all these editing processes. The ability to dynamically change every aspect of audio editing is made possible only within the virtual environment – in an analogue studio one would require an infinite number of sound engineers’ hands to manually carry out all the changes in real time during playback.

    In the processing of vocal signals, DAW automation offers an enormous range of possibilities, and it represents a real alternative for many editing processes. For example, the control processes conducted automatically by a compressor could also be applied with the appropriate dynamic automation of the volume fader. In some cases, such automation could even offer advantages when compared to using a compressor. There could be situations in which you don’t want to have a compressor working on a lead vocal track. In such a case, volume automation could represent a more flexible solution worth considering. It is also possible to avoid the sound by-products of the compression process in this way.

    The same also applies in principle to the de-essing process. Why not reduce the dangerous ‘S’ sounds with careful automation of the volume fader in the level? You would thereby achieve a de-essing without all the possible unwanted shortcomings and artefacts.

    Since all parameters can be automated as required, dynamic frequency editing is also conceivable. You may want to regulate the low-mid parts for individual words in a verse with differing levels. This is not so feasible with an EQ; it might be possible with a dynamic EQ but automating the relevant EQ band would also have this effect.

    On the other hand, you should not overdo automation – there are enough tasks and requirements which can be fulfilled very well and completely adequately with the use of classic gear or plugins. However, every now and again, it pays to ask yourself whether some tasks really can’t be applied more easily and quickly, and with a better sound result, by using careful and attentive automation. 

    In any case, volume automation with the fader, more than anything else, can help promote the most dynamic vocal performance. If the singer shows only slight emotional differences between aggressively loud and emotionally quiet passages in the recordings, it may help to subtly automate the volume during post-editing work. Often, one will want to make the verses a bit quieter, to give the refrain a bit more energy after a small volume increase. Bridges, and in particular breakdown parts, are suitable for bringing the voice upfront very directly and immediately. Volume automation can carefully promote all these subtle impressions and shape the dynamic course of a vocal performance. 

     

    Spatiality in the Mix

    For various reasons, which we have already discussed in part in previous chapters, modern-day singing and speech are mostly recorded in very anechoic rooms or even cabins with concise reverberation times – such are modern productions. However, this was not always the case – in the early years of sound recording people used the natural spatial information of large studio rooms effectively – including and especially in the case of vocal recordings – this was to give the vocal signal the greatest possible naturalness. After all, every acoustic signal only develops in the interplay of a room’s own interactive echo behaviour and its own natural sound effects.

    Nevertheless, in modern sound engineering technology, the practice of recording vocals as neutrally and “dry” as possible has become standard practice, in order to retain all possibilities and options for artistic and production-related spatial changes in the later post-editing stage. There are countless products available on the market today for artistic and, above all, virtual simulation of spatial sound. These create almost every kind of natural sound behaviour in such an amazingly authentic way that, when combined with a compact mix rich in signals, it is tough to determine the difference between it and real three-dimensionality.

    It is not; however, only the authentic image of real spaces that can be effortless reproduced with the help of devices and plugins. It is also possible to create very effective, totally unreal spatial projections which could not exist in reality and where one could never have recorded any singing. With these comprehensive options, vocals can be placed in very impressive soundscapes, which in turn can help to make it so unique and exciting for the listener that they will remember the sound of the vocals and the connected artistic-emotional statement more quickly.
     

    The two most important elements of spatial effects include (only naturally) the reverb (reverb) and the echo (delay). In most cases, both are integrated into the mixer routing as send effects when it comes to the vocal side of music production – the actual effects device is looped into the insert of its own effects channel. All tracks which are to use the effect send a certain level portion of their direct signal to this effect channel, whereby this signal component runs through the effect device 100 %, and the acoustic result is heard on the main outputs. The dry and unedited signal will, in parallel, also be forwarded to the main outputs; with this one will listen to a mix of processing and dry signal. The ratio of this signal mix will ultimately be determined by using the channel send level.

    The processing of vocals with reverb and delay can serve different purposes:

    • Simulation of an artistic space appropriate to the production, whereby originally dry vocals subsequently receive a natural surround sound.
       
    • Deliberate exploitation of sound effects on the depth offset of signals: vocals can be positioned more in the foreground or background of the mix with the application of skilled editing work. With this, they may, for example, float clearly and concisely “in front of the mix” or be homogeneously integrated into the overall sound.
       
    • With flexible spatial simulation, the possibilities for creating unreal spatial conditions, vocal signals with very impressive and effective sound aspects can be equipped. These can help to promote the attention of the listener and at the same time, emphasise the meaningfulness of the song.
       

    Algorithmic Reverb

    Let us have a quick look at what happens in a real space, how sound is propagated there and ultimately becomes a spatial impression which we will associate as inseparable from the sound. Some of the original sound waves reach our ears directly, while others are reflected beforehand (possibly more than once) in the immediate vicinity; still, others are reflected over and over again very often until they combine and create a diffuse, reverberant sound image depending on the space, its size and the surfaces located therein.

    Over time all these sound waves lose their energy, with the higher frequencies breaking down in level fastest while, the lower ones are preserved longer. These three described stages are also known as direct sound, early reflections and reverb (tail). These three parameters and their individual levels are especially important when it comes to describing and perceiving spatiality and reverberation.

    Two additional time constants are also important, both for the development of reflected sound in a space and also for the simulation of this process in devices and plugins. These are the time intervals between direct sound and the arrival of the first echoes at the listening point (or ear) and between direct sound and the insertion of the actual reverberant reverb tail.

    The first time constant, the Initial Time Delay Gap (ITDG) is provided as a reverb simulation parameter only very rarely, while the second, the pre-delay, is provided all the more frequently. Signals can be positioned as part of the spatial depth classification perception, in particular when manipulating the pre-delay in connection with the decay time of the reverb tail and making adjustments to the high reverb frequencies.

    Some reverb devices and plugins still offer additional parameters, e.g. size (spatial size) or density (density of the reflections). However, ultimately these are specific modifications of an original algorithm, that is, calculation models which calculate and simulate early reflections.

    So much for parameters: But what is an “algorithmic” reverb? The word “algorithmic” already reveals the following: it is a purely synthetic reverb, which corresponds with the aforementioned number of parameters and can be adjusted. As part of this, many adjustment knobs and variables are changed – these are anchored in an algorithm which calculates the behaviour of the reverb precisely.

    This means there are basically countless echoes generated which take over the task of the initial echoes and the reverb tail. In this way, the algorithm keeps the diffusibility, the frequency image and the duration of the reverb under control, using modulations.

    First, a dry signal is sent through a variety of “delay lines”. This results in delays which follow one after the other quickly and which are close together. Exactly how these delays take shape, depends on the settings of the size and the form of the “theoretical” space. Mathematical algorithms regulate the timing, volume and sound of the delays using these parameters. It is quite similar to surfaces in a real space.

    After the early/initial echoes, the late echoes follow – also known as the “reverb tail”. Thus, you should keep in mind exactly when these occur: they are initial echoes which affect other surfaces!

    In order to replicate this, the reverb uses feedback loops to send the generated echoes through the algorithm again. These spatial properties are then combined with the initial echoes already sent by the algorithm are re-applied, and “late reflections” arise.

    At this point, however, the reverb algorithm has other variables, which influence timing, volume and the sound of the feedback loop.

    Now the length of the reverb can now be determined based on how often the signal is sent through the feedback loop. The more often it is sent through the loop, the longer the reverb is.

    With the help of these processes, an algorithmic reverb can generate a pretty convincing real impression of a space. But it also comes with the possibility of generating “surreal” spaces which can convey a strange and unreal sound expression. This is a lot of material available for creative, crazy or even natural reverbs.

    Various digital reverbs have established themselves as plugins. Here are just a few of them:

    The true classic among digital reverbs, however, is the Lexicon 224, which was first used in music productions in 1978. Even if it was not the first digital reverb device, it is certainly one of the most famous.

    Convolution Reverb

    As an alternative concept to an algorithmic reverb based on complex mathematical models, reverb devices have also been around for some time which create artificial reverberation using sampled characteristics of real spaces and spatial surroundings.

    This so-called “convolution reverb” uses samples (spatial impulse responses) for this, which it includes in the dry output signal. The samples arise with an actual space being vibration-stimulated by a very short acoustic impulse (Knall, DIRAC, Sinus-Sweep), with the result (the “answer” or reaction) being displayed. This gives you a so-called “impulse response”, or IR for short.

    Convoluted reverb? Why “convoluted”? In this case, it is not meant that the reverb is symbolically convoluted and inserted into a reverberation device. This is a term which originated from mathematics. A mathematical convolution describes the multiplication of two functions. Alternatively, to put it very simply: the frequency image of the signal is multiplied by the impulse response. We do not, however, want to get too theoretical at this point.

    This “response sample” represents the individual and unmistakable real spatial behaviour of this special space and it can be calculated on every audio signal using a corresponding plugin algorithm (the convolution reverb). The audio result in the end is the same as if the dry output signal had actually died away in this space – it results in extremely realistic reverberation behaviour, the naturalness of which cannot be surpassed.
    Thus, it is possible to insert a dry signal into every conceivable real space. These may include legendary concert halls or studio rooms, or also reverb devices in unusual places such as the inside of an oil tanker, a plastic bucket or a car boot.

    However, the reverberation behaviour of the convoluting reverb often cannot be as flexibly edited as would be possible using the parameters of algorithmic reverb devices and plugins, but it can be considerably more natural and realistic in this regard. When it comes to spatial editing, therefore, you should think carefully about the signal you value so much – either you want the maximum possible naturalness and the amazingly realistic reverb of convolution reverb, or you rely on the flexible simulation and recalculation of echo behaviour with the help of an algorithmic reverb, which can also be strongly adjusted to your own wishes and requirements. The respective decision should be made depending on the desired sound image for the particular song and the sound of the vocals.

    Convolution reverbs are not just useful in music production – practically every location can be “convoluted”. Thus, film sound can be brought to life in a picture shot in front of a green screen – on an acoustic level – using the convoluted reverb of a real environment.

    Plate Reverb

    Over the decades, the particular sound character of the so-called plate reverb has established itself as a consistent stylistic device and a good choice for vocal reverb. The German company EMT Franz initially put forward a monstrous reverberation device in the 1950s, which creates sound echoes over a freely swinging metal plate. The echoes of the EMT 140 and its successors are very compact, with a high-middle pitched, metallic sound character, lending a pleasant and strikingly fresh, artificial space to vocal signals in particular. The original EMT 140 plate reverb, more than 2 metres in length, was replicated by many manufacturers as a virtual plugin emulation, and it is a good selection for a fine vocal reverb. The plate reverb is a popular sound, which serves both as a template for sound behaviour of algorithmic reverbs and as a convolution reverb. This reverb ensures an unbeaten open and light vocal sound in countless releases. 

    There have already been several models established in the plugin world in particular, which convey a very realistic or true-to-original sound impression. 

    Reverb Selection for Vocals

    What kind of reverb should one use when processing vocals? As is so often the case, there can be no general answer or recommendation here either, since the decision depends strongly on the respective sound objective and the requirements of the song. One will generally have the least parameter editing work to do with the selection of a suitable convoluting reverb since these devices come with fewer settings. Even if some parameters can be changed, however, in the event of any doubt it would be better to choose a different, more suitable impulse response rather than to manipulate the one selected and, ultimately, distort its sound. That is because this was not the original idea behind the concept of convolution reverb; after all, you are looking for precisely this unique, particular form of echo found in the space whose impulse response has been chosen. A change or adaptation of this sample will typically, then, lead to significantly worse sound results as compared to simply having selected another impulse answer.

    On the other hand, if you decide on an algorithmic reverb, you have many more manipulation options; however, you will never achieve the realism of a convolution reverb. Ultimately, however, outstanding results can be obtained with both concepts. Answering the question of how much reverb one should use with a vocal signal depends very strongly on the current zeitgeist. Reasonable and necessary space is good for every vocal signal since it ensures a basic level of naturalness, with which one should equip a very dryly recorded singing-signal. The length, sound and “colouring” of the reverb signal will vary – we are all familiar with, for example, pop productions which are drowned in mostly very clear and long reverb tails. Every style and every pop music epoch seems to deal with reverbs differently, no doubt to clearly distinguish and differentiate one’s work from everything that has gone before. This meant that, in the 1980s, long and bright, high-frequency reverberation rooms were modern, whereas for the past several years we have experienced a trend more towards drier productions (among other things), and space is often created only with early reflections and/or delays, as well as spatial ranges with very little reverberation rating. The Rock’n’Roll era of the 1950s used very short reverb times, and the legendary slap delay – hardly any recordings by Elvis Presley or other representatives of this genre got by without this punchy and in-your-face bathroom sound. 

    Algorithmic ReverbConvolution Reverb
    Free and flexible processing of all relevant parameters.Fixed characteristic echo patterns and reverberation behaviour of the convoluted space
    Unreal and unusual spaces can be created through deliberate “abuse” of the parameters.Extremely realistic sound
    Not-as-realistic results as you would get with convolution reverb technology.The opportunity to use the fantastic rooms of large concert halls, studios, etc.
    More resources-friendly than convolution reverb.Very unusual sound spaces available (cans, forest, shoebox, etc.)
    Fully suitable for most tasks in a mix and especially when editing less important signals.Especially suitable for very high-value and important signals, which are far at the front in the mix. Tips for editing the vocal reverb.

    Tips for Processing Vocal Reverb

    No matter the current taste in music, when it comes to editing vocals with reverb, as we have seen, it is not just a purely realistic simulation of space that needs to be in the foreground. Much importance is also placed on editing, which is capable of making the main voice (which ultimately is the most important element of the song) radiant, assertive and unusually attractive. It is not uncommon to use multiple reverb devices for this with different settings and which give the voice different sound aspects (even if only in small proportions).

    A device with a rather short and compact reverb plate emulation, but rich in early reflections, ensures a full, voluminous and significant basic sound. Another reverb with a slightly longer reverberation time helps to embed the voice in the accompanying arrangement of the other instruments and gives it depth and substance at the same time. Thus, the vocal reverb can still be rich in high-frequency components, resulting in a more radiant and shiny impression. Often one will also work with a rather long-selected pre-delay (popularly around 100 ms), which decouples the direct sound of the voice from the reverberation, with the consequence that the voice will, as usual, be perceived as very present and clear in the near foreground of the sound image. Naturally, one can also get very good results with only a reverb device or a convoluting reverb plugin.

    • Two different reverb devices can help you to achieve different spatial sound aspects.
       
    • A very compact, rather short plate reverb (EMT 140 Simulation, Plate etc.) but rich in early reflections for a voluminous and assertive voice
       
    • A longer, finer reverberation for embedding the voice in the playback, for depth and a refined appearance
       
    • High-frequency components in the reverb help promote the shine and radiance of the vocals.
       
    • If you want to make the reverberation more natural and inconspicuous, on the other hand, you should dampen the high-frequency components somewhat (Low-Pass/High-Cut Filter). This corresponds to the normal decay behaviour of echoes in nature.
       
    • Often longer pre-delay (not rarely up to 100 ms and depending on the rhythm of the song), to decouple the direct sound component of the voice from the reverberation. Thus, the voice gains significance and clarity, and the impression of closeness – despite high-quality reverberation – is promoted.
       

    Here is a sample representation of the parameter settings of two algorithmic reverb devices which illustrate what has just been described: 

    Delay (Echo)

    An effect closely related to reverb, which plays an almost equally important role in the  editing of vocals in the mix and is one of the standard audio editing tools, is echo, known mostly as delay. As we learned about reverb in the previous chapter, echo effects are, strictly speaking, the acoustic basis for what we designate as reverb and attempt to simulate with the help of complex algorithms or elaborate convolution with the help of hardware devices or virtual plugins. After all, the echoes that are so important for the creation of a compact and more or less long reverb tail, are nothing other than short echoes which resound from walls, floors and ceilings due to the nature of the space as well as the reflecting surfaces; echoes which overlay each other. Initial sound event echoes (early reflections) responsible for reverb creation and the already mentioned pre-delay, too, are built on very short delays (in the 1 to 2-digit millisecond range).

    Thus, delays consist of very important additional sound information which subconsciously helps us to relate an audio event to a surrounding space typical of it. With this information, we can understand how far away from us, a sound event is taking place, and what the space and its reflective surfaces could be exactly. Thus, a delay, in connection with (and, ultimately, no less than) an important component of the reverb, is the effect which allows us to give a relatively dry audio signal with little natural spatial information, such as a vocal recording, a suitable and objective three-dimensionality for the respective production and the desired sound concept which, for the listener, represents an artistic reality which we have created.

    Technically speaking, delay is not really a complex effect. A signal present at the input of the device is issued at the output, in an artistically delayed state, by the delay switch. If the signal delayed in this way is played back again at the input, to be delayed there again, it will result in a so-called feedback loop, which will continue for as long as the audio signal still has enough energy to be thrown back on itself. 

    Tape Delay

    The concept of delay, which is simple in itself, can be found in different device types (in a slightly modified form and with correspondingly different parameterisation) creating a wide variety of options for editing audio material.

    A so-called tape delay thus simulates the echo behaviour that was to be achieved using the original hardware devices. In these early devices, the delayed signal was captured and repeated by successively recording and repeating the signal on tape, using several record/play heads arranged in a row. The delays achieved in this manner were mostly of increasingly bad sound quality as the signal lost more and more of its high frequencies with each repetition becoming increasingly distorted.

    This gradual worsening of the delay sound became so characteristic and striking, its distinctive sound even decisively shaped entire music styles (e.g. reggae, dub, etc.). This sound behaviour can even be elaborately simulated and reproduced most authentically in today’s virtual tape delay plugins. In order to do this, the individual delay repetitions are edited (more or less intensively) with high cut filters, special EQ settings, compression and distorting tape saturation simulation, so that the delay signal will get that coveted vintage sound.

    Many legendary tape delay device emulators also still offer massively extended filter and editing options, with which very interesting and unusual effects can be achieved, including sound design options.

    Sound Example 5

    Multi-Tap Delay

    So-called multi-tap delays make up another kind of contemporary delay devices/plugins. They offer a variety of simultaneous and very flexibly selectable signal tapping options, delivered mostly on a virtual basis. These allow individual delay signals to be respectively individually edited with filters and EQs, and also rhythmically synchronised precisely with the song tempo in different note values. This comprehensive parameterisation can give rise to very complicated repetition patterns which are capable of turning a possibly somewhat one-dimensional and unspectacular dry signal into a real wonder marked by acoustic and rhythmic complexity. In electronic music, in particular, these complex delays are readily applied using all possible instruments, giving the track a special mood and atmosphere. With vocal signals, too, the sound effects triggered by a multi-tap delay are quite good (especially in electronic genres); they can contribute to a voice that is thus edited remaining impressive in the mind of the listener. The strong recognition effect can help to make a “normal” song a hit. 

    Sound Example 6

    Multi-Effect Delay

    If one combines the concept of multi-tap delay along with the ability to modulate the individual parameters of other source signals, thus letting them change dynamically, one can get very complex multi-effect delays or even modulation delays, with which particularly unusual effects can be created. Instrumental or vocal signals edited in this way can take on completely different sound characteristics and ultimately be “edited up” until they are completely unrecognisable, which in turn can be very practical and helpful for many applications, especially in electronic music.

    Exactly which kind of delay you decide to use when editing vocal recordings, depends very much on what kind of music you are currently producing and your specific sound goals for the production. In just about every case, you will want to set the vocal delay in rhythmic relation to the tempo and groove of the song. A classic application of this technique often used by sound engineers when mixing vocals is to time the delay effects to the song’s tempo in eighth or quarter notes.

    You can also get quite good results with triplet or dotted delay time values. The delay times and also the feedback values should not be selected too long or too high, respectively, lest the repetitions of certain singing passages be inevitably reproduced in the words of the following repetition and so on until sooner or later you are left with a load of incomprehensible gibberish. If the individual singing phrases are divided by long breaks, this will result in somewhat longer delay values.

    If the individual sentences follow each other very closely, you will have to work with very short delay times (if you work with any at all). One form of editing which is quite often practised is to let the delay effect actually start only at the ends of sentences/phrases, or at least significantly dampen them within sentences. This can be achieved either with volume automation of the delay send level, or by turning it down, or even by completely muting the delay channel. It is necessary to decide how you will ensure the intelligibility of the song despite the use of delays, on a case by case basis and depending on the audio material.

    One should also consider the spatial positioning of the delay signal in the stereo panorama. A delay does not always have to affect the entire stereo width; it is often more effective to organise a delay return on a certain side and maintain mono conditions. 

    Sound Example 7

    Cross-Routing Reverb and Delay

    In many cases, it makes sense to re-route the individually controlled send effects reverb and delay back into each other by so-called cross routing. Part of the delay effect channel’s level can be fed into the reverb effect channel using send. The delay is thus partly processed with additional reverb, resulting in a very dense effect that differs significantly from parallel signal routing of the two effects. Alternatively, a portion of the reverb channel’s signal can also be sent into the delay effect channel. You should not be afraid to extensively edit the effect channel with additional insert effects. Very interesting sounds may be created by the use of compressors, EQs/filters, distortion and more complex modulation effects that heavily process the reverb/delay signal in the inserts. As is so often the motto here: experimenting with routing possibilities is not only allowed, it can also lead to very interesting and useful effects – there is no right or wrong, anything which sounds good is allowed.

    As a starting point for processing vocal signals with delay in the mix, we want to make a suggestion here. As already mentioned, however, different processing possibilities and configurations are just as possible and goal-oriented.

    • Depending on the tempo and rhythm of the song as well as the density of the individual vocal phrases, you should choose 1-3 different delay effect channels, which are individually applied in the DAW. You can name these channels, for example, DEL 8th, DEL 4th, DEL Tape and assign separate aux or bus routings to them.
       
    • The individual delays are loaded into the inserts of the effect channels (e.g. a simple delay in 1/8 note rhythm, one in 1/4 note rhythm, and a tape delay with dotted 1/8 note groove).
       
    • The feedback of the individual delays is kept very moderate, especially if the text phrases of the vocals are already quite dense. Feedback values of about 10-15% are usually enough. The tape delay’s feedback can be a bit more pronounced.
       
    • The delay plugins are all set to 100% wet, which means that the incoming send portions of the vocal channels are processed at 100%. The faders of the effect channels remain in 0dB position, but you can distribute the three delays a bit in the stereo panorama if necessary.
       
    • Now three different sends and their individual levels send different signal components to the delay effect channels. That does not have to be much in individual cases; here usually already very low send levels are sufficient for the desired effects. As already mentioned, the send levels are branched off in POST fader mode, which means that if the channel strip volume is reduced via the fader, the branched-off send signal also becomes correspondingly quieter. Although there are situations in which the exact opposite is desired and can be realized via a PRE fader circuit, the POST fader variant is usually better for this purpose.
       
    • If you want to use even more prominent and distinctive delay effects, you can either create new additional effect channels with corresponding multi-tap or modulation delays, or you can edit existing delay channels with corresponding insert plug-ins.
       

    Cross-routing in Cubase (Lead Vocals to Reverb + Delay and Delay to Reverb)

    Cross-routing in Pro Tools (Lead Vocals to Reverb + Delay and Delay to Reverb)

    Likewise, dynamic automation of the individual parameters or the applied send level can help to create more unusual and exciting effects.

    Sound Example 8

    Other Send-Effects

    In addition to classic echoes and reverbs, there is a whole range of other effects, which can easily be controlled with an aux send and added to the mix at will. We would like to have a brief look at a few typical representatives of these here:

    Slapback

    Slapback is essentially nothing more than a simple, one-time echo. The origin of this one-time repetition originates from tape machines. With such devices there is a small spatial distance between the recording and playback head, and – depending on the tape speed and the distance between the heads – a certain period of time will pass before the recorded signal is issued again.

    In the digital age, slapback is possible by setting a simple time delay to the send channel. Common delay times here are approx. 45 – 70 ms. To make the delayed signal a bit more interesting in terms of sound and make it sound a bit more like tape, it is not uncommon to give the signal a bit of distortion and to lower the highs. Although, even in combination with other effects, a slight delay with the mixing effect can often lead to interesting results.

    Sound Example 9

    Slapback Delay

    Audio Player

    00:00 | 00:06Use Up/Down Arrow keys to increase or decrease volume.

    Widener

    The word “widener” is, strictly speaking, a collective term for tools which allow for spatial broadening. In popular music, one often has a mono-signal for the lead vocals and would like to get this “space-filling” effect stretched out. If one already has a stereo signal, a simple broadening can, of course, also be achieved by raising the side channel, but this would have no effect with a mono signal since there is no side part available at all. There are different techniques for broadening mono-signals.

    The following are suitable examples of giving a mono-signal a certain width: a short stereo delay, a doubler or a stereo chorus (see also the “Classic Modulation Effects” chapter). Other techniques work by using different frequency processing for the right and left audio channels. A one-side time delay also ensures a stereo image – but be careful: The perception of the signal then shifts to the side on which it first sounds. One simple widening technique which is mostly quite effective with vocals is to shift the vocal track on a send channel with differences in pitch on the left side and the right side.

    It is often sufficient to, for example, pitch the left channel one-tenth of a semitone lower and the right channel one-tenth of a semitone higher, to get a “floating” width – this technique often functions very well, especially with singing. Of course, however, all these broadening techniques should not be used in place of a doubling; rather, in the mix, they can help to reach additional “size”.

    Sound Example 10

    With Mono-to-Stereo

    Audio Player

    00:00 | 00:06Use Up/Down Arrow keys to increase or decrease volume.

    Without Mono-to-Stereo

    Audio Player

    00:00 | 00:06Use Up/Down Arrow keys to increase or decrease volume.

    Routing is Everything!

    In the modern practice of mixing there are usually several different send effects used in connection with vocals, which, through well-thought-out cross-routing (i.e. mutual influence), often in the end converge with dry vocals on a bus and are likely then again “united” by bus compression. You often only notice how many different send effects were used when you listen to current music mixes closely. Usually, it is precisely the combination of many different effects in relatively small proportions, as well as their intelligent combination, which creates a sound image that really showcases the vocals to their best advantage.

    We have already shown that a cross-routing between delay and reverb is a very sensible procedure. As already explained, however, this, of course, applies not only for reverbs and echoes, but basically for all conceivable signals which may be applied in a mix.

    With each one, you can ask yourself in which other effects the signal should be distributed and at what ratios the effects should work together. It is not uncommon for the selected cross-routing to have a considerable influence on the hierarchy of the signals and their spatial effect in relation to each other. It is therefore worthwhile exploring different routing options, signal flow sequences and ratios in a playful way and always trying something new and unconventional once again.

    One should also not forget the sidechain function of gates and compressors! This is how the famous “gated snare” effect came about, in which the reverb of a snare was gated by the snare signal. It results in an unnaturally large, choppy reverberation space, the likes of which shaped 80s pop music enormously. This too is, of course, a conceivable effect on vocal signals. Another technique frequently used in modern pop music is, for example, ducking your own echo. The procedure is relatively simple: the dry voice is sent to a delay via send. On the delay channel, there is also a compressor, which should compress the echo.

    However, on this compressor, the external sidechain is activated, so that it responds not to the echo signal itself, but to an externally supplied signal. If you send dry vocals to this sidechain input, then the echo will always be reduced when the singing sounds. In this example, a relatively loud echo can be used, and a compact “vocals carpet” created, without it resulting in unfavourable competition for the direct signal. The vocals can be definitively at the front in this way, and the echo flag only really comes out in the singing breaks.

    In addition, in the practice of mixing it is not just dedicated send effects mixed on additional channels; dynamics, distortions and EQs are also used on send channels. Classic parallel compression (“New York Style”), compression or frequency editing of reverberation spaces or echoes, and even the isolation of certain frequency ranges for more controlled use, are all used by professionals to make popular mixing tracks. One can obtain and develop, for example, a “vocal shimmer”, by isolating only the highs of the vocals on a send channel and leaving them condensed with relatively strong compression (along with, if necessary, counter-de-essing). This “shimmer” can then be added in the mix, in a carefully measured manner. 

    As these examples show, the following rings true with the various send effects as well as in an entire mix: the interaction of the different elements provides only the common sound image, and it is well worth experimenting with different routing options. In this area, in particular, the possibilities are endless.

    Exercise 4

    1. What is the difference between destructive and non-destructive work?
       
    2. In which frequency ranges can the components of a voice be classified (fundamental range, presence, sibilants)?
       
    3. How can a low-cut filter help when it is used on vocal tracks?
       
    4. How is an excessively long attack time expressed in sound terms in the compression of vocals?
       
    5. Moreover, how is an excessively long release time expressed in sound terms?
       
    6. Which parameter is influenced by the hard/soft knee in a compressor?
       
    7. Outline how an algorithmic reverb works.
       
    8. If you had to choose the reverb for a vocal recording of a classical song, would you prefer a convolution reverb or an algorithmic reverb? Why?
       
    9. Why can it make sense to send the delay into the reverb along with the vocals?
       

    Pitch Correction

    One of the biggest hurdles on the path to a perfect vocal performance is the intonation of the singer, which should be correct and as clean as possible.  During the recordings, along with the artistic emotional expression, they should, more than anything else, have the “melody notes resonate” with the correct and appropriate voice “colouring” and dynamics and the corresponding song text.

    However, it is not just heavy intonation blasts that should be avoided. The singer should also pay attention to correct micro-intonation. There are many (mostly inexperienced and untrained) studio singers who tend to intonate too low (flat) or too high (sharp). This may also be due to the fact that the headphones mix is not optimally set. An excessively loud mix in the headphones leads to a rather sharp intonation; conversely, a mix that is too quiet in many cases leads to flat vocals.

    What should be done if you have been unable to fully eliminate these strong and/or minimal pitch fluctuations during recording? Subsequent pitch correction is necessary – an operation which, in earlier decades, was possible only with the help of complex pitch shifting and time stretching with the audio passages in question.

    The resulting adjustments in the pitch of the vocal signal were mostly accompanied by rather poor and, above all, not unnoticed changes in quality. No-one would have argued that pitch correction was unobtrusive.

    Manual Pitch Correction

    Behind the bulky term “manual pitch correction” lies what is probably the most controversial trick of sound technology: correcting a singer’s wrong notes! This is perhaps an oversimplification.

    Tools and editors which offer manual pitch correction are capable of analysing audio material (mainly vocal signals) and determining the natural direction of a pitch. This pitch information can then be used, in a manner similar to a MIDI editor, to edit the song’s melody. 

    Studio One with Melodyne (via ARA2)

    Logic Pro X with Flex Pitch

    Cubase Pro with Variaudio

    Melodyne – the Revolution

    At the start of the 2000s, the German company Celemony developed an independent and at the same time revolutionary way to edit pitches of audio material graphically and with the highest sound quality, in the form of Melodyne, whereby monophonic or polyphonic audio data is analysed in detail and, depending on the pitch, sound lengths and levels are displayed graphically.

    Following an analysis, all these parameters are freely accessible, and they can be edited and shifted with total flexibility. This allows not only for pitch corrections with highest audio quality; length, articulation, volume, transitions, formant shares and vibrato with individual phrases, words and sounds can also be individually edited.

    The resulting corrections are very realistic within a wide range of values and represent the absolute professional standard with regard to audio manipulation, time-stretching and pitch correction. With Melodye, it is, to a certain extent, possible to transform any vocal performance into a completely different melody and rhythmic structure with a certain melodic delivery or timing.

    The possibilities of manipulation are almost frightening, and one can certainly argue about the sense and nonsense of such a powerful manipulation. In any case, Celemonys Melodyne offers the most impressive and comprehensive range of functions for editing vocal tracks and is therefore recommended as the absolute standard for professional vocal editing and/or creative sound design for any sound engineer.

    Auto-Tune

    But there is also a fully-automatic form of pitch correction which occurs (at least approximately) in real-time! The effect is quite well-known as “auto-tune”, even though this is actually a brand name.

    The software plugin Auto-Tune from the company Antares was revolutionary in this area, as it enabled authentic-sounding pitch correction for the first time, by forcing the pitch of the melody of the input signal to match the predefined frequencies of a reference scale. If in the original signal, any pitches were analysed which did not coincide with the values of the target scale, these were adjusted accordingly.

    The level and speed of the corrections can also be freely adjusted. By quickly and radically changing pitches to match just a few “allowed” reference tones, some very distinctive and immediate frequency jumps emerged between sound transitions, which were, initially, used as a conscious sound effect and later to extremes in the mega-hit “Believe” by the artist Cher.

    The resulting sound effect as applied to a human singing voice was so characteristic and had such a strong recognition value that people still speak of the so-called “Cher effect” today, which is actually the result of using an overly extreme reaction parameter adjustment, combined with a drastic scale limitation in Auto-Tune.
    This world-famous effect basically came about as a result of an “incorrect”, or at least very unorthodox, handling of the plugin – a prime example of the principle of innovation with unprejudiced experimentation using the tools that were available.

    Moderate and gentle use of Auto-Tune, on the other hand, leads to quite good results when editing vocal signals, and unwanted intonation fluctuations are corrected safely and largely inconspicuously. However, you should only really edit the critical passages of the vocal performance with the options offered by Auto-Tunes – the effect is usually not completely inaudible, even with the most careful use.

    Today the effect can be found in countless rap, trap and cloud rap genres; it has indeed massively characterised the current epoch of hip-hop music. But Auto-Tune and consorts are also massively used in pop, EDM, rock and folk, even if it is usually less noticeable than otherwise.

    In addition to the various versions and offshoots of the classic “Auto-Tune” from Antares, there are of course many other suppliers of comparable software as well. Incidentally, in everyday life in modern music production, it is advisable to have a few different versions of these tools in stock. In many cases, a direct comparison shows that different Autotune tools do not function equally well with a specific voice. It is, therefore, difficult to provide a general assessment; rather, it is always a question of the desired effect and the respective voice. Here are a couple of suggestions for different Auto-tune tools: Waves Tune (Real-Time), GVST GSnap, Auburn Graillon or Izotope VocalSynth. In addition, many of the commercially available DAWs already include real-time pitch correction. These alternatives are definitely also worth trying!

    Auto-Tune in Production Usage

    There often arises the question of the point at which Auto-tune should be used in a production. While an artist may insist on listening to the effect at 100% over their headphones during recording, in other cases the better solution would be to record the vocals “clean” and wait for the mixing stage to do pitch correction work, for the sake of promoting intonation or including a certain sound “colouring” in the vein of the “Cher effect”.

    Whether or not a real-time Auto-tune effect on headphones during recording is sensible, naturally depends on the technical feasibility factor, but more than anything else, it also depends on the desired sound effect. Many singers deliberately play with the sound transitions, which generate a certain “flutter” through pitch quantisation. If such a striking auto-tune sound effect is desired, the use of the plugin on the headphones mix makes sense – it means that the singer can “play” the effect during the performance. This approach is, for the sake of example, frequently used in more modern hip-hop EDM or Trap. If, on the other hand, a more natural singing performance in the foreground is what is wanted, and pitch correction is to be used only in the context of sound “colouring”, or even just for the sake of obtaining a correction which is as inaudible as possible, it is advisable to put the vocals on the headphones without Auto-tune, so that the singer can better control their actual intonation without being distracted from their performance by the effect. In pop or rock music, one would tend to choose this route. Of course, the needs of the artist are also as individual as art itself; for this reason, this topic should never be decided on without consulting the singer.

    In any case, however, the effect is not directly recorded, meaning that, afterwards, you also have the opportunity to change settings or to apply plugins prior to the pitch correction.

    As a matter of principle, it should be borne in mind that real-time editing represents a certain sound-technological challenge when it comes to headphone monitoring. Just like with any digital editing, a certain buffer time is required for calculation purposes in the case of pitch correction as well – this, of course, also creates a certain temporal delay over the headphones. Exactly how long this time period is, depends on the complexity of the editing algorithm, as well as the available CPU power and, of course, the tasks that are to be calculated alongside this. As a matter of principle, under such circumstances, one requires a relatively powerful computer and a resource-saving algorithm, and it also makes sense to close, or bounce/freeze, all unnecessary background programs and plugins, to enable as little latency as possible. A special session is often held for more complex vocal recordings, in which only exported group tracks of the playback are available. Since these only need to be played, and all effects are already included, the studio computer will be significantly relieved. During recording, a simpler auto-tune effect can also be used, which will then later be replaced by a more resource-intensive algorithm.

    Some audio interfaces also offer the option of creating a mix, including complex effects, in near-real-time, with the help of internal processors. This solution allows you to apply Auto-tune effects to the voice during recording without being dependent on the computing power of the recording computer.

    Sadly, the term “auto-tune” has taken on a negative meaning on account of its frequent strong effects-related use. It is also a helper which can even be almost invisible on a vocals track and correct the voice on a pre-defined sound scale. This eliminates the need for costly manual correction if the singer has not strictly sung “along with the pitch”.

    Here are some examples of “effects usage”, and soft and “invisible” use. In both examples, Antares Auto-Tune PRO was used. 

    Sound Example 11

    Why Make Corrections Anyway?

    The question of whether one should correct vocal recordings in terms of pitch and timing initially seems somewhat banal. Obviously, incorrectly-sung tones and poor intonation should be corrected – this much is clear! However, manual or even automatic pitch and timing corrections also find their way onto the recordings of highly professional singers whose performances are immaculate.

    In modern music, it is often common for the instrumentals to be completely comprised of electronic or virtual instruments. Pop music, in particular, usually still contains only a very few “natural” instruments. It should be clear that a virtual instrument, in its full digital splendour, only rarely varies in pitch or is out of tune. However, the exact intonation of these instruments can be challenging when combined with a natural voice. Under these circumstances, even the most natural deviations in the voice are dangerous, and even slight deviations or a strong vibrato in the voice might not sound completely “clean”. Therefore, pitches may be precisely corrected even in seemingly perfect vocal takes.

    Even if such corrections are barely perceptible the first time you listen, they still ensure the usual “perfect” vocal sound, as we have all heard from Rihanna, Katy Perry and Madonna. 

    ARA 2

    Those who like Melodyne should also like ARA! When it comes to correcting pitches and timing, there have been many different programs developed over the past years – Melodyne, Antares Autotune, SynchroArts VocAlign and ReVoice, and then some. However, using these in the context of separate editors is only moderately practical. Thus, to a given extent, audio tracks had to be “recorded” again in programs like Melodyne in order that they could be edited.

    However, this changed with the idea of the VST and AU plugin extension ARA. ARA means Audio Random Access, and it has revolutionised the pitch correction tools’ sector. In 2018 and 2019, version 2 was integrated into different DAWs. A team from Celemony (Melodyne) and Presonus (Studio One) played a key role in its development. In 2019, Steinberg also implemented the ARA extension int0 the DAWs Cubase and Nuendo.

    How does it work?

    With classic plugin formats, data could only be edited in real-time. This meant that, for the plugins to be able to work, there had to be a data stream at the input. This data stream was, so to speak, received, edited and re-issued serially. This limits the section that the plugins can edit – only the section which is currently being replayed.

    For editors like Melodyne, which have their own views of events (or so-called “blobs”), this is certainly problematic. These plugins essentially require static values, on which the edits are applied. And that is where ARA comes in.

    ARA enables permanent access to the audio files present in the sequencer of the DAW – completely independent of the play position. With this, Melodyne and the DAW sequencer can work as independent editors while always knowing what the respective other editor is doing. In this way, if one moves a clip in the DAW, the event is also automatically moved in Melodyne. If the length of a note is changed in Melodyne, it is immediately reflected in the DAW sequencer. This simplifies the workflow very much and speeds up work with programs such as Melodyne or VocAlign. 

    Tips and Tricks for Vocal Processing

    In the following we would like to offer some useful tips and tricks derived from regular experience in the recording studio. These are applicable to daily work and especially for post-recording vocal signal processing. Some of these are based on principles and concepts already introduced in previous chapters.

    Recommended Routing Setup for Vocal Mixes

    A suitable routing setup will, of course, look slightly different for each music production, and be accordingly adjusted for its respective requirements; the number of tracks and the integration of effects. However, it makes sense to consider using a standard routing setup – which can, in many cases, be used as a general template. This facilitates a quick session overview, since the same configuration can always be used in a similar way at the same juncture. This will, consequentially, enable faster and more purposeful work.

    • It is important to name the individual tracks in a meaningful and clear manner and, if appropriate, give them a specific colour scheme, most of all with regard to the routing configuration of vocal tracks during a given session.
       
    • You should give some real consideration as to whether you will route each of the vocal tracks individually and independently of the stereo aggregate (the main stereo output) or if you might combine them into one or more sub-mix channels. A vocal stereo submix which combines all vocal signals on the main-out prior to the output, offers the advantage that the vocals can be quickly removed from the signal path, or soloed, with a single click by using the fader’s mute/solo switch. This can be very helpful in many situations. Thus additionally, you could supply even more common sound arrangements for all vocal signals using the submix channel’s inserts – possibly a slight common compression, a bit of tape saturation or a special kind of final EQ-ing. Using this method you could, in some situations, even give all of the vocal signals a certain reverb or delay effect by using send in the submix channel; however, it is probably better to route reverb and delay effects using the individual sends of the individual channels.
       

    Parallel Compression and Parallel Processing

    One popular type of compression editing, called parallel compression, was originally used to gain flexibility when compressing drums and percussive signals, and has recently become quite common for compressing the dynamic range of a vocal signal. During the 1960s, this practice was named the “New York Drums Trick” or the “Motown Trick”. The principle behind this compression technique can be explained quickly. Parallel to the actual vocal signal, which remains untouched during the dynamic editing stage, an identical copy of the track is strongly compressed in its dynamics, thanks to an extreme compressor setting.

    The compressor settings required for this have such an unusually strong effect that signals compressed in this manner, when heard alone, sound distorted almost to the point that it was unrecognisable. However, if this supercompressed signal is quietly mixed into the main vocal track with a subtle volume level, then the combined sound result receives a convincing and effective kind of stability. The mixture of the two signals does not really sound compressed; nevertheless, it has all the sound characteristics of a sufficiently compressed vocal signal. The advantage of parallel compression is that all of the negative aspects normally found in strong compression are not evident.
     

    • The percussive transients at the beginning of words and consonant syllables remain completely preserved in the unedited signal, which the listener will perceive as a transparent and direct naturalness of sound and performance. In addition, the transients define the “punch” and the liveliness of the vocal performance.
       
    • The strongly compressed parallel signal ensures a constant, wide and compact sound basis for the overall sound which, when subtly mixed together, gives the combined output signal a noticeable (rather than directly audible) kind of fullness and stability. The overall sound becomes significantly more powerful and assertive, without sounding noticeably compressed.
       
    • Depending on the chosen settings of the time constants of the compressor – attack and release – a strong pumping sustain can be mixed in with the unedited signal, which fills the time gaps of the song (with exact settings conditions) and as such, enables a sound density that one could never achieve using normal compression. This can be used very effectively with singing in particular, by filling the spaces between the individual sentences and phrases by “pumping back” the compressor signal, which leads to the creation of a particularly compact overall performance.
       

    Just exactly how you finally implement the parallel tapping of the vocal signal depends on the given circumstances and requirements valid within the session and the production considering the tracks present. One can, for example, simply create an identical duplicate of the original track – this is certainly the easiest way to get a parallel signal. Implementation of the parallel signal (by branching off a level component via the Send Routing of the DAW mixer) is somewhat more complex. In any case, you must make sure that both signals are played back identically, both sample-wise and timing-wise; otherwise, you can experience frequency cancellations, resulting from phase shifting. The level of the parallel compression track does not need to be selected at a high rating for the purpose of editing the vocals; just a very quiet intermixture will bring about the desired effects..

    It is not just the effect of compression that is available for parallel editing by using duplicate tracks; this can be done just as effectively with effects added to the original signal in parallel, as well as virtual tape saturation emulation, EQing, filtering and all types of distortion.

    One good example for such parallel editing in the real world of professional music production is the vocal sound of the Nickelback singer Chad Kroeger. When you listen closely, you can recognise parallel editing characteristics on Chad’s lead vocals, on the band’s early recordings (“How you remind me”/Silver Side Up) in particular but also on its later hits (“Photograph”/All the right reasons or “I’d come for you”/Dark Horse). Amp simulations or parallel vocals distorted with bit crusher effects, some of which have a clear tremolo effect, promote and compress the main vocal track and give it an unmistakably rough character with a definite presence despite how it is relatively subtly used.

    Meanwhile, in many compressor plugins, one can find a dry/wet regulator, which makes parallel compression easy, all without complex routing. Thus, the wet value determines the volume of the compressed signal in relation to the unedited original signal.

    Sound Example 13

    Panning – Centre, Left or Right?

    Although in most productions the lead vocal track is placed in the centre, you should not regard this rule as obligatory and may experiment with different positioning options in the stereo field. If you want to especially highlight and emphasise specific sentences, phrases or vocal fills, an unusual panorama position can help to emphasise the desired effect.

    In most cases, doubled tracks, harmony vocals and choir arrangements should be distributed over the stereo field, and this – in combination with the central lead vocal sound – will yield a wide and impressive sound image.

    Stereo Widening with Short L/R Delays

    To give the mono recording of a singing voice a breath of stereo effect and, in this way, be able to make it “wider” in the truest sense of the word, one will normally use a simple Stereo Delay with different values (which can be adjusted independently of each other) for the left and the right side. One will use such a delay in an insert channel of an effect send and will select for each side slightly different, quite short delay values in the milliseconds’ range (e.g. left side 10-15 ms, right side 20-40 ms). The effect value is set to 100 % wet – after all, this is a sound effect expected to edit, in its entirety, the whole signal portion assigned to it. If you send a certain level portion of the central mono original signal into the effect send, then, with moderate levels, it will result in a fine widening of the vocal sound. You might also use a light modulation of the delay times using an LFO – this will create even more authentic beats, and the widening will not be quite as static. The more extreme the choices with the differences in the delay times, the wider the sound will be. Of course, you should pay attention to the phase accuracy of the signal; for this reason, slight send levels, more than anything else, are to be recommended. 

    Ping Pong Delay in Ableton Live

    • If you decide to use a vocal submix, you could also use different submix channels for the different vocal track types. All exclusive lead vocal track types can be merged on a lead vox submix, all harmony and backing vocals on a backing vox submix, and finally all ad-libs on an ad-lib vox submix. However, using  routing this complex only makes sense if you have a lot of individual vocal tracks to manage in your mix.
       
    • For a standard effects tracks configuration, you should create at least 1-2 stereo reverb channels and 1-2 stereo delay channels, and make them available using the send architecture of the DAW. Reverb 1 could be a very short, basic three-dimensionality comprised almost entirely of early reflections, and reverb 2 could be a somewhat longer plate reverb which is matched to the pace of the song and the atmosphere (approx. 1.2-1.4 seconds is a good output value). Delay 1 and Delay 2 each offer a delay that is timed to the song tempo at 1/8 and 1/4, with initially small feedback values which can be more accurately readjusted at a later time.
       
    • Additional standard effects tracks, each with a flanger and a short L/R Delay (left side about 15 ms, right side about 25-40 ms) could also be created and made available; whether or not they are ultimately needed would then be seen during the mix. The quietly mixed delay helps to make the vocal signal a bit wider, and it is especially well suited for refrains. A flanger, when very discreetly used, gives the vocal signal a subtle dynamic sound change and movement. 
       

    Doubling – Rich Vocals Through “Real” Duplications

    A quite popular and at the same time obvious technique for making lead vocals more powerful and present, is to double the main voice. As with guitar recordings, where distorted rhythm guitars in particular are recorded several times as congruently as possible, the method can also work very well on vocal tracks. With the almost identical double track, there are minimal timing and pitch fluctuations, which create a kind of natural chorus effect.

    If both (or more) signals are distributed a little in the stereo panorama, you will receive a powerful and wide sound. In the case of music styles with which the singing is at the foreground of the sound image, in particular, doubling the whole take, or at least individual important words or phrases, is an absolute must and contributes to the genre-typical vocal sound. In rock/pop music and related music styles, as well, doubling is almost a standard recording technique, which is employed in the convincing sound design of refrains in particular. A vocal track used in a refrain can be given the required lift with doubling (or even tripling or quadrupling); in addition, it, together with the textual statement, gains assertiveness and significance.

    Doubling of vocal recordings in Cubase

    Doubling of vocal recordings in Logic

    Doubling of vocal recordings in Pro Tools

    To correctly double vocal tracks, in most cases the same singer should also sing a take which is as congruent as possible to the main track. Of course, there are also techniques and situations in which it makes sense to have the duplications carried out by another voice.

    However, you will normally get the best results if the same singer sings once again in the same pitch and voice “colouring”. The important thing here is that the singer be able to reproduce the phrasing, pitches and timing of the original take as well as possible. Small fluctuations are allowed to a certain extent and even desired; however, it becomes difficult as soon as the timing has actually become something else audibly or the phrasing is not original. With this, duplications no longer offer psychoacoustic support; rather, they are perceived as independent vocal voices, which, in the case of the listener, will only lead to more confusion and irritation in the context of a “powerful sound”.

    When it comes to duplication, special attention is required not just with as-precise-as-possible copying of the original take (i.e. pitch and timing) but also with careful handling of the conspicuous consonants, e.g. “S”, “T”, “K”, “P” or “B”. Even for very experienced studio singers, it is almost impossible to sing the consonants exactly, and congruently one after other, and this can result in strange hissing effects and double transients. When both vocal tracks are played, they stand out as separate tracks – an effect you certainly want to avoid.

    A common solution for this problem is to ask the singer to avoid the sharp consonants in the duplication or at least significantly attenuate the level of them. Of course, this will not come so easily to the singer – it takes quite a bit of experience to be able to sing those sentences which sound so strange. One alternative is to either cut out or hide the disruptive double consonants in question in the editing stage. This is mostly connected with extensive editing work; however, very useful results from doubled vocal performances can be achieved this way.

    Decrease of the sibilants in the doubled tracks (by automation)

    Additional Tips and Tricks

    • With a slight variation of the voice “colouring” used for the duplications, one can create a fine contrast to the lead voice. In many cases, it is advisable to sing the duplications with somewhat less character, i.e. in a more “neutral” way. After all, on a sound level, duplications should compete as little as possible with the main voice.
    • The voice colour variations represent an opportunity to achieve an interesting contrast to the lead vocal voice. The pitch/register is an additional parameter which can be used effectively. This leads to many sound combination possibilities, starting from upper or lower octaves to breathed or whispered duplications. Interesting sound results can be achieved with the help of the latter two things in particular – of course; basically, everything is allowed, and this helps the achievement of the desired sound vision.
    • In music genres in which singing is the dominant element (hip-hop, rap, R&B etc.), in particular, so-called “shouting” is recommended when doing duplications of individual words or phrases, for the sake of giving them more expression and weight. These duplications are usually not deliberately sung/spoken with precision; rather, they are deliberately articulated and phrased in a more aggressive and extensive manner. Thus, they become conspicuously massive and “rough”, and therefore, the duplicates receive something outside of rock/pop music; they receive an independent character and help one to define the recognition value of the refrain or the textual statement. 

    Audio Alignment

    At this time, there are very useful tools, which – mostly in the form of plugins – simplify the required editing and the alignment of the duplicate takes and, in part, even automate it. One of these specialised plugins is VocAlign from SynchroArts. This tool compares the amplitude swings of the individual syllables and phrases (of both the original and the double-track) and corrects them with time-stretch algorithms or automatically set cuts and fades. Cubase has – since version 10 – also mastered this function with “Audio Alignment”. The result is – subject to having enough usable output takes – quite exact and synchronous duplications, which are suitable both for thickening of lead vocals and for creating compact choir arrangements. 

    SynchroArts VocAlign Pro

    Audio Alignment dialog in Cubase 10

    Sound Example 5

    Special Cases in Vocal Production

    We have already looked at which techniques one can use to record most forms of singing, but a “clean” singing take is not always what is wanted, and there are definitely vocal performances which certainly have a musical element but cannot be counted among those using classical singing techniques. In what follows, we would like to look at a couple of these individual cases and discuss the unique features which therefore come into play and should be considered during recording/mixing.  

    Vocals in Hip-Hop and Rap

    Chant-oriented music styles have developed into very influential genres over the past decades. In the case of hip-hop and rap, rapped vocals stand in the foreground very clearly as they define and shape the music. Given the commercial success of these genres in recent years, this principle has also been reflected in other styles of music. As a result, rapped vocals can be repeatedly found in many styles of today’s black music, R&B and nu-soul, but they can also be found in indie rock, EDM and nu-metal. In addition, there are naturally also many special kinds of rap in modern hip-hop and pop derivatives, such as trap, emo or cloud rap. This type of singing has long since developed beyond the narrow genre boundaries of hip-hop music and is now indispensable as an independent element of modern pop music.

    When it comes to recording singing, first and foremost, the same basic principles apply as those applied with the recording of conventional sung vocals. Nevertheless, spoken vocal content automatically becomes more clear and immediate at the centre of the listener’s attention, which is why a number of things need special consideration and observation during the recording and editing of these performances.

    Presence, Clarity and Assertiveness of Rap Vocals

    If, when dealing with sung vocals, one often strives to “embed” them (in the truest sense of the word) in the overall sound structure of the mix and the arrangement of the other instruments as harmoniously and elegantly as possible, then the opposite is usually true in the case of spoken vocals. Vocals in hip-hop/rap should, first and foremost, be clear and distinct, and easily understandable, with a presence in the foreground of the mix. Rap music is usually characterised by an overly present “in your face” character in the vocals, which also has much to do with the underlying aggressiveness and energy of its artistic performances, and, in many cases, with the textual statement as well. The vocals, therefore, should have an unrivalled presence at the foreground of all musical events – the accompanying arrangements are often very strongly reduced, with an absence of showy and grandiose individual instrumental performances, in order not to distract from the textual statement of the vocal performance. 

    All recording preparation and subsequent editing should be subordinated to the goal of guaranteeing a strong concentration on the spoken vocals and the textual content:

    • When selecting a microphone and EQ-ing for rap vocals, the classic focus is primarily on the presence area and the highs, because this is where the intelligibility of the speech and the perceived proximity to the listener are represented. Consequently, in rap recordings, mostly rather bright condenser microphones are used, and the height range is also often raised by another few dB. The fundamental range is mostly maintained in proportion to this. Naturally different genres, and even speech, will differ from each other a bit here – for example, dynamic microphones are more frequently used in America than, for example, in Germany, and with genres such as trap or cloud rap, speech intelligibility is also often seen as really not essential any more; which is why a somewhat duller and muddier vocal sound can definitely work with these styles.
    • Even during the recording stage, you should keep the voice is as dry and direct as possible, without much space. There is a very simple reason for this. Spoken vocal lines have a significantly larger number of words than sung ones. If you use long-lasting room ambience or even reverb here, words – and, along with this, the textual statement and, last but not least, the energetic “punch” and aggressiveness of the performance, would inevitably and literally sink into the reverb. Words or parts of words would be concealed; the entire expressiveness and ultimately, the decisive style-defining characteristic of the genre would suffer if you were to work with too much space. For this reason, an atmosphere of a (preferably small) recording area which is as dry as possible – ideally acoustically optimised in a truly “compact” way – is a recommended spatial environment when recording rap/hip-hop vocals. A good mix of absorbed space and high sound diffusivity will also deliver very good results here.
    • The subsequent use of effects is rather limited in classic hip-hop genres. Reverb here is mostly taboowhen you are essentially working with short and subtle delays or ambiences, to give the speaking voice sufficient dimension and liveliness. However, often the recorded surround sound of the recording room alone is sufficient to ensure the realisation of this characteristic.
    • In some more modern stylistics, such as cloud rap, emo rap or trap, on the other hand, effects are sometimes employed very unsparingly – in addition to the excessive use of autotuning, very clear reverberation rooms, delays, vocoders and distortions are also often used in this regard. On the one hand, in these styles, speech intelligibility is often no longer as crucial as it was in the more classic hip-hop genres; on the other hand, relatedly, there is usually more time between the text lines, allowing the reverb and echo tails to develop quite well without having to compete with the text all that much.
    • Naturally, compression also plays a major role with vocals in this genre; however, the approaches are not fundamentally different from those for sung vocals. However, in the case of rap vocals, it is mostly a somewhat clearer and more audible compression effect that is used, to give the voice even more presence in the foreground. The time parameters of the compressors tend to be set somewhat lower, thus allowing for even short transients in the speech to be kept under control. But here, too, excessive compression should not be the norm. Pressure, punch and aggression in the vocals come mostly from the performance of the rapper – these things are not achieved artificially through extreme compression.

    Treatment of Ad-libs, Shouts and Doublings

    Alongside the important main raps in the singing performance, which can be compared with the lead vocals of the singing performance, there are two more important stylistic elements in hip-hop/rap:

    • Duplications are, as previously indicated, particularly crucial in this genre. With their help, vital textual statements and emotions are impressively reinforced. Sometimes even whole song parts (refrains, hooks) are completely duplicated; in other cases, only certain individual phrases or words. Experienced rappers are very accomplished at performing duplications as precisely as possible, and one should also give “real” duplications absolute preference over artificial duplication of the lead track. Duplications which are deliberately performed in a different sound/voice can be particularly effective. For complete duplications of an entire refrain, one can also use duplicated versions of the parts rapped a whole octave higher or lower. The technique of breathing/whispering a duplication as much as possible is equally widespread. Strongly compressed and mixed under the actual lead, these whisper duplications promote the main singing element, giving it a particularly powerful and assertive sound.
    • Just as important as duplications are so-called ad-libs or shouts, which are called out, “groaned” or mumbled in many extra tracks in the remaining gaps of the lead rap (“Yeah”, “Aha”, “C’mon”, “What?” etc.). These short phrases are supposed to stoke the rapper and maintain the flow of the overall performance; they bridge gaps in order avoid breaks, and can intensify the whole atmosphere and message of the performance making it all the more expressive. In this respect, these short insertions are not just nice additions to the actual vocal performance they are an essential component of hip-hop culture and an indispensable singing element. In terms of sound technology, these ad-libs are often alienated at the level of sound, to differentiate them from the actual rap vocals. Alongside the somewhat old-fashioned telephone effect, all types of distortion/bit crushers and other distorting sound destroyers are of course suitable for this. However, often, just the selection of a dynamic microphone or a matching EQ can deliver the desired effect.

    Beatboxing

    Beatboxing is a very independent art, which involves using the human speech apparatus as a musical instrument. Culturally, beatboxing is closely tied to hip-hop culture – but forerunners or related fare can also be found in scat singing, and partly in blues or jazz and even in pop music – one of Michael Jackson’s trademarks, for example, was embellishing his vocal performances with percussive sounds. It was not until the 80s; however, that beatboxing as an independent art form came about – to this day it is done competitively in battles and competitions and is used in many different music styles.

    Beatboxing is about generating percussive sounds with the help of respiratory and vocal apparatus; sounds which are supposed to resemble those of a drum kit. On a musical level, the beatbox – when pulled off and mixed properly – can also pass as a perfect substitute for classic drum sounds. Beatbox elements are even more common as a sound addition, in order to give the rhythm section a bit more of human touch.

    Incidentally, the term beatbox derives from a slang term for drum computer. However, beatboxers of course sometimes also use their vocal cords and use resonances of the mouth and throat area to create tonal sounds. Basslines and synthesiser sounds are imitated in this way, and naturally, speech or singing inserts are also allowed in a beatbox performance.

    Miking

    Since beatboxing is not about capturing the voice as sound-neutral or as natural as possible, but rather about noises which (in part) require relatively strong acoustic manipulation in order to generate the desired sound, with many beatboxing techniques the use of the microphone is a part of the performance; on a sound level, it is indispensable. For this reason, it is important for the sound engineer to know that many beatboxers use the microphone specifically to generate or amplify certain sounds.

    In almost every other case, a vocalist will provoke the ire of any sound engineer if they wrap their hand around the hilt of their handheld microphone. This leads to violent resonances in the closed hand, and backward sound cannot enter the microphone capsule, which adversely affects the directional characteristics of the microphone. However, what is normally considered a no-go, is often specifically employed in beatboxing to generate a powerful and resonating sound.

    In addition, in beatboxing, microphones are often held not only at the mouth – sometimes they are also held on the cheek, the larynx or the nose – depending on which sound characteristic or resonance is to be emphasised.

    Since most singing techniques work fully independently of any kind of miking, while  recording the sound engineer can place any microphone anywhere where they will not disrupt the performance, and capture a good signal. In the case of beatbox recordings, however, this approach will not lead to satisfactory results in most cases.

    In order that the beatboxer may be able to integrate the microphone as part of their performance, some typical approaches to vocal recording must be discarded here. The following key points should be noted:

    • Beatboxers must always be able to hold their microphone in their hand, so that they can freely select the vicinity and placement!
       
    • Considering the purpose, the otherwise popular large-diaphragm capacitors are out of the question with beatbox recordings!
       
    • Dynamic handheld microphones with integrated pop protection are available! The classic: the Shure SM 58.

    Processing

    Beatboxing has a completely different function in a piece of music from that of singing or even rap; for this reason ,it normally also requires a different approach in the post-editing stage. A few leads:

    • Given the high proportion of percussive sounds in the close-up range of the microphone, sometimes one can expect unusually high levels. With this, during levelling, sufficient headroomshould be planned.
       
    • Since the powerful, bass drum-like sounds feature tonic keynotes far below the regular vocal range, it is necessary to be careful even if  you’re only using lowcuts. In these cases, beatbox samples are better without an impact sound filter, and even in the post-editing stage, the lowcut should be positioned carefully.
       
    • In beatboxing, breathing, smacking and similar noises are also used as sound effects. Some of these noises are relatively quiet and need to be raised with the help of relatively strong compression with short release times.
       
    • Sounds from the mouth and throat area are often not as percussive as drums; it may be necessary to do a bit more work with the transients in order to generate the desired “gaudy” sound. Thus, during compression, the attack time should not be too short, so as not to lower the transients. Additionally, transient designers (for example) may be used to give a beatbox performance that little something extra.
       
    • When using creative modulation effects, delays, vocoder, reverb, distortions and the like, creativity in a beatbox performance is in no way limited. It is, then, by no means the naturalness of the signal that is in the foreground, but rather the end justifies the means. However, as in normal vocal recording, it is more advisable to record the performance dry and add effects only afterwards, so as to avoid any unwanted results in the mix stemming from previously-used effects.

    Beatbox and Loop Stations

    Many beatboxers also work with a loop station and, if appropriate, additional effect pedals, so that their live performances can be accompanied by singing, rap or other percussive elements. One well-known representative who uses this approach with beatboxing and singing is the Australian Dub FX. In addition to the human voice, one can of course also frequently use other instruments, and these days many loop musicians also rely on a computer-based workflow, for example, with the help of Ableton Live.

    Here the focus clearly lies on live performance, and often a loop station artist is regarded as a one-man-band. During recording, in such cases, one should generally also be prepared to do some live work in order to maintain this sound. Since obtaining cuts from different takes is often difficult to impossible because one loop accompaniment usually sounds different from other takes, it is common to just let the artist perform live in full and simply select the best take from many.

    Different from a conventional recording situation, for loop artists the use of effect devices is also usually just part of the performance; thus, it is also advisable to record the performance “as a whole”, that is, including the effects. However, depending on the setup, you can use a signal splitter at strategically sensible points in the signal chain (e.g. in front of an effects pedal), so that, during post-production, the effects-free version can be accessed if necessary. 

    • Background voices (duplications, backing vocals, ad-libs) are usually somewhat more dully mixed than lead vocals so that they are more clearly separated from each other and do not compete. In addition, background voices tend to be somewhat more strongly compressed and contain a bit more space. During the recording, it can make sense to select a duller microphone and/or a somewhat greater microphone distance for background voices.

    Growling, Grunting, Screaming & co.

    In this chapter, we will summarise some different particular forms of singing that are found in pop music – in death and black metal in particular, but also, in part, in other sub-classifications of the metal genre, as well as in grindcore and in the industrial range. One thing common to all of them is that the human voice is not sung or spoken “cleanly”; rather, it is distorted or manipulated through the conscious formation of the larynx. However, the origins of larynx techniques in sound generation are clearly older than all known genres; to this day they can still be found again and again in the folk music of many different cultures (including often in a shamanic context). Throat singing techniques are known by, for example, the Inuits, the Sami, the Mongols, and the Tibetans, but such singing techniques can also be found among the Xhosa in South Africa, or in alpine yodelling.

    Larynx Singing

    While it is mostly the vocal cords that are responsible for sound formation in speaking and singing, in guttural singing the so-called vestibular convolutions above the actual vocal cords are used. By narrowing the larynx, these false vocal convolutions end up vibrating, and a strange, low-tone noise arises – it can be willingly formed in words with the formation of sounds in the oral cavity.

    Guttural singing techniques require a bit of practice before they can be used in a controlled way. In addition, when larynx singing is attempted unprofessionally, there is a not-inconsiderable risk! Anyone who wants to learn, for example, growling or screaming, should definitely seek a professional teacher who specialises in such singing techniques, or else as a result of incorrect larynx singing the vocal cords can become permanently damaged!

    Recording Technology

    When recording growls, screams, shouts, grunts, squeals and the like, there are not too many special features to consider; nevertheless, not every approach in traditional vocal production will achieve the desired result. There follows a couple of aspects which one should observe during production:

    • Given the sometimes high sound pressure and the sound “colouring” in this genre that is fully desired for this purpose, you should obtain a dynamic moving coil microphone for recording these singing techniques! In this context, the Shure SM7B, in particular, is very popular – it has an extremely detailed and balanced design for the conditions of a dynamic microphone.
       
    • With screaming, growling and grunting in particular, an extremely powerful and solid sound is desired. With this, it makes sense to take advantage of the microphone’s proximity effect, to achieve an increase in the depth and give a close impression.
       
    • With larynx singing, the keynote is often significantly lower than it is in the case of a traditional singing performance. With this, the low-cut should always be handled with care, so that these fundamental sounds are not affected.

    The following sound examples show quite clearly how great the influence of the microphone and the recording distance is. Let us start by listening to the SM7B and comparing two different distances during a recording: the first recording was made with approx. 2 cm distance; the second was increased to approximately 15 cm. The sound-distance result of the close-up effect is conspicuously audible in the bass range:

    Sound Example 16

    Growling_SM7B_close

    Audio Player

    00:00 | 00:08Use Up/Down Arrow keys to increase or decrease volume.

    Growling_SM7B_far

    Audio Player

    00:00 | 00:08Use Up/Down Arrow keys to increase or decrease volume.

    Comparison of the frequency curves with different recording distance: 2 cm (green) and approx. 15 cm (red)


    For the sake of comparison, we did the same test setup using a second microphone that is also very popular: the Neumann U87 AI, a large-diaphragm condenser microphone which is very popular worldwide, especially with high-resolution voice recordings and singing voices.

    Growling_U87_close

    Audio Player

    00:00 | 00:08Use Up/Down Arrow keys to increase or decrease volume.

    Growling_U87_far

    Audio Player

    00:00 | 00:08Use Up/Down Arrow keys to increase or decrease volume.

    As can be clearly seen, the more detailed and significantly lighter recording with the U87 in this context is rather unsuitable in comparison with the dynamic version. The frequency response does not fully meet the expectations expected in a growling recording, and quiet mouth sounds and breathing sounds are also louder in the foreground. What is particularly striking is that, with increasing distance, the depth pressure decreases even more strongly than with the comparable recording with the SM7B.

    What, however, makes rather thin and depressurised recordings in this recording situation – and even captures unwanted sounds in part – may naturally be completely desired in other cases. Once again, this clearly shows one thing: there is no “one size fits all” microphone for all cases; rather, the decision must be individually made appropriate to the desired sound impression.

    Comparison of the frequency curves in the proximity range: SM7B (green) and U87 (red)

    Choral Vocals

    Recording choirs is significantly different from recording individual voices. This is due mainly to the fact that, in a choral situation, it is not the voice of an individual singer that is in focus; rather, many singers together form a large singing ensemble, and it should be depicted as such.

    Choral Lineups

    Most choral pieces are written for four voices, meaning that there are four separate musical voices, and the respective bits are to be sung jointly by multiple singers. The four voices, from highest to lowest, are soprano, alto, tenor and bass. The situation being what it is, for each voice there will be a smaller or larger group of people in the room. The four different voice groups can, in turn, be set up in different ways. This spatial distribution is a decisive factor with regard to the resulting sound and recording options.

    As with any larger, acoustic sound source, a main microphone forms the basis for recording a choir. This is usually located at the conductor’s desk or a few meters behind it and is often using a classic stereophonic method such as the AB, XY or ORTF setup. In exceptional cases, however, more unusual variants such as Blumlein or dummy head systems can be found. When producing for surround sound (e.g. in film music), multi-channel methods are often used. In any case, the goal of main miking is to pick up the entire ensemble as naturally and completely as possible.

    In addition to the main miking, spot microphones are used where necessary and possible, so that individual voices can be brought out or individually processed in the mix.

    The main choral lineups are as follows:


    Choral lineup variant 1: (from left to right) soprano, alto, tenor, bass

    This widespread lineup essentially resembles the classic American string lineup in an orchestra or a string quartet. The highest (and, in most cases, melodic) voice is on the far left, followed by next-lower voices; the bass, the deepest voice, is, accordingly, the farthest one on the right.

    This lineup has the technical advantage that the individual voices can be very easily recorded with the use of support microphones. This can be applied with an individual condenser microphone with cardioid characteristics, or with a compilation of subjects, whereby each voice receives its own stereophony. The supports are distributed according to the panoramic spatial layout.

    Distribution of the voices from left to right, however, will result in a slightly uneven frequency distribution. Such a condition is not desirable in a music mix. For one thing, consumers sometimes listen to music only with one speaker/earphone or are close to one of the two speakers; this means they “miss” either the bass or the melody voice. Also, an unequal energy distribution will mean an unbalanced utilisation of the technical devices, which can lead to difficulties in the mastering process.

     Choral lineup variant 2: tenor and bass behind

    The second choral lineup variant can be frequently found with, for example, mixed men’s and boys’ choirs, where the men, with their deeper voices, are in the back rows and at the back, they sing over the heads of the boys.
    This two-row lineup has the advantage that the energy is distributed somewhat more evenly in the panorama.

    However, here the voices can no longer be miked so easily with spot microphones – for example, at the soprano spot mic, a certain portion of the tenors will also inevitably be heard if the microphone is positioned at the front. One good option in such a case is to position the microphone above: using high tripods, trusses and the like, spot microphones can be lowered from above into the centre of any singing group. If small membrane capacitors are used, often these can even simply be hung on the cable.

    Choral lineup variant 3: tenor and bass in the centre

    In this third option, only bass and tenor are set up in two rows, leaving the deepest voices in the centre. This also results in relatively uniform energy distribution in the panorama. Soprano and alto are antiphonic, and it is easier to perceive them separately from one another given the spatial distance. This lineup has parallels to a German symphony orchestra lineup, and it is especially suited for pieces in which soprano and alto are interrelated.

    Soprano and alto voices can be recorded from the forefront relatively well using spot microphones. Bass and tenor can be supported either together and from the front or individually and from above.

    There are also choral formations in which the four singing groups are mixed. Normally in these circumstances the voices alternate, so that no soprano stands next to a second soprano, no tenor stands next to another tenor and so on. On a sound level, this has the advantage that the voices merge with each other, thus leading to a more homogenous sound image. On the one hand, this suits the sound engineer, because the panoramic frequencies are evenly distributed, and the sound image in the mix does not “tilt” to one side (highs or lows). On the other hand, with a mixed setup, it is not feasible to place spot mics for the individual voices. Thus, a lot more attention is required to achieve the desired sound in the main microphone, since separate processing of the individual voices in the mix is not possible.

    Of course, there is also choral literature in which more than four voices are sung. If necessary, a group of voices can then be divided into voice types; there may be resulting “voices within voices”, for example, mezzo-soprano (between soprano and alto) or baritone (between tenor and bass). The selected lineup depends on the literature and the desired audio image.

    Those in charge of the recording should always, prior to recording, determine which voices are set and which lineup has been selected. Not only does this facilitate communication, but it is also, more than anything else, relevant to the lineup of the microphones and the selection of the acoustic measures. Normally those in charge of the recording will have a copy of the score in hand prior to the start of the recording session, and the choir director will be consulted beforehand, to get the best result.

    Acoustics in Choral Recording

    Just like with every larger sound ensemble, a sufficiently spacious environment is not just important for accommodating the large number of people. Above all, a large and high space is also needed so that the choir can properly develop acoustically. Just like with a classic orchestra, the space is an important part of the desired sound character. The space and the sound character of a choral recording, however, naturally also depend on the musical context.

    Classical Music

    In classical choral recordings, the aim is usually a large, comparatively reverberant and ambient sound. The different singing voices should blend together. Accordingly, a particularly large space with a high ceiling and a long reverberation time is best suited for this. In many cases, good-sounding church vaults are the right acoustic environment; naturally, these are especially suited for sacred music in particular. For more secular works with a large chorus, the best choice would be concert buildings which normally combine a long reverberation time with more controlled acoustics. Room microphones are a good choice in addition to the main mics, for capturing the character of the space and can be added to the mix as needed.

    Jazz/Gospel

    Gospel choirs, too, have a sacred background; however, with gospel choirs, the sound image is usually not as large and reverberant as is the case with classical music. Instead of large stone cathedrals, somewhat smaller churches with a large wood interior (for example) are more suited for this. In this musical context, choirs usually have more emotion, and sometimes there will also be dancing and clapping during the performance. This should be considered and, if appropriate, discussed with the choir director prior to the recording. Light footwear and airy cotton clothing can, for example, reduce disruptive noise during dancing or swaying.

    In gospel music, there is normally a lead voice which – in the African-American tradition of the “Master of Ceremony” – initially sings a line which is then repeated or answered by the choir. This soloist naturally has a prominent position and must be microphoned individually.

    Choirs in Pop/Rock

    If, in pop and rock music, there is work with done with choirs, then said choirs tend to play mostly subordinate roles; they are often expected to give the band arrangement and the lead singer(s) a certain level of sound “lubrication” and ensure a greater sound image. In such circumstances, a larger recording room in a recording studio – one which is rather dry compared to a concert hall or even a church – is mostly used. Nevertheless, to avoid inconvenient echoes on nearby walls, here too, spaces with a floor space of at least approx. 100 m² and a ceiling height of more than 4-5 metres are advisable. With this, a certain amount of space is also important here, but additional reverberation is usually artificially added.

    A Cappella Band

    An a cappella band is not a choir in the proper sense of the word. They are a group of singers, but not one with any intention of merging as a unified sound ensemble. Rather, an a cappella band must also actually be understood as a band in which the individual members take on equally weighted and unique functions in the piece. This also applies to smaller vocal ensembles in which every singer gets their own voice, just like in an American barbershop quartet. In this case, a single microphone arrangement is the best choice, so that one can have sufficient control over all of the voices. This applies in particular if other instruments are to be imitated with the use of a voice. In a studio, overdub productions are preferred, for the purpose of enabling maximum separation and control.

    In the case of popular choirs, shanty choirs, singing clubs, etc., depending on the repertoire, the somewhat more controlled sound conditions which one would also choose for a jazz or pop choir, also mostly apply. If more space is wanted, this will normally be achieved with artificial reverberation spaces.

    If a stronger acoustic separation of different vocal groups is desired, this can be achieved with a change to the lineup. Often, a simple solution is simply to place the groups a bit further apart from each other. If complete separation is desired, then an overdub recording is best. For the purpose of better control in the mix, especially in the context of popular music, mixed choirs are often recorded separately by gender. However, a separation based on vocal groups can also naturally be made. The overdub process always has the advantage that more attention can be paid to the performance of the individual. In this, a more precise representation is normally possible. However, an overdub recording is also, naturally, considerably more time-consuming, and, above all, the sound results with an electrically or digitally summed up choir recording in parts will never be as natural and organic as a real lineup in a big space. However, this can, for example, be counteracted a little, by spreading the vocal groups in the space as they would be in a joint performance. Whereas one would hardly be likely to decide on a shared choir in the case of a classical or jazz recording, this technique is more commonly found in the rock-pop genre.

    Soloists

    In many situations, it is not only a choir (as a joint sound ensemble) that sings; instead, individual soloists fulfilling a solo function are employed. This may be an individual singer who interacts with the choir or who is accompanied by the choir. However, it may also be individual choir members who will step out from the choir ensemble for specific passages (usually also physically) and who temporarily take on a lead voice.

    The soloist should always be given their own support microphone, which allows for better control of this especially important voice in a later mix. Even if the soloist has a well-trained singing voice (in the classical context) which requires no technical support, an independent signal is still worthwhile when recording in order to later be able to mix the choir and soloist as independently from one another as possible.

    If the solo is being recorded at the same time, it can be spatially separated, in order to avoid crosstalk between the microphones. You can use a separate recording room (best if within line of sight) – with the use of partitions/gobos, or you can create an acoustic separation in the same space, or crosstalk can be minimised through appropriate spatial arrangement combined with suitable directional characteristics. A feasible example of the latter would be to use a main directional microphone for the choir and position the soloist behind it.

    Exactly which type of lineup and acoustic separation is selected, will naturally depend on the spatial and technical conditions and the desired sound image. In any case, it makes sense to think of the mix, and to direct the sound image of the space in the right direction, even during the recording stage.

    If it is a matter of temporary soloists who emerge from the choir for a solo passage, then they should also be given their own microphone, which, in the mix, will typically also be used only for the corresponding passage.

    Monitoring

    In the case of a choir, monitoring can be quite a challenge. As a matter of principle, individual headphones for every choir member – especially considering the high cabling requirement – should only be used if it is necessary on a musical and technical level or is at least sensible.

    Without Headphones

    A choir recording in its classical sense is normally a common live performance which, at the same time, takes place in one room. In such a case, headphones are not only unnecessary, but they are also even capable of being disruptive to the performance – ultimately, all choir members want to be able to hear the voices of their colleagues, and any accompanying instruments, just as they are accustomed to hearing them in a live performance or rehearsal. Given this, all that is really required is a communication path from the control room to the recording room for talkback. A single speaker in the recording room is fully sufficient for this so that those in charge of recording can communicate with the choir between the takes. Alternatively, or in addition to this, a single headphone path can be established for the conductor/choir director. The technical and musical directors can communicate with each other in this way without having to involve the entire choir. 

    With Headphones

    If a choir is recorded for an overdub production, then, naturally, each choir member must also hear the remaining instruments during the singing. This is important not only for the timing but also – and more than anything else – for the intonation! Since the human voice allows for free intonation, it is of enormous importance that singers simply hear a reference – for example, a piano accompaniment – so that they can adjust their own pitch to it.

    In this context, it, therefore, makes sense to give every individual choir member their own headphones. Of course, this poses a certain challenge for the sound engineer, since it requires many headphones along with a correspondingly high number of headphone amplifiers. Fortunately, the different choral singers normally do not require a separate mix, which means work can be done with a single stereo headphone mix, which is then distributed to each singer through headphone amplifiers. Singers often find it useful to wear the headphones on one side only, so that they can better hear the rest of the choir and their own voice in the room.  Some studios have special headphones for this purpose, with an earpiece on only one side. A standard stereo headphone can, however, also be used by sliding one side of the headphone off of one ear. 

    In certain circumstances the choir director may be granted an independent headphone mix, to allow for direct communication between director and engineer and/or to give the director other volume ratios or, for example, a metronome click on the headphones.

    By the way, it is not necessary to give each choir member a high-priced pair of studio headphones. A slightly cheaper, durable and lightweight model is normally sufficient. For psychological reasons, and in the interest of not slowing down the session unnecessarily, it actually makes sense to be able to offer 40-50 headphones of the same model for choral recordings. With this arrangement, no-one feels disadvantaged, and there are also no delays resulting from different preferences. However, it makes sense to equip soloists, conductors, producers, arrangers and of course sound engineers with better headphones, since these individuals require a more detailed playback considering their unique role.

    Since the wiring requires a fair bit of time, the exact number of singers should be queried in advance, and the lineup should be accordingly planned so that all the cabling can be prepared before the choir comes into the studio.

    With Loudspeakers

    As already mentioned, in a choir only production, it is possible to work with one simple speaker in the recording room which is used only for the purpose of communication between the takes. Of course, the speaker must be switched off during recording, and under no circumstances should choir microphones be played back via the loudspeakers, in order to avoid unnatural discolouration and feedback.

    There are also methods in which choir monitoring can be achieved with the help of speakers. However, in such cases, it is necessary to ensure that as little sound from the speakers as possible is recorded by the microphones. Since, however, a certain level of crosstalk is always inevitable, and the sound quality of the recording ultimately suffers as a result of it, these should be considered “emergency solutions” which are only to be applied if an individual monitoring solution is not technically feasible.

    If you must use loudspeakers for monitoring, you should, above all else, take advantage of the directional characteristic of the microphones. For example, one possible solution is to install loudspeakers on the ceiling (for example, on a traverse or the like) and provide monitoring for the choir from above. The microphones are then also suspended from above. If you are using directional microphones with cardioid features, a relatively clean choir signal can be recorded. With this setup, however, the main microphone, or even a room microphone, would contain a great deal of crosstalk from the speakers and, as such, probably be useless. Naturally, placing the loudspeakers in front of the choir is also conceivable, if miking with cardioid features is also to be done from the front. One great advantage with sounding from above, however, is that the choir itself absorbs the sound energy from the loudspeaker quite efficiently before it is reflected in the direction of the microphones again. Ultimately, the sound waves need to travel longitudinally between people to the ground and then take the same route back. The following graph does an exemplary job showing this setup:

    Choir recording with monitoring and miking from above

    This technique is frequently employed in the events’ sector since it combines the advantage of a free stage with a relatively low cable demand. During live performances, one can also assume that the choir itself is already relatively indirectly filling the space with sound; meaning that the main microphone and/or a room microphone may, depending on the circumstances, be unnecessary, and direct miking of the different voices would be sufficient for the live mix. For a studio recording, on the other hand, this would be a rather inadequate solution, as the main microphone and room microphone arrangement is hardly feasible in this kind of setup without there being substantial crosstalk. 

  • Embedding in the Mix of the Song

    After the recorded vocals have been viewed in the editing stage, the best versions (takes) have been chosen mostly in consultation with the singer and/or the entire band, and the final vocal tracks have been created by cutting different passages of different takes into the sequencer (comping), the next step is to embed these final vocal takes together within the other instrumental tracks of the song.

    During the mixing stage, first, the volume levels of the individual signals, and their distribution in the stereo panorama are approximated, up until the creation of a preliminary mix – the so-called rough mix.

    This signifies the creation of an approximate version of the song with all instrumental and song recordings which have been declared good enough; however, the fine-tuning of sound and dynamic ratios and the dimensional and spatial positioning of the individual elements are still lacking. As part of this, the individual tracks are processed more or less intensively in the mixer, with EQs, filters, compressors, limiters and all manner of spatial effects, such as reverb, delay or modulation effects like chorus, flanger, phaser, ring modulator etc., with the objective of creating a coherent and balanced overall musical performance from many tonally-independent individual signals.

    Mixing is thus, a very creative, while at the same time subjective process. This difficult, but also exciting, task involves developing one’s sound concept of the song and realising it through the appropriate processing of the individual tracks so that, ultimately, the “song” comes across as a meaningful and consciously- created sound compilation. As far as the recorded vocals are concerned, in most cases, this means working with EQs and filters, shaping the frequency response of the signal so that the vocals come close to the particular individual sound concept while at the same time being capable of conveying the emotional and artistic message of the song in a transparent and coherent manner.

    The use of compressors can help to keep the vocals stable and clearly audible (in the foreground of perception) by narrowing the dynamic differences so that they can prevail against the volume of the accompanying instruments. Editing with spatial effects like reverb and delay lends thee-dimensionality, depth and dimension to a (mostly) very dry voice with no noteworthy reverberation, and combines it with all additional instrumental elements by simulating an everyday spatial environment.

    Additional fancy effects can (optionally) serve to give vocals unusual tonal aspects which emphasise the artistic statement of the song/text and can ensure the listener’s attention, as well as lend a high recognition value to them. The options available for processing vocals during the mixing phase can be very different from one case to the next; they depend a great deal on the prerequisites/requirements of the song, the production and the desired sound vision.

    For this reason, it is difficult to give hard and steady guidelines for processing vocals in a mix: all additional decisions are subjectively and artistically motivated, except for a few purely technical measures aimed at improving the sound quality or embedding and combining it with other signals.
    Generally, it can be said that it is worth striving for the best possible recording result by selecting the right microphone, preamp, recording space and, most importantly, with regard to the actual performance of the singer/musician which then requires very little technical post-processing.

    Laborious subsequent EQing with the aim of optimally shaping the frequency response of the signal can possibly be avoided during the recording stage by merely selecting another microphone with a more suitable sound, or experimenting with the mic position or the spatial environment, etc. The less technical editing is required in the subsequent mix, the better the mix result and the general sound quality of the finished song will be – excessively dynamic (compressor), or spectral processing (EQ) will only reduce the original quality of the recorded signal more and more.

    For this reason, experienced recording engineers try to capture the best possible sound by means of microphone selection and the best possible combination of preamp settings and spatial positioning: a sound which, in the subsequent mix, needs only to be minimally edited, if at all.
     

    “We’ll fix it in the mix” should not be the motto when recording, least of all when it comes to recording the human voice. True, digital post-processing offers all imaginable possibilities for “fixing” poorly recorded signals; however, an excessively processed audio signal will not sound better by any means. The original quality deteriorates with every necessary intervention – especially when it comes to drastically necessary post-processing. For this reason, when recording, pay attention to and make sure that the best possible signal is recorded; a signal which, ideally, already sounds just like you would want to hear it later on in the mix. Careful preparation, selection and positioning of the microphone, and in particular, a convincing and meaningful performance on the part of the singer/musician all implicitly contribute to the quality of the recording result.

    The Equalizer (EQ)

    The classic tool for adjusting the frequency response of an audio signal is the equaliser (EQ) – the German term (“Entzerrer” – entzerren = deskew, rectify, straighten) describes its original purpose very well: in the early days of recording studio technology, it was predominantly about adjusting the frequency response of the audio signal which was often rendered unfavourable by the technical limitations of the recording conditions (microphones, preamps, spatial resonances) and compensating for any existing overemphasis or “dips”. A parametric filter EQ, in particular, allows for the raising or lowering of one or more frequency ranges, whereby these can be determined more or less flexibly and freely.

    The EQ is, therefore, also one of the most essential tools used for tonal adjustment and optimisation of the vocal signal (technical sound processing). It is basically about maintaining a frequency response for the embedding and assertiveness of the vocals in the overall mix which should be not only as balanced as possible but also attractive.

    The EQ and its individual filter variants are classic insert effects which are fully integrated into the sound control of the analogue or virtual-digital mixer channel. The signal present in the channel is fully implemented by the respective effect, edited by it and afterwards once again transferred to the signal flow of the channel. Insert effects usually have their effect on the entire signal; so that it usually is not possible to edit the frequencies just “a bit” through EQ usage. (Exceptions: effects with a dry/wet regulator which thus enable parallel processing. Compared to this, classic send effects affect only a freely definable share of the original signal, but they edit these on a separate effects channel and are added to the raw portion of the signal.
    Classic send effects include, e.g. reverb and delay – the usual insert effects include, in addition to the already mentioned EQ, all tools used in dynamic editing (compressor, gate, expander, limiter).

    Low-Cut Filtering

    Below about 80-100 Hz there are no relevant frequency components in a vocal signal. Anything recorded in this range would conflict with the frequency components of low-frequency instruments such as bass, bass drum, deep guitars or keyboard instruments etc. Low-frequency rumble or similar sounds are often recorded – having been picked up, for example, by impact sounds.

    For this reason, it is recommended that this frequency range be (more or less) cut out. This task is best achieved with a low-cut filter. An alternative description for the low-cut filter is a high-pass filter – the way it works is already signified in its name.

    This filter allows high frequencies above a freely chosen cutoff frequency to pass unhindered; it dampens the low-frequency components below the cutoff frequency. This damping can be varied by adjusting the so-called slope. Common low-cut filters are, in most cases, integrated in the EQs themselves, and can be switched on or off at the push of a button. The slope offers the option to lower the signal below the cutoff frequency by 6, 12, 18, 24, 36 or even 48 dB/octave. Steeper curves can sound slightly unnatural, with the signal in the bass being completely cut off.

    On the other hand, softer and more gradual curves (6, 12 or 18 dB/octaves) are often to be recommended. One can select a value between 70 and 80 Hz (for men) and approx. 100 Hz (for women) as a cutoff frequency for vocals; however, you should also always, through attentive listening, check whether there are any audible differences resulting from the use of the filter. If there are, in most cases, this means that the cutoff frequency has been set too high.

    The filter is supposed to eliminate only unwanted rumbling and humming, and undefined bass and sub-bass segments effectively.

    Steinberg frequency

    HOFA IQ-EQ

    FabFilter Pro-Q 3

    EQ Processing in the Bass Range

    The bass range contains the frequencies which represent the “belly” and the fullness, as well as the fundamental notes of most voices and their vowels. There are no categorical recommendations or rules for dealing with this frequency range.

    In general, vocal signals can often be slightly increased by a few dB in the range of around 120-150 Hz, to lend the vocals more “weight” and “fullness”. However, you should refrain from raising anything if the frequency response of the microphone signal already shows any aspects of overemphasis in the bass range. To check this, in addition to listening closely, it is also worth having a look at an analyzer, which depicts the frequency range of the signal in a graphic representation.

    Use of an analyzer can be good when it comes to evaluation and assessment of the various signal frequency ranges; however, one should not use it “blindly” (or rather deafly:-)) and rely only on its visual representation. An acoustic assessment of the signal – under sufficiently good listening conditions (acoustically optimized monitoring room, transparent sound image and impulse behaviour of the monitor) – is very important, since it sharpens your hearing and helps you to gain experience in dealing with audio signals. However, you should not waive the help of a graphic analyzer as an additional method for checking the signal. When assessing very low frequencies, in particular, you often need a visual depiction of the situation. This is because many speakers within the acoustic near and mid-field cannot reliably depict the bass frequencies in question due to their design.

    Whether you turn to an EQ shelving filter to raise or lower bass frequencies or select a fully parametric mid band for your work, you should make your decisions on a case by case basis. With vocal recordings, combinations of a low-cut filter and a low shelving band on top of it often deliver good results.

    However, other EQ filter combinations can also make for a balanced bass range of a vocal recording.

    Low-Cut in the bass range

    Low-Cut with resonance at the cutoff frequency

    Bell in the fundamental range

    Low-shelf in the fundamental and bass range

    EQ Processing in the Low-mid Range

    Fundamental range balance is essential for a modern vocal sound. If the voice sounds “boomy”, the fundamental range can be cleaned up a little. Of course, at the same time, the voice cannot be too lacking in energy, because this leaves it sounding thin and weak.

    Many male as well as female voices come with a natural overemphasis in approximately the 300 – 600/700 Hz range, which partly defines the characteristic sound of the voice; this though, is often too strong. Additionally, a non-complete linear frequency response of the microphone and or preamp can lead to an unpleasant overemphasis in this range.

    Given that there are very dominant elements of practically all other instruments present in the mix, especially in the low-mid range, this frequency range should be edited very carefully, and you should attempt to avoid exceeding overlays.

    Too strong low-mids in the mix are usually expressed in the form of an unpleasant overall sound, leaving the sound “image” spongy and undefined, mushy and dull. It is best to use a bell or shelving filter for editing. When tracking the overemphasised elements and disruptive resonances in question, it is recommended you go through the frequency range slowly, with a narrow filter (Q factor) and large level gain, and while doing this pay attention to any noticeable level jumps and or distortions.

    If these occur, this means that the overemphasised core frequency has been found, and at this point, one can lower this range gently with a slightly wider Q factor. A few dB values are usually enough here, because you do not want to thin out the signal unnecessarily, only eliminate the overemphasised elements, in order to receive a balanced frequency response. Working with a dynamic EQ is appropriate in the fundamental range in particular. Some overemphasised elements become clear only with certain notes; given this, a static drop can often change the sound too much. This is further discussed in the chapter “Dynamic EQs”.

    The result is that, when this measure is applied, the clarity and conciseness of the vocal signal are increased; however, you should make sure that it does not sound unnaturally thin or nasal. This character is an indication that the reduction in the low-mids was too strong.

    For this work with low-mid frequencies, a graphical analyser can also serve as a valuable complementary tool. With increasing time and experience dealing with vocal signals, you will learn how to reliably read and assess the frequency images of the analyser. However, as already mentioned, you should never wholly and completely depend on the visual monitoring of the signals – good sound should still be processed mainly with ears and acoustic reproduction via studio monitors. An analyser is always good for a sound second opinion, but it should never be the first instance.

    Normally vocals, along with the low cut, can also be cleaned up a bit with a bell filter in the low-mids. A typical curve would look like this:

    Low-cut and low-mid attenuation

    In rare cases, a microphone signal which may be a bit too thin needs to be boosted in the low-mid frequencies in order to attain more fullness. However, you should carry out such a balancing increase very carefully; 1-2 dB gain is usually more than enough in most cases.

    Low-cut and low-mid boost

    If you have found clear frequency resonances in the low-mid range which can easily arise from problematic acoustic conditions in the recording room, then the use of a so-called notch filter is also recommendable; it is appropriate for eliminating very clearly limited frequency spaces with a very narrow filter quality.

    In this regard, again, you should not overdo the level reduction of the frequency range in question, since resonances and distortions can appear as a result of an excessively steep intervention. Additionally, the natural sound image of the voice will be strongly influenced. For this reason, notch filters should actually be used only to eliminate interference frequencies.

    Notch filter

    The above-mentioned problems with an overemphasis of individual frequency ranges in the low-mid range can also appear in higher frequency ranges. The mid and high-mid sections are far less affected by this; however, it is worth checking the vocal signal in these ranges as well and, where appropriate, processing them in a similar fashion.

    EQ Processing in the High-mid Range

    One important factor you should consider when embedding the vocal signals in the instrumental mix is to ensure that the text of the song is conveyed with the best possible intelligibility. After all, it contains a large part of the information relevant to the statement and the emotional effect of the song.

    The so-called “speech intelligibility” of the human voice is defined by the tonal expression of the consonants; in this regard, the corresponding frequency portions lie between 3 and 6 kHz depending on the voice “colouring” and the gender of the singer.

    A slight increase in the 3-4 kHz range emphasises the noise components of the consonants somewhat, which leaves the voice sounding more direct and clear. With this, one should raise the Q factor as wide as possible, to avoid additional resonances and unnatural “colourations”. Our ear is, for speech intelligibility reasons, very sensitive in this frequency range, which is why we expose unnatural processing quickly and accurately.

    For this reason, very moderate increases (max. +2 bis +3 dB) are mostly sufficient; otherwise, the voice will quickly begin to sound “tinny” and “nasal” or “metallic”. An increase in the frequencies relevant to speech intelligibility in most cases also generates an increase in any already very significant and unwanted S-sounds and sibilants.

    This should be tolerated during this processing stage – a narrow-band and level-dependent reduction can/must take place later, e.g. with the help of a de-esser.

    Boost with bell in high mids

    EQ Processing in the High Range (Air)

    Frequencies above approximately 5 kHz are responsible for the brilliance of the vocal sound and for open transparency in the presentation of the human voice. If a voice in this upper-frequency range is given “more highs”, not only will the recorded material sound “finer”, the voice will also be shifted further to the front of the mix. Since we automatically and associatively combine distinct high frequencies with close signals, we can take advantage of this characteristic of our hearing to position the vocals in the mix.

    More distant acoustic signals quickly lose their brilliant high elements on the way to our ears, which is why the ratio of high frequencies with an instrument or a voice can be used to position it spatially as part of our “depth impression”. In practice, treble boosts are typically achieved with a very broadband high shelving filter with a frequency around 5-8 kHz, whereby the filter quality is used to make the transition into the increase as soft as possible.

    In some cases, a significant increase of the high frequencies using a parametric peak/bell filter with a high Q-factor also leads to outstanding results, sometimes in combination with more subtle shelving below.
    With the so-called air range (from approx. 10-12 kHz upwards) in particular, you can give the vocals more “quality” and “shine” with a significant increase of these frequencies using a peak filter.

    As always, one should not overstrain this effect; however, appealing it may sound at first. All too often one will fall into the “treble trap”, mixing extreme overemphasised highs which may sound very good during processing but are clearly too strong when objectively assessing the mix.

    Again, do not forget to consider careful acoustic and also visual control – studio monitors and the analyser are indispensable tools.

    “Air” boost with high shelf

    “Air” boost with bell

    EQ Processing in the High-mid Range

    Individual EQ Processing

    Individual EQ Processing

    This description of basic vocal EQing should only serve as an approximate and very generalised guide, since each audio signal and in particular, each vocal recording comes with unique and very different preconditions and frequency images.

    Fortunately, each human singing voice is unique and like no other in its voice “colouring” and its striking resonances – this is what gives a voice, and therefore a singer, an individual character, which we value so much in music. The EQing above should serve as a starting point and inspiration for the appropriate required measures. These can be used in a more refined and modified form, to integrate the voice into the mix as completely as possible without it losing any of its natural qualities through processing.
    For this reason, once again we would like to point out that, when dealing with vocal signals, it is particularly important to develop a clear sound concept and to maintain this during production and following processing stages.

    As part of this, you can listen to professional productions and take inspiration from them. If you imagine the vocal sound from artist XY for your production, then you should try to reproduce it as exactly as possible. This orientation will not only give you valuable experience in dealing with vocals; you will gradually develop your own, ever more pronounced, taste, which you will want to put into effect as much as possible.

    The following steps are essential during EQ processing:

    1.) Attentive and careful control of sound changes in a typical listening environment with loudspeakers that are as transparent as possible, and which have a mostly neutral frequency response: You can only edit what you have actually heard.
    If the speakers reproduce specific frequency ranges unnaturally and shifted, you will hardly be able to make the right decisions regarding processing. Purposeful and useful work is therefore not only difficult but almost impossible.

    2.) Frequent A/B listening (before/afterwards) to your changes and their effects on the sound of the vocals in connection with the overall mix is imperative, in order not to “twist,” that is, ultimately making edits which are useless or exaggerated for reaching your sound concept. Pressing the bypass switch, which is available in every plugin and also in every hardware device, should become routine.

    3.) “The right thing is what sounds good” – this principle applies without limitation: there are no established rules for EQing vocals in a certain mix/song. The possibilities are infinitely diverse, and editing is a creative process with its own freedom.

    4.) Based on experience, the expression “less processing is more” can be a good, valid piece of advice when dealing with vocals. In most cases, you will strive for a presentation of the human voice, which is as natural and organic as possible. This is certainly not achievable through extreme EQing; on the contrary, excessive EQ processing increasingly distorts the natural frequency response of the vocal signal.

    In many cases, it is best to proceed by carefully testing your possibilities. If the desired sound vision can only be achieved with very strong EQing, or if it is necessary to bend the frequency range of a vocal trackback towards “naturalness”, you should consider whether it would not be more sensible to record the vocals again using a more appropriate microphone. Often this extra work can lead to significantly better results compared to extreme EQing.

    Different Equaliser Types

    It is not just the unique character of the respective voice that strongly influences and decides which EQ editing should/must be done with the respective vocal signal; the different EQs themselves, as well, deliver an almost unmanageable diversity of sound options.

    Their exact technical way of working can also be used in conscious sound design: only very few EQs work largely linearly and transparently. As already explained when discussing microphones, tube-based devices add certain harmonic distortions and pleasant “colourings” to the EQ, but transistor-controlled EQs also have their own, mostly highly individual, sound character.

    After all, it is not just analogue hardware EQs which strongly differ in their sound behaviour and the way they enable frequency processing. The most diverse digital plug-in variants of EQs and filters can be found on the market, and most have their own sound properties, which are, more than anything else, based on their particular respective frequency manipulation calculation algorithms.

    Different filter grades, alternative shelving curves, emulated distortions with analogue circuits of vintage devices, resolutions beyond the human listening range and, last but not least, phase linear processing: It would seem that selecting the right EQ is an art in itself. One is often overwhelmed wondering which of the 10-20 available EQs to use in their own DAW environment.

    As before, for this, there is, sadly, also no generally applicable right method. Sure, some EQ concepts and models are better suited for specific tasks and desired sound results than others, but almost every EQ can be worked with purposefully and objectively.

    As is so often the case, it is not so much “with what”, but rather “just how” that counts with EQing in actual practice. The internal EQs in the mixer channel strips of an individual DAW alone are enough to provide sufficiently good editing options with most problems – a good and transparent mix can also be created with them.

    In general, put loosely: For all technical EQings you should choose EQs with sufficient parameterisation and which are as neutral-sounding (i.e. as little-sound colouring) as possible. This means that there is no strong “colouring” emulation of vintage tube EQs for dampening or filtering out disturbing resonances in a narrow band. Conversely, when you want to create a specific “sound” it would be better to choose those models in order to give the vocal signal a certain “colouring” character. But, here as well, in every situation the rule, “What is right is what you like.” applies. As you gain experience, you will discover which devices give you, in the easiest and most objective way, “their” desired sound.

    Linear Phase EQs

    Standard, digital EQ circuits work according to the so-called Infinite Impulse Response Procedure (IIR Principle), and their effects are such that they behave very similarly to their analogue predecessors. With this, the level change is achieved through complex feedback in the respective frequency band. These feedbacks consequently have a minimal temporal offset (delay) within these frequency ranges, which inevitably results in some degree of phase shift within the signal. For this reason, traditional EQs distort and influence the frequency image solely as a result of how they work, depending on the extent of the increase/decrease in signal. For a number of years, the concept of phase-linear EQ circuits has offered an alternative, which maintains the phase position of the sound material, meaning that it delays the entire signal by the same time value, instead of delaying the frequency ranges at different levels and thus influencing the waveform. Thus, linear phase EQs offer the advantage of not unwantedly influencing the signal in its phase position; as such, they offer the advantage of better transparency. However, these EQs often generate significantly higher pre-ringing given how they work – a type of “pre-echo”, which influences the transients of a signal in particular. Additionally, linear phase EQs calculate at a much more sophisticated level.

    Thus, given that even linear phase EQs feature audio-related disadvantages, it does not make sense to always use them as a matter of principle. Linear-phase processing is recommended for multi-microphone sessions in particular: in such cases, the phase relationship between multiple microphone signals should remain intact. However, in such cases, it is recommended that you edit in a sub-group (where feasible), where the signals are processed as a whole. The adverse effects of linear-phase EQs also tend to be less relevant in the highs than in the basses. In practice, you are more likely to use a standard EQ and put up with the mostly non-problematic phase shifts and get linear-phase colleagues involved only in exceptional circumstances.

    Incidentally, there are also mixed forms between non-linear and linear-phase EQs, and manufacturers use different approaches when coordinating the disadvantages of the two filter forms against each other or balancing them. Technically, however, there is no perfect equaliser.

    Waves SSL E-Channel (partially semiparametric, partially fully parametric)

    Mäag EQ 4 (partly semi-parametric)

    Waves Linear Phase EQ (Fully Parametric)

    Softube Tube-Tech PE 1C (partly semi-parametric, partly fully parametric)

    FabFilter PRO-Q 3 (fully parametric)

    Sonible smart:EQ 2 (fully parametric controlled via AI)

    Dynamic EQ

    Dynamic EQ can be viewed as a combination of the two classic insert effects EQ and compressor. This innovative concept among signal processing devices can now also be found in ever more plugins.

    While “normal” EQs work statically, dynamic equalisers are able to monitor individual tracks dynamically, thus combining the possibilities of both an equaliser and a compressor. With this combination of both editing concepts, very complex editing processes are possible, the sound results of which could not be achieved, or only with a great deal of trouble, with traditional EQs and compressors. For many vocal editing tasks, a dynamic EQ’s flexible range of functions is a perfect choice. Naturally, this allows for relatively simple control processes, such as De-essing, but also a great deal of far more specialised editing techniques, which can be implemented very quickly and effectively.

    We can compare the function to that of a compressor: the user specifies the frequency range to be processed.  A threshold or a range (maximum gain reduction) is established in order to activate the dynamic track. If the level of the specified frequency range exceeds the user-defined or automatically-determined threshold, the EQ begins reducing this range. Put more simply; it can be said that the dynamic EQ will only intervene if the signal in the problematic frequency range is high enough.

    The following example will demonstrate the difference between using static or dynamic EQ. The singer has an unfavourable resonance at the beginning and at the end of the recording – the editing was done once with a static EQ and once with a dynamic EQ. The static reduction is roughly equal to the maximum dynamic reduction.

    Sound Example 1

    Vocals with Resonance

    Audio Player

    00:00 | 00:07Use Up/Down Arrow keys to increase or decrease volume.

    Processing is done with a dynamic EQ. The disruptive frequency is extracted by the dynamic EQ with a maximum gain reduction of 15 dB.

    Resonant Vocals with dynamic EQ

    Audio Player

    00:00 | 00:07Use Up/Down Arrow keys to increase or decrease volume.

    The 15 dB reduction may initially appear quite drastic, but if the dynamic EQ only works when the disturbing frequency range reaches an excessively high level, this apparent drastic intervention really is not that big a deal at all. If you listen to the same level of reduction using a static EQ, the voice will seem much thinner; it will permanently lose considerable energy in the fundamental range.

    Resonant Vocals with static EQ

    Audio Player

    00:00 | 00:07Use Up/Down Arrow keys to increase or decrease volume.

    Processing with dynamic EQ

    Processing with static EQ

    Common dynamic EQs include, in addition to the HOFA IQ-EQ, the Waves F6 and the Sonnox Oxford Dynamic EQ.

    Sonnox Oxford Dynamic EQ

    Waves F6

    The Compressor/Limiter

    When embedding a recorded vocal signal in a mix made up of several mostly high-volume instruments that are part of a band, along with processing and adjusting the frequency image, optimising the dynamic range also plays a central role. When recording songs, in particular, the range between the quietest and loudest deflection of the signal is particularly large.

    This is desirable on a musical level: significant differences in volumes and variations are the basis for the expression of emotions in music. However, the very extensive dynamic range becomes a technical problem as soon as the signal needs to prevail over much louder and less dynamic instruments in the mix. Even a voice that is amplified using both a microphone and a preamp will no longer have a chance, especially in the case of electrically amplified or very level-intensive instruments such as electric guitars or drums.

    Loud level-peaks will prevail well with sufficient amplification; however, the very quiet passages will be drowned out hopelessly. If you turn the entire vocal signal so far up that even these quiet words and sentences are easily audible, the loud level-peaks will become negatively audible.

    This should be enough of a description of the problems that occur when the dynamic range is too large. It is, therefore, necessary to find to technically edit the signal in which the large level differences can be effectively compensated for so that the more compact signal as a whole can be increased without individual peaks standing out too strongly and causing distortions.

    A solution was found very early on in sound engineering. A so-called compressor was used to automatically reduce the audio signal according to a definable ratio once a certain volume level had been exceeded. If the signal fell below the defined level threshold (threshold) again, it would pass the compressor unhindered. Thus, the compressor regulated only the loudest parts of the audio signal, therefore, making the overall signal quieter and “more compact” – hence the term “compressor”, which adequately reduces the dynamic range ingested by the signal. 

    The signal made much quieter and weaker and with fluctuating volume, can be made louder as a whole in the final step. Thus, after compressing the signal, you raise it far enough again that the average level matches that of the unedited signal. What is much more important, however, is that the quieter passages are also significantly raised, and the internal differences between the quieter and louder voices are diminished. 

    The “Automated Sound Engineer”

    To illustrate the operation and control processes of a compressor, sound engineers often like to cite the comparison of manual volume controls with a channel fader. A compressor is described thus as an “automated” sound engineer” who is responsible for the signal as a “level controller”. 

    As soon as the level of the signal exceeds a particular threshold value (threshold), the automated engineer will turn down the volume at a specific ratio. The reaction time the automatic engineer needs until it has actually turned down the fader after the threshold has been exceeded is known as attack time.

    There is also the time in which it regulates the signal back to its original output value after the level has fallen below the threshold again – this, accordingly, is called release time. The engineer may have been told that, after decreasing the level (attack), he should wait a certain period before a turning it back to the original value level (release) is possible again. This short waiting period is known as a hold

    Parallel to the control processes conducted by the automatic engineer, it will also increase the overall output levels of the signal; in most cases, it will be by the exact average by which it reduced the peak levels. However, we do not need a sound engineer for all these editing procedures on a single track because the described tasks are reliably and automatically performed by a compressor. 

    The compression effect also has, in addition to narrowing the dynamic range and the resulting possibility of a higher level, another significant side effect: the quieter passages are increased and therefore can be heard much more clearly in the mix. This also means a result of significantly better speech intelligibility, and emotionally significant parts of the performance (breathing, sighing, groaning, etc.) can be increased to an audible level. 

    The regulation time setting options (attack and release) in particular give the sound engineer a great deal of control when shaping the signal – they make compressors the mixing engineer’s “best friend”. Percussive portions can be “rounded down” with fast attack times; longer attack times can help to make transients stronger. Shorter release times can generate effective pumping and long release times allow for longer-term level reduction. The adjustable times are very much faster than any person could react, and they allow each signal to be reformed from being imperceptible to highly effective, thus significantly changing the character of the sound.

    In this regard, vocal compression has many benefits: 

    • The large dynamic range of the vocal signal is effectively limited and literally compressed. Thus, a higher level of the overall signal is possible, and dangerously breaking level-peaks can be intercepted and dampened. At the same time, we still also, therefore, benefit from preventive protection against overloads from signals that reach the upper-level threshold.
       
    • The quietest passages in the signal become louder as a result of the overall raising of the signal level. This promotes the intelligibility of the voice and its assertiveness in the mix as compared to the other instruments. The voice sounds more direct and concise, and it can be better placed into the mix.
       
    • By “bringing out” quieter voice segments that would have been drowned out very easily without editing, often overlooked and characteristic vocal moments can be emphasised and amplified (e.g. breathing, scratching, breaking etc.). Thus, compression can definitely help lend the voice a distinctive sound.


    Another additional factor which has not yet been mentioned is that, due to their design-specific tonal differences, different compressor models can be used quite objectively and effectively in the sound design of an audio signal.

    Each compressor sounds different and adds certain audio nuances to an audio signal so that a compressor can be used in a thoroughly musical and sound-shaping way. Particularly drastic dynamic interventions to the signal using certain compressor models give the sound very characteristic “colourings”. 

    Just like with microphones and preamplifiers, we also find conceptual differences in compressor circuit designs. Depending on whether compressors are based on transistors, tubes or photoelectric components, they all differ in their individual control behaviour and, accordingly, in their characteristic sound. 

    Here is an introduction to common types of compressors, explained by using some – partly legendary – studio equipment. 

    1176 by Urei (later Universal Audio) – Transistor FET Compressor

    This is a true legend among compressor models, recognised as part of the inventory of countless large recording studios and available in the mono variant and the studio version (1178). The signal is regulated using a so-called field effect transistor – as is typical with transistors, the reaction time is very fast; however, it also creates audible distortions, but these also make up the sound character of the compressor and can be easily used for “colouring” the sound. 

    All of the components of the 1176 have a sleek design, and the device is equipped with high-quality transformers at the input and output. Its attack and release time parameters can be regulated extremely quickly, making it also easy to use for compressing fast and transient-rich audio signals (drums). The 1176 has no separate controls for the threshold – this is fixed by switching to a certain voltage. With this, the controlling is done via the input regulator, by simply “shifting” the level of the input signal instead of the threshold. The ratio can be selected using four switches: 4:1, 8:1, 12:1 and 20:1 (limiting). If all of the buttons are pressed at the same time, the 1176 will work in the legendary “All Buttons” mode, which creates a very distinctive character and quite useful controllability, the results of which one can hear especially on the drum room signals of countless productions. 

    The 1176 is one of the most popular compressors for vocals, and there is now a new version of it available from the Universal Audio, which is based on the original circuit design. In addition, many different plugin companies offer more or less exact emulations of the classic, some of which come amazingly close to the sound of the 1176 even though they will never be able to simulate the individual sound and character of the devices altogether. To be fair, one also has to admit that every model (the older devices in particular) of this analogue compressor has a slightly different sound image, which can hardly be reproduced with a digital emulator.

    Above: 1176 Rev. D; Below: 1176 Rev. A

    UAD 1176 Plugin

    Bomb Factory 1176 (BF76) Plugin

    Waves CLA-76 “Blacky” Plugin

    Waves CLA-76 “Bluey” Plugin

    LA-2A by Teletronix – Optocompressor

    Applying a completely different concept, this compressor, initially built by the Teletronix in the 1960s, limits the dynamic range of the signals. The LA-2A belongs to the family of optocompressors, which regulate the extent of the compression with the help of a so-called optocoupler. This simple component consists of two parts: a light diode (LED) and a photoresistor. The louder the input signal, the brighter the diode lights up and the more the signal will be reduced with the help of the altered photo resistance. The response time of an octocoupler is naturally very long, and for this reason, optocompressors tend to be among the slower animal of their group. 

    Given that the reaction time of the optocoupler cannot be influenced traditionally, the so-called optocompressors lack the typical time parameters of other compressors (Attack, Release, Hold); with this compressor type, the control types are unchangeable and very characteristic. 

    The natural attack time, with a very musical feeling, is particularly suitable for compressing vocals, but also bass or drums. The release time is also quite outlandish; however, a signal reduced by the compression is initially very slow before it rises ever more quickly to the original level. In addition, the optocompressor is differently sensitive for various frequency ranges; in this way, one can observe a specific natural, frequency-selective compression, which in turn sounds particularly pleasant, musical and realistic for vocal signal editing purposes. 

    Given the absent time parameter, an optocompressor comes typically with very few setting options, which is another reason for its reputation for conveying very “musical results” – if you cannot adjust much, you have to rely on your acoustic assessment, and cannot hide behind exaggerated parameter editing. The company Universal Audio has the LA-2A back in their line, but there are also countless digital emulators as plugins, like with the Urei 1176.

    Teletronix LA – 2A

    UAD LA-2A Plugin

    Bomb Factory LA-2A Plugin

    Waves CLA-2A Plugin

    SSL Bus Compressor – VCA Compressor

    The “cleanest” compressors are the so-called VCA compressors. VCA stands for “Voltage Controlled Amplifier”, and that also describes how these devices work: it is an amplifier unit whose degree of amplification or damping is regulated by a set voltage amount. This set voltage is in turn obtained from the input signal through rectification. Accordingly, as the amplifier decreases, the input signal increases. This circuit design has some technical advantages: VCAs have a very linear curve, they hardly “colour” the signal and the setup allows it to regulate the control times very precisely. 

    VCA compressors can be found in many mixing consoles, and with their low signal alienation and flexible setting options, they are the typical “all-round” devices when it comes to general dynamics control. 

    One of the most popular and well-known VCA-type compressors is the bus compressor from SSL, installed as a complete compressor in the classic SSL consoles. Meanwhile, this device is also available in hardware form, for example for the API 500 cassette format, the SSL-proprietary X-Rack or in the classic 19′ casing, and of course, there are also countless digital replicas of this studio legend. One practical speciality of the SSL bus compressor is the automatic release function. This is especially useful for musical expression – for complex signals in particular – and it makes the release time dependent on the duration period of the reduced level peak. 

    Even though the SSL bus compressor is a stereo device, it is quite well-suited for processing vocal signals. It often performs well on a vocal bus, too, where it makes for a pleasant “glueing” of several vocal tracks without alienating the signal too much. 

    Of course, there are also monocompressors, which are based on VCA technology, e.g. the legendary DBX 160. VCA compressors are also used in many channel strip solutions like, for example, the ISA 131 from Focusrite.

    SSL Bus Compressor

    Waves SSL Bus Compressor

    Fairchild 660/670 – Vari-Mu compressor

    Naturally, in addition to the transistor-based VCA, FET and optocompressors, there is also the family of compressors which carry out level reduction with the help of tube shifts. The so-called Vari-Mu compressors carry out level reduction with the help of tubes – not to be confused with optocompressors, which balance their level loss with the help of a tube amplifier (e.g. Tube Tech CL1B)! 

    With the ability to pull tubes easily and drive them to saturation, these compressors produce a very characteristic “colouring”, which originates with the addition of harmonic distortion, especially at “hotter” levels. Additionally, they also have very individual, and not always linear, sound control and level reduction characteristics (depending on their design), which are not necessarily transparent or true to nature but can result in a pleasant colouring throughout. 

    Tube compressors are primarily used as a creative tool, as the sound engineer can use them to “make sound”. These devices are not suitable for a purely technically-motivated compression effect which should be as inconspicuous as possible. There are very transparent VCA compressors or digital plugins available for this. The sound of the tubes can have a very positive impact on the production of vocal signals. However, you should carefully consider whether you want to use the distinct sound characteristics of the tubes in every signal chain (microphone, preamp, compressor). 

    Sometimes less is more here too. Definitely, the most prominent kind of tube-based compressors is the Fairchild 660 from the company Sherman Fairchild. There could be difficulties acquiring one, since they are no longer being built, and even used ones are hard to find and more than anything else they are sinfully expensive. However, there are also digital plugin emulations for this device, making a touch of the Fairchild 660 available for your own production.

    UAD Fairchild 660 

    UAD Fairchild 670 

    Bomb Factory Fairchild 660 Plugin

    Bomb Factory Fairchild 670 Plugin

    Waves Fairchild 660 Plugin

    Waves Fairchild 670 Plugin

    Basic Compressor Settings

    Whichever compressor (or compressors!) you choose when editing vocal recordings, the exact settings of the parameters are complicated to represent in a generally accurate description/method, similar to the EQs. The various factors are too different; for example, the nature of the output signal, the compression effect and output (depending on the design) and, above all, the musical or technical objectives for the respective production.

    Nevertheless, here we would like to provide a short guide for some basic settings for vocal compression, starting with a completely normal compressor design which provides the time constants attack und release, and which can be found, in its current or in a similar form, as the standard compressor in all current DAWs. Thus, the following parameters are only to be understood as guidelines, as every signal and every piece of editing requires a very individual level of sound intervention. The following values are suitable as part of an initial orientation when gradually making final adjustments.

    Insert Compressor:

    All types of devices and plugins used for editing dynamic processes are classic insert effects. This means the signal to be edited should run through at 100%; this is why we insert the compressor of our choice as an insert on the channel strip of the vocal signal to be processed.

    Compressor as an insert in Cubase

    Compressor as an insert in Pro Tools

    Adjust Ratio Moderately

    To get a feeling for the compression to be expected, a slight to average ratio adjustment is recommended. A ratio of 2:1 to about 4:1 should be sufficient when starting. Higher values usually ensure sound results which already strongly tend towards effects’ editing.

    Set ratio in Cubase

    Set ratio in Pro Tools

    Adjust Time Parameters (Attack and Release)

    The next thing we do is select time parameters which are common for compression purposes while staying, for the time being, with moderate values. We will still be able to fine-adjust the controls for attackrelease and, if appropriate, for hold in a later step. However, for the moment, it is sufficient to get the compressor working. An average attack time of approx. 20-50 ms and a rather short release time of approx. 100-200 ms are good standard values when getting familiar with the reaction of the compressor and its sound effects.

    Attack and Release settings in Cubase

    Set Attack and Release in Pro Tools

    Select Threshold

    By making changes to the threshold in the negative dB range, we determine the threshold value, above which the audio signal in the specified ratio is compressed. If we experiment a bit with this value, we will be able to hear the sound result very quickly and clearly, in the form of a significantly reduced level signal.

    In most cases, the compressor/plugin has a so-called Gain-Reduction Display, which can be represented as an LED chain, a standard VU meter or a graphic level column. Here we are attempting to achieve a maximum gain-reduction rating of approximately 3-6 dB above the status of the threshold regulator. This means that the audio signal is reduced in level by up to 3-6 dB if it exceeds the adjusted threshold value.

    Set Threshold in Cubase

    Set Threshold in Pro Tools

    Fine-adjust Time Parameters (Attack and Release)

    Now we return to the time parameters. Since the compression, and with this the level reduction, impact the vocal signal by lowering the threshold limit, we are able to hear significant changes if the attack and release values are altered. It is now difficult to obtain general information since the vocal signals and compressor models react very differently to the time parameter settings.

    Generally, it can be said that extremely short attack times also compress the important early transients and thus word beginnings – a feature which is sometimes desired, for the purpose of intercepting dangerous level peaks as quickly as possible, but this is often avoided for the purpose of retaining the original impulses of the vocal performance. For this reason, in most cases, one will select the attack time such that the compressor only allows the important transients of the signal to pass.

    The release time generally depends on the speed of the level impulses more than anything else. If it is set so long that the next impulse (e.g. the next sung phrase) still falls within the reduction period after the level reduction, you will get a very inhomogeneous and unstable sound. Some notes will be of average volume, but some will still be reproduced “release-compressed”. With this, one should attempt to have the level reduction process completed shortly before the next peak / the next phrase.

    Nevertheless, again, excessively short release times must also be avoided – fast and constant volume reduction can result in very conspicuous and uncomfortably unnatural “fluttering” of the singing performance.

    With this, setting the time parameters is a tricky task which, in truth, depends very much on the nature of the audio signal and the context of the music. Normally, compression, which is as inconspicuous as possible is desirable – compression which gives the vocal signal the highest degree of naturalness. Even very experienced sound engineers find the best time parameter settings, respectively, mostly through careful trial and error and careful listening.

    Output Amplification (Make-up Gain):

    Following the completed adjustment of all parameters which influence the compression process, the level of the vocal signal has been reduced, more or less strongly. The consequence is a quieter signal, but one whose general dynamic range was compressed.

    The lost average level is increased again after compression in order to maintain a meaningful level across the entire signal chain (also known as gain staging) and avert any complications caused by different levels when making a direct AB comparison before and after compression.

    The easiest way to recapture this gain is by using the bypass switch function. By doing this, the compressor/plugin can be deactivated for a short time, so that the original unedited vocal signal can be listened to. By switching back and forth, the original volume level can be compared with that of the compressed signal, and the latter can be adjusted accordingly. If ultimately, both signal variants were “felt” to be equally loud, one could also ultimately detect and evaluate the influence of the compression on the vocal signal. With a bit of practice, an experienced sound engineer can recognise the necessary make-up amount mostly relatively accurately, by looking at the gain reduction display.

    Set Make-Up Gain in Cubase

    Set Make-Up Gain in Pro Tools

    Here we list some more principals for vocal compression and compression processes in general:

    • very low set threshold value will cause an early, usually permanent compression of the audio material. This will lead to a very stable signal with a compact dynamic scope. The risk lies in possibly unnaturally crushing the sound image.
       
    • very high set threshold allows the compressor to respond only to the highest signal peaks and intercepts them, in order to, for example, protect the recording playback system against override or prevent sudden short signal bursts.
       
    • A good starting point for the threshold is to pull the regulator so far down that it sufficiently exceeds the greatest peaks. This will guarantee that the compressor will work in a largely normal control range.
       
    • Small ratio values will compress inconspicuously and naturally; high settings, on the other hand, will deeply affect the dynamics of the signal and clearly effect audible level reduction.
       
    • Excessively short attack times stifle the initial impulses (transients) of the signal and in the worst cases can trigger unpleasant crackling. With this, they should only be recommended if very extreme level peaks are to be caught (as quickly and safely as possible) or if a signal is to be artificially “blurred”.
       
    • Too long attack times have the consequence that the compression is delayed. Thus, the beginnings of words are “missed”, and the sound becomes unnatural. Also, after the expiration of the attack period, the compression will suddenly kick in, and this can lead to unnatural pumping.
       
    • Very short release times always very quickly regulate the reduced level back to the unedited volume. This can lead to a hectic and clearly perceptible jumping back and forth with the volume. If the release times become extremely short, constant regulation can even lead to cracking or distortion.
       
    • Nevertheless, with individual signals, in particular, the release process should, if possible, be completed by the time the following note begins, otherwise, it would be reproduced too quietly. In this way, excessively long release times prevent a concise sound image.
       
    • Should mainly level peaks be effectively intercepted and reduced, extremely short attack/release values can be used. Here, the aim is to achieve the fastest possible level reduction and an equally fast reset, which in the best case will not be noticed at all. Especially limiting and special tasks such as de-essing are based on the principle of the super-fast reaction of the compressor.
       
    • A rule of thumb regarding the relationship between the musical content of the audio material and corresponding time parameters for the compression (to be used with caution, though) could be as follows: fast notes and a fast pace in the song tend to require short values. Processing longer notes and slower tempos sounds significantly better, using longer time values.

    Hard Knee and Soft Knee

    A compressor can still exhibit other differences with regard to compression effects. The typical and characteristic control behaviour for the curve of the respective compressor, located at the kink of the curve, is, very lucidly, referred to as knee, and here one differentiates between hard knee and soft knee compression.

    Hard knee is always an option if, as with technical compression, a very effective and fast level reduction is to be achieved. Soft Knee reacts more softly and inconspicuously, making it better suited for compression of all audio signals with which one would not like to hear the actual compression process if at all possible.

    The dynamic editing of vocals, in particular, where one places great importance on their natural sound development, is preferably done with soft knee procedures.

    Again, the typical characteristics of optocompressors are ideal for this, which is why this technology, which as we have already mentioned is not the youngest, is still very popular and frequently used today.

    Nevertheless, of, course, there is also a large number of current devices which provide this particular type of compression – most of the time one can even choose between the two options hard knee and soft knee. Many digital compressor plugins offer maximum flexibility here: they usually allow you to blend the two options, thus creating an individual characteristic curve adapted to the respective musical situation.
     


    As can be seen in the figure above, the characteristic curves must be clearly assigned to the name. A hard knee has a fixed point at which compression begins immediately. A soft knee is closer to the “final” ratio value, the higher the input level and therefore, the further the threshold is exceeded.

    Studio One’s internal hard knee and soft knee compressor settings look like this:

    Hard Knee compression in Studio One

    Soft Knee compression in Studio One

    The following sound examples should help to illustrate the difference in sound between hard knee and soft knee compression. To make the difference more clearly perceptible, we have decided on a very percussive snare signal and a very high ratio: 12:1. This is about the settings listed in the above screenshots. In a direct comparison of these extreme settings, the influence of the knee can be perceived quite clearly: the hard knee interferes with the signal much more conspicuously, while the soft knee makes the sound audibly softer and less noticeable – even with such a huge gain reduction and short attack time – so that the drum’s transients are not quite so strongly undermined:

    Sound Example 2

    Snare without Compression

    Audio Player

    00:00 | 00:08Use Up/Down Arrow keys to increase or decrease volume.

    Snare Hard-Knee

    Audio Player

    00:00 | 00:08Use Up/Down Arrow keys to increase or decrease volume.

    Snare Soft-Knee

    Audio Player

    00:00 | 00:08Use Up/Down Arrow keys to increase or decrease volume.

    As is shown quite well by these sound examples, there is indeed an audible difference between hard knee and soft knee, but the influence of this parameter is also significantly less than that of the attack and release times. When making adjustments, therefore, one can also leave this parameter aside and in this way fine-tune only the compression behaviour. However, one can, as a matter of principle, bear this in mind: should the compression come across as inaudible and unobtrusive, a soft “knee” is preferable.

    Limiting

    The limiter is the closest relative of the classic compressor. Its functionality is not fundamentally different from that of a compressor; only the strength of the compression and its effect on the sound of the audio material is what makes it a specialised tool for dynamic processing.

    In simplified form, one could describe the limiter as a high-speed compressor with a very high, mostly preset ratio – at least 10:1 (more common are values between 20:1 and infinite:1). The attack time of a limiter must be ultra-short (values far beneath 1 ms are important), to allow for immediate and safe interception of level peaks. Thus, digital limiters normally work with look-ahead technology. With this, the compressor can, in a way, look into the future and identify the wave even before it hits the threshold.

    The release time regulates how quickly the volume level is passed unprocessed from input to output after excessively loud peaks have been limited at the threshold limit, just like a normal compressor.

    One of the most important areas of application for limiters is safe interception of peak levels in the audio signal which could override a system (recording or playback) at the point of input or output of the audio. In this respect, it is a very reliable and important protective function used mainly as the last element of a signal chain in a channel (output ceiling mostly -0.1 or -0.3 dBFS).

    During the editing of vocal signals, the limiter plays a more subordinate role – vocal signals which have been strongly edited with a limiter sound very unnatural and offer no pleasant dynamic behaviour. However, the limiter could also easily be used as overdrive protection at the end of a vocal chain, as well as in the stereo output of the overall mix.

    The use of a limiter in mobile speech recording is widespread and very important – for example, on a film set or in electronic reporting – in that, a usually unique sound needs to be recorded in a technically correct way, even in the case of unpredictable peaks. However, in music production, or just generally in a studio, the use of a limiter during recording is, in fact, taboo, since all options for correct level adjustment are provided at this point, and peaks can be predicted and takes can, if necessary, be repeated with corrected levels. In this regard, a limiter could only worsen the signal, and it should absolutely never be used during recording, or at most only as a protective mechanism just before clipping.

    Established limiters include the following examples: 

    FabFilter Pro-L 2

    A.O.M. Invisible Limiter

    Waves L2 Ultramaximizer

    Since a limiter needs to work with ultra-short attack times and extremely high ratios to be able to perform its protective function, it is not, of course, possible, that it has no acoustic impact on the signal. It is also easy to see that, due to the fast reaction time, the waveform inevitably becomes “bent”. This quickly generates an artificial “pumping” and audible distortions which, while they are not as drastic as those known with clipping, are something that the limiter should protect against; the signal, therefore becomes considerably degraded. For this reason, a limiter should always only be used when it is really necessary. To demonstrate how a strong limiting-effect can affect a voice, look at these following two extreme cases:

    Sound Example 3

    Speech without Limiter

    Audio Player

    00:00 | 00:06Use Up/Down Arrow keys to increase or decrease volume.

    Speech Limiter 8 dB Gain Reduction

    Audio Player

    00:00 | 00:06Use Up/Down Arrow keys to increase or decrease volume.

    Speech Limiter 18 dB Gain Reduction

    Audio Player

    00:00 | 00:06Use Up/Down Arrow keys to increase or decrease volume.

    Naturally, these examples are extreme cases, but they nevertheless show very clearly what careless reliance on a limiter (e.g. during a recording) can lead to. Moreover, these examples were created with a very high-quality limiter; many models would lead to considerable loss of sound even with significantly lower gain reductions. By the way, one certainly could say, as a rule of thumb, that the quality of a limiter can be largely measured by how much energy it is capable of absorbing without there being any audible distortions.

    De-Esser

    A de-esser should process only overly emphasised and disturbing hissing and ‘S’ sounds in the singing voice and dampen them as effectively as possible. De-essers have different functions – the most common one is similar to that of a multi-band compressor, which compresses only in the frequency range in which the ‘S’ sounds are present. All other ranges will be left unprocessed.

    To achieve this, the compressor circuit must be fed only with the high-frequency ‘S’ or ‘hiss’ signal for processing. In the case of a de-esser, this is done using a very narrow filter which lets only the problematic frequency range pass. Typically, this is above approximately 6-7 kHz.

    Another very common strategy is to reduce the sibilants by using targeted phase cancellations in a specific frequency range. This makes the process less obvious and normally leads to more natural-sounding results. The company SPL developed this form at the start of the 1990s.

    The de-esser control times must be very short – with most de-essers they are fixed. Some de-essers also offer the option to customise the reduction range to make it wider or narrower around the selected frequency. In many cases, there is also a function which allows to solo-listen to the filtered high-frequency signal. When using a de-esser, one should make sure that the compression is not too strong or noticeable. When you focus on the ‘S’ sound for too long, it can easily lead to lowering the sibilants too much, with an annoying lisp arising. It is, therefore, not advisable to “sharpen” your ability to hear ‘S’ sounds for too long. In case of doubt, de-ess the signal a little less, so that the voice will not be too strongly affected.

    In addition to the classic frequency-selective compression, there are other, more experimental approaches. For example, the company Waves offers a de-esser which, through resynthesis, digitally reproduces S sounds and inserts them into the signal at an amplitude which can be defined by the user. This is to prevent compression-typical artefacts (pumping etc.) with the consequence of a more natural reduction of the sharp sibilants. (Waves Sibilance)

    Here is a quick guide for practically de-essing a vocal track:

    1. Look for an area in the audio recording in which the problem of emphasised S/hiss sounds can be heard particularly clearly. The best thing to do is have this part played in a loop.
       
    2. A graphic analyser in the peak adjustment stage can help to determine the problem frequency in question. This frequency should be transferred (as a core frequency parameter) to the de-esser if a frequency is to be specified.
       
    3. Listening to/monitoring hissing signals which are often designated as “filter” or “HF only” makes it easy to determine the problem frequency more accurately. The core frequency should be chosen at the point where hiss or ‘S’ sounds can be heard the loudest.
       
    4. In most cases, a threshold control sets the dampening of the frequency to be more or less intense. The threshold here works similar to that of a normal compressor. If the threshold is exceeded, the de-esser will reduce the level of the band quickly. The fixed release time makes for an equally fast readjustment of the level to the original value.
       
    5. You should be very careful when making adjustments with the threshold, for an excessively strong dampening will lead to unnaturally quiet S sounds in the vocal signal. If this happens, an ‘S’ will be transformed into a kind of ‘F’, sounding as if the singer/speaker had a lisp. A maximum level reduction by about -6 dB is usually a good value for natural de-essing.

    Waves Sibilance

    HOFA IQ-Series DeEsser

    SPL De-Esser Classic (Plugin Replica)

    SPL De-Esser (Dual Band)

    In addition to dedicated de-essers, there are other options for processing the signal so that sharp ‘S’ sounds are reduced. For example, you can also use a multiband compressor, where different frequency ranges are separated and compressed independently of each other. Since, in the case of a de-esser, one cares about only the bothersome frequency range (which, depending on the voice and the recording, will be approx. 6-9 kHz), you can set the frequency accordingly and switch off the compressors on all other bands. In this way, with ultra-fast attack and release times, a kind of compression is allowed which filters only ‘S’ sounds.

    De-Essing with multiband compression in HOFA system

    Another very flexible solution is the use of a dynamic EQ, where you need only adjust frequency and Q-factor of one dynamic band. This band is used with very fast control times so that the S sounds can be reduced as effectively as possible.

    De-Essing with a dynamic EQ

    To show you the different effects of a de-esser “in front of your ears”, listen to the following sound examples:

    Sound Example 4

    Vocals without De-Esser

    Audio Player

    00:00 | 00:08Use Up/Down Arrow keys to increase or decrease volume.

    Vocals with De-Esser

    Audio Player

    00:00 | 00:08Use Up/Down Arrow keys to increase or decrease volume.

    Vocals with falsely adjusted De-Esser 

    Audio Player

    00:00 | 00:08Use Up/Down Arrow keys to increase or decrease volume.

    As is clearly shown in this example, a suitably adjusted de-esser can leave the voice sounding much softer and more pleasant. If the adjustments are too extreme, however, a very artificial vocal sound will result,  the ‘S’ sounds will be audibly reduced and there will be a slight lisp.

    As you may have noticed, a de-esser often does not respond equally to every ‘S’. Different sibilants can have different frequency spectra and the extent of the regulation also depends on the level of the sound. One should keep this in mind when working with de-essers, and make sure that the de-esser is not acting independently in an unchecked passage. If there is only one individual sound that stands out in the whole performance, it makes more sense to manually edit this separately (with standard editing or level automation), so as to avoid compromises at any other points.

    Automation Options in the DAW

    One of the key advantages of the digital functionality of a DAW is its ability to dynamically automate practically every individual parameter, rotary knob, switch and fader, as well as the virtual mixer, and also each of the plugins used in the session. With the help of graphically represented automation tracks, the functions of all parameters can be freely defined at any point in time, and, as if by magic, the DAW will automatically regulate all these editing processes. The ability to dynamically change every aspect of audio editing is made possible only within the virtual environment – in an analogue studio one would require an infinite number of sound engineers’ hands to manually carry out all the changes in real time during playback.

    In the processing of vocal signals, DAW automation offers an enormous range of possibilities, and it represents a real alternative for many editing processes. For example, the control processes conducted automatically by a compressor could also be applied with the appropriate dynamic automation of the volume fader. In some cases, such automation could even offer advantages when compared to using a compressor. There could be situations in which you don’t want to have a compressor working on a lead vocal track. In such a case, volume automation could represent a more flexible solution worth considering. It is also possible to avoid the sound by-products of the compression process in this way.

    The same also applies in principle to the de-essing process. Why not reduce the dangerous ‘S’ sounds with careful automation of the volume fader in the level? You would thereby achieve a de-essing without all the possible unwanted shortcomings and artefacts.

    Since all parameters can be automated as required, dynamic frequency editing is also conceivable. You may want to regulate the low-mid parts for individual words in a verse with differing levels. This is not so feasible with an EQ; it might be possible with a dynamic EQ but automating the relevant EQ band would also have this effect.

    On the other hand, you should not overdo automation – there are enough tasks and requirements which can be fulfilled very well and completely adequately with the use of classic gear or plugins. However, every now and again, it pays to ask yourself whether some tasks really can’t be applied more easily and quickly, and with a better sound result, by using careful and attentive automation. 

    In any case, volume automation with the fader, more than anything else, can help promote the most dynamic vocal performance. If the singer shows only slight emotional differences between aggressively loud and emotionally quiet passages in the recordings, it may help to subtly automate the volume during post-editing work. Often, one will want to make the verses a bit quieter, to give the refrain a bit more energy after a small volume increase. Bridges, and in particular breakdown parts, are suitable for bringing the voice upfront very directly and immediately. Volume automation can carefully promote all these subtle impressions and shape the dynamic course of a vocal performance.  

    Automation

    Spatiality in the Mix

    For various reasons, which we have already discussed in part in previous chapters, modern-day singing and speech are mostly recorded in very anechoic rooms or even cabins with concise reverberation times – such are modern productions. However, this was not always the case – in the early years of sound recording people used the natural spatial information of large studio rooms effectively – including and especially in the case of vocal recordings – this was to give the vocal signal the greatest possible naturalness. After all, every acoustic signal only develops in the interplay of a room’s own interactive echo behaviour and its own natural sound effects.

    Nevertheless, in modern sound engineering technology, the practice of recording vocals as neutrally and “dry” as possible has become standard practice, in order to retain all possibilities and options for artistic and production-related spatial changes in the later post-editing stage. There are countless products available on the market today for artistic and, above all, virtual simulation of spatial sound. These create almost every kind of natural sound behaviour in such an amazingly authentic way that, when combined with a compact mix rich in signals, it is tough to determine the difference between it and real three-dimensionality.

    It is not; however, only the authentic image of real spaces that can be effortless reproduced with the help of devices and plugins. It is also possible to create very effective, totally unreal spatial projections which could not exist in reality and where one could never have recorded any singing. With these comprehensive options, vocals can be placed in very impressive soundscapes, which in turn can help to make it so unique and exciting for the listener that they will remember the sound of the vocals and the connected artistic-emotional statement more quickly.
     

    The two most important elements of spatial effects include (only naturally) the reverb (reverb) and the echo (delay). In most cases, both are integrated into the mixer routing as send effects when it comes to the vocal side of music production – the actual effects device is looped into the insert of its own effects channel. All tracks which are to use the effect send a certain level portion of their direct signal to this effect channel, whereby this signal component runs through the effect device 100 %, and the acoustic result is heard on the main outputs. The dry and unedited signal will, in parallel, also be forwarded to the main outputs; with this one will listen to a mix of processing and dry signal. The ratio of this signal mix will ultimately be determined by using the channel send level.

    The processing of vocals with reverb and delay can serve different purposes:

    • Simulation of an artistic space appropriate to the production, whereby originally dry vocals subsequently receive a natural surround sound.
       
    • Deliberate exploitation of sound effects on the depth offset of signals: vocals can be positioned more in the foreground or background of the mix with the application of skilled editing work. With this, they may, for example, float clearly and concisely “in front of the mix” or be homogeneously integrated into the overall sound.
       
    • With flexible spatial simulation, the possibilities for creating unreal spatial conditions, vocal signals with very impressive and effective sound aspects can be equipped. These can help to promote the attention of the listener and at the same time, emphasise the meaningfulness of the song.

    Algorithmic Reverb

    Let us have a quick look at what happens in a real space, how sound is propagated there and ultimately becomes a spatial impression which we will associate as inseparable from the sound. Some of the original sound waves reach our ears directly, while others are reflected beforehand (possibly more than once) in the immediate vicinity; still, others are reflected over and over again very often until they combine and create a diffuse, reverberant sound image depending on the space, its size and the surfaces located therein.

    Over time all these sound waves lose their energy, with the higher frequencies breaking down in level fastest while, the lower ones are preserved longer. These three described stages are also known as direct sound, early reflections and reverb (tail). These three parameters and their individual levels are especially important when it comes to describing and perceiving spatiality and reverberation.

    Two additional time constants are also important, both for the development of reflected sound in a space and also for the simulation of this process in devices and plugins. These are the time intervals between direct sound and the arrival of the first echoes at the listening point (or ear) and between direct sound and the insertion of the actual reverberant reverb tail.

    The first time constant, the Initial Time Delay Gap (ITDG) is provided as a reverb simulation parameter only very rarely, while the second, the pre-delay, is provided all the more frequently. Signals can be positioned as part of the spatial depth classification perception, in particular when manipulating the pre-delay in connection with the decay time of the reverb tail and making adjustments to the high reverb frequencies.

    Some reverb devices and plugins still offer additional parameters, e.g. size (spatial size) or density (density of the reflections). However, ultimately these are specific modifications of an original algorithm, that is, calculation models which calculate and simulate early reflections.

    So much for parameters: But what is an “algorithmic” reverb? The word “algorithmic” already reveals the following: it is a purely synthetic reverb, which corresponds with the aforementioned number of parameters and can be adjusted. As part of this, many adjustment knobs and variables are changed – these are anchored in an algorithm which calculates the behaviour of the reverb precisely.

    This means there are basically countless echoes generated which take over the task of the initial echoes and the reverb tail. In this way, the algorithm keeps the diffusibility, the frequency image and the duration of the reverb under control, using modulations.

    First, a dry signal is sent through a variety of “delay lines”. This results in delays which follow one after the other quickly and which are close together. Exactly how these delays take shape, depends on the settings of the size and the form of the “theoretical” space. Mathematical algorithms regulate the timing, volume and sound of the delays using these parameters. It is quite similar to surfaces in a real space.

    After the early/initial echoes, the late echoes follow – also known as the “reverb tail”. Thus, you should keep in mind exactly when these occur: they are initial echoes which affect other surfaces!

    In order to replicate this, the reverb uses feedback loops to send the generated echoes through the algorithm again. These spatial properties are then combined with the initial echoes already sent by the algorithm are re-applied, and “late reflections” arise.

    At this point, however, the reverb algorithm has other variables, which influence timing, volume and the sound of the feedback loop.

    Now the length of the reverb can now be determined based on how often the signal is sent through the feedback loop. The more often it is sent through the loop, the longer the reverb is.

    With the help of these processes, an algorithmic reverb can generate a pretty convincing real impression of a space. But it also comes with the possibility of generating “surreal” spaces which can convey a strange and unreal sound expression. This is a lot of material available for creative, crazy or even natural reverbs.

    Various digital reverbs have established themselves as plugins. Here are just a few of them:

    The true classic among digital reverbs, however, is the Lexicon 224, which was first used in music productions in 1978. Even if it was not the first digital reverb device, it is certainly one of the most famous.

    Convolution Reverb

    As an alternative concept to an algorithmic reverb based on complex mathematical models, reverb devices have also been around for some time which create artificial reverberation using sampled characteristics of real spaces and spatial surroundings.

    This so-called “convolution reverb” uses samples (spatial impulse responses) for this, which it includes in the dry output signal. The samples arise with an actual space being vibration-stimulated by a very short acoustic impulse (Knall, DIRAC, Sinus-Sweep), with the result (the “answer” or reaction) being displayed. This gives you a so-called “impulse response”, or IR for short.

    Convoluted reverb? Why “convoluted”? In this case, it is not meant that the reverb is symbolically convoluted and inserted into a reverberation device. This is a term which originated from mathematics. A mathematical convolution describes the multiplication of two functions. Alternatively, to put it very simply: the frequency image of the signal is multiplied by the impulse response. We do not, however, want to get too theoretical at this point.

    This “response sample” represents the individual and unmistakable real spatial behaviour of this special space and it can be calculated on every audio signal using a corresponding plugin algorithm (the convolution reverb). The audio result in the end is the same as if the dry output signal had actually died away in this space – it results in extremely realistic reverberation behaviour, the naturalness of which cannot be surpassed.
    Thus, it is possible to insert a dry signal into every conceivable real space. These may include legendary concert halls or studio rooms, or also reverb devices in unusual places such as the inside of an oil tanker, a plastic bucket or a car boot.

    However, the reverberation behaviour of the convoluting reverb often cannot be as flexibly edited as would be possible using the parameters of algorithmic reverb devices and plugins, but it can be considerably more natural and realistic in this regard. When it comes to spatial editing, therefore, you should think carefully about the signal you value so much – either you want the maximum possible naturalness and the amazingly realistic reverb of convolution reverb, or you rely on the flexible simulation and recalculation of echo behaviour with the help of an algorithmic reverb, which can also be strongly adjusted to your own wishes and requirements. The respective decision should be made depending on the desired sound image for the particular song and the sound of the vocals.

    Convolution reverbs are not just useful in music production – practically every location can be “convoluted”. Thus, film sound can be brought to life in a picture shot in front of a green screen – on an acoustic level – using the convoluted reverb of a real environment.

    Plate Reverb

    Over the decades, the particular sound character of the so-called plate reverb has established itself as a consistent stylistic device and a good choice for vocal reverb. The German company EMT Franz initially put forward a monstrous reverberation device in the 1950s, which creates sound echoes over a freely swinging metal plate. The echoes of the EMT 140 and its successors are very compact, with a high-middle pitched, metallic sound character, lending a pleasant and strikingly fresh, artificial space to vocal signals in particular. The original EMT 140 plate reverb, more than 2 metres in length, was replicated by many manufacturers as a virtual plugin emulation, and it is a good selection for a fine vocal reverb. The plate reverb is a popular sound, which serves both as a template for sound behaviour of algorithmic reverbs and as a convolution reverb. This reverb ensures an unbeaten open and light vocal sound in countless releases. 

    There have already been several models established in the plugin world in particular, which convey a very realistic or true-to-original sound impression. 

    Reverb Selection for Vocals

    What kind of reverb should one use when processing vocals? As is so often the case, there can be no general answer or recommendation here either, since the decision depends strongly on the respective sound objective and the requirements of the song. One will generally have the least parameter editing work to do with the selection of a suitable convoluting reverb since these devices come with fewer settings. Even if some parameters can be changed, however, in the event of any doubt it would be better to choose a different, more suitable impulse response rather than to manipulate the one selected and, ultimately, distort its sound. That is because this was not the original idea behind the concept of convolution reverb; after all, you are looking for precisely this unique, particular form of echo found in the space whose impulse response has been chosen. A change or adaptation of this sample will typically, then, lead to significantly worse sound results as compared to simply having selected another impulse answer.

    On the other hand, if you decide on an algorithmic reverb, you have many more manipulation options; however, you will never achieve the realism of a convolution reverb. Ultimately, however, outstanding results can be obtained with both concepts. Answering the question of how much reverb one should use with a vocal signal depends very strongly on the current zeitgeist. Reasonable and necessary space is good for every vocal signal since it ensures a basic level of naturalness, with which one should equip a very dryly recorded singing-signal. The length, sound and “colouring” of the reverb signal will vary – we are all familiar with, for example, pop productions which are drowned in mostly very clear and long reverb tails. Every style and every pop music epoch seems to deal with reverbs differently, no doubt to clearly distinguish and differentiate one’s work from everything that has gone before. This meant that, in the 1980s, long and bright, high-frequency reverberation rooms were modern, whereas for the past several years we have experienced a trend more towards drier productions (among other things), and space is often created only with early reflections and/or delays, as well as spatial ranges with very little reverberation rating. The Rock’n’Roll era of the 1950s used very short reverb times, and the legendary slap delay – hardly any recordings by Elvis Presley or other representatives of this genre got by without this punchy and in-your-face bathroom sound. 

    Algorithmic ReverbConvolution Reverb
    Free and flexible processing of all relevant parameters.Fixed characteristic echo patterns and reverberation behaviour of the convoluted space
    Unreal and unusual spaces can be created through deliberate “abuse” of the parameters.Extremely realistic sound
    Not-as-realistic results as you would get with convolution reverb technology.The opportunity to use the fantastic rooms of large concert halls, studios, etc.
    More resources-friendly than convolution reverb.Very unusual sound spaces available (cans, forest, shoebox, etc.)
    Fully suitable for most tasks in a mix and especially when editing less important signals.Especially suitable for very high-value and important signals, which are far at the front in the mix. Tips for editing the vocal reverb.

    Tips for Processing Vocal Reverb

    No matter the current taste in music, when it comes to editing vocals with reverb, as we have seen, it is not just a purely realistic simulation of space that needs to be in the foreground. Much importance is also placed on editing, which is capable of making the main voice (which ultimately is the most important element of the song) radiant, assertive and unusually attractive. It is not uncommon to use multiple reverb devices for this with different settings and which give the voice different sound aspects (even if only in small proportions).

    A device with a rather short and compact reverb plate emulation, but rich in early reflections, ensures a full, voluminous and significant basic sound. Another reverb with a slightly longer reverberation time helps to embed the voice in the accompanying arrangement of the other instruments and gives it depth and substance at the same time. Thus, the vocal reverb can still be rich in high-frequency components, resulting in a more radiant and shiny impression. Often one will also work with a rather long-selected pre-delay (popularly around 100 ms), which decouples the direct sound of the voice from the reverberation, with the consequence that the voice will, as usual, be perceived as very present and clear in the near foreground of the sound image. Naturally, one can also get very good results with only a reverb device or a convoluting reverb plugin.

    • Two different reverb devices can help you to achieve different spatial sound aspects.
       
    • A very compact, rather short plate reverb (EMT 140 Simulation, Plate etc.) but rich in early reflections for a voluminous and assertive voice
       
    • A longer, finer reverberation for embedding the voice in the playback, for depth and a refined appearance
       
    • High-frequency components in the reverb help promote the shine and radiance of the vocals.
       
    • If you want to make the reverberation more natural and inconspicuous, on the other hand, you should dampen the high-frequency components somewhat (Low-Pass/High-Cut Filter). This corresponds to the normal decay behaviour of echoes in nature.
       
    • Often longer pre-delay (not rarely up to 100 ms and depending on the rhythm of the song), to decouple the direct sound component of the voice from the reverberation. Thus, the voice gains significance and clarity, and the impression of closeness – despite high-quality reverberation – is promoted.
       

    Here is a sample representation of the parameter settings of two algorithmic reverb devices which illustrate what has just been described: 

  • Protected:

    This content is password protected. To view it please enter your password below:

Design a site like this with WordPress.com
Get started