0% found this document useful (0 votes)
889 views451 pages

Sound Reinforcement For Audio Engineers

Uploaded by

Taner ÖNGÜN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
889 views451 pages

Sound Reinforcement For Audio Engineers

Uploaded by

Taner ÖNGÜN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 451

i

Sound Reinforcement for Audio Engineers

Sound Reinforcement for Audio Engineers illustrates the current state of the art in sound
reinforcement.
Beginning with an outline of various fields of applications, from sports venues to religious
venues, corporate environments and cinemas, this book is split into 11 chapters covering
room acoustics, loudspeakers, microphones and acoustic modelling among many other
topics.
This comprehensive book packed with references and a historical overview of sound
reinforcement design is an essential reference book for students of acoustics and electrical
engineering, but also for engineers looking to expand their knowledge of designing sound
reinforcement systems.

Wolfgang Ahnert is a sought-​after author, contributor, educator and lecturer at professional


conferences and tradeshows, and has authored countless white papers on subject matters
such as acoustical simulation processes, measurement technology, electro-​acoustical theory
and applications.

Dirk Noy is Director of Applied Science and Engineering at WSDG. He frequently lectures
at SAE, TBZ and the ffakustik School of Acoustical Engineering, all in Zurich. He is also a
member of the Education Committee for the AES’s Swiss section.
ii
iii

Sound Reinforcement for


Audio Engineers

Edited by Wolfgang Ahnert and Dirk Noy


iv

Cover image: Claudia Höhne


First published 2023
by Routledge
4 Park Square, Milton Park, Abingdon, Oxon OX14 4RN
and by Routledge
605 Third Avenue, New York, NY 10158
Routledge is an imprint of the Taylor & Francis Group, an informa business
© 2023 selection and editorial matter, Wolfgang Ahnert and Dirk Noy; individual chapters,
the contributors
The right of Wolfgang Ahnert and Dirk Noy to be identified as the authors of the
editorial material, and of the authors for their individual chapters, has been asserted in
accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988.
All rights reserved. No part of this book may be reprinted or reproduced or utilised
in any form or by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying and recording, or in any information
storage or retrieval system, without permission in writing from the publishers.
Trademark notice: Product or corporate names may be trademarks or registered trademarks,
and are used only for identification and explanation without intent to infringe.
British Library Cataloguing-​in-​Publication Data
A catalogue record for this book is available from the British Library
Library of Congress Cataloging-​in-​Publication Data
A catalog record has been requested for this book
ISBN: 978-​1-​032-​11518-​4 (hbk)
ISBN: 978-​1-​032-​11517-​7 (pbk)
ISBN: 978-​1-​003-​22026-​8 (ebk)
DOI: 10.4324/​9781003220268
Typeset in Goudy
by Newgen Publishing UK
v

Contents

List of Figures  vii


List of Tables  xxi
List of Contributors  xxii

1 Introduction to Considered Sound Systems  1


W O L F G A N G AHNE RT AND DI RK NOY

2 Room Acoustics and Sound System Design   20


W O L F G A N G AHNE RT AND DI RK NOY

3 Loudspeakers  68
G O T T F R I E D K. B E HL E R

4 Microphones  108
G O T T F R I E D K. B E HL E R

5 Design for Sound Reinforcement Systems  131


W O L F G A N G AHNE RT

6 System Design Approaches  173


W O L F G A N G AHNE RT

7 Speech Intelligibility of Sound Systems  215


PETER MAPP

8 Acoustic Modelling –​Basics  251


S T E FA N F E I S TE L AND WOL FGANG AHNE RT

9 Audio Networking  283


S T E FA N L E D E RGE RB E R
vi

vi Contents
10 Commissioning, Calibration, Optimization  322
G A B R I E L HAUS E R AND WOL FGANG AHNE RT

11 System Solutions /​Case Studies  352


W O L F G A N G AHNE RT AND DI RK NOY

Index  417
vi

Figures

1.1 Components of a simple sound system  2


1.2 Standard structure for fire detection regulations or voice alarms  4
1.3 Recommended reverberation RT at 500 Hz as a function of volume V  10
1.4 Frequency dependency of the recommended reverberation time RT  10
1.5a Reverberation time at 500 Hz vs. volume in studios  13
1.5b Tolerance range of the frequency response of the reverberation time  13
1.6 Dolby recommendation for the reverberation time  14
1.7 Dolby and THX recommendation for the reverberation time tolerance range 15
1.8 WFS loudspeaker arrangement in a cinema  16
1.9 WFS loudspeaker arrangement in a cinema  17
2.1 Sound pressure level chart in dB(A)  22
2.2 Wall structure reflection behaviour  24
2.3 Left figure: scattering coefficient in blue, important for simulation routines
of sound propagation. Right figure: reflection patterns for three selected
frequencies, Reflection at 125 Hz (red), local reflections at 4000 Hz
(blue) and scattering at 800 Hz (green)  24
2.4 Sound level in a closed space as a function of the distance from the source  28
2.5 Recommended values of reverberation time at 500 Hz for different room
types as a function of volume  28
2.6 Optimum reverberation time at 500 to 1000 Hz (according to Russel and
Johnson)  29
2.7 Tolerance ranges for the recommended reverberation time values vs.
frequency  29
2.8 Schematic energy time curve ETC  30
2.9 Time behaviour of the sound pressure p(t), of the sound intensity Jτo(t)
integrated according to the inertia of the hearing system, and of the sum
energy E(t)  31
2.10 Additional propagation attenuation Dr with varying atmospheric situation
as a function of distance r  33
2.11 Atmospheric damping Dr as a function of distance r at 20°C, 20% relative
moisture and very good weather  34
2.12 Sound propagation influenced by temperature  34
2.13 Sound propagation against the wind (a) or with the wind (b); wind speed
increasing in both cases with height  35
2.14 Curves of equal loudness level for pure sounds  36
vi

viii Figures
2.15 Frequency weighting curves recommended by the IEC 61672 for sound
level meters  36
2.16 Excitation level LE on the basilar membrane caused by narrow-​band noise
with a centre frequency of 1 kHz and a noise level of LG (indicated on the
x-axis between 8 and 9 Bark)  37
2.17 Excitation level LE over the subjective pitch z (position on the basilar
membrane), when excitation is done by narrow-​band noise of
LG =​60 dB and a centre frequency of ƒc  38
2.18 Calculation of the overall level from two individual levels  38
2.19 Early to-​late ratio of the sound reaching the ears, as a function of the
incidence angle  40
2.20 Variation of the sound level difference with discrete frequencies and
horizontal motion of the sound source around the head  40
2.21 Critical level difference ΔL between reflection and undelayed sound
producing an apparently equal loudness impression of both (speech)
signals, as a function of the delay time Δt  41
2.22 Use of a Magnavox sound system in 1919 to target 75,000 people in
San Diego  42
2.23 Mass event in Germany in the 1930s with the then newly developed horn
loudspeakers and condenser microphones  42
2.24 First sound reinforcement guidelines for stadia and theatres in 1957  43
2.25 Schematic diagram of the magnetic tape delay system with perpetual tape
loop, record head to print the original signal and various reproduction
heads at varying distances down the tape loop to obtain a variation of
delay times  44
2.26 Basic measures for first sound reinforcement and first feedback
considerations  45
2.27 (left) 3 dB attenuation by line array principle (C. Heil and M. Urban),
(right) digitally controlled line column by Duran Audio  46
2.28 Multipurpose hall with line array clusters  48
2.29 Hidden loudspeakers in the portal area behind the blue fabric  48
2.30 Feedback circuit  51
2.31 Feedback curve in open air  52
2.32 Sound transmission paths in closed room  53
2.33 Fragment of a frequency-​dependent transmission curve  53
2.34 Difference of the peak and average value of the sound transmission curve  54
2.35 Relationship between the feedback threshold X and the required feedback
level LR  56
2.36 Block diagram of the ACS system  58
2.37 Block diagram of the AFC system  59
2.38 Basic routines of the Constellation system  61
2.39 Working schemata of the Vivace system  63
2.40 Work flow of the ASA system  64
2.41 Working schemata and block diagram of the Amadeus system  65
3.1 Sectional view of a magnetostatic ribbon tweeter of modern design.
The conducting wires are thin copper strips bonded directly to the thin
foil membrane  69
3.2 Cross-​sectional view of a typical cone type dynamic loudspeaker  70
ix

Figures ix
3.3 Some typical loudspeaker types that can be described as point sources.
Typical compact PA systems in upper part (Klein+​Hummel). Ceiling
speaker horn systems in the lower part: left, a large cinema horn by JBL
and right a Tannoy speaker  72
3.4 Comparison of the so-​called ‘stacking’ array of speakers (left picture)
and the nowadays more common ‘line array’ concept (centre picture) to
cover large audience areas. Rightmost a typical active, digitally steerable
line array using multiple identical loudspeakers mounted in one line,
individually driven by a DSP amplifier allowing individual adjustment
of frequency, time and phase response of the 16 point sources to create a
coherent and nearly cylindrical wave front (Renkus-​Heinz)  73
3.5 Frequency response of a loudspeaker system plotted for a rated input
voltage of 2.83 V and a measurement distance of 1 m on axis. Sensitivity
(see equation (3.2)) and bandwidth with respect to upper and lower cut-​
off frequency is depicted with dashed lines  75
3.6 Frequency-​dependent input impedance of a loudspeaker. For this system
the nominal impedance Zn was defined by the manufacturer as 8 ohms.
Taking the tolerance limit into account allowing a −20% undercut this
loudspeaker does not fulfil ISO standards  76
3.7 Upper graph: phase response curves of the sound pressure transfer function
(shown in Figure 3.5); lower graph: the phase response curve of the
impedance transfer function (shown in Figure 3.6)  77
3.8 Group delay distortion of the phase response shown in Figure 3.7  78
3.9 Impulse response of the loudspeaker shown in Figure 3.15  79
3.10 Step response of the loudspeaker shown in Figure 3.15  80
3.11 Waterfall diagram of the loudspeaker from Figure 3.5. The magenta curve
shows the theoretical decay time for constant relative damping equivalent
to 1 period of the related frequency  81
3.12 Typical distortion plot measured by EASERA  82
3.13 Frequency-​dependent maximum achievable SPL for a defined limit of
the THD figure. The figure shows the sensitivity of the loudspeaker, the
theoretical SPL for the proclaimed power handling of 1000 W input
power (manufacturer rating) and the measured, achievable SPL for a
given THD of 3% and 10%  83
3.14 Cartesian system defining the angles for the measurements of directivities
with loudspeakers. Note that the distance to the point for the microphone
is not defined, but must be chosen with respect to the size of the
loudspeaker  84
3.15 Computer-​controlled robot for measuring directionality information of
loudspeaker systems. Note the tilting of the z-​axis, which in consequence
leads to an intersection of the x-​axis with the ground plane of the pictured
half-​anechoic room. At this point of intersection, the microphone is
placed so to ensure that there is only one signal path between source
and receiver. The distance in this set-​up is 8 m. (Picture courtesy AAC
Anselm Goertz.)  85
3.16 Horizontal and vertical polar plots of a two-​way PA loudspeaker system
with 12′′ woofer and 1′′ horn-​loaded tweeter for different 1/​3-​octave
bands. The frontal direction points to the 0°. For positive and negative
x

x Figures
angles, the observation point –​at a fixed distance –​rotates around the
reference point, as shown in Figure 3.14  87
3.17 Isobar plots for a loudspeaker in 2D and 3D display for the horizontal and
vertical directivity. The frequency resolution is smoothed to 1/​12th of
an octave. The horizontal x-​axis shows the frequency. The vertical axis
shows the level relative to the frontal direction (0 degree) in different
colours, hence the 0° response is equal to 0 dB. The orange range covers
a deviation of ±3 dB around the 0 dB, whereas for all other colours the
range covers only 3 dB. The right axis shows the angle of rotation (either
horizontal or vertical) of the loudspeaker for a full 360° rotation  88
3.18 Balloon plot of a loudspeaker system for a given frequency  89
3.19 The relationship between free field sensitivity and diffuse field sensitivity.
The DI describes the difference between the two graphs. The diffuse field
sensitivity is typically measured in 1/​3-​octave bands; therefore, the free
field sensitivity needs to be averaged in the same bandwidth to evaluate
the DI  90
3.20 Sensitivity of a loudspeaker as a function of the Efficiency and Directivity
Index  92
3.21 Theoretical polar plots for a circular piston of 25 cm diameter (a typical
12′′ woofer) in an infinite baffle for frequencies from 500 Hz up to 2.5
kHz in steps of 1/​3 octave. The total range is 50 dB. The dotted lines
have a distance of 6 dB, so that the intersection points at the −6 dB line
denote the beam width of the directivity, leading to approximately 150°
at 1 kHz, 100° at 1.25 kHz, 75° at 1.6 kHz, 58° at 2 kHz and 45° at 2.5
kHz. Obviously, omnidirectional sound radiation can be assumed for
frequencies below 500 Hz  94
3.22 JBL 2360 Bi-​Radial® Constant Coverage (another name for CD) horn
with attached 2′′ compression driver (courtesy JBL). The left picture
shows the narrow slit in the neck of the waveguide, which continues into
the final horn and is intended to diffract the soundwave horizontally into
the final surface line of the horn, so to cover a wide horizontal angle of
90°. The vertical angle of 40° is maintained throughout the length of the
horn with a little bit of widening to the end of the horn mouth.  95
3.23 Horizontal directivity of the JBL 2360 with JBL 2445J 2′′ compression
driver. The aimed coverage of ±45° is met in the frequency range between
600 Hz and 10 kHz. To cover lower frequencies requires a larger horn and
for the higher frequencies the diffraction slit probably needs to be even
smaller (Measurement courtesy Anselm Goertz)  96
3.24 Electro Voice MH 6040AC stadium horn loudspeaker covering the full
frequency range from 100 Hz up to 20 kHz. The construction uses two
10′′ woofers to feed the large low-​frequency horn and one 2′′ compression
driver feeding into the small horn placed coaxially into the large mouth.
The dimensions: height 1.5 m, width 1 m, length 1.88 m, weight 75 kg  97
3.25 Two-​way PA loudspeaker system with 15′′ woofer and 2′′ compression
driver and CD horn (courtesy Klein+​Hummel)  98
3.26 Standard isobaric plots for the horizontal and vertical directivity of an
ordinary two-​way PA loudspeaker equipped with a 15′′ woofer and a CD
horn with 1.4′′ compression driver. While the horizontal directivity is
xi

Figures xi
fairly symmetrical with a slight narrowing at 2 kHz and becomes narrower
at higher frequencies, the vertical isobaric plot shows a typical asymmetry
due to the placement of the two speakers side by side and a strong
constriction of the directivity at the crossover frequency (between 800 Hz
and 1600 Hz) due to interference. (Courtesy four audio.)  99
3.27 The directivity of a loudspeaker column (with N =​16 identical chassis,
membrane diameter a =​6 cm; equally spaced d =​8 cm). The figure shows
the resulting directivity (right picture) derived from the directivity of a
single driver (left picture), which also shows the horizontal directivity of
the column, and the directivity of equally spaced monopole sources (point
sources, in the middle). All directivities are calculated for the centre
frequencies (as listed in the plot) with an energetic averaging within
1/​3-​octave band  100
3.28 Same column as in Figure 3.27 except for the frequency-​dependent low-​
pass filtered loudspeaker arrangement to achieve a constant active length
of the column relative to wavelength. The width of the main lobe is
significantly greater for high frequencies though not smooth. The plot
shows a simulation with piston-​like membranes and theoretical radiation
pattern; it reveals the great potential for DSP-​controlled loudspeaker ​arrays 101
3.29 Left picture: a DSP-​controlled loudspeaker column. By combining up
to nine elements a column with a length of almost 9 m can be realized.
(Courtesy Steffens.) Right picture: placement of DSP loudspeaker
columns in St. Paulus Cathedral in Münster (Germany). Photo taken by
the author during the celebration for the reopening of the cathedral after
renovation including the sound system. Note the installation above the
audience, allowing unobstructed sound propagation even to more distant
places in the audience. However, this requires a downward tilting of the
sound beam  102
3.30 A DSP-​controlled loudspeaker line array (length 3.6 m) is optimized
to deliver sound to two different audience areas. Each picture shows
the optimization within a bandwidth of one octave, from upper left to
lower right: 250 Hz, 500 Hz, 1 kHz, 2 kHz, 4 kHz, 8 kHz. As expected,
the suppression of side-​lobes at high frequencies is difficult. (Calculation
performed with the software dedicated to the Steffens Evolutone
loudspeaker.)  104
3.31 The transformation from circular input to rectangular output. The DOSC
geometry sets all possible sound path lengths to be identical from entrance
to exit, thus producing a flat source with equal phase and amplitude  104
3.32 Representation of the variation of the distance for cylindrical sound
radiation and far field divergence angle (spherical radiation) with
frequency for a straight-​line source array of height 5.4 meters  105
3.33 Two-​dimensional loudspeaker array using individual signal-​processing and
power-​amplifying for each driver. The software allows different types of
directional pattern and sound field applications  106
4.1 Basic design of a condenser microphone. To the left, a sectional
drawing of a classic measuring microphone is displayed; to the right,
the relationship between the components involved in its construction
(diaphragm mass, air volume behind the diaphragm as a spring, and
xi

xii Figures
viscous friction of the air between the diaphragm and the back electrode)
is shown. ©B&K  109
4.2 Section through the capsule structure of the legendary Sennheiser
MD441U. Note the multi-​resonance construction with several chambers
and ports. Furthermore, a cone-​like sound guide is placed in front of the
diaphragm, which serves to optimize the directional characteristics  112
4.3 Basic construction of a ribbon microphone (the left figure shows the
Beyer M130). The magnetic flux of the high-​energy permanent magnets
is guided around the ribbon by the ring-​shaped yoke wires made of highly
permeable, soft magnetic material. The internal magnetic field should be
as homogeneous and tangential as possible through the ribbon  112
4.4 Comparison of the directivity of two pressure microphones: left side a ¼′′
capsule, right side a 1′′ capsule. The polar plots show a clear directionality
for the large membrane at high frequencies whereas the small membrane
shows almost perfect omnidirectional sensitivity. The frequency response
curve for the ¼′′ microphone is flat for free field sound incidence, the one
for the 1′′ microphone shows a distinct presence boost in free field whereas
for diffuse field the response is rather flat until a roll off above 10 kHz (DPA) 
118
4.5 Basic construction of double-membrane microphones (left side: AKG
K4; middle: Neumann M49; right side: Neumann SM 2). The K4 is set
to figure of eight only, whereas the M 49’s directivity can be changed
remotely. The SM 2 forms a stereo-​microphone for coincidence
stereophony (either XY or MS). Both capsules are remotely adjustable for
the pattern (between sphere and figure of eight). By choice of the right
pattern and rotation of the upper capsule, the desired width for the
stereo-​panorama can be set  120
4.6 Two microphones with higher-​order directivities. Left: Sennheiser
Ambeo; right: Eigenmike. The capsules are individually connected to
allow any adjustment to the directivity by external signal-​processing.
Whereas the Eigenmike capsules are pressure type microphones, the
Ambeo capsules are cardioid microphones  120
4.7 Line-​array microphone with pronounced vertical (left panel) and wide
horizontal (cardioid, right panel) directivity (Microtech Gefell KEM 975).
The microphone is built with eight differently spaced cardioid capsules set
into a vertical line  121
4.8 The KEM 975 in use at the lectern at the German Bundestag. A diversity
switch ensures that only one of the two microphones (the one with higher
level) is in use at a time. (Courtesy Microtech Gefell)  123
4.9 Typical shotgun microphone (Sennheiser MKH 8070) and the
frequency-​dependent directivity  123
4.10 Head-​mounted microphone (left); Lavalier microphone (right)
(courtesy DPA). The placement of these microphones requires some
EQ to provide sound without colouration  124
4.11 Direct recording of the violin sound with a small condenser microphone,
which is mounted on the frame of the violin (courtesy DPA)  124
4.12 Microphone supply according to DIN EN IEC 61938  125
5.1 Audible level range for speech and music signals  132
5.2 Noise criteria (NC) curves as a function of frequency  133
xi

Figures xiii
5.3 Noise rating (NR) curves used in Europe  134
5.4 Relationship SPL values and noise rating curves  135
5.5 Computer-​based measurement system for different excitation signals
(schematic block diagram)  136
5.6 Overlay of excitation, raw data and impulse response files  137
5.7 Transfer function as Fourier transform of the impulse response  138
5.8 Phase response of the impulse response  139
5.9 Frequency response and spectrogram presentation  139
5.10 Loudspeaker data and polar diagrams  141
5.11 Different aiming diagrams of a typical point source loudspeaker  144
5.12 Sophisticated data presentation of a point source loudspeaker  146
5.13 Sophisticated data presentation of a modern line array  147
5.14 SPL coverage figures for a line array in part of a stadium  148
5.15 Control panel for coverage control  149
5.16 Controlled radiation avoiding sound coverage in unoccupied zone 2  150
5.17 Loudspeaker system for suppressing echoes  152
5.18 Proportion of listeners still perceiving an echo, as a function of echo delay  153
5.19 Limit curve to be complied with for suppressing echo disturbances  153
5.20 Tolerance curves for the reproduction frequency response in different
applications: (a) recommended curve for reproduction of speech;
(b) recommended curve for studios or monitoring; (c) international
standard for cinemas; (d) recommended curve for loud rock and pop music  155
5.21 Attenuation behaviour of filters of constant bandwidth (a) and of
constant quality (b)  156
5.22 Figures a–​c show the same view of a 3D computer model in AutoCAD,
SketchUp and in the simulation software EASE  158
5.23 Echogram in EASE-​AURA 4.4  160
5.24 3D aiming presentation in a wireframe model (EASE4.4)  161
5.25 2D aiming mapping (EASE 4.4)  162
5.26 Delay presentations in simulation tools: (a) in ODEON 10.0 and (b) delay
pattern of first signal arrival in EASE  163
5.27 Echo detection in EASE: (a) initial time delay gap (ITD) mapping
to check echo occurrence in a stadium; (b) echogram in weighted
integration mode at 1 kHz; (c) echo detection curve for speech at 1 kHz  165
5.28 Waterfall presentation in EASE  166
5.29 Sound pressure level mapping in simulation tools: (a) 2​D presentation in
CATT acoustics; (b) narrow-band presentation in EASE; (c) broadband
presentation in ODEON  167
5.30 Speech transmission index (STI) presentation in EASE: Top: three-​dimensional
presentation in a hall and Bottom: STI presentation in a parliament  169
5.31 Parameter presentation in EASE: Top: Clarity C80 and Bottom: Sound
Strength G  170
5.32 Block diagram of an auralization routine  171
6.1 Radiation angle and loudspeaker height d above ear-​height  177
6.2 Installation grids of ceiling loudspeakers. (a) Centre to centre;
(b) minimum overlap; (c) rim to rim  178
6.3 Loudspeaker coverage (left, 32 loudspeakers; right, 123 loudspeakers)  178
6.4 STI coverage (left with 32, right with 123 loudspeakers)  179
vxi

xiv Figures
6.5 Suspended directive loudspeakers for avoiding an excitation of the upper
reverberant space  180
6.6 Double loudspeakers arranged in a staggered pattern for covering a
large-​surface room  181
6.7 Sound reinforcement system for an outdoor swimming pool area  184
6.8 Simple sound system at a sports ground  184
6.9 Level relations between two loudspeakers arranged at distance a  185
6.10 Loudspeaker arrangement for decentralized coverage of a large square.
(a) Loudspeakers with bidirectional characteristics; (b) loudspeakers with
cardioid characteristics  186
6.11 Installation of passive directed sound columns on the platform of the
Munich main station. © Duran Audio  188
6.12 Radiator block to cover a platform in the main station in Frankfurt/​Main
with sound. © Holoplot  189
6.13 Decentralized coverage of a church nave by means of sound columns for
speech  190
6.14 Simple sound reinforcement channel position of source S, microphone
M, loudspeaker L and listener H as well as associated angles  191
6.15 Sound transmission index LXY as a function of the distance ratio rH/​rXY  193
6.16 Use of built-​in arrays for source localization  195
6.17 Geometric relations in the case of centralized coverage without delay
equipment  196
6.18 Sound-​field relations with different loudspeaker arrangements  196
6.19 L-​C-​R arrays in a larger conference hall (EASE simulation)  197
6.20 Use of supporting loudspeaker for coverage of a listener’s seat  198
6.21 Explanation of localization and phantom source formation as a
function of the time delay of one partial signal  200
6.22 Acoustical localization of a sound source (in the speaker’s desk) by means
of a delayed sound system (schematic)  200
6.23 Schematic layout of a delta-​stereophonic sound reinforcement system.
(a) Working principle. (b) Equipment structure of the DSS  202
6.24 Source-​oriented reinforcement system. (a) Tracking and delay localization
zones for mobile sound sources. (b) Placement of hidden positions of
installed trackers. (c) Visualization of 12 localization zones on a stage  203
6.25 Loudspeakers on stage for source support and tracking procedure with
corresponding software. (a) Stage area with hidden support loudspeakers
in a stage design in the Albert Hall performance. (b) Computer graphic
for visualization of the tracking procedure of a talker or singer on a stage.
(c) One tracker placement on stage for a performance in the Szeged
Drone Arena  205
6.26 d&b Soundscape 360° system  207
6.27 Use of induction loops for compensation of hearing loss in a theatre
main floor  209
6.28 Infrared field strength coverage simulation on listener areas of a lecture
hall with two SZI1015 radiators in blue  210
6.29 Sennheiser SZI1015 radiator/​modulator and infrared receiver  211
6.30 FM receiver EK 1039  212
7.1 Simplified sound system or audio channel intelligibility chain  216
xv

Figures xv
7.2 Subjective effect of delayed reflections and later arriving sounds  219
7.3 Typical speech waveform  223
7.4 Diagrammatic view of speech events (syllables or words)  224
7.5 Diagrammatic view of the effect of noise on speech, for high, moderate
and low signal to noise ratios  224
7.6 Diagrammatic effect of noise on speech elements of varying level  225
7.7 Diagrammatic effect of reverberation on speech elements of the same and
varying levels  225
7.8 Speech waveform for the word ‘back’  226
7.9 Diagram showing the effect of reverberation times of 1.0 and 0.6 seconds
on the word ‘back’  226
7.10 Temporal variability of speech: LAeq =​73 dB, LAmax =​82 dB, LCeq =​78 dB,
LCpk =​98 dB and average LCpk =​89 dB  227
7.11 Typical speech spectrum  228
7.12 Speech and test signal spectra from IEC 60268-​16 2011 and 2020
(Editions 4 and 5)  229
7.13 Speech spectra of six different voices and comparison with IEC 60268-​16
2011 spectrum  229
7.14 Typical octave band contributions to speech intelligibility  230
7.15 Octave band analysis of speech and interfering noise –​with good signal to
noise ratio  230
7.16 Octave band analysis of speech and interfering noise –​with poor signal to
noise ratio  231
7.17 Energy time curve for sound arriving at listening position from distributed
sound system in 1.6 s RT space  231
7.18 Integrated energy plot for distributed sound system  232
7.19 Sound energy ratios –​C7 is effectively the ‘direct sound’ alone. C50 and
C35 include early reflections that will integrate with and increase the
effective level of the direct sound  232
7.20 Example of strong echo occurring in circular reverberant space.  233
7.21 MTF plot for high-​quality sound reinforcement system in 1.4 s RT space
(STI =​0.61)  237
7.22 Effect of speech level on STI for three reverberant conditions  238
7.23 STI qualification bands (categories)  240
7.24 MTI plots for two sound systems exhibiting STI values of 0.61 and 0.49
respectively  241
7.25 Theoretical effect of a delayed sound (echo) on STI  242
7.26 1/​12 octave analysis of Edition 5 STIPA signal (centre frequencies at
125, 250, 500 Hz, 1, 2, 4 and 8 kHz)  243
7.27 Typical target speech response curve  246
7.28 Frequency response for cathedral sound system measured at typical listener
location  247
7.29 Set of frequency response curves for a concert hall high-​quality sound system 
248
8.1 Top: computer model of the main railway station of Berlin (EASE
software by AFMG). Bottom: Holoplot loudspeaker system installed at
Frankfurt Hauptbahnhof (main station)  252
8.2 Exemplary distribution of direct SPL across listening areas in a
medium-size church (EASE software by AFMG)  253
xvi

xvi Figures
8.3 Ambisonics reproduction room  254
8.4 3D computer model of a German church (Frauenkirche Dresden) that
shows the level of geometrical details typically used for acoustic indoor
models (EASE software by AFMG)  255
8.5 Illustration of scattering effects: at low frequencies (left) the fine structure
of the surface is ignored. For wavelengths of the order of the structure’s
dimension, the incident sound wave is diffused. At shorter wave lengths
geometrical reflections dominate again (courtesy Vorländer 2006)  256
8.6 Directivity balloon for a line array (EASE SpeakerLab software by AFMG) 257
8.7 Material measurements in the reverberation chamber  258
8.8 Exemplary scattering and diffusion behaviour of a Schroeder diffuser
computed by AFMG Reflex  260
8.9 Schematic structure of a reflectogram  261
8.10 Exemplary room transfer function measured in a medium-​size room
(EASERA software by AFMG). Typical smooth, modal structure in the
frequency range 50 Hz to 300 Hz; typical dense, statistical structure for
frequencies above 1 kHz  262
8.11 Computed modal sound field of a studio room (courtesy AFMG) showing
the surfaces of equal pressure  264
8.12 Numerical optimization scheme for sound system configurations as used by
AFMG FIRmaker  265
8.13 Positional maps showing an example of improvement of SPL uniformity
when using FIR numerical optimization. Top: without FIR optimization.
Bottom: with FIR optimization  266
8.14a Image source method. Top: construction of image source S1 by mirroring
at wall W1. The connection from image source S1 to receiver E
determines intersection point R1. Bottom: construction of the (possible)
reflection using the point R1  267
8.14b Image source method. Construction of image source S2 by mirroring at
wall W2. Intersection point R2 is outside of the actual room surface.
The reflection is impossible  267
8.15 Ray tracing. Rays are stochastically radiated by a source S in random
directions. Some hit the detection sphere E after one or more reflections.
In this example, the rays representing the floor reflection RF and the
ceiling reflection RC are shown. The direct sound path indicated by D is
computed deterministically  268
8.16 Pyramid-​or cone tracing. Schematic illustration of a beam tracing
approach in two dimensions. Cones with a defined opening angle are used
to scan the room starting from the sound source S. Receivers E located
inside a cone are detected and validated  269
8.17 Radiosity method. Patch P3 is illuminated and excited by radiation from
patches P1 and P2. It may also radiate sound itself  270
8.18 Clarity C80 results shown as 3D mapping for theatre model
(courtesy EASE 5 AURA by AFMG)  273
8.19 Typical example for a result echogram generated by ray tracing simulation
methods (courtesy EASE 5 AURA by AFMG)  274
8.20 Binaural setup with HRTF selections by head tracker. Blue: Right ear
channel. Red: Left ear channel  274
xvi

Figures xvii
8.21 Part of a typical project report (courtesy EASE Focus 3 by AFMG)  275
8.22 Computer model of a theatre with different acoustic materials assigned to
walls, ceiling and floor (courtesy EASE 5 by AFMG)  276
8.23 Eyring reverberation time calculated for a medium-​size church
(courtesy EASE 5 by AFMG)  279
8.24 Schematic overview of binaural auralization process  280
9.1 Link offset determines latency  286
9.2 Phase coherence by identical link offset  286
9.3 Unit types within a network  287
9.4 Router connecting subnet A to subnet C  288
9.5 Star topology  291
9.6 Spine/​leaf architecture  292
9.7 Ring topology  292
9.8 Audio to multiple destinations using unicast  293
9.9 Audio to multiple destinations via multicast  294
9.10 Querier activated in a switch with high-​bandwidth links  294
9.11 Quality of Service (QoS) concept  296
9.12 A PTP leader synchronizes the absolute time across all followers.
Each device then derives its own media clock from this.  297
9.13 Principles of the Precision Time Protocol (PTP)  298
9.14 Example of a PTP scenario with several devices that can be leaders  299
9.15 Concept of a boundary clock switch  301
9.16 Concept of a transparent clock switch  302
9.17 Example of an SDP file (relevant parameters in red)  304
9.18 Stream variants with identical packet size: number of channels versus
packet time  306
9.19 Elements of total latency  307
9.20 Principle of SMPTE ST 2110  309
9.21 Synchronizing older Dante devices in an AES67 environment  313
9.22 Loop detection by the Spanning Tree Protocol (STP)  314
9.23 Link aggregation as safety net for cabling issues  314
9.24 Maximum safety through double networks  315
9.25 Example of a successful (unicast) connection to a device  316
9.26 Matrix crosspoints in senders and receivers must be correctly set  317
9.27 Example of a well-​set link offset. All packets arrived within the
set latency  318
9.28 Example of a link offset that is too short. Not all packets arrived within
the set latency  318
10.1 Top view of a convention hall, showing measurement locations (R)
in the auditorium and source locations on stage  326
10.2 Example of an artificial human speech source for room acoustic
measurements  327
10.3 Tolerance curves for the reproduction frequency response in different
applications: (a) recommended curve for reproduction of speech;
(b) recommended curve for studios or monitoring; (c) international
standard for cinemas; (d) recommended curve for loud rock and pop music  328
10.4 Characteristic frequency spectra for white noise and pink noise. Graph
shows the power density spectrum in dB using an arbitrary normalization  334
xvii

xviii Figures
10.5 Characteristic frequency spectra for white sweep, log sweep and
weighted sweep. Graph shows the power density spectrum with an
arbitrary normalization  336
10.6 Shift register for the construction of the maximal length sequence of
order N =​3  337
10.7 Section of the time function of an MLS of order N =​16. The sampling
rate is 24 kHz  338
10.8 TDS principle. (a) Measurement principle; (b) action of the
tracking filter  340
10.9 SysTune measurement system  341
10.10 Octave-​band display of the spectral shape of white noise and pink noise.
Graph shows the band-​related power sum spectrum with an arbitrary
normalization  342
10.11 Top view drawing of the Allianz arena in Munich, showing
measurement locations in the bleachers  343
10.12 Room-​acoustic measurement setup  344
10.13 Partial spectrogram  346
10.14 Wavelet type presentation  347
10.15 Exemplary section of a measured impulse response where the sound
reinforcement system (after 90 ms) provides a higher signal level than
the original sound source on the stage (at about 44 ms)  349
10.16 Polarity checker from MEGA Professional Audio  351
11.1 Computer model with Atlas Sound Ceiling speakers FA 136 and
Renkus-​Heinz Line arrays IC16 and listener areas  355
11.2 Calculated SPL distribution in the greeter area  355
11.3 Calculated intelligibility number STI from 0.5 to 0.75 in the
greeter area  356
11.4 Computer model of the main station in Berlin  356
11.5 Arrangement of the Duran Audio loudspeaker IntelliDiskDS-​90  356
11.6 Radiation pattern of nine IntelliDisk DS-​90 speakers along the upper
platform in the main station  357
11.7 Computer model of the complicated lobby structure  359
11.8 Overall RT, SPL and STI values in selected floor level  360
11.9 Olympia-​Stadium in Berlin 2002 with no roof and sound coverage from
the perimeter of the field of play  362
11.10 View into the Getec Arena during a handball game  365
11.11 Wireframe model of the Getec Arena Hall  366
11.12 Overall RT, SPL and STI values in the Hall  367
11.13 Wireframe model of the inside geometry of the stadium  368
11.14 Twelve line array positions with a total of 124 Electro-​Voice XLD 281
modules  369
11.15 Outstanding SPL and STI distributions in the bleachers  369
11.16 Location of the stadium in Capetown close to residential areas  370
11.17 Noise map during a night time game  370
11.18 NTi XL2 Handheld Acoustical Analyzer including measurement
microphone M2211  372
11.19 Wireframe model of the 250-​m-​wide entrance lobby  373
xi

Figures xix
11.20 Rendered view of the Grand Foyer including ceiling detail  374
11.21 Designed sound system with line arrays type JBL VT4888DP in the
centre of the foyer  375
11.22 Room acoustic design of the main hall. Left above: View to the hotel
design. Right above: Computer model of the main hall. Below: Echo
simulations in computer model  376
11.23 Recommended secondary structure in the main hall. (Left)
Architectural design of the hall (only hard surfaces); (right) hall with
acoustical treatment at ceiling and back wall. Light red faces at front
and back wall: Broadband absorber (e.g. slotted panels). Orange face at
back wall: Additional broadband absorber (slotted panels or curtain).
Dark red faces at ceiling: Broadband absorber or sound transparent grid.
Dark blue: Glass facade with inclined lamella structure  377
11.24 RT values in the main hall (left) without treatment and (right) with treatment 
377
11.25 Main loudspeaker groups in a theatre  382
11.26 Rendered section of the Musik theatre Linz. ©Landestheater-​Linz.at  385
11.27 Layout of the second floor  385
11.28 View of the stage opening with lowered line arrays  386
11.29 Positions of panorama loudspeakers along the railings of the three galleries  387
11.30 Loudspeaker layouts  387
11.31 View to the stage including loudspeaker systems  388
11.32 Overall sound level in the audience hall, broad band, A-​weighted  388
11.33 STI mapping  389
11.34 Distribution of the STI values by consideration of noise and masking  389
11.35 View of the large (so-​called Bruckner) rehearsal room  390
11.36 View of the studio theatre  390
11.37 Performance TRL stage manager system  391
11.38 Computer model and detail view of the wall structure  391
11.39 Modifying reverberation time by changing the ceiling height above the stage  392
11.40 Variable acoustics in the Concert Theatre Coesfeld, Germany  394
11.41 KKL Luzern Concert hall, rare view from within an echo chamber. © KKL Luzern 395
11.42 Variable low-​and mid-​frequency absorber aQflex by Flex Acoustics  396
11.43 Computer model showing all enhancement loudspeakers and a
rendered view of the hall  398
11.44 Opening concert (left) and hall with the new wall loudspeakers  399
11.45 The Anthem Hall Washington and view of the hall on the right  400
11.46 Computer model and calculated RT of the Anthem Hall  400
11.47 Mappings of SPL and STI  400
11.48 Buddistic temple inside and outside  401
11.49 Floor plan of a traditional synagogue  402
11.50 New Munich synagogue  403
11.51 Rendered computer model with calculation results in mapping and
probability distribution form  403
11.52 View into the Central Synagogue in New York  404
11.53 Ceiling detail of the Central Synagogue including small point-​source
loudspeakers in the corners  404
11.54 Sound columns in the gothic church Maria Himmelfahrt in Bozen, Italy  406
x

xx Figures
11.55 View of the iconostasis with installed directed sound columns in the
Christ the Saviour Cathedral in Moscow  406
11.56 Centrally arranged loudspeaker cluster in a church  407
11.57 Decentralized small line arrays in a church  408
11.58 Test setup for new sound system in the cathedral  409
11.59 New main sound system in the cathedral  410
11.60 St. Ursen Cathedral with two visible line arrays for sound coverage  411
11.61 St. Ursen Cathedral exterior  411
11.62 View into the Sheikh Zayed Mosque in Abu Dhabi, UAE  414
11.63 Line arrays close to the Mihrab prayer niche  415
11.64 View into the Al Eman Mosque in Jeddah KSA  415
xxi

Tables

4.1 Typical parameters for microphones with different directivity patterns  117
5.1 Noise criteria values in different room types  134
5.2 Noise rating (NR) values in different studio facilities  135
5.3 Relationship of distance in m and ft and run time in ms (path time
relation at 20°C)  151
6.1 Examples of achieved sound reinforcement  193
7.1 Reverberation time and sound reinforcement system design  218
7.2 STI matrix  235
7.3 STI matrix for sound system measurement –​STI =​0.611  236
7.4 STIPA matrix  237
7.5 STI qualification bands and typical applications  239
7.6 Speech and STI/​STIPA test signal characteristics  242
7.7 Minimum recommended number of spatially distributed STI measurements  244
7.8 Relationship between STI and % AlCons  245
9.1 Recommended QoS settings  296
9.2 Selection criteria for PTP leaders, sorted by BMCA rules  299
9.3 Recommended PTP parameter settings  300
9.4 Typical audio stream formats  306
9.5 Overview of the chosen approaches  311
xxi

Contributors

Wolfgang Ahnert is a sought-​after author, contributor, educator and lecturer at profes-


sional conferences and tradeshows, and has authored countless white papers on sub-
ject matters such as acoustical simulation processes, measurement technology, electro
acoustical theory and applications. He is author or coauthor of numerous books like
“Fundamentals of Sound Reinforcement” (1981, 1984 in Russian) and the book “Sound
Reinforcement –​Basics and Practice” (1993, 2000 fully updated in English, 2002 in
Chinese and 2003 in Russian and an updated version 2016 in Arabic).
Gottfried K. Behler was the academic director at the Institute for Technical Acoustics
at RWTH Aachen University until the end of 2020. After graduating with a diploma
in electrical engineering from RWTH, he did his PhD on reverberation enhance-
ment in rooms with multi-​channel coupled loudspeaker systems. After a period in
the professional audio industry (as head of R&D at Klein+​Hummel), he returned
to RWTH. His special interests are professional methods for the engineering of
loudspeaker systems, room acoustics, acoustic measurement technology, and audio
recording technology.
Stefan Feistel founded SDA Software Design Ahnert GmbH with Wolfgang Ahnert. Stefan
is a member of the AES, the ASA, the DEGA, as well as several IEC standards groups.
He has authored or co-​authored more than 70 papers focusing on software projects and
the related mathematical, numerical and experimental background studies.
Gabriel Hauser has graduated as Electrical Engineer at the Swiss Federal Institute of
Technology in Zurich. The main subjects of his studies were analogue and digital signal
processing and acoustics. He is Senior Acoustical Engineer at WSDG.
Stefan Ledergerber holds a masters degree in Electrical Engineering and in Management,
Technology and Economics from the Swiss Federal Institute of Technology Zurich (ETH).
He is a member of the AES67 and SMPTE2110 standardization groups and is heavily
involved in technologies related to audio-​and video-​over-​IP. He is a frequent lecturer in
related events and has written several articles for audio/​video magazines. Today, as the
Managing Director of Simplexity GmbH, he offers vendor-​independent consulting and
training services in the field of audio/​video-​over-​IP networks. In this function, Stefan
Ledergerber is actively involved in IP projects with broadcasters and proAV providers as
well as product development with manufacturers.
Peter Mapp is an independent acoustic consultant. He is currently a member of several
British and the International Standards committees concerning sound systems and
newgenprepdf

xxii

Contributors xxiii
speech intelligibility and is vice chair of the AES standards committee on acoustics and
audio. In 2014 he was awarded the AES Bronze medal and in 2020 the UK Institute of
Acoustics’ Engineering Medal.
Dirk Noy has a Master of Science (MSc) Diploma in Experimental Solid-​State Physics
from the University of Basel, Switzerland, and is a graduate from Full Sail University,
Orlando, USA, where he was one of John Storyk’s students. He is Director of Applied
Science and Engineering at WSDG. He frequently lecturers at SAE, TBZ and the
ffakustik School of Acoustical Engineering all in Zurich. He is also a member of the
Education Committee for the AES’s Swiss section.
vxi
1

1 Introduction to Considered Sound Systems


Wolfgang Ahnert and Dirk Noy

1.1 Categories of Sound Systems


The technical design of a sound reinforcement system is essentially determined by the
functional requirements as well as the characteristics of the space wherein it will be
installed.
Any sound reinforcement system consists of three basic component groups and option-
ally a control system for user interfacing (compare Figure 1.1):

A. Input –​wired and wireless microphones, CD and streaming media players, radio tuners,
instruments and any other signal sources
B. Processing –​signal routing, dynamics processing (gates, compressors), frequency
conditioning (equalizers, crossovers), mixing (gain setting and combining) and
distribution
C. Output –​amplifiers and loudspeaker system(s) (optional) control system to adjust
system parameters on all items mentioned above in real time, component switching,
preset store and recall, algorithms, interfacing with third-​party technical systems etc.

According to the present state of the art sound reinforcement systems can be systematized
as per the following list, taking into consideration the critical location relationship between
the signal source (e.g., a stadium announcer or a musician) and the signal receiver (e.g., the
listener in the audience).

A. Sound reinforcement systems where the signal source is remote, possibly a prerecorded
message, mostly invisible and not particularly spatially related to the listener:
Public buildings
Paging systems
Voice alarm systems
Shopping malls
Transportation hubs
Hotels
Museums and exhibition halls
Sports venues
Stadia
Arenas
Outdoor fields/​campuses

DOI: 10.4324/9781003220268-1
2

2 Wolfgang Ahnert and Dirk Noy

Figure 1.1 Components of a simple sound system.

B. Sound reinforcement systems where the signal source is somewhat distanced (e.g., on
a stage, on the cinema screen or in a recording room), but clearly visible and visually
relevant to the listener:
Performing arts centres
Clubs /​discotheques
Music venues
Theatres
Opera houses
Concert halls
Multipurpose halls
Religious venues
Churches
Synagogues
Mosques
Media production facilities
Audio recording studios
Broadcasting facilities
Cinemas
THX, Dolby, DTS
Immersive acoustics
Home cinemas
C. Sound reinforcement systems where the signal source and the listener are co-​located
and basically interchangeable as any listener can become a sound source and vice versa:
Corporate environments
Meeting rooms
Boardrooms
Video conferencing
Educational facilities
3

Introduction to Considered Sound Systems 3

1.2 Public Buildings

1.2.1 Paging Systems


Paging systems are loudspeaker installations that serve the transmission of audible pieces
of information to an audience distributed over a large area, or over many rooms, or both.
The acoustic source of the information is a human speaker or electronic playback device
which is located in a separate technical control room and thus (mostly) not visible for the
addressed audience. The transmitted information can be speech, music and in some cases
also a masking noise.
Several special variations exist, such as stage-​manager paging systems in cultural centres
or a dispatcher loudspeaker system in transport facilities which are installed in halls or
outside. One common feature of these systems is the elimination of a positive acoustic
feedback loop, so the clearly audible, characteristic howling cannot develop. A positive
acoustic feedback loop generally occurs when a loop exists between an audio input (for
example, a microphone or guitar pickup) and an audio output (for example, a power amp-
lified loudspeaker).
The required transmission properties, like the frequency range, the balance of timbre, rela-
tive gain levels as well as the maximum achievable sound pressure level, a volume control in
the determined areas and the availability of one or multiple microphone positions, depend
on the requirements that the system needs to address. The room acoustical conditions are
essential to be taken into consideration for the design and arrangement of the system. In
most cases these conditions correlate with determined functional requirements.
For more details, please refer to section 6.3.1.

1.2.2 Voice Alarm Systems


These sound systems are information systems too, with the particular specification that the
message emitted must –​as a requirement by law –​reach the target person by any means.
For these systems speech signals are used throughout. In most countries these kinds of
systems are linked to fire alarm systems. In addition, there is a split between system standards
and product standards; see Figure 1.2.
In most European countries it is national standards that prescribe the fire detection
regulations and codes in connection with speech transmission. The Speech Transmission
Index (STI) requirement is given as an average value over the considered area minus the
standard deviation of the single values, with STI ≥ 0.5.
Where national standards do not exist, the European Standard is deemed to be valid;
outside Europe the international system standard ISO 7240-​19 is often used, which can
also be referred to when a connection to the fire detection system does not exist. The
American standard NFPA 72 is used quite often outside the US, as some of the equipment
manufacturers are based in the US. In Germany the specific technical standard DIN 14675
requires that the installer of a voice alarm system connected to a fire detection system
needs a special qualification which will only be given after a particular test and approval
procedure [1].
Even if test data show that only a few people actively react to it, in some particular
cases where the installation of a voice alarm system is not required by law it might still
be advisable to install a loudspeaker system for simple alarm tone emission such as in a
factory hall.
4

4 Wolfgang Ahnert and Dirk Noy

Alarm systems using


speech

With connection to a Without connection to


fire detection system a fire detection system

Product standards
International ISO 7240-04 International
ISO 7240-19 ISO 7240-16 ISO 7240-19
ISO 7240-24

Product standards
CEN standard
CEN standard EN 54-04
EN 60849
CEN TS EN 54-32 EN 54-16
(in future EN 50849)
EN 54-24

National standards
National standards
BS 5839-8
BS 7827
NEN 2575
BS 6259
DIN VDE 0833-4 together with DIN 14675
DIN VDE 0828
NF S61-936
TRVB-158
Ö F3076
Ö F3074
NFPA 72

Figure 1.2 Standard structure for fire detection regulations or voice alarms.

1.2.3 Shopping Malls


In shopping malls, the sound reinforcement system provides a number of functions that can
best be illustrated by a range of priority levels. Emergency calls are on the highest priority
level –​the installed loudspeakers basically cover the voice alarm requirements in the case
of an emergency (refer to section 7.3). The second priority level serves for operational mes-
saging, such as an announcement that a car is parked in a wrong spot or that a child who
has lost his parents can be met at a particular information booth and similar. The third and
least relevant priority level serves various audio content such as background music, adver-
tisement announcements or the daily shopping centre closing time jingle.
A recent development for some shopping malls is the composition of a specifically
designed, often quite elaborate audio soundscape, meaning the creation of a certain atmos-
phere by using acoustic signals that can be either automatic or (partially) interactively
triggered by visitors’ actions, movements and presence. It is believed that an adequately
designed soundscape might increase the curiosity and comfort level of the guests and hence
increase the duration of stay in the stores which in turn might result in more sales.
5

Introduction to Considered Sound Systems 5

1.2.4 Transportation Hubs


In transportation hubs the sound systems are mainly put in place for information trans-
mission, more specifically for general calls, for specific location calls and for voice alarm
calls. The general call is used to address all travellers simultaneously. The specific call can
be addressed to a particular platform or gate zone or a defined group thereof to carry infor-
mation that is relevant to selected travellers only, such as a change of departure time at a
particular gate.
Speech intelligibility is of high importance in these types of loudspeaker systems. Long
reverberation times and high background noise levels emitted by equipment and passengers
are detrimental to good speech intelligibility, so extra care is needed for the design of these
systems. As the background level varies widely (e.g., the quiet late-​arrivals evening hour,
the very loud rush hour) the use of automatic gain control (AGC) systems is recommended.
These systems must be implemented smartly –​the more background level the higher the
output level is rarely the correct approach. A ‘maximum level’ must be set at around 90–​95
dBA otherwise health problems may occur and more importantly the speech intelligibility
will start to decrease again, caused by masking effects of human hearing.
Long reverberation times also considerably limit speech intelligibility. Railway stations
and airport passenger terminals are frequently constructed using massive areas of acous-
tically reflective materials such as concrete, stone and glass. In contrast, the volume of all
these spaces is huge, so air attenuation will help reduce the reverberation times at high
frequencies. Because of security reasons the existing glass panes are relatively thick, so they
only act as low-​frequency absorbers at very low frequencies. The subjective impression in
these rooms therefore is a boomy, low-​frequency heavy sound and the intelligibility of voice
alarms is strongly reduced. Hence the acoustician’s job is to cooperate with the architect
and to introduce suitable wall and ceiling areas with defined acoustic absorption proper-
ties. Another issue in these spaces is the high number of loudspeakers, be it conventional
loudspeakers or modern line arrays, that are required to cover all areas of the station or ter-
minal. While a loudspeaker will increase speech intelligibility in its direct radiation field,
it will also create reverberant sound in other loudspeakers’ areas that decreases the speech
intelligibility. The sound system design must consider and account for this, which can be
overcome utilizing the appropriate computer simulation programs before the installation is
started.
Details and case studies are shown in Chapters 6 and 11.

1.2.5 Hotels
Hotels are a collection of many specific types of rooms that can be found individually in
this chapter, such as a ballroom or a cinema. The main reason for a full-​facility sound
reinforcement system is again the emergency voice alarm functionality. Care must be taken
that guests are not suspicious that the loudspeaker system might be used in reverse mode
to listen in to certain areas. Often the loudspeaker system is solely installed in the hallways
and other public spaces.
Beside the alarm and emergency systems conventional sound reinforcement systems
are required for ballrooms and lobbies. Hotels quite often host conferences, meetings and
exhibitions where there is a need for sound amplification mainly for highly intelligible
speech. Operational flexibility might ask for mobile walls; the various configurations make
dealing with sound systems a complex task unless using ceiling speakers. These lower-​quality
6

6 Wolfgang Ahnert and Dirk Noy


systems excite the space with reverberant sound and often result in poor speech intelligi-
bility. A mobile system installed on stands and controlled by technical personnel from a
control desk in the back of the ballroom etc. is a much more sensible approach, while the
video equipment can be managed from that location as well.

1.2.6 Museums and Exhibition Halls


Two particular types of systems are present in museums: The first system delivers specific
information about the displayed artefacts and exhibits. This sound system is part of the
exhibition and may also have an artistic function. The second system serves as a voice alarm
system for emergency calls and has to provide audio information with high intelligibility. As
such it is not much different from similar systems in other facilities.
In historic museums it is not just the artefacts and exhibits that are heritage-​protected
but the building itself might be listed as well. Hence it is difficult to modify the walls or the
ceiling structure by acoustic treatment in these spaces as even the plaster might be listed
as part of the building. Only highly directivity-​controlled line arrays achieve transmitting
clear and understandable information, especially in the case of an emergency.
Often though the client insists on an ‘invisible’ installation not interfering with the pres-
entation of the exhibits. In newly designed museums or in exhibition halls the acoustician
may work with the architect to design the required absorptive areas as acoustic treatment.
In flat-​roofed halls the ceiling can be clad with jointless and uniform absorber layers, with
ceiling speakers used for voice alarms in case of emergency.
The renovation of older museums or exhibition halls is a different case: It might offer
the acoustician or/​and the sound system engineer the perfect opportunity to insert acoustic
treatment and to install the sound sources while not dominating the visual appeal of the
spaces. This approach is facilitated by close cooperation amongst architects and acousticians.
System solutions and case studies are discussed in Chapter 11.

1.3 Sports Venues

1.3.1 Stadia
These single-​purpose facilities are half-​open or fully open rectangular or oval bowls with a
level field of play and a slanted spectator seating area, sometimes distributed over one, two,
three or four rank levels.
The walls and floors are commonly fabricated in concrete construction, the roof is often
a membranic material, sometimes visually transparent. Membranes act as low-​frequency
absorbers (in addition to the panel resonance low frequencies may also pass through the
membranes better in comparison with high frequencies which are reflected), so in combin-
ation with the roof only the audience remains as a broadband absorber. If no further room
acoustical treatment is provided the reverberation times are very long under the roof, which
in turn then increases the design complexity for a sound system which allows voice alarm
calls with sufficiently high speech intelligibility.
To verify the sound quality and to study the acoustical, architectural and technical
parameters (locations, materialization and product specifications) during the design phase
the use of computer simulation is strongly recommended.
Target sound pressure levels are a matter of great discrepancies in technical specifications
and are often predetermined by a sport’s governing body such as FIFA, UEFA, IOC and
7

Introduction to Considered Sound Systems 7


other authorities. Reasonable sound pressure levels for a smooth, clear coverage are in the
90–​95 dBA range, within a level tolerance of ±3 dB over all seating areas. In recent years
even higher SPL values have been specified; the latest FIFA target values for the champion-
ship in Russia are in the 110 dBA range, motivated by the need to surpass audience levels
at any cost. Health problems start to appear at these levels [2].
Various usage scenarios are implemented over a full match day. During a soccer match
the sound system for instance is only used for simple voice-​based messages like goal
announcements or player substitutions and of course for voice alarm calls, should they
be required. The pre-​and post-​game periods are in fact more demanding concerning the
perceived quality of a sound system as entertainment and advertisements will be presented.
The audio quality of the material is of critical importance and frequently gets overlooked
as the ads are generally not produced to be played in a stadium. As a result, they offer poor
sound quality and the speech is unintelligible. Care must be taken to properly delay the
audio signal path for lip-​synching to the video display.
Stadia usually have access, audience flow and support areas that must also be provided
with audio systems for at least the voice alarm calls.
These aspects will be elaborated upon in Chapter 6 and within the case studies of
Chapter 11.

1.3.2 Arenas
In contrast to a stadium, an arena is a large-​volume closed venue. Hence acoustical energy
cannot escape and without proper design will tend to build up. Arenas are likely to be multi-
purpose, say for various sport disciplines and for entertainment alike, sometimes dividable
and reconfigurable (e.g., with mobile audience ranks), making the acoustical considerations
more complex; hence it is strongly recommended to perform studies by use of computer
simulation platforms.
While designing a sound system in such spaces, initially the architecture of the
facility must be studied. If the architectural design is still in development and the acous-
tician and sound system designer are already involved both must inspire the architect to
keep the reverberation time at a reasonably low level, say below 2 seconds in the occu-
pied case. Quite often this cannot be achieved and modern sound system design must
succeed in providing clear and intelligible messages. The configuration of modern sound
systems such as line arrays might conflict with video facilities such as a centrally located
video cube.

1.3.3 Outdoor Fields/​Campuses


These facilities are quite different from the ones mentioned so far. In all other places and
halls there is an interaction between the room acoustic properties and the kind of sound
system design. Without knowing the room acoustic properties, a sound system should not
be designed. This relationship is quite different in open-​air stages. Room acoustic properties
are less important, but other issues must be considered, like

a. the existing noise floor close to the facility


b. the shape, size and position of neighbouring buildings
c. the frequency and day times the stage or venue is used for performances
d. roof or tent construction for rain and sun shielding
8

8 Wolfgang Ahnert and Dirk Noy


In planning such an open-​air venue, the above-​mentioned issues must be considered. The
open-​air applications of sound reinforcement engineering reach from transmission of infor-
mation for large areas to artistic applications in open-​air theatres or sound amplification at
pop-​music events.
As compared with the situation in closed space, the most essential difference consists
in the lack of reverberation. Hence the direct sound component becomes more important,
but without the reverberant sound the noise levels in the neighbourhood are more audible.
In the open-​air case the number of reflections is greatly reduced. Often only some initial
reflections stemming from the floor or from the rear and side walls remain, as is the case in
antique amphitheaters or in some open-​air stages. While early reflections of this kind are
advantageous for loudness and definition of the transmitted signal, all reflections reaching
the listener later than 50 ms after the original signal are perceived as disturbing echoes. This
risk is especially great in the open where intermediate reflections are lacking. In addition,
short-time reflections having a high coherence with the direct sound may cause disturbing
comb-​filter effects.
Echo elimination is possible with sound absorbing or scattering surfaces. Absorbers are
only effective against echoes when they are put in front of the reflecting surfaces creating
echoes. But even while decreasing the level of short-time reflections, they may still have
an echo-​supporting effect. For mobile open-​air installations treatment with absorption
material comes into consideration only in exceptional cases.
If buildings exist around the area covered by the sound reinforcement system, they are
normally so distant from the listener that reflections coming from them are perceived as
echoes. To reduce these reflections, the loudspeakers should be oriented accordingly.
Echoes are sometimes generated by the sound system itself –​as the listener hears mul-
tiple, distributed sources each with a different distance, hence with a different arrival time.
The characteristic high-​frequency drop at a large distance between listener and source
does also occur in open-​air venues. This is on one hand due to the fact that the listener is
often not located in the main radiating direction of the sources which are very directive
in the upper frequency range, and on the other hand it is due to the strong air attenuation
caused by considerably large distances between source and listener. The frequency depend-
ency described in section 2.3.1 shows a noticeable effect.

1.4 Performing Arts Centres

1.4.1 Clubs /​Discotheques


Club sound reinforcement systems are often not engineered to give a perfectly linear fre-
quency response but tend to exaggerate the low-​frequency range. Bass loudspeakers are
often arrayed in particular patterns to optimize the low-​frequency radiation and coverage.
Club loudspeaker design should be undertaken with high sound pressure levels in mind
that will need to be reproduced over many hours of operation, often with increasing sound
levels over time. To retain an undisturbed and clean sound, the headroom before clipping
is of major concern.

1.4.2 Music Venues


Just as in clubs, in music venues the sound system can commercially make or break a venue.
Most permanent music venues for popular music with an audience of up to a couple of
9

Introduction to Considered Sound Systems 9


thousand will have a sound system that is permanently installed (in contrast to a rental or
touring system that is brought in for each occasion). Only a handful of rather well-​known
and high-​end sound systems can be considered for that purpose, as touring acts will usually
have high demands on the sound system, indicated on their ‘technical rider’, basically a
list of preferred equipment that the venue must provide. Exotic, little-​known or lower-​end
solutions will have low acceptability and will cause frequent discussions.

1.4.3 Theatres, Opera Houses and Concert Halls


Theatres as well as concert halls demand a well-​designed interaction between the archi-
tecture and the acoustic properties of the halls. The earliest theatres and concert halls
were small in dimension, mainly built for the ruler and his court. In the eighteenth century
larger theatre buildings were erected (Bayreuth, Germany, 1748; Teatro Alla Scala, Milan,
Italy, 1778; Teatro di San Carlo, Naples, Italy, 1737; first and second Royal Opera House,
London, UK, 1632/​1847; State Opera, Berlin, Germany, 1746; and others).
Based on the layout of the court theatre (the king is in the centre of the limited audi-
ence) most of the older theatres are ‘horseshoe’ shaped. The distance to the stage was similar
for a large part of the audience, as was the level of the direct sound. With well-​controlled,
low reverberation this resulted in acceptable speech intelligibility and good music clarity.
Modern theatres have different shapes with different acoustic properties.
Right up to the middle of the nineteenth century concert halls were small or medium in
size. Larger spaces were built after 1850 (Musikvereinssaal, Vienna, Austria, 1870; Boston
Symphony Hall, USA, 1900; first and second Gewandhaus, Leipzig, Germany, 1781/​1884;
Concertgebouw, Amsterdam, the Netherlands, 1888). In these or similar halls the acoustic
properties are good or very good, and what is known from the destroyed ones, they have
been very good. Older halls were built using an empiric design or halls have been built based
on excellent model concert halls. Quite a number of halls throughout the world have used
the Musikvereinssaal as a model, e.g. the Boston Symphony hall, the second Gewandhaus
in Leipzig, the old Berlin Symphony hall or the Konzerthaus in Berlin [3].
The shapes of the first concert halls were based on existing larger guild halls, mainly
built in ‘shoebox’ shape. The old ‘Gewandhaus’ name shows this until the present
time; ‘Gewandhaus’ may be translated as garment-​house, as a hall for the tailor’s guild.
Subsequently, more complex shapes have been developed, like fan or amphitheatre shapes.
Churches are also used for concert performances.
All these facilities have different acoustic properties; the shoebox shape is often pre-
ferred because of the free volume above the seats and the close side walls which supply
important lateral reflections.
Figure 1.3 shows the recommended reverberation time as a function of the volume. It
can be seen that for most of the venues and for small and larger halls the recommended
reverberation time is between 0.6 and 2 s. Figure 1.4 shows the frequency dependence
of the reverberation time. For music use, a slight enhancement at low frequencies is
recommended. For pure speech reproduction the curve may be level at the low end or even
decrease slightly.
Since about 80 years ago measurements on a physical scale-​model (1:10–​1:20) of the
hall in consideration have been performed to verify that the acoustic properties are per-
fect. Nowadays more and more computer simulations –​using mathematical models and
algorithms –​are in use, for high-​end projects, though the scale model approach is still
recommended and practised.
10

10 Wolfgang Ahnert and Dirk Noy

Figure 1.3 Recommended reverberation RT at 500 Hz as a function of volume V.

Figure 1.4 Frequency dependency of the recommended reverberation time RT.

The use of sound reinforcement systems in these facilities is often newer than the
building, which was not originally designed with sound systems in mind, but today a theatre
or concert hall without a perfect sound system is not imaginable. In a theatre we need
a sound system for the reproduction of all kind of sound effects and audio play-​ins. The
sound system signals are a part of the production, just like the theatrical lighting system.
The localization of the played-​in signals is important as well, so a considerable number of
loudspeakers are often arranged around the stage opening, distributed within the depth
1

Introduction to Considered Sound Systems 11


of the stage and sometimes even in the audience area for sound effects that enhance the
production.
Sophisticated sound equipment is needed in the control booth and also in larger houses
in a separate audio production studio or suite for the creation of the required sound effects.
Live recordings can be undertaken there as well.
Similar technology is required in a concert hall. Most of these halls also serve as multi-
purpose halls, as not only traditional, acoustic concerts are performed throughout the years.
These representative halls are also used for conferences, presentations and for non-​acoustic
musical performances using electronic music instruments and amplification. In some cases,
these user profiles may be realized with temporarily rented equipment, but basic house
equipment is always demanded. Today modern line array systems are applied to handle the
high reverberation in concert halls. The arrangement of these loudspeakers is often an issue
because heritage protection may not permit preferred equipment locations. In larger con-
cert halls we have audio control booths and sometimes even audio production studios for
live recordings.
Finally, it should be mentioned that acoustic enhancement systems are gaining in popu-
larity for smaller theatres or halls with the goal to make them suitable for acoustic concerts.
These systems enhance the natural reverberation by use of pickup microphones, a digital
reverberation processor and an adequately distributed loudspeaker system with the goal to
envelop the listener with reverberation that would not exist without the enhancement
systems.
Please refer to Chapter 11 for details.

1.4.4 Multipurpose Halls


This category includes town halls, courts, government buildings and shopping centres,
malls and all types of facilities with ball rooms, restaurants and lobbies. These rooms can
be large (up to 20,000 m3 and over) or small (down to about 1000 m3) and will be used for
assemblies, conferences, exhibitions and celebrations. Generally, in all these spaces music
or speech reproduction will happen, also live performances of music groups might take
place. Smaller spaces can get away without an installed sound system, but, as a rule of
thumb, if any dimension of the space exceeds 10 m a sound system should be used for speech
presentations.
In the case of government buildings or assembly halls where speech intelligibility is a
basic requirement the implementation of a sound system is obvious. An architect will prob-
ably readily agree to consult an acoustician or a sound system engineer to further optimize
the acoustic design of the space. Acoustic simulation programs then permit efficient studies
of the hall’s acoustic properties. The same computer model may be used to optimize the
sound system and to support the architect to optimize the location of the loudspeakers or
line arrays. This may need some coordination as most architects have a thorough knowledge
of design, shapes, colours, light and other visual effects but rarely of sound and acoustics and
thus prefer to hide the loudspeakers or want to limit their size and aesthetic impact.
Please refer to Chapter 11 for more information.

1.5 Religious Venues


Sound systems in sacral buildings are special and they have been used since loudspeakers
have existed. All houses of the different religions have long reverberation times in common.
12

12 Wolfgang Ahnert and Dirk Noy


The first large houses for worshipping are known from the late Roman or the beginning of
the Byzantine period. A perfect example is the Hagia Sophia, finished in 537, the world’s
largest building at that time. Hagia Sophia with its huge copula (height 56 m and diam-
eter 31 m) was the largest Christian church for almost 1000 years. After the conquest of
Constantinople in 1453 the church was converted to a mosque and has been a prototype for
mosque design until today.

1.5.1 Churches, Synagogues and Mosques


The churches in Europe have been influenced by the different construction styles like
Romanic, Gothic, etc. They have been very large in footprint and –​starting with the Gothic
style –​with considerable ceiling heights. Other shapes and styles have been introduced later
and modern churches of today are often similar to multipurpose halls, especially in the US.
The word synagogue means ‘assembly’, i.e., a hall to gather. The architectural geom-
etry and style as well as the interior design vary to a large extent and no single prototype is
known. Quite often the influence of other local religious buildings may be observed, e.g.,
synagogues in Gothic style in Europe. Later, especially in the nineteenth and twentieth cen-
turies, most synagogues, even the most magnificent ones, were not intended to be built in
a particular unified style, and are best described as individually eclectic. This is valid until
now when new synagogues are constructed [4].
The halls of these sacral buildings are quite often huge and feature long reverberation
times. In empty churches and synagogues all walls, floors and ceilings are acoustically hard
and thus reflect sound; in contrast the floor in mosques is always covered by a carpet. Even
in fully occupied churches and synagogues the absorption caused by the worshippers is low
compared to the uncovered wall and ceiling surfaces, so the reverberation remains high.
In all sacral buildings the spoken word of the priest or imam should be clearly understood.
In the old days this was only possible at distances up to about 20 m from the preacher;
worshippers further distanced could not understand the spoken words. We know from the
Hagia Sophia in Muslim times that some sub-​imams placed at regular distances repeated the
spoken words of the main imam. This could be understood as the first employment of ‘delay
lines’ in a sound reinforcement system.
Today sound systems are used everywhere; in churches and synagogues with distributed
columns modern line arrays are hidden in the architecture of the buildings and are some-
times difficult to localize. Two basic configurations are utilized, firstly a decentralized
arrangement with a number of loudspeaker locations or secondly a centralized position with
a cluster or a large line array.
In mosques the situation can get more complex as there is often an empty area below a
large copula, where arrays cannot be installed in a straightforward setup. Depending on the
design of the mosque a special sound system concept is needed to obtain sufficiently good
intelligibility of the spoken words. It is very tough to engineer a proper solution without the
help of computer simulation.

1.6 Media Production Facilities

1.6.1 Audio Recording Studios and Broadcasting Facilities


Both these types of rooms have existed only since the last century and are created for sound
recording and sound reproduction. A studio in this context is a space for audio recording
13

Introduction to Considered Sound Systems 13


with a volume between about 100 m and 3000 m3. The smaller ones are for instance for
3

recording purposes in concert halls or for voice-​over in post-​production, the larger ones
allow recordings of entire bands, orchestras or orchestra groups. The user profile predefines
the acoustic properties of the space. Usually low reverberation times are required: the
standards indicate values of 0.2 to 0.5 s (Figure 1.5).

Figure 1.5a Reverberation time at 500 Hz vs. volume in studios.

Figure 1.5b Tolerance range of the frequency response of the reverberation time.


14

14 Wolfgang Ahnert and Dirk Noy


The audio monitor speakers serve as the sound reinforcement system that reproduces
either the sound currently being recorded or the audio signals previously recorded and
replayed for editing and mixing.
In Figure 1.5a the blue curve corresponds to the following equation:

RTm =​0.25 (V /​V0)1/​3

where V =​volume of the room


and V0 is the reference volume of 100 m3

When production facilities are producing content to be replayed in multichannel audio


environments, the facility itself needs to offer multichannel capabilities. The production
studio layouts can be a duplicate of the playout venue (such as a scoring stage that looks like
a cinema but is equipped with a large audio production mixing console and further hard-
ware) or alternatively a scaled-​down version, such as a multichannel post-​production studio
with reduced loudspeaker count and slightly modified acoustical specifications indicated by
the standard owner (such as Dolby).

1.7 Cinemas

1.7.1 THX, Dolby, DTS


In cinemas the recommended reverberation time varies between 0.6 and 1.2 s and is
volume-​dependent. For instance, for a cinema with a volume of 6.000 m3 the Dolby standard
recommends an average reverberation time of 0.83 s; see Figure 1.6.
The THX standard asks for similar values.
For a large cinema with a volume of around 20,000 m3 the recommended reverberation
time of both standards is around 1.2 s. Higher values are not recommended. In cinemas

Figure 1.6 Dolby recommendation for the reverberation time.


15

Introduction to Considered Sound Systems 15

Factor of reverberation time


2.5

1.5
RT factor

0.5

0
31.5 63 125 250 500 1000 2000 4000 8000 16000
Frequency / Hz

RT recommended Upper limit Lower limit

Figure 1.7 Dolby and THX recommendation for the reverberation time tolerance range.

of 2500 m3 (fewer than 200 seats) the reverberation time should be decreased to values of
around 0.6 s in the midrange.
Background noise levels should be kept to a minimum to allow for silent parts of the
movie to really be silent and undisturbed. The three primary potential noise sources are: (A)
mechanical equipment (HVAC), (B) noise from adjacent theatres and lobby and (C) out-
door noise.
After the first demonstration of a movie with sound in 1923 in New York horn
loudspeakers were widely used in cinemas in the thirties. Afterwards one-​channel, two-​
channel and four-​channel systems were used until 1950. The Cinemascope format uses
35 mm movie formats. In the seventies Dolby noise suppression was introduced and in 1975
Dolby stereo sound (left, centre, right and surround) came to the cinema with a simple
matrixing algorithm. Tomlinson Holman created the THX standard in 1982.
Not only do the front loudspeakers have to be installed behind the acoustically trans-
parent screen but the screen has had to be curved with the THX standard. There followed
the 5.1 and 7.1 surround sound formats, and the digital ‘Cinema Digital Sound’ (CDS)
format was introduced in 1990. In 1992 the first use of the Dolby Digital format is seen.
One year later the Digital Theater System (DTS) is introduced with sound on a sep-
arate CD-​ROM. In parallel, Sony offers the digital optical sound format SDDS for large
cinemas with eight audio channels. In 1999 still higher audio channel counts are offered
by Dolby Surround EX. After 2000 spatial audio formats are introduced mainly for tests
like wave field synthesis with a large number of surround speakers along the walls; see
Figure 1.8.
These types of formats facilitate the creation of localization effects even inside the lis-
tener area. The compatibility with other systems is not yet solved; therefore, movies with
WFS sound formats are not available practically. Research on this topic is being performed
to evaluate the audience localization performance in the case of 3D-​video projection in
combination with 3D audio reproduction.
16

16 Wolfgang Ahnert and Dirk Noy

Figure 1.8 WFS loudspeaker arrangement in a cinema.

1.7.2 Immersive Audio


Immersive audio is a summary term describing audio systems that are designed to recreate
a realistic, natural, spatial 3D sound field. Ideally and in theory, hundreds of very large
loudspeakers are to be distributed spherically surrounding the listener to cover each angular
possibility at high sound pressure levels; this approach of course is not practically possible.
As it turns out, it is also not really necessary, as the angular resolution of human hearing
is limited and the number of loudspeakers can be significantly reduced while still creating
a rather consistent 3D sound field. Several standards have been developed that define the
number of loudspeakers and their positioning inside a given space and also the distribution
of audio channels over the available loudspeakers. The most popular standards today are
Dolby Atmos [5], Auro3D and DTS.

1.7.3 Home Cinemas


This term relates to complex installations in specifically designed rooms in private homes.
Most common is the basic 5.1 surround format as shown in Figure 1.9.
The figure illustrates an L-​C-​R system, a left and right surround loudspeaker and a sub-
woofer in the recommended arrangement.
17

Introduction to Considered Sound Systems 17

Figure 1.9 WFS loudspeaker arrangement in a cinema.

Usually, the furniture and furnishings in a living room would be sufficiently absorptive as
to control the reverberation time; for designated home theatres an acoustical study is to be
recommended, perhaps specifying absorbers like curtains or tapestries. Strong side and rear
wall reflections should be avoided at the listening position. The surround loudspeakers can
be installed higher than the listening zone and then be slightly tilted downwards to avoid
reflections as well [6].
Additional efforts must be undertaken in designing home cinema rooms for more than
two or three listeners.

1.8 Corporate Environments

1.8.1 Meeting Rooms /​Boardrooms


The advent of audio-​visual presentation, collaboration and communication techniques for
business use has inspired manufacturers to create an entire new universe of equipment.
Directional, programmable microphones, conferencing systems with a microphone/​loud-
speaker station per delegate (which might also be used for ID registration, various lan-
guage translations and voting) or sound bars with integrated cameras are only a couple of
examples.
Two basic approaches are used to design these spaces –​either a roll-​in, mobile solution
with a display unit and some electronics on some type of cart that can be plugged in in any
of the rooms that need the infrastructure at a given moment, or a room-​integrated approach
18

18 Wolfgang Ahnert and Dirk Noy


where the technical equipment is architecturally coordinated and permanently built into
the room in question. The latter approach –​being rather complex in engineering, inte-
gration and user interface –​is chosen for rooms that are purpose-​built and need to echo a
certain level of prestige, style and comfort, e.g., such as a boardroom. The room acoustics
aspects of meeting rooms are critical for both speech intelligibility and comfort.

1.8.2 Video Conferencing


Video conferencing rooms are similar to regular meeting rooms, with the added complexity
of remote parties being virtually present for meetings and presentations. One of the issues
that needs to be dealt with is the elimination of acoustic echoes over an AV conferencing
feed: A loudspeaker in Berlin reproduces the Amsterdam-​based chairwoman’s audio, which
is then picked up by the microphone in Berlin, in addition to the live audio spoken in the
Berlin room. The signal containing the Berlin live audio and the reproduced chairwoman is
then sent to Amsterdam and voilà –​the chairwoman hears herself again replayed through
her loudspeaker; this scenario is extremely irritating and unacceptable.
To limit the introduction of the chairwoman’s audio into the return chain, a specific
processor named AEC (acoustic echo cancelling) is ‘listening in’ to the conversation –​as
the processor records the chairwoman’s voice on the way to Berlin, it can filter out that par-
ticular content from the stream returning through it to Amsterdam. This algorithm works
really well if correctly implemented and allows echo-​less, naturally flowing conversations
(often a couple of seconds calibration time is needed when a connection is established to
train the system).

1.9 Educational Facilities

1.9.1 Schools, Campuses


In universities, schools and similar educational facilities classrooms, lecture halls, auditoria
and lobby areas need to be dealt with. These spaces require sound systems for music and
speech reproduction as well as voice alarms.
Regarding these topics, most aspects can be found in the sections above regarding audi-
toria and meeting rooms.
Legal requirements regarding the necessity of voice alarm systems in schools differ by
country, e.g., for evacuations, fire and amok.
For large campuses, cross-​ building and wide-​area connectivity issues often become
important as events from one classroom might need to be streamed to another classroom
and vice versa. Digital networks with sufficient media data transmission capacities are to be
designed for these situations.

1.9.2 Classrooms
Classrooms are comparable to larger meeting rooms or boardrooms; please refer to the
appropriate section. Remote teaching is more important than ever, and more often than not
the classroom may be equipped with video conferencing infrastructure, taking into account
the issues listed above.
19

Introduction to Considered Sound Systems 19


References
1. DIN 14675, January 2020.
2. FIFA World Cup Stadium Requirements Handbook 2018 FIFA World Cup /​01.11.2014.
3. Beranek, L. Concert Halls and Opera Houses. Music, Acoustics, and Architecture, second
edition. Springer, New York, 2004.
4. Kleiner, M., Klepper, D.L., Torres, R.R. Worship Space Acoustics . J. Ross Publishing, Fort
Lauderdale, 2010.
5. Dolby Atmos Cinema Technica Guidelines, white paper, Dolby Laboratories, Inc. 100 Potrero
Ave., San Francisco, CA 94103, www.dolby.com2, Inc., 2020.
6. Surround Sound in Home Cinemas, NTi Audio Application Note, 2014.
20

2 Room Acoustics and Sound System Design


Wolfgang Ahnert and Dirk Noy

2.1 Developments in Acoustics


The oldest basic studies of acoustics can be found during the Greek Empire.
The first reported scientist was Pythagoras of Samos (580–​500 BC). He performed
experiments with a simple monochord and investigated its tone pitch as a function of
string length. He and his followers found that tones created by subdivision of the string
are in a direct proportion to the original tone of the string. The conclusion was that
all harmonic intervals may be expressed by number ratios. This did fit the slogan of the
Pythagoreans: ‘Everything is number’.
500 years later Marcus Vitruvius Pollio (often abbreviated as ‘Vitruv’) dedicated his ‘Ten
books about architecture’ to the Roman emperor Augustus [1]. These publications are the
only work about architecture and construction written around 30–​20 BC. Besides general
construction details of that time a chapter about sound propagation in ancient theatres
can be found therein. Vitruv compares sound propagation to waves on water and describes
reflections and echoes; additionally considerations regarding tone pitch and intervals are
explained. Translations of Vitruv’s books published in the sixteenth and seventeenth cen-
tury include drawings of recommended floor plans and corresponding sections for clarity –​
well known is a sketch by Leonardo da Vinci of human proportions that were originally
introduced by Vitruv.
For over 1000 years no further books with acoustic topics were published; in fact only
during the Renaissance period did numerous authors publish books on this topic, but the
word acoustics was not yet in use for this science.
Athanasius Kircher (1602–​1680) must be mentioned, a Jesuit priest and scientist. After
working at different universities, he was called as a professor of mathematics and physics to
the Collegium Romanum in Rome. His principal oeuvre was ‘Phonurgia Nova’, a book just
about acoustics. Kircher describes sound effects like reflections and echoes, but also methods
to support sound radiation and sound amplification by use of tubes and ear trumpets. Also,
methods to secretly listen in to sound events are described. This book was translated from
Latin into other languages.

French Scientist Joseph Saveur (1653–​1716) introduced the word ‘acoustics’. He


derived it from the Greek word ‘ακουστός’ which means ‘to be heard’. Sauveur is
well known for his studies on acoustics. His work involved researching the correl-
ation between frequency and tone pitch, and he performed studies on tuning pitch,
harmonics, tone ranges of voices and musical instruments and others.

DOI: 10.4324/9781003220268-2
21

Room Acoustics and Sound System Design 21


In 1802 the German scientist Ernst F.F. Chladni (1756–​1827) published the first full-​
fledged textbook ‘Die Akustik’ (Acoustics). Numerous publications explained the well-​
known ‘Chladni figures’, i.e., by loosely putting sand on metal plates and by exciting these
plates with a violin bow distinctive sand figures became visible. He demonstrated that by
increasing the excitation frequency the sand figures become more complex. This behaviour
is well explained as two-​dimensional standing waves within the metal plates and now is a
popular experiment for young physicists.
A famous German scientist also working in acoustics was Hermann von Helmholtz
(1821–​1894). His acoustic-​physiological research was laid out in the well-​known book ‘Die
Lehre von den Tonempfindungen’ (On the Sensations of Tone) as a physiological basis for
music theory. Helmholtz equations in acoustics and in fluid mechanics are widely known
and used, as is the Helmholtz resonator in room acoustics. Furthermore, the Helmholtz
number is the foundation for physical model measurements and describes the similarity
between frequencies and dimensions of different scales. This led to the law of physical
similarity.
Also well-​known is the oeuvre of Lord Rayleigh (1842–​1919). His book ‘Theory of
Sound’ has been until today a standard textbook to understand the physical basics of wave
radiation. Rayleigh investigated the theory of acoustic elementary radiators and formulated
an inhomogeneous wave equation. He used expressions like monopole and dipole in his
scripts.
Wallace Clement Sabine (1868–​1919) is another significant contributor to the field of
room acoustics. Working at Harvard University in the 1890s he performed the first in-​depth
investigations concerning acoustic reverberation. He used a wind chest and organ pipes to
excite lecture rooms with sound. Just by using a stopwatch and his ears he measured the
apparent decay time of the sound. To obtain different decay times he added or removed seat
cushions to change the amount of absorption in the space. He found the decay time, now
commonly referred to as reverberation time, to depend on the room volume and the absorp-
tion present in the space. The well-​known Sabine reverberation equation was published
in 1900.
Regarding the twentieth and twenty-​first centuries two scientists and acousticians must
be mentioned.
The first one is Leo Leroy Beranek (1914–​2016). After entering Harvard University, he
received his PhD in 1940 and became a professor at MIT Boston. Beranek’s publications
‘Acoustics’ (1954) and ‘Music, Acoustics, and Architecture’ (1962: analysis of numerous
opera houses and concert halls, updated 2004 and 2010) are considered the classic
textbooks in acoustics. Beranek participated in the design of numerous concert halls
and opera houses. All his life he was a frequent contributor at industry conferences and
meetings; his last publication, ‘Concert Hall Acoustics: Recent Findings’, was published
in 2016.
The second scientist who must be mentioned is Manfred Robert Schroeder (1926–​2009),
a German physicist, well known for his contributions to acoustics and computer graphics. In
1954 he earned a PhD in physics. In the same year he joined the technical staff at Bell Labs
in New Jersey. There and also later in Göttingen, Germany, as a full professor he published
numerous books and articles on acoustics. Some of his contributions include the Schroeder
frequency, the Schroeder backward integration to calculate the reverberation time and the
introduction of Schroeder diffusors only to mention a few.
2

22 Wolfgang Ahnert and Dirk Noy

2.2 Interaction of Space, Room Acoustics and Sound Systems

2.2.1 General Issues


The main differences are based on the type of sound sources being used. Natural acoustics
deal with natural, non-​electro-​acoustical sources like

• human speaker voices


• acoustical musical instruments
• technical non-​electro-​acoustical devices such as HVAC systems, generators etc.
• traffic noise such as cars, airplanes or trains

In contrast, electro-​acoustical sound systems employ

• loudspeakers
• microphones
• loudspeaker arrays
• vibration exciters

Sound sources have a certain range of acoustic power they may radiate. Figure 2.1 shows the
range of sound pressure values.
For thousands of years human beings have been familiar with natural sound sources and
their increasing portfolio as new sounds are added. Sound systems have been available since
loudspeakers and microphones were developed, that is for just about 100 years.
Both types of sound sources –​natural and electroacoustic –​may be controlled in level
and frequency range. Natural speech or acoustic music is controlled by the talker, singer

Figure 2.1 Sound pressure level chart in dB(A).


23

Room Acoustics and Sound System Design 23


or player, as applicable; sound systems are controlled by an audio engineer. For speech and
music, a number of acoustic parameters have been developed to quantitatively describe the
character of the sound transmission. Such parameters are e.g., definition and intelligibility
of speech or clarity and transparency of music reproduction. In both cases these parameters
are significantly impacted by the built environment surrounding it –​this is precisely the
interaction between room acoustics and electroacoustic sound reinforcement. Poor acoustic
properties in a hall cannot be completely compensated by an installed sound system. On the
other hand, acoustic properties perfect for symphonic orchestras or organ concerts are not a
preferred environment to present a lecture or to conduct a conference. Within these types
of spaces, the perceived speech intelligibility will frequently be disliked even when very
sophisticated sound systems with digitally controlled line arrays are employed. And vice
versa: within acoustic dry halls a symphonic concert will sound unsatisfactory; in such cases
modern acoustic enhancement systems that inject electronically generated reverberation
into the hall can be implemented; refer to section 2.7.2.
A different approach is realized within the so-​called multi-​purpose halls that combines
room acoustic properties well acceptable for speech and jazz performances while using sound
systems and also well acceptable for small operas or concert events without supporting
sound systems.
In general, it should be stated that sound reinforcement design has to consider the room
acoustic properties present in the space. And the room acoustic design has to be engineered
while respecting the user profile of the space. If electro-​acoustic sound systems are by
default in use for all the intended performances, then no traditional concert hall acoustics
is required.
Before we focus on the requirements that a sound system has to meet let us summarize
the general basics in acoustics that are of significant importance to a sound system designer.

2.2.2 Room Acoustics Fundamentals


Imagine a simple setup with a sound source radiating sound into the environment, and a
receiver being present in the vicinity as well. The strength or amplitude of the signal has
been mentioned earlier (see Figure 2.1). Obviously, a narrow cone of sound energy travels
directly from the source to the receiver (usually an ear or a microphone): this is called the
direct sound. However, the direct sound is not the only sound that can be heard by the
receiver: sound emitted by the source and bouncing off reflective surfaces will travel fur-
ther paths and therefore arrive later in time. More specifically, immediately after the direct
sound a number of discrete reflections will be received and after a short while these reflected
sounds become so dense that they are indiscernible –​this is the so-​called diffuse sound field.
Depending on the relationship between the linear structure dimensions of wall parts
and the incidence sound’s wavelength λ three types of reflections can be distinguished; see
Figures 2.2 and 2.3.
A geometrical reflection happens if the wall structure b < λ, α =​ β (specular reflection
according to the reflection law in the plane perpendicular to the carrier wall). A locally
directed reflection happens if b > λ, α =​ β (specular reflection according to the reflection law,
referred to the effective structural surface). And lastly a diffuse or scattered reflection will be
created if b ≈ λ (no specular reflection, no preferred direction).
In addition to the actual geometry of the reflector its sturdiness (usually a function of the
mass) has to be studied as well to obtain a reflection with as little energy loss as possible. As
a rule of thumb, the higher the mass the more low frequencies are reflected.
24

24 Wolfgang Ahnert and Dirk Noy

Figure 2.2 Wall structure reflection behaviour.

Figure 2.3 Left figure: scattering coefficient in blue, important for simulation routines of sound
propagation. Right figure: reflection patterns for three selected frequencies, Reflection at
125 Hz (red), local reflections at 4000 Hz (blue) and scattering at 800 Hz (green).
25

Room Acoustics and Sound System Design 25

2.2.3 Assessment of the Quality of Sound Events


A spectator in a concert or a visitor to a congress may judge the acoustic reproduction
quality of a signal emitted by a natural source or via electro-​acoustical devices. This
judgement is often rather unspecific, such as ‘very good acoustics’ or ‘poor intelligibility’.
Such assessments combine objective causes with subjective impressions acquired through
various listening experiences.
For speech, good intelligibility is desired in an acoustic atmosphere, which may also be
influenced by the room itself or by sound systems. Far more sophisticated criteria are applied
for assessment of music reproduction. Depending on genre, ‘good acoustics’ can mean suffi-
cient loudness, good sound clarity or a spatial impression that is appropriate to the piece of
music performed. Moreover, as far as reproduction of traditional music is concerned, only
the ‘natural’ timbre should be perceptible. ‘Natural’ timbre is the effect that high-​frequency
components have less dominance at a greater distance from the source than at closer range
for reproduction in halls.
The criteria governing subjective assessment of speech and music reproduction as well
as definitions of the terminology are illustrated in the literature [2, 3] as well as in national
and international standards [4, 5]. Originally, these terms mainly served to assess room-​
acoustical circumstances and are therefore of significance not just for the communication
between the electro-​acoustical and the room-​acoustical designer, but also for assessing the
electro-​acoustical reproduction itself. Some of the most important terms are explained as
follows:

Acoustic overall impression: Suitability of a room for the acoustic performances.


Reverberation: Sound decay after termination of sound excitation.
Reverberation duration: Duration of perceptibility of the reverberation.
The reverberation duration depends on the objective reverberation (as a parameter of the
room), the excitation level, the level of background noise or the threshold of hearing and
the ratio between direct sound signal and diffuse sound signal. The higher the absorption in the
room, the shorter is the reverberation time. This parameter is frequency-​dependent.
Clarity: Temporal and tonal differentiability of the individual sound sources within a
complex sound event.
Spatial impression: Perception of the interaction of sound sources (ensembles) within
the surrounding environment, including the listener.
The spatial impression arises from several individual parameters: among others, the room-​
size impression, the spaciousness, the reverberance and the spatially balanced distribution of
reverberant sound.
Room-​size impression: individually perceived and sound-​event-​dependent size of the
acoustically perceived room.
Spaciousness: Perception of the acoustic amplification of a sound source compared to
its visual perception, especially in the lateral plane of the listener.
The spaciousness depends on the sound level at the listener location and on the ratio of the
direct sound level versus the reflected sound level arriving up to 80 ms after the direct sound
from lateral directions.
Reverberance: Perception that apart from the direct sound reflected sound is audible
that is not perceived as a distinct repetition of the signal.
Echo: Reflected sound arriving with such an intensity and early-to-​late difference that
it is perceived as a distinct repetition of the direct sound.
26

26 Wolfgang Ahnert and Dirk Noy


Flutter echo: Rapid and periodical sequence of echoes, occurring in between two par-
allel surfaces.
Diffusion: Evenness of sound field distribution regarding both intensity and direction
of incidence.

2.2.4 Objective Influencing Quantities, Criteria and Quality Parameters in Rooms

2.2.4.1 Reverberation Time, Critical Distance


When inquiring about the quality of acoustics within a room, the expert as well as the
acoustical layman will often employ the ‘hand-​clapping test’, enabling him to excite the
volume of a hall with acoustic energy and to listen to the reverberation decay process. This
is often followed by an estimate of the reverberation time [6], which is the oldest and prob-
ably the most familiar room-​acoustical measurable parameter, although it has been known
for a long time that by itself the reverberation time permits one to obtain no more than a
limited statement on the acoustic properties of a room. Nevertheless, it continues to be one
of the principal criteria and therefore it is the first quantity to be explained here in detail.
The reverberation time (RT) is defined as the time during which the mean steady-​state
energy density w(t) of a sound field in an enclosure will decrease by 60 dB after stopping
the energy supply. It can be shown that the reverberation time depends on the volume and
the total absorption (or sound attenuation capability) of the surfaces of the room, given for
metric units:

0.163 V V
RT = ≈ 0.163 (2.1)
4 m V − S ln (1 − α ) A

V volume in m³
A equivalent absorption area in m²
α mean sound absorption coefficient (frequency-​dependent)
S total surface of the room in m²
m damping coefficient as a function of air absorption and frequency in m−1 [1]

The equivalent sound absorption area (A) is calculated as

A = α S = ∑ α i si + ∑ A n + 4 m V (2.2)
i n

αi sound absorption coefficient of the partial areas Si


An equivalent absorption area of objects and bodies

The reverberation time cannot be decreased by electro-​acoustic measures; however, it can


be increased (refer to ‘Electronic procedures for enhancing reverberation and spaciousness’,
section 2.7.2).
The sound power absorbed in a room, Pab, can be derived from the relation energy density
w =​sound energy W/​volume V, under consideration of the differential quotient Pab =​dW/​
dt, as a measure of the energy loss in the room from (2.1) and (2.2):

Pab =​ ¼ wr c A.
27

Room Acoustics and Sound System Design 27


In steady-​state conditions the absorbed power is equal to the sound power P fed into the
room. Thus, one obtains the average sound energy density in the diffuse sound field of the
room as

4P
wr = (2.3)
cA

c velocity of sound

While the sound energy density wr is approximately constant in the diffuse sound field, the
direct sound energy and thus also its density decrease in the near field of the source with the
squared distance r from the source, hence given as

P 1
wd = (2.4)
c 4πr 2

Strictly speaking, this is valid for spherical point sources only, but may at a sufficient dis-
tance from the source also be accepted for most of the practically available loudspeakers, in
which case the energy is considered by the directivity characteristics.
In this zone of dominating direct sound, the sound pressure loss results as p ~ 1/​r. By
doubling the distance r the sound level drops by 6 dB.
If the energy densities of the direct sound and the diffuse sound are equal (wd =​ wr), (2.3)
and (2.4) can be equated, i.e. one can derive a particular distance from the source, the rever-
beration radius rH. Therefore, for a spherical source follows

A A V
rH = ≈ ≈ 0.141 A ≈ 0.057 (2.5)
16π 50 RT

For a directional source the eq. (2.5) must be changed to (2.5a) and we obtain the so-​called
critical distance Dc:

Dc = Γ ( ϑ ) * Q * rH (2.5a)

Γ(δ) angle-​dependent directional factor (eq. (3.7))


Q directivity factor (eq. (3.13))

In Figure 2.4 the variation of the overall energy density level 10 lg w dB is plotted as a
function of the distance r from the source (w =​ wd +​ wr). In the direct field of the source,
one observes a decrease of 6 dB per doubling of distance. For a directional sound source
with the directivity factor Q this can be expressed as 10 lg wd dB ≈ 10 lg Q dB − 20 lg r dB.
Hence it follows that, beyond the critical distance Dc, an area wherein a constant diffuse-​
field level 10 lg wr dB ~ − 10 lg A dB prevails. In an absolute free field (A → ∞) the free-​field
behaviour (6 dB decrease per distance doubling) would continue (dashed line in Figure2.4).
Figure 2.4 shows that the critical distance can also be derived graphically. The critical dis-
tance thus obtained is Dc =​10 m.
28

28 Wolfgang Ahnert and Dirk Noy

Figure 2.4 Sound level in a closed space as a function of the distance from the source.

Figure 2.5 Recommended values of reverberation time at 500 Hz for different room types as a function
of volume.
1 Rooms for oratorios and organ music
2 Rooms for symphonic music
3 Rooms for solo and chamber music
4 Opera theatres, multi-​purpose halls for music and speech
5 Drama theatres, assembly rooms, sports halls
29

Room Acoustics and Sound System Design 29

Figure 2.6 Optimum reverberation time at 500 to 1000 Hz (according to Russel and Johnson).

Figure 2.7 Tolerance ranges for the recommended reverberation time values vs. frequency.

For particular applications, objective reverberation time recommendations can be given


(Figures 2.5 and 2.6). In addition to the absolute duration, the evenness over frequency is
also a relevant parameter. In a drama theatre the reverberation time RT should be between
1.0 and 1.3 s. In a concert hall of equal dimensions, the nominal values are to be significantly
higher, i.e. between 1.6 and 2.1 s. Figure 2.7 shows the recommended reverberation time as
a function of frequency. In rooms for music performances an increase of reverberation time
30

30 Wolfgang Ahnert and Dirk Noy


is often recommended for frequencies below 250 Hz. This should be producing a warm,
round sound pattern. On the other hand, for speech a flat response or a decrease of the
course of the reverberation time could be used below 250 Hz.

2.2.4.2 Analysis of the Energy-Time Curve


In comparison to reverberation time, the room impulse response provides considerably more
detailed information regarding the energy time behaviour. After being emitted from a sound
source, the direct sound is the first component to reach the listener, followed by the first or
initial reflections, until in the course of increasing density and decreasing level over time
the remaining reflections become perceptible as a diffuse reverberation tail; see Figure 2.8
for a squared impulse response, i.e. an energy time curve.
Figure 2.9 illustrates the behaviour of sound pressure, weighted energy and integrated
energy over time. One can see that after a sufficiently long time the reflected energy arriving
at the listener’s location becomes so weak that for t → ∞ the final value E∞ no longer
increases by a relevant margin.
The energy time curve in Figure 2.9 allows a simple determination of the partial energy
values arriving at the listener’s location, e.g. at 50 ms or 80 ms after the direct sound.
Of the overall energy E∞ that arrives at the listener’s location, only 5 to 15% can be
attributed to the direct sound Ed and 10% to the reverberation tail. About 80% of the
energy is contained in the initial reflections. These reflections significantly shape the

Figure 2.8 Schematic energy time curve ETC.


31

Room Acoustics and Sound System Design 31

Figure 2.9 Time behaviour of the sound pressure p(t), of the sound intensity Jτo(t) integrated according
to the inertia of the hearing system, and of the sum energy E(t).

subjective sound impressions, which change from audience area to audience area along
with the variation of these initial reflections. Sound reinforcement systems provide the
option to improve unfavourable reflection behaviours (illustrated in reflectograms) (e.g.
low direct sound, missing short time reflections to support speech intelligibility) by electro-​
acoustically compensating for the missing energy in those time intervals wherein room-​
acoustical reflections do not occur.
The threshold between early and late energy portions depends on the genre of the
performance and on the build-​up time and lies at about 50 ms for speech and at about
80 ms for symphonic music, after the direct sound. Early energy enhances clarity, late energy
enhances the spatial impression. Lateral incident energy within a time range of 25 to 80 ms
may even enhance both clarity and spatial impression [7]. This is of crucial importance for
the planning of sound reinforcement systems.

2.2.4.3 Criteria for Speech Intelligibility and Clarity for Music


One of the principal tasks of sound reinforcement engineering is improving speech intel-
ligibility and enhancing clarity for music. Optimization of the systems used to this effect
requires observation of several criteria which will be explained.
32

32 Wolfgang Ahnert and Dirk Noy


2.2.4.3.1 ENERGY TIME MEASURES USED FOR ASSESSING DEFINITION AND CLARITY

The definition measure C50 was derived from speech clarity D [8] as defined by Thiele:
50 ms

∫ p 2 (t ) dt
C 50 = 10 lg 0

dB
(2.6)
∫ p 2 (t ) dt
50 ms

This means that the more sound energy arrives at the listener’s seat within the first 50 ms
the higher is the speech intelligibility, i.e. the definition. Good speech clarity is generally
given when C50 ≥ 0 dB.
The frequency-​dependent definition measure C50 should increase by approx. 5 dB with
octave centre frequencies above 1 kHz (octave centre frequencies 2 kHz, 4 kHz and 8 kHz),
and decrease by this value with octave centre frequencies below 1 kHz (octave centre fre-
quencies 500 Hz, 250 Hz and 125 Hz).
Extensive investigations were carried out to establish a measure for the clarity of classical
music. It was found that with symphonic and choir music it is not necessary to distinguish
between temporal clarity and tonal clarity (the latter determines the distinction between
different timbres) [9]. Both are equally well described by the clarity measure C80:
80 ms

∫ p 2 (t ) dt
C80 = 10 lg 0

dB
(2.7)
∫ p 2 (t ) dt
80 ms

The value for a good clarity measure C80 depends strongly on the musical genre. For romantic
and most classical music, a range of approximately −3 dB ≤ C80 ≤ +​4 dB is regarded as good,
whereas jazz and modern music will allow for values of up to +6dB to +8dB.

2.2.4.3.2 MEASURES FOR SPEECH INTELLIGIBILITY

Three major methods to evaluate the quality of intelligibility can be noted:

1. Speech Transmission Index STI (developed by Houtgast and Steeneken in 1972 [10],
1985 [11])
2. Articulation Loss of Consonants Alcons (developed by Peutz and Klein in 1971 [12, 13])
3. Subjective intelligibility tests

As this topic is of major significance for the use of sound reinforcement systems, the details
will be laid out in a separate chapter (Chapter 7).

2.3 Basics in Sound Propagation

2.3.1 Sound Propagation in the Open Air


According to (2.4) the sound pressure level decreases by 6 dB per distance doubling. The
direct sound level of a spherical source is expressed by
3

Room Acoustics and Sound System Design 33


Ld =​ LW − 20 log r dB − 11dB (2.4a)

At a distance of r =​0.28 m from the assumed point source, the sound pressure level Ld and
the sound power level LW are equal. At just 1 m distance (reference distance) both levels
already differ by 11 dB, i.e. with a sound power of 1 W (⇒
⇒ Lw =​120 dB sound power level)
the sound pressure level amounts to just Ld =​109 dB.
For open-​air installations where the distance between loudspeaker and listener may be
exceptionally large, an additional propagation attenuation Dr depending on temperature
and relative humidity must be considered. In this case the sound pressure level at distance r
is calculated for the assumed point source as

Ld =​ LW − 20 log r dB − 11dB − Dr dB (2.4b)

Curve 3 in Figure 2.10 illustrates the average value of the empirically derived curve
family that should be used in practice. It reveals that up to a distance of 40 m no add-
itional attenuation needs to be considered. This for instance applies for nearly all indoor
rooms. The additional attenuation Dr increases with increasing frequency (Figure 2.11).
This behaviour needs to be taken into account when designing a sound reinforcement
system.
Owing to the heat expansion of the air, the speed of sound increases by about 0.6 m/​
s per degree Kelvin. This implies that in a layered atmosphere in which the individual
air layers are of different temperatures, the sound propagation is modified accordingly
(Figure 2.12) [14].

Figure 2.10 Additional propagation attenuation Dr with varying atmospheric situation as a function


of distance r.
Range 1–​2 very good (dusk)
Range 2–​3 good (overcast)
Range 3–​4 mediocre (mean solar radiation)
Range 4–​5 poor (heavy solar radiation)
Range 5–​6 very poor (desert heat)
34

34 Wolfgang Ahnert and Dirk Noy

Figure 2.11 Atmospheric damping Dr as a function of distance r at 20°C, 20% relative moisture and
very good weather.
Parameter: frequency ƒ

Figure 2.12 Sound propagation influenced by temperature. (a) Negative temperature gradient; sound
speed decreases with increasing height. (b) Positive temperature gradient; sound speed
increases with increasing height.

Where the air is warmer near the ground and colder in the upper layer, an upward diffraction
of sound takes place so that sound energy is withdrawn from the ground-​near transmission
path with increasing deterioration of the propagating conditions (Figure 2.12a). This case
occurs for example with strong sunlight on plain terrain as well as in the evening over water
surfaces which were warmed up during the day. Inverse conditions prevail with cool air
at ground level and warmer air in the upper layer, as is the case over snow areas or in the
morning over water surfaces. Under these conditions sound energy is diffracted from the
upper layers down to the lower layers (Figure 2.12b).
Given that wind speeds are relatively low compared to the speed of sound (wind speed
in a storm is approx. 25 m/​s, while the average speed of sound c =​340 m/​s), sound propaga-
tion normally is not significantly influenced by wind. However, due to the roughness of
the Earth’s surface the wind speed is lower at ground level than in higher layers, so sound
35

Room Acoustics and Sound System Design 35

Figure 2.13 Sound propagation against the wind (a) or with the wind (b); wind speed increasing in
both cases with height.

propagation may indeed be modified by the wind gradient in a similar manner as by the tem-
perature gradient (Figure 2.13). Thus, speech intelligibility may be significantly decreased
by a very whirly and gusty wind [14].

2.3.2 Loudness Perception and Masking Effect


Loudness perception is limited downwards by the threshold of hearing and upwards by the
threshold of pain.
The phenomenon of the threshold of hearing originates from the fact that a certain
minimum sound pressure is required for producing a hearing impression. At 1 kHz, this min-
imum sound pressure, averaged over a large number of people, amounts to

po =​2 • 10−5 Pa =​20 μPa

or a sound intensity of

Jo =​10−12 W/​m2.

Following international standardization, these threshold values correspond to a sound


pressure level of 0 dB. The hearing threshold is a function of frequency and rises signifi-
cantly for lower frequencies, as can be seen in Figure 2.14. This mechanism prevents per-
manent audible disturbance by natural phenomena such as air turbulences and overreaching
of sound transmissions.
The upper limit of sound perception is given by pain caused by the ‘clipping protection’
of the auditory system (disengagement of the ossicles). This limit lies at 106 times the sound
pressure or 1012 times the sound intensity of the audibility threshold value at 1 kHz. Even
before this level is reached non-​linear distortions occur which begin at a level of about 90
dB. Loudness perception widely follows a logarithmic law (the Fletcher-Munson graph).
One particular scale for loudness perception is the subjective ‘phon’ scale according
to Barkhausen [15]. This scale was established by comparing a 1 kHz tone with a tone
of another frequency and adjusting it to perceived equal loudness. Hence curves of equal
perceived loudness are obtained which are similar to the curves shown in Figure 2.14 with
some individual variations.
Note that the term ‘phon’ may be used only for such subjectively ascertained loudness
values. Due to the somewhat arbitrary standardization it follows for 1 kHz:

LN =​ L =​20 lg(pN/​po) phon =​20 lg(p/​po) dB =​10 lg(J/​Jo) dB. (2.8)


36

36 Wolfgang Ahnert and Dirk Noy

Figure 2.14 Curves of equal loudness level for pure sounds.

Figure 2.15 Frequency weighting curves recommended by the IEC 61672 for sound level meters.

The lower sensitivity of the auditory system for low and high frequencies at low sound levels
is approximated for determined loudness values by means of weighting curves that work like
filters in sound level meters and international standards (Figure 2.15). According to IEC
61672, the A-​weighted curve approximately corresponds to the sensitivity of the ear at 30
phons, whereas the B-​weighted curve and the C-​weighted curves correspond more or less to
the sensitivity curves of the ear at 60 phons and 90 phons, respectively [16].
37

Room Acoustics and Sound System Design 37


The A curve, which is selectable in any sound level meter or app, however, is also of
importance for sound reinforcement engineering for measuring the sound level distribu-
tion of speech intelligibility of sound reinforcement systems in noisy environments, and
also in the vicinity of supply ducts of ventilation and air-​conditioning systems (including
e.g. air outlets in the backs of chairs). The A-​weighting curve is recommended in such
cases to avoid mismeasurement caused by air turbulences and thereby induced low-​
frequency sound.
If several tones or noises of different neighbouring frequencies and at different loudness
levels are simultaneously present, additional audio content may under certain conditions
remain inaudible, although its sound level lies above the threshold of hearing. This occurs
when the weaker noise is masked by the louder one [17, 18, 19]. The audibility threshold
of the weaker acoustical stimulus is determined by the masked threshold of audibility.
Figures. 2.16 and 2.17 illustrate the masking effect under different excitation conditions.
This effect is based on the fact that on the basilar membrane of the human ear not only
the narrow range corresponding to the stimulating frequency is excited, but, with greater
loudness levels and to an increasing degree, the neighbouring ranges as well. As can be seen
from the figures, this effect is significantly stronger towards higher frequencies than towards
the lower ones.
The masking effect is of great importance for sound reinforcement engineering. One
consequence of it is that narrow-​band frequency response dips and peaks which often occur
caused by loudspeaker interferences are usually inaudible, whereas higher-​frequency peaks
in the response may give rise to considerable timbre changes due to the masking of the

Figure 2.16 Excitation level LE on the basilar membrane caused by narrow-​band noise with a centre
frequency of 1 kHz and a noise level of LG (indicated on the x-axis between 8 and
9 Bark).
z subjective pitch
38

38 Wolfgang Ahnert and Dirk Noy

Figure 2.17 Excitation level LE over the subjective pitch z (position on the basilar membrane), when
excitation is done by narrow-​band noise of LG =​60 dB and a centre frequency of ƒc.
LT resting threshold

Figure 2.18 Calculation of the overall level from two individual levels.

neighbouring ranges. The inaudibility of weak background noises in the presence of much
louder audio signals may be attributed to the masking effect as well.
In this respect it is also interesting to know how to arithmetically summarize sound
levels in a given spectrum. Since the addition is that of the common logarithm of energy, it
is necessary to add the p 2-​proportional energy contents. This is simple if n coherent sound
sources are of same level and same spectrum, resulting

Ltotal =​ L +​10 lg n dB. (2.9)

With sound stimuli arriving at the listener’s position in addition to the direct sound from
sound reinforcement systems (simultaneously or briefly delayed) the sound components to
be added are often of different levels. For ascertaining the overall level, one may use the
nomogram given in Figure 2.18.

2.3.3 Echo Behaviour of the Auditory System


When examining the question at which amplitude and intermediate pause duration two
identical signals will merge into one single auditory impression, or, respectively, the listener
perceives two individual signals, one comes to the conclusion that a relaxation period (time
constant) of about 30 ms duration fits well for describing the energetic behaviour of the
auditory system [20]. Internationally one has agreed on τ =​35 ms.
39

Room Acoustics and Sound System Design 39


Thus consideration of the relaxation time of 35 ms in an impulse sound level meter [19]
is a good compromise between merging (long time period of observation) and individual
rating (short pulses with intervals).
This also makes it possible to determine the perceptibility of echoes. If a new sound signal
of the same level as the previous excitation arrives after these 35 ms, it will be audible as a
separate sound. It will then no longer be integrated within the previous signal but it will be
perceived as a new one. Important in this respect are comparable levels of the reflections,
and the duration of the impulses combined with the intervals between them, as well as the
occurrence of other reflections in the interval between them (the filling-​up of the interval
with signal repetitions); refer to Dietsch [21].
Only a consciously audible signal repetition is referred to as an echo. An empirical value
in this regard is a delay of 100 ms. This time is required for recognizing a sound (such as a
syllable) as a repetition. But repetitions with shorter delays produce a disturbing effect as
well, as they reduce intelligibility and cause a timbre change. A limit of 50 ms is assumed
as a guide value for the blurring threshold in speech (100 ms correspond to a path length
difference of 34 m, 50 ms to one of 17 m).

2.3.4 Acoustic Localization


Acoustic localization is achieved mainly by the binaural structure of the auditory
system, or simply the presence of two ears. For sound sources placed in the median
plane at the listener position (vertical plane, perpendicular to the floor; front, above,
behind) the travel time differences between the two ears are identical and hence do
not contribute to localization, as both ears are equidistant. Monaural localization of
a sound event is nevertheless possible on account of so-​called directional frequency
bands. These result from the frequency-​dependent shadowing by the pinna (ear shell)
and allow assigning directionality (‘behind’, ‘above’ or ‘front’) to narrow-​band noises
(cf. Blauert [22, pp. 108 ff.]).
Binaural localization is simpler to explain. Lateral sound incidence from outside the
median plane produces a pressure increase in the directly impacted ear in the medium and
upper frequency ranges, whereas the shadowed ear is less impacted. The auditory system is
thus more sensitive for lateral (sideways) sound incidence than for frontal incidence. In
addition to the level difference, the ear closer to the sound source will receive the sound
sooner than the other ear. The direction of incidence therefore is determined by time and
level differences at the ears. Figure 2.19 shows the travel time differences and Figure 2.20
the level differences at the ears.
Investigations [23] have revealed that with frequencies below 300 Hz directionality is
mainly determined by travel time differences, and with frequencies above 1 kHz by level
differences.
In investigations on the ‘influence of a simple echo on the intelligibility of speech’
Haas established that the ear characterizes the origin of an auditory signal being wherever
the very first sound waves are localized from [24]. This applies for finite time and level
differences only though. The precedence effect derived from this and other investigations
establishes that the perception of the direction of incidence is determined by the very first
arriving signal. This is still the case if the level of the secondary signal (the repetition of the
first one) is increased up to 10 dB and arrives within 30 ms (Figure 2.21). Only at travel time
differences of t > 40 ms does the ear slowly begin to notice discrete repetitions or reflections,
40

40 Wolfgang Ahnert and Dirk Noy

Figure 2.19 Early to-​late ratio of the sound reaching the ears, as a function of the incidence angle.

Figure 2.20 Variation of the sound level difference with discrete frequencies and horizontal motion of
the sound source around the head.

but continues localization by the primary event. The blurring threshold (see above) is at
about 50 ms. With longer delay times (30 ms and longer) timbre changes are noticeable and
at ≥ 50 ms distinct echoes (‘chattering effect’) are audible.
The precedence effect is frequently used for the localization of sources in sound reinforce-
ment systems. This will be covered in detail in Chapter 5.
If identical signals from two loudspeakers show a mutual delay of max. 3 ms at the point
of arrival they merge into one single signal as far as localization is concerned. The resulting
effect is a phantom source between the two directions of incidence, called a summing local-
ization effect. If neither a delay nor a level difference exists between the two signals, the
phantom source is located on the bisecting line between the directions towards the two
loudspeakers. Delay or attenuation of one of the two loudspeaker signals causes a shift of
the phantom source determined by the summing localization effect and leaning towards the
loudspeaker radiating either the earlier or the louder signal.
41

Room Acoustics and Sound System Design 41

Figure 2.21 Critical level difference ΔL between reflection and undelayed sound producing an appar-
ently equal loudness impression of both (speech) signals, as a function of the delay time Δt.

2.4 Development of and Requirements for a Sound Reinforcement System


A modern sound reinforcement system is confronted by a great number of general
requirements and these may be different from application to application. Certain basic
requirements, however, have to be complied with in all applications.
Until the early twentieth century, it was known that under certain conditions (rever-
berant spaces, long distances, noisy environments) the human voice alone is unable to
cover the intended areas where a talker wants to give a speech. The first documented appli-
cation of a sound system with a Magnavox system (10 W power!) was a Christmas concert
for 100,000 people on 24 December 1915 in the San Francisco City Hall.
In 1919, Woodrow Wilson, the 28th President of the United States of America, used a
similar installation to address 75,000 people in San Diego (Figure 2.22).
These types of systems became possible after the invention of the components of a sound
reinforcement system:

• first loudspeaker with moving coil principle by Oliver Lodge, 1898


• invention of the electronic vacuum tube as a precondition for the construction of an
amplifier by Lee DeForest, 1906
• invention of the first condenser microphone with a built-​in pre-​amplifier by Edward
C. Wente in 1915, patented in 1920

Soon it became obvious that the sound level produced by the system within the audience
area of the hall or in the open air must be sufficiently high. The following quantities are
related herewith:

• loudness (expectation value, required sound level)


• performance of the sound reinforcement system
• adequate signal to-​noise ratio within the audience area (dynamic range of reproduction)
42

42 Wolfgang Ahnert and Dirk Noy

Figure 2.22 Use of a Magnavox sound system in 1919 to target 75,000 people in San Diego.

Figure 2.23 Mass event in Germany in the 1930s with the then newly developed horn loudspeakers
and condenser microphones.

From the 1920s engineers strived to improve the components of a sound system step by step. In
the US, the development of horn and column loudspeakers was mainly driven by the expanding
movie industry, while during the Nazi regime in Germany efforts were being undertaken to
engineer sound systems to cover large halls and fields for mass events. New microphone types
based on the dynamic and electrostatic principle were developed as well; refer to Figure 2.23.
A rapid development of new loudspeaker types took off, mainly horn loudspeakers in the
US and line columns in Europe.
In 1957 a comprehensive book on sound system design was published: Petzold [25]
indicated the first basic design guidelines for sound coverage in stadia and theaters, Figure 2.24
The guidelines covered:

• the arrangement of the loudspeakers


• the directivity characteristic of the loudspeaker
• the reverberation of the room in which the reproduction takes place
newgenrtpdf
43
Room Acoustics and Sound System Design 43
Figure 2.24 First sound reinforcement guidelines for stadia and theatres in 1957.
4

44 Wolfgang Ahnert and Dirk Noy

Figure 2.25 Schematic diagram of the magnetic tape delay system with perpetual tape loop, record
head to print the original signal and various reproduction heads at varying distances
down the tape loop to obtain a variation of delay times.

In the 1960s Olson introduced the first simple analogue tape delay lines (Figure 2.25) in
sound systems, which subsequently were further developed with the introduction of digital
delay units as a critical component of sound systems to compensate for travel time [26].
In 1975 Don Davis published his book ‘Sound System Engineering’ [27], which introduced
calculations regarding required acoustic gain and potential acoustic gain (the latter in con-
sideration of the positive acoustic feedback; Figure 2.26).
In the early 1980s the book ‘Basics in Sound Reinforcement’ was published in
German [28]. Around that period now-​available computers were starting to be used to simu-
late acoustic and sound system relationships during the planning phase of a space, before the
installation was actually executed.
Acoustic calculation software packages like Odeon, CATT-​Acoustics and EASE became
available.
In sound systems design, the following requirements could be found:

• desired sound pressure level


• definition of speech, clarity of sound reproduction
• localization of sound sources, use of delay systems
• required reverberated to direct ratio of the perceived signal
• absence of echoes

It was now state-​of-the-​art to obtain sufficient naturality of a transmission or in other words


a perceptibility of a desired timbre change. In the early 1990s contributing factors in this
respect were:
45

Room Acoustics and Sound System Design 45

Figure 2.26 Basic measures for first sound reinforcement and first feedback considerations.

• the timbre depending on the transmission range and the frequency response of the
signal transmitted
• the required frequency response
• absence of distortions

Sound reinforcement systems should be sufficiently insensitive to positive acoustic feedback.


This includes the following investigations:

• the level of loop amplification


• the level conditions around the microphone
• the directivity characteristic of the microphones and loudspeakers

In the beginning of the 1990s the first line array loudspeaker became available (Heil and
Urban, 1992 [29]) and in 1998 the first electronically controlled line column loudspeakers
were introduced (Duran Audio, 1998 [30]); compare Figure 2.27.
After the events of 11 September 2001, the focus of new developments and designs was
directed to high-​performing emergency sound systems. The updated standard IEC 60268–​
16 was published to specify the requirement of high speech intelligibility in public spaces.
hIn the existing acoustical software simulation packages, the influence of noise impact and
masking on speech intelligibility had to be considered and included. New loudspeaker design
developments can be observed as well such as highly directional loudspeaker arrangements
based on wave field synthesis [31].
In 2015 complex manipulation of sound fields was introduced in modern sound system
installations. Sound sources on stage should be correctly localized throughout the area [32],
while the concept of ‘immersive’ sound reproduction includes not only loudspeakers for
sound level coverage and speech intelligibility but also the creation of a natural acoustic
soundscape surrounding a listener in rooms and halls as well as in open-​air installations [33].
newgenrtpdf
46
46
Wolfgang Ahnert and Dirk Noy
Figure 2.27 (left) 3 dB attenuation by line array principle (C. Heil and M. Urban), (right) digitally controlled line column by Duran Audio.
47

Room Acoustics and Sound System Design 47


Today, a sound reinforcement system is expected to meet the following requirements
while considering the room-​acoustic properties of the space:

• improvement of speech intelligibility and clarity


• extension of the dynamic range
• improvement of the acoustic balance between the different parts of a performance
(speech, vocal and instrumental music)
• consistency between visual and acoustical localization impression of stage events, also
if the action and reception areas are large-​sized and of complex geometry
• acoustic control of complex room geometries
• extensive inclusion of the audience area in the performance activity
• modification of the room-​acoustical parameters of the reproduction room

All these requirements are the basis for sound reinforcement design work and serve as an
intermediary between the requirements of the client and the boundaries of technical feasi-
bility. In this context it is required to be informed in as much detail as possible not only
about the technological requirements to be met by the system, but also about the room-​
acoustical conditions under which the sound system must work. An optimal solution can
be achieved by this process.

2.5 Integration of the Sound Reinforcement System in the


Architectural Design
The integration of a sound system in the architectural design of the room is one of the
first items to be considered when starting the design of a sound reinforcement system. For
a new building or a fundamental modernization of a hall or an open-​air venue, the sound
system designer determines the following parameters, ideally in cooperation with the room-​
acoustic consultant if available:

• the desirable reverberation time


• the spaciousness that can be expected
• the required clarity and speech intelligibility
• the measures required for avoiding echoes

These parameters are put together in a requirements document and are used as target
parameters for the new design.
Additionally, the design must consider all factors interfering with visibility conditions
for the audience. This is of particular importance as the loudspeakers are directed towards
the audience and thus are mostly arranged in an area susceptible to architectural or interior
design. Visual screening or ‘hiding’ of loudspeakers is possible only to a limited degree and
entails several acoustical problems like limited transmission range or poor sound coverage.
The requirements may be quite varying. For example, in sports facilities it is particularly
important to avoid impairing visibility onto the playing field. Requirements regarding field
clearance, as specified by international sport associations, must also be considered.
In modern multi-​purpose halls, a visible arrangement of the loudspeakers is quite accept-
able (Figure 2.28), but the overall architectural design should not be disturbed nor must
the lighting and video projection facilities be impaired. Complicated conditions may arise
in highly prestigious or historically protected spaces. The example in Figure 2.29 illustrates
48

48 Wolfgang Ahnert and Dirk Noy

Figure 2.28 Multipurpose hall with line array clusters.

Figure 2.29 Hidden loudspeakers in the portal area behind the blue fabric.
49

Room Acoustics and Sound System Design 49


how loudspeakers were concealed in a reconstructed and historically protected theatre hall.
Foyers and restaurants of hotels or historical and museum buildings can be approached
similarly.
Acceptable solutions can only be achieved by close cooperation between the sound
reinforcement consultant and the architect. In certain cases, the client makes initial
decisions based on specific priorities. To avoid later installation deficiencies, the acoustical
expert should also be consulted in the case of possible architectural modifications. The
following parameters and issues are relevant:

• position and radiating direction of the loudspeakers


• the type of loudspeaker or arrays to be used (dimensions, mass, size of the radiating
surface)
• the potentially required aesthetic covering of the loudspeakers. The covering should be
as acoustically transparent as possible at all frequencies. In this respect it often neces-
sary to perform measurements with samples of selected cover materials

The impact of microphones on the architectural design is of minor importance. Nevertheless,


they should be included at an early stage in the architectural conception of a room, espe-
cially if microphone hoists or stands are to be used.
Most problematic is the arrangement of a sound control facility in the auditorium. The
‘Front of House (FOH) mixing position’ must be located at an acoustically representative
point within the audience area to enable an assessment of the overall acoustic impression
for the audience. To facilitate assessment of the balance between the two sides of a hall, it
should moreover be placed on the acoustic symmetry axis of the audience area. Integrating
such a work position with a mixing desk and auxiliary equipment inconspicuously as com-
monly demanded is often rather challenging. Therefore, objections are often raised against
the installation of such a mixing console. Final mixing of the transmitted signal within the
audience area becomes, however, necessary in all cases where

• use of a large number of microphones is to be expected


• only short rehearsal times are available for optimizing the acoustic pattern and audio
mix, or
• the acoustic impression of the room is to be optimized during the performance

These conditions are nearly all relevant for medium and large multi-​purpose halls. The
exposed position of such an audio mixing console, the loss of good audience seats, the
necessity of providing easy accessibility without disturbing the audience or being disturbed
by them, as well as the avoidance of visibility impact involve a number of design challenges
which have to be solved in close cooperation between the architect, the manager of the
venue and the sound system designer.
With smaller halls and cultural centres, the situation may be different. Sound systems
therein are usually controlled from a separate audio control room. However, hook-​up points
for mobile mixing consoles should be available in the audience area when required.
In theatres the conditions are similar: hook-​up points for mobile or permanently installed
mixing consoles within the audience area are required for the adjustment of the loud-
speaker system during rehearsals and for the performance of musicals and other productions
requiring a high standard of audio engineering.
50

50 Wolfgang Ahnert and Dirk Noy


In concert halls it is usually sufficient to have connection points for mobile mixing
consoles, which are mainly required for the reproduction of electronic music.
Although the architectural implications of mobile mixing consoles are not as significant
as of stationary consoles, their location should nevertheless be specified in close cooperation
between the architect and the sound system designer.

2.6 Acoustic Feedback


As is well known, any electrical circuit which contains active elements can fall into self-​
excitation behaviour whenever the output signal is fed back into the input. This means
that undamped oscillations occur, and the amplitudes of these oscillations could –​given
ideal boundary conditions –​grow almost into infinity. In reality, however, the non-​linearity
of active and passive elements as well as the finite power of the voltage supply limit the
amplitudes to a threshold value. The self-​excited vibrations usually have a sinewave form,
and the proven term ‘feedback’ is used. When the so-​called ‘howling’ or ‘whistling’ are
meant, this phenomenon is described as a ‘positive feedback’.
Acoustic feedback is characterized by several specific features:

(a) The feedback loop of the electroacoustic amplification circuit does not only contain an
electrical part but also an acoustically audible part.
(b) It is practically impossible to separate the feedback path in the room by subdiv-
iding it into individual parts (e.g. electroacoustic installation, acoustical path in
the room).
(c) The feedback can occur via numerous loops and paths; the nature of acoustical feed-
back is more complex than that in purely electrical networks.

2.6.1 Mathematical and Physical Basics

2.6.1.1 General Information


An original sound source generates a sound signal with the spectrum A(ω) that is picked
up by a microphone (Figure 2.30). This signal A(ω) obviously goes straight to the listener
as a direct sound transmission through the room (sound transmission coefficient ß0 ). In
addition, the signal A(ω) from the microphone (sound transmission coefficient ß1) is also
amplified and reproduced by the loudspeaker, which then projects the signal to the listener
(sound transmission coefficient ß ) and it is also fed into the microphone (sound transmis-
2
sion coefficient ßR ).
The sound signal at the listener position with spectrum B(ω) is calculated as follows:

β0 − µβR β0 + µ β1β2
B (ω ) = A ⋅ (ω)
1 − µβR (2.10)


( )
i
B (ω ) = (β0 − µβRβ0 + µβ1β2A (ω) ⋅ ∑ µβR (2.10a)
i=0

( µ includes all transducer constants).


51

Room Acoustics and Sound System Design 51

Figure 2.30 Feedback circuit.

Whether the feedback system remains stable or becomes unstable depends on the term

∑ (µ β )
i
R
i=0

Because µ βR is frequency-​dependent, the condition applies for all frequencies and not
only for the average values of the so-​called open-​loop amplification. With growing µ βR
the value B(ω) becomes very large. As βR and µ are complex numbers, the term can be
expressed as follows:

µ βR = Ge jϕ

In doing so, G and ϕ can be regarded as the amplification and phasing of the signal in a
closed loop. G and ϕ generally vary with the frequency. The feedback system is always stable
when the following two conditions are fulfilled:

Im{ µ βR } ≠ 0

{ }
Re µ βR < 1, i.e. G < 1

2.6.1.2 Feedback in a free sound field


The first investigations of the acoustic feedback effect of a simple transmission system in
a free sound field were carried out by Bürck [34] in 1938. During his investigations Bürck
assumed that the distance between the microphone and the loudspeaker is large compared
to the dimensions of the transducers and that the wavelengths are larger than the diameter
of the microphone membrane (frequencies ≤ 10 kHz). Within these conditions the distance
d (from loudspeaker to microphone) can be expressed as

ϕ
d = λ(n + )

52

52 Wolfgang Ahnert and Dirk Noy

Figure 2.31 Feedback curve in open air.

λ wave length of sound in air


n positive integer (1, 2, …)
φ phase angle

Feedback may happen with ϕ =​0 at the following frequencies:

c
fn = n ; n = 1, 2, 3
d

Acoustic positive feedback occurs at these frequencies fn (Figure 2.31). As c is always con-
stant, and d is constant for a certain configuration, f1 represents the fundamental frequency
at which the positive feedback sets in. It will then reappear at all integer multiples of this
fundamental frequency. A formerly even frequency response of an installation hence will
be modified by the feedback in a comb filter type pattern of periodic reoccurrence within the
audio spectrum.
Feedback occurs with the so-​called loop gain ∣ µ βR∣ =​ ∣vS(f)∣ ⇾ 1. By increasing or
reducing distance d positive feedback sets in periodically (loop amplification vs ⇾1 is
assumed).

2.6.1.3 Feedback in Closed Rooms


When installing sound reinforcement systems in rooms, the acoustic feedback is a result
of wall reflections of various intensity and the effect is more dominant than outdoors.
Figure 2.32 illustrates the conditions. In compliance with Figure 2.23 the sound transmis-
sion coefficients βi and the amplification µ have been used here as well. The sound signal at
the listener position H with the spectrum B(ω) is calculated again according to (2.10), if a
signal with the spectrum A(ω) is radiated at the speaker position. For feedback occurrence
the coefficient βR is determining (µ assumed to be frequency-​independent). It should be
taken into consideration that βi are only schematically indicated in their position; the
entire room contributes to the frequency-​dependent nature of βi (numerous reflections and
transmission paths).
53

Room Acoustics and Sound System Design 53

Figure 2.32 Sound transmission paths in closed room.

Figure 2.33 Fragment of a frequency-​dependent transmission curve.

Whereas in a free sound field a comb filter curve characterizes the frequency behaviour
of the sound transmission from the loudspeaker to the microphone, many such comb filter
curves simultaneously act in the room due to reflections off the room’s limiting surfaces (a
sum of infinitely many curves) as a sound transmission curve between the loudspeaker and
the microphone. These statements, however, do not just apply to coefficient βR, but for all βi
in the room. Comprehensive investigations have been carried out regarding the frequency
independence of these transmission curves (i.e. frequency curves, sound transmission curves
of the room, marked as coefficient βi in Figure 2.31). Schroeder, Kuttruff and Thiele [35,
36, 37] showed that the statistical parameters of frequency curves in different rooms above a
threshold frequency fL are equal and depend only on the reverberation time RT.

RT
fL = 2000 (2.11)
V
54

54 Wolfgang Ahnert and Dirk Noy


If RT is specified in s and V in m3, fL results in Hz. For example: A volume of 22,000 m3 and
a reverberation time RT of 2 s result in fL ≈ 20 Hz.
Figure 2.33 shows a fragment of such a frequency curve. For instance, a peak occurs at
1025 Hz, i.e. the amplification µ of the amplifier must be configured in such a way that
the loop amplification ∣ µ βR∣ =​ ∣vS(f)∣ does not reach or exceed the value 1 (or 0 dB full
scale).
It is an interesting question by how many dB the peak value of a frequency curve exceeds
the average value. As this nearly frequency-​independent behaviour of transmission curves
is crucial for the appearance of positive acoustic feedback, the calculation of this level diffe-
rence will be demonstrated in the following section.

2.6.1.3.1 DIFFERENCE BETWEEN THE AVERAGE VALUE AND THE PEAK VALUE OF A FREQUENCY CURVE

The peak values of the transmission curve shown in Figure 2.33 are the frequencies where
feedback will occur if the average amplitude of the transmission is appropriately enhanced.
The level difference between the peak and the average value of a transmission curve is of
interest: It has been demonstrated [37, 38] that –​depending on the reverberation time RT
and the bandwidth B of the signal that is transmitted –​feedback does occur when the peak
value exceeds the average value by a specific difference ∆L:

∆L =​10 log10 ln N dB

The number N of values into which the investigated frequency range of the bandwidth B is
subdivided results in N =​0,1 B*RT. Calculations have found the relationship for ∆L in dB
shown in Figure 2.34.

Figure 2.34 Difference of the peak and average value of the sound transmission curve with
B –​bandwidth in Hz
RT –​reverberation time in s
epsilon –​error probability
5

Room Acoustics and Sound System Design 55


The probability that the used maximum value of the transmission curve represents the
real peak value is expressed as 1−ε. The parameter is the multiple of the bandwidth B and
the reverberation time RT. Figure 2.34 shows that even with B*RT =​35,000 as the max-
imum values of the peaks of the transmission curve the average value will not exceed 11–​12
dB with a probability of 1−ε =​99.5% (ε =​0.005). Operating with a feedback reserve of 3 dB
to avoid linear distortions, tone colouration etc., the average value of the sound transmis-
sion curve should be kept at 15 dB under the peak value.

2.6.1.3.2 DIFFERENCE BETWEEN THE AVERAGE VALUES OF THE FREQUENCY CURVE AND THE POSITIVE
FEEDBACK THRESHOLD LEVEL

For optimizing the achievable amplification, the determining factor is not the difference
between the average value and the peak value of the frequency curve, but the level diffe-
rence between the average value of the sound transmission curve and the positive feedback
threshold.
This difference X is identical to the so-​called loop amplification vS. As a result, we obtain

X = − 20 log 10 vS dB

To avoid positive feedback, it must be ensured at any time that X does not fall below a cer-
tain value ∆L =​≈ 6–​12 dB (depending on the bandwidth). From experience it is known
that for voice transmissions one can operate with a feedback reserve of 3 dB, which results
in X =​9–​15 dB. Kuttruff used a feedback reserve of approx. 5 dB for speech, and 12 dB for
music [39]. Hence X results in 11–​17 dB or 18–​24 dB, respectively.
For computer simulations not just including loudspeakers but also microphones a feed-
back factor R(X) is introduced:

vS2 10 − X /10 dB
R (X) = =
1 − vS 1 − 10 − X /10 dB
2

It is obvious that R(X) is between 0.01 and 0.1 for the practical values of X (10–​20 dB). This
order of magnitude may be used for practical calculations. By introducing the feedback level
LR =​10 log10(R(X)) the relationship between LR and X is shown in Figure 2.35.
Whereas in a free sound field the frequencies which lead to a positive feedback are deter-
minable, sound systems in rooms do not permit any statements about the absolute values of
positive feedback frequencies. It is only apparent that at the particular frequency with the
greatest peak the positive feedback occurs if the amplification µ is increased. If the location
of the loudspeaker or the microphone in the room is modified the positive feedback will
most probably occur at a different frequency as the greatest peak for this new geometrical
layout is located at a different frequency.

2.6.2 Feedback Calculation


The radiated sound power PL of one or more loudspeakers is calculated by multiplying the
sound energy density wM present at the microphone location by the amplification coeffi-
cient µ . Since this sound energy density consists of components from the original source as
well as those reproduced by the loudspeakers, one obtains
56

56 Wolfgang Ahnert and Dirk Noy

Figure 2.35 Relationship between the feedback threshold X and the required feedback level LR.

PL = µ ( w MS + w ML ) = µ w M

The second term in the addition is determining the positive feedback sensitivity of the
system. If the loop amplification of the system

w ML
vS2 = →1
wM

the amplification circuit becomes unstable and after some modification results in

w ML vS2  X

R (X) =
− dB
=  with vS2 = 10 10 , as shown above
w MS (1 − vS ) 
2 

By inserting LR =​10 log10 R(X) dB it can be stated that feedback can be avoided when using
a sound system where LMS –​ LML =​ LR, with:

LMS total sound level on the microphone produced by the original sound source
LML total sound level on the microphone produced by the loudspeaker system
LR feedback level

To securely avoid feedback in all instances LR should be 12–​15 dB, and a recommended
minimum value of LR should not fall below 6–​9 dB (use of directional microphones).
57

Room Acoustics and Sound System Design 57

2.7 Sound System and Reverberation Enhancement Systems

2.7.1 Use of Sound Systems


The technical layout of a sound reinforcement system is essentially determined by its
intended application and by the characteristics of the room in which it is to be installed.
According to the present state of the art one can distinguish the following types of sound
reinforcement systems:

a) information and announcement systems


b) sound reinforcement systems with and without playback reproduction
c) sound reinforcement systems for improving sound coverage
d) sound reinforcement systems for ensuring acoustic localization of sound sources in the
action area
e) sound reinforcement systems serving as a means of artistic expression

The given sequence reflects the technical complexity and sophistication for the systems.
An important consideration for selection and arrangement of the sound reinforce-
ment devices and thus also for the selection of the technical solution to be installed is
the spatial relationship in which the listener is located regarding the original or supposed
(playback) source of the transmitted sound event. This original source may be completely
separated from the listener, as is the case with the aforementioned announcement systems
(for instance in department stores, transportation hubs etc.); it may be located in the same
room as the listener (in the action area) but separated from the listener (in the reception
area), or both areas may also overlap. Acoustical feedback may occur in the latter two cases.
Sound systems facilitate the transportation of information from a source to the listener
and operate in an environment (hall or open space) determined by the room acoustic prop-
erties of that space. These room acoustic properties are not greatly influenced by the sound
system but may be much influenced by a specifically installed reverberation enhancement
system. Establishing good audibility indoors as well as in the open air has been and remains
the subject of room and electro-​acoustics.
Integrating electro-​and room acoustics as well as architectural solutions is often challen-
ging. Some of the critical areas are:

• the sound source in question has only a limited sound power rate, so the dimensions of
the speakers may become an issue for the architectural design
• modifications of room acoustics may consequently lead to major updates to the archi-
tectural design and thus might not be optimally applied
• measures regarding room acoustics may cause a considerable number of constructional
changes and these can only be optimally integrated for one single intended purpose of
the room
• the constructional modification, despite its high costs, may result in only a limited effect

Because of these reasons sound systems are increasingly employed to adjust specific room
acoustic properties, thus improving audibility, intelligibility and spaciousness. At the lis-
tener position the level of direct sound is of great significance. Also ‘short time reflections’,
enhancing intelligibility of speech and clarity of music, can be provided by means of sound
systems.
58

58 Wolfgang Ahnert and Dirk Noy


The following sound-​field components can be manipulated or generated:

• direct sound
• initial reflections with direct-​sound effect
• reverberant initial reflections
• reverberation

For this reason, electronic techniques have been developed that introduce the possibility
of increasing the direct sound or reverberation time and energy in the hall, hence directly
modifying the acoustic room properties.
These methods of enhancing the room-​acoustic properties of spaces are the application
of so-​called ‘electronic architecture’. A good acoustic design is achieved when listening to
an event it is indistinguishable whether the sound quality is a result of just the interaction of
the original source with the space or of employing an electro-​acoustic enhancement system.
Therefore, enhancement systems are normally operated by the stage manager of the hall. The
configuration of such a system is calibrated during installation and programmed for different
applications like concerts, operas or speeches and the parameters cannot be modified by
the sound engineer. Hence the sound reinforcement system must be considered completely
separated from the enhancement system, although certain components of both systems like
loudspeakers may be used by both systems. An enhancement system is a designated part of
the room-​acoustic properties of the hall which are not to be modified during a performance.

2.7.2 Electronic Procedures for Enhancing Reverberation and Spaciousness

2.7.2.1 Acoustic Control System (ACS)


This procedure was developed by Berkhout and de Vries at the University of Delft [40]. Based
on a wave-​field synthesis approach (WFS) the authors speak of a ‘holographic’ attempt to
enhance the reverberation in rooms. In essence it is more than the result of a (mathematical-​
physical) convolution of signals captured by means of microphones in an in-​line arrangement
(as is the case with WFS) with room characteristics predetermined by a processor, which in
the end produces a new room characteristic with a new reverberation time behaviour.
Figure 2.36 shows the complete block diagram of an ACS system. The acoustician
formulates the characteristics of a desired room, e.g. in a computer model, transfers these

‘Hall’ with the desired


acoustical properties Real hall

Sound
ACS Signal source
processor

Parameters Reflection
simulator Convolution

Reflections

Figure 2.36 Block diagram of the ACS system.


59

Room Acoustics and Sound System Design 59


characteristics by means of suitable parameters to a reflection simulator and convolutes
these reflection patterns with the real acoustical characteristics of a hall.

2.7.2.2 Active Field Control AFC


The AFC system developed by Yamaha [41] makes active use of acoustic feedback for
enhancing the sound energy density and thereby also the reverberation time. When using
the acoustic feedback it is, however, important to avoid timbre changes and to ensure the
stability of the system. To this effect one uses a specific switching circuit, the so-​called time
varying control (TVC), which consists of two components:

• electronic microphone rotator (EMR) and


• fluctuating FIR (fluc-​FIR)

The EMR unit scans the boundary microphones in cycles while the FIR filters impede
feedback.
For enhancing the reverberation, the microphones are partially located in the diffuse
sound field and partially in the source area (green dots in Figure 2.37b).
The loudspeakers are located at the wall and ceiling areas of the room. For enhancing
the early reflections four to eight microphones are located in the ceiling area near the
sources. The signals picked up by these are passed through FIR filters and are reproduced
as lateral reflections by loudspeakers located in the wall and ceiling areas of the room. The
loudspeakers are arranged in such a way that they cannot be located, since their signals are
to be perceived as natural reflections.
Furthermore, the AFC system allows signals to be picked up, e.g. in the central region of
the audience area, and the reproduction of them via ceiling loudspeakers in the area below
the balcony for the sake of enhancing spaciousness.

Figure 2.37a Block diagram of the AFC system.


60

60 Wolfgang Ahnert and Dirk Noy

Figure 2.37b Block diagram of the AFC system (continued).

2.7.2.3 Virtual Room Acoustic System Constellation


Constellation, developed by Meyer Sound Inc., is a multi-​channel regenerative system for
reverberation enhancement. Its development is based on concepts considered in the 1960s
by Franssen [42]. Poletti [43] further developed the outdated MCR procedure.
In Constellation modern electronic elements and DSPs have made it possible to design
circuits which widely exclude timbre changes. This is achieved by coupling a primary room
61

Room Acoustics and Sound System Design 61

Figure 2.38 Basic routines of the Constellation system.

A (the theatre or concert hall) with a secondary room B (the ‘reverberant room processor’).
Simultaneously the number of reproduction channels is reduced along with the timbre
change of sound events. An enhancement of the early reflections is obtained as well; see
Figure 2.38.
Within the Constellation system a multitude of small loudspeakers L1 to LN (N =​40 to
50) is distributed in the room, which, of course, may also be used for panorama and effect
purposes. Ten to 15 strategically located and visually inconspicuous microphones M1 to MN
pick up the sound and transmit it to the effect processor X(ω) in which the desired and
adjustable reverberation takes place. The output signals thus obtained are fed back into the
62

62 Wolfgang Ahnert and Dirk Noy


room. The advantage of this solution lies in the precise tuning of the reverberation pro-
cessor enabling well reproducible and thus also measurable results.

2.7.2.4 Vivace
Vivace, developed by Mueller BBM, ensures a high degree of detail veracity, accurate tran-
sient response, and exceptional feedback stability. The result is a homogeneous and entirely
realistic three-​dimensional sound which meets individual on-​site acoustic requirements
with high flexibility and accuracy.
Vivace also enables sources and effects to be moved around, virtually, in the acoustic
environment.
A Vivace system consists of a few microphones picking up the performance on the
stage, the room-​enhancement mainframe, an audio I/​O matrix system, multichannel
digital amplifiers, monitored remotely, and loudspeakers. Vivace digitizes, analyses and
processes incoming stage-​microphone signals in real time and subsequently plays them
back over precisely positioned speakers; refer to Figure 2.39. Using an intelligent con-
volution algorithm Vivace can recreate almost any environment in a low-​reverberant
space [44].

2.7.2.5 Astro Spatial Audio (ASA)


The developer team recommends the system for a theatre or venue as a truly creative tool;
it builds realistic 3D sound ‘sceneries’ that enhance productions or events (Figure 2.40).
ASA’s 3D audio rendering engine creates sound scenes that accurately and dynamic-
ally match for an entire audience, it reproduces voices seemingly at the position of the
performers, it places sound effects in specific locations, and it is simple to create and
select presets between small and large spaces and outdoor venues. ASA is based on WFS
algorithms developed by Fraunhofer IDTM in so-​called object-​based design [45].

2.7.2.6 Amadeus Active Acoustics


Using modern technologies, the existing acoustics can be enhanced, loudness increased
and reverberation prolonged [46]. Speech intelligibility for presentations or theatre can be
optimally adjusted and a musical envelopment can be achieved for a variety of performance
styles (Figure 2.41).
One author of this book has realized a project in a large multipurpose hall by using
the system; see section 11.5.3.1. In addition, a three-​dimensional sound experience can
be provided, as soundscapes (forest, urban or beach) can be played back to create a certain
mood and to give presentations or shows extra appeal.

2.8 Further Aspects Concerning the Use of Sound Reinforcement Systems


First, the listener’s expectation is to be mentioned. In open-​air applications, for instance,
the listener does not expect much reverberation, but even indoors it sounds unnatural if too
much reverberation is present close to a stage. It may also be contradictory to the expect-
ation if sonic definition and clarity in the rear seats of the hall are enhanced by excessive
sound amplification to such an extent that one gets the impression of sitting close to
the stage. This impression may also occur where coverage of the stage-​distant hall area is
63

Room Acoustics and Sound System Design 63

Figure 2.39 Working schemata of the Vivace system.


newgenrtpdf
64
64 Wolfgang Ahnert and Dirk Noy
Figure 2.40 Work flow of the ASA system.
65

Room Acoustics and Sound System Design 65

Figure 2.41 Working schemata and block diagram of the Amadeus system.

obtained by decentralized loudspeakers which reproduce high frequencies with unnaturally


high energy due to the reduced loudspeaker-​listener distance. It might then be required to
provide these decentralized loudspeakers with equalization devices to calibrate their repro-
duction characteristics to the natural room impression.
With large stages the monitoring in between the artists is a rather essential challenge. In
certain circumstances soloist and orchestra may hear each other only through monitoring
loudspeakers. Monitoring is required especially where heavily delayed room signals in
the action area largely impair reciprocal audibility of the artists. Owing to the ‘Lee effect’
[47] the artist may even be impeded by the delayed room response of his own sound. The
6

66 Wolfgang Ahnert and Dirk Noy


monitoring signals must be transmitted undelayed or with just an insignificant delay. Thus
it is possible to achieve a ‘rounded’ acoustic ensemble performance preventing the ‘disinte-
gration’ of the sound pattern.
The use of stage monitors may cause problems if loudspeakers are already arranged on
stage (especially in large open-​air theatres) and interconnected by delay networks. In this
case it is possible that signals of the monitoring loudspeakers, if audible in the auditory, give
rise to mislocalizations.

References
1. Vitruvius, The Ten Books on Architecture, edited by I.D. Rowland and T. N. Howe. Cambridge
University Press, March 2001.
2. Xiang, N., Architectural Acoustics Handbook, ­chapter 12. J. Ross Publishing, 2017.
3. Ballou, G., Handbook for Sound Engineers, 5th ed., c­hapter 9. New York and London:
Focal Press, 2015.
4. DIN 18041:2016–​03.
5. ISO 3382-​1 and 2 and 3.
6. Sabine, W.C. Collected papers on acoustics. Cambridge, MA: Harvard University Press, 1923.
7. Lehmann, U. Untersuchungen zur Bestimmung des Raumeindrucks bei Musikdarbietungen und
Grundlagen der Optimierung. Diss. Tech. Univ. Dresden, 1974.
8. Thiele, R. Richtungsverteilung und Zeitfolge der Schallrückwürfe in Räumen. Acustica (1953)
Beih. 2, p. 291.
9. Reichardt, W., Abdel Alim, O., and Schmidt, W. Definitionen und Meßgrundlage eines objektiven
Maßes zur Ermittlung der Grenze zwischen brauchbarer und unbrauchbarer Durchsichtigkeit bei
Musikdarbietungen. Acustica 32 (1975) 3, p. 126.
10. Houtgast, T., and Steeneken, H.J.M. Envelope spectrum and intelligibility of speech in enclosures,
presented at the JEEE -​AFCRL 1972 Speech Conference.
11. Houtgast, T., and Steeneken, H.J.M. A review of the MTF concept in room acoustics and its use
for estimating speech intelligibility in auditoria. J. Acoust. Soc. Amer. 77 (1985) pp. 1060–​1077.
12. Peutz, V.M.A. Articulation loss of consonants as a criterion for speech transmission in a room.
J. Audio Engng. Soc. 19 (1971) 11, pp. 915–​919.
13. Klein, W. Articulation loss of consonants as a basis for the design and judgement of sound
reinforcement systems. Journal of the AES 19 (1971) 11, pp. 920–​925.
14. Herrmann, U.F. Handbuch der Elektroakustik (Handbook of electroacoustics). Heidelberg:
Hüthig, 1983.
15. Barkhausen, H. Ein neuer Schallmesser für die Praxis. Z. tech. Physik (1926). Z. VDI (1927)
pp. 1471 ff.
16. Class 1 Sound Level Meter IEC 61672:2013.
17. Zwicker, E. Ein Verfahren zur Berechnung der Lautstärke. Acustica 10 (1960) p. 304.
18. ISO 532-​1, Ed. 2017, Method for calculation of loudness –​Part 1: Zwicker method.
19. Zwicker, E., and Feldtkeller, R. The Ear as a Communication Receiver. Acoustical Society of
America, 1999.
20. IEC 651 Ed. 1994. Precision sound level meters.
21. Dietsch, L. Objektive raumakustische Kriterien zur Erfassung von Echostörungen und Lautstärken
bei Sprach-​und Musikdarbietungen (Objective room-​acoustical criteria for registering echo
disturbances and loudnesses in speech and music performances). Diss. Tech. Univ. Dresden, 1983.
22. Blauert, J. Räumliches Hören (Stereophonic hearing). Stuttgart: Hirzel, 1974.
23. Jeffers, L.A., and McFadden, D. Differences of interaural phase and level detection and localiza-
tion. J. Acoust. Soc.Amer. 49 (1971) pp.1169–​1179.
24. Haas, H. Über den Einfluß eines Einfachechos auf die Hörsamkeit von Sprache (On the influ-
ence of a single echo on the audibility of speech). Acustica 1 (1951) 2, pp. 49 ff.
67

Room Acoustics and Sound System Design 67


25. Petzoldt, H. Elektroakustik, Band IV Grundlagen der Beschallungstechnik. Fachbuchverlag
Leipzig, 1957.
26. Olson, H.F. J. Acoust. Soc. Amer. 31 (1959) 7, p. 872.
27. Davis, D., and Davis, C. Sound System Engineering. Indianapolis: Howard W. Sams, 1975.
28. Ahnert, W., and Reichardt, W. Grundlagen der Beschallungstechnik (Basics of sound reinforce-
ment technology). Stuttgart: S. Hirzel Verlag, 1981.
29. Heil, C., and Urban, M. Sound Fields radiated by Multiple Sound Sources Arrays, 92nd AES
Convention, March 2992, Preprint No. 3269.
30. Duran-​Audio BV, White paper: Modelling the directivity of DSP controlled loudspeaker arrays.
Zaltbommel/​Holland, June 2000.
31. IEC 60268-​16: 2020-​09, Sound system equipment –​Part 16: Objective rating of speech intelligi-
bility by speech transmission index.
32. Holoplot.
33. Zahn, T. Immersive Sound, Professional System 02.2016.
34. Bürck, W. Akustische Rückkopplung und Rückwirkung (Acoustic feedback and repercussion).
Triltsch Verlag, 1938.
35. Schroeder, M.R. Die akustischen Parameter der Frequenzkurven von großen Räumen, Akust.
Beihefte 4 (1954), pp. 594–​600.
36. Kuttruff, H., and Thiele, R. Über die Frequenzabhängigkeit des Schalldruckes in Räumen, Akust.
Beihefte 4 (1954), pp. 614–​617.
37. Schroeder, M.R., and Kuttruff, H. On frequency response curves in rooms, J. Acoust. Soc.Amer.
34 (1962), p. 76.
38. Ahnert, W. Über die Bedeutung des absoluten Maximums der Frequenzkurve für die akustische
Rückkopplung (in russ.) Akust. Žurnal 19 (1973) 1, pp. 1–​8.
39. Kuttruff, H. Room Acoustics. London: Elsevier, 1973.
40. Berkhout, A.J. A holographic approach to acoustic control. J. Audio Engng. Soc. 36 (1988) 12,
pp. 977–​995.
41. Hideo Miyazaki, Takayuki Watanabe, Shinji Kishinaga and Fukushi Kawakami, Yamaha
Corporation, Advanced System Development Center, Active Field Control (AFC), Reverberation
Enhancement System Using Acoustical Feedback Control, 115th AES Convention, New York.
October 2003.
42. Franssen, N.V. Sur l`amplification des champs acoustiques. Acustica, 20 (1968) S.315 ff.
43. Poletti, M.A. On controlling the apparent absorption and volume in assisted reverberation
systems, Acustica 78 (1993).
44. https://​viv​ace.mbbm-​aso.com/​de/​.
45. www.ast​roau​dio.eu/​.
46. www.amade​usac​oust​ics.com/​.
47. Lee, B.S. Effects of delayed speech feedback. J. Acoust. Soc. Amer. 22 (1950) p. 824.

Further Reading
Taschenbuch Akustik, Teil 2, VEB Verlag Technik Berlin, 1984.
Handbook of Acoustics, T.D. Rossing (Ed.), Chapter 18. Springer, 2007.
Everest, A.F. The Master Handbook of Acoustics, 4th edn. New York: McGraw-​Hill, 2001.
68

3 Loudspeakers
Gottfried K. Behler

3.1 Loudspeaker types and characteristics


In general, sound reinforcement systems require loudspeakers to convert the amplified elec-
trical current into acoustical energy and adequately radiate this sound into the audience.
Taking into account the demands for such parameters as sound pressure level, frequency
range, distortion, directivity and intelligibility (with speech) or transparency (with music)
the selection, placement and alignment of the loudspeaker systems probably become the
most challenging part of the planning project. Moreover, in many cases sound reinforce-
ment systems are installed inside closed spaces (concert halls, theatres, rock/​pop venues,
partly closed spaces such as stadiums etc.) which leads to an interaction of the loudspeakers
with the room acoustics and hence this must be considered as well.
The enormous variety of loudspeaker systems and concepts present on the market assume
a very good knowledge of the properties of loudspeakers considering sound reinforcement.
The following chapter introduces, explains and applies those technical and physical prop-
erties of loudspeaker systems which are relevant for the planning and optimization of sound
reinforcement systems of any kind and size. When using the term ‘loudspeaker’ or ‘loud-
speaker system’ we are not restricted to a single electro-​acoustical transducer as a device
that converts electrical properties (like current and voltage) into acoustical properties (like
sound pressure and velocity). In fact, we are talking about complex devices which might be
using multiple transducers of different type, with waveguides, cabinets and in addition with
active or passive electronic crossover circuits as a unit.

3.1.1 The principal operation and construction of the dynamic transducer


Before going into a more phenomenological description of loudspeaker systems, let’s have a
look at the most relevant operational principle (the dynamic transducer).
Most sound reinforcement loudspeaker systems employ the dynamic transducer principle
for the conversion of electrical energy into mechanical and finally acoustical energy. A strong
magnetic field created by a permanent magnet interacts with the alternating magnetic field
created by the current in a wire. The Lorenz force then causes the coil to move and then a
membrane attached to it will produce pressure waves in air –​hence radiate sound. Piezo-​
electric transducers created from material that changes its shape when applying an electric
field can be used for high-​frequency transducers, but are commonly only in use for cheap
and simple loudspeaker systems. Transducers using the capacitive or electrostatic principle
(vibrating foils of a capacitor) or the magnetic principle (using the force in the magnetic
field to move a ferromagnetic membrane) are rarely in use for sound reinforcement systems.

DOI: 10.4324/9781003220268-3
69

Loudspeakers 69

Figure 3.1 Sectional view of a magnetostatic ribbon tweeter of modern design. The conducting wires
are thin copper strips bonded directly to the thin foil membrane.

The more common magnetostatic loudspeaker is the modern version of the ribbon loud-
speaker and is thus a transducer working according to the dynamic principle. To achieve an
impedance between 4 and 16 ohms, which is typical for loudspeakers, the current-​carrying
conductor (which in the classic ribbon was the low-​impedance metallic ribbon itself) is
applied as a conductive flat voice coil on a thin plastic diaphragm, which is located within
a strong magnetic field.
The most common construction when high forces and large linear excursion of the mem-
brane are required (such as in a woofer for low-​frequency reproduction) uses a circular voice
coil in a radial magnetic field.
The ring magnet between the two pole plates (usually a ferrite or  neodymium compos-
ition) generates a constant magnetic field in the air gap of strength B. The cylindrical voice
coil is located in the centre of that gap; hence, the magnetic field crosses the wire section
of the voice coil. The voice coil is fixed to the membrane and is only allowed (by the spider
and the suspension) to move in an axial direction. 
The current (I) in the voice coil and the magnetic
➝ field
➝ (B)➝ cross perpendicularly; hence,
the resulting force (Fcalculated from the formula F = I• l × B ) is directed in the axial direc-
tion and generates the desired excursion of the membrane. When the voice coil is moved
from its resting position, the spider is deformed, which produces the required restoring force
to take the membrane back to its resting position after switching off the current. The sus-
pension at the outer edge of the membrane guides the membrane without introducing much
force and is needed to seal the moveable parts from the fixed parts of the speaker.
Typical materials for the magnet are ferrite and metallic alloys like neodymium (NdFeB).
For high flux density and low weight, the material neodymium is used, which allows small,
highly efficient and lightweight transducers to be built. One of the critical characteristics of
NdFeB, however, is its relatively low Curie temperature, where the material loses its ferro-
magnetic properties. Temperatures higher than 100°C are critical and careful cooling of the
loudspeaker motor unit is required.
70

70 Gottfried K. Behler

Figure 3.2 Cross-​sectional view of a typical cone type dynamic loudspeaker.

Most membranes for woofers today are still manufactured from paper pulp. High-​quality
synthetic membranes like cast polymer membranes or sandwich membranes are rare,
whereas membranes manufactured from foils using hot pressing techniques are more popular
because of the cheap manufacturing process. In compression drivers for high frequencies,
metallic membranes made of aluminium or titanium are often used due to their limited
weight and high stability.
The spider is made of synthetic fabric, which is pressed into folds to give a linear increasing
force with excursion and is soaked with a special resin to reach the appropriate stiffness. The
outer suspension of high-​power transducers is often made of fabric whereas for higher excursions
a rolled rubber suspension is used. The outer suspension should not induce strong reaction forces
(as the spider does) but is intended to damp bending waves of the membrane to avoid standing
waves known as ‘breakup’ modes. These breakup modes are the cause of peaks and dips in small
frequency bands, which cause linear distortions of the transfer function and may become aud-
ible as resonances. The mechanical construction of the frame holding the heavy magnet as well
as the lightweight membrane must be sturdy and should be free from resonances. The selection
of moulded aluminium frames for high-​quality speakers is quite common. Nevertheless, a well-​
designed pressed-​steel frame is good as well and may be less costly.

3.1.2 Complex loudspeaker systems


In general, dynamic transducers, regardless of their application, are constructed as described
in the previous section. Nevertheless, one must distinguish between several very different
designs: on one hand because of the required audio frequency range, which can only be par-
tially reproduced with a single transducer, and on the other hand because the requirements
for the output volume can be very different. The directivity, which will be discussed later
(section 3.3), must be considered as well.
In addition to the features of the electroacoustic transducer as such, the overall con-
struction of a loudspeaker must be considered as a complex system consisting of one or
71

Loudspeakers 71
more transducers and the corresponding cabinet. In the following a distinction is made: the
loudspeaker as a transducer component, and the loudspeaker as an enclosure with an
arrangement of a certain number of spatially distributed point sources, which interact with
each other (e.g. line source, loudspeaker arrays, multi-​way loudspeakers).

3.1.2.1 Loudspeaker systems modelled as point source


The idea of the point source assumes that sound power and directional characteristics can
be described independently of each other. The prerequisite for this is that the sound power
of the source is generated at one point and –​following the distance law –​the sound pressure
decreases inversely proportional to the distance. In addition, the directional dependence
of sound pressure is described by the directivity pattern, which depends exclusively on the
radiation angle. This description is mostly used for loudspeaker systems that are intended to
operate as individual, standalone sources.
This assumption is only valid if either the source is small compared to the wavelength so
that we can speak of mainly non-​directional sound radiation or if the distance to the source
is so large that distance-​dependent interference is negligible. This is also true for large
loudspeakers at quite large distances. So, any loudspeaker can be modelled as a point source,
as long as the distance for which this assumption is valid is specified. Examples are multiple
loudspeaker drivers mounted into one cabinet, two-​way or three-​way loudspeakers com-
prising several transducers for the different frequency bands, with or without horn loading
etc. Typical point source loudspeakers are shown in Figure 3.3. Note that the different
loudspeakers vary quite a bit in size.
Point sources are used where sound reinforcement must be achieved with low technical
effort and limited resources. Examples include lecture halls, small concert halls or tem-
porary event spaces outside. Also distributed systems, such as announcement systems in
department stores, administrative buildings or airports, are mostly implemented with full-​
range loudspeakers installed in the ceiling, which can be treated as point sources. One of
the advantages of the point source model is that it is comparatively easy to calculate and
simulate the sound pressure level distribution on a listening area. This is much more com-
plex with arrayed loudspeakers where the interference of individual point sources must be
considered.

3.1.2.2 Loudspeaker systems described as arrays of point sources


For larger sound reinforcement installations, a single loudspeaker will not be enough to
either reach the required sound power or cover the entire audience area. Therefore, many
identical systems might be used. By ‘stacking and splaying’ those loudspeakers (see left pic-
ture in Figure 3.4) the intended sound power and sound distribution can be achieved. This
technique –​quite common in earlier times –​is no longer used today but is replaced by using
‘line arrays’. Line arrays are loudspeakers in which several identical sources are arranged ver-
tically one above the other so that they form a joint, linear sound source. The point source
model can be used for each element in the line. For the sound pressure at one location in
the audience, however, the complex interaction of the individual point sources must be
considered with respect to distance, magnitude and phase.
More complex line arrays use DSPs (digital signal processors) to adjust the response of
each individual point source within the line array. This can be used to equalize the fre-
quency response or to delay elements in such a way that the vertical directivity can be
72

72 Gottfried K. Behler

Figure 3.3 Some typical loudspeaker types that can be described as point sources. Typical compact
PA systems in upper part (Klein+​Hummel). Ceiling speaker horn systems in the lower
part: left, a large cinema horn by JBL and right a Tannoy speaker.

changed and for example tilted up-​or downwards. These systems are very useful for ampli-
fication in reverberant spaces to increase the speech intelligibility by directing the amplified
sound to the audience seats and not exciting the room too much.
Since most arrays are actually constructed of individual point sources, the focus is put on
the measurement and description of the output of point source loudspeakers.

3.2 Output-​based characterization of point source loudspeaker systems


One of the most critical tasks when designing a sound reinforcement system is the selec-
tion of a suitable loudspeaker. The variety of devices on the market is massive and to keep
an overview a conclusive set of parameters is needed that best describe the properties (and
ideally quality) of the speaker.
Key features include, but are not limited to, items on the following list:

• frequency range (bandwidth) of the source to be reinforced


• target sound pressure level (SPL in dBA)
73

Loudspeakers 73

Figure 3.4 Comparison of the so-​called ‘stacking’ array of speakers (left picture) and the nowadays
more common ‘line array’ concept (centre picture) to cover large audience areas. Rightmost
a typical active, digitally steerable line array using multiple identical loudspeakers mounted
in one line, individually driven by a DSP amplifier allowing individual adjustment of fre-
quency, time and phase response of the 16 point sources to create a coherent and nearly
cylindrical wave front (Renkus-​Heinz).

• expectation for fidelity (linear and non-​ linear distortion) with respect to the
program
• budget limitations
• definition of target audience area and possibly areas with requirements for quietness
• architectural restraints (size, quantity and acceptable locations for loudspeakers)

With respect to the situation that current signal-​processing can maintain almost any
required correction to the signals fed to the loudspeakers (i.e., equalizing, delaying etc.), the
main interest is in the specific output properties that are inherently connected to the loud-
speaker and cannot be changed by processing. Moreover, the order of importance is helpful
to find out which is the economically most efficient system.
74

74 Gottfried K. Behler
In the first place the application of the loudspeaker requires consideration: it makes a
big difference whether an emergency call is to be announced or music is to be reproduced
at the highest sonic quality. This, for example, defines the lower and the upper limit of the
frequency range and to some extent the smoothness of the response, the permissible distor-
tion and the maximum output capability. Further details will be discussed in the following
sections 3.2.1 to 3.2.4.
One of the most underestimated properties of loudspeakers is their directivity as it
cannot be changed by manipulation of the input signal since it is inherently connected
to the mechanical design of the cabinet and the arrangement of the different transducers.
An exception is digitally controlled (DSP) loudspeaker arrays, where each of the many
transducers (arranged either as a line source or as a two-​dimensional planar array) can be
individually driven by dedicated amplifiers with signals suitably filtered for magnitude and
phase so to create a specific directivity pattern; see section 3.3.5.
Other parameters which are difficult to correct by signal-​processing are:

• power handling
• power compression
• distortion
• resonance behavior

3.2.1 Linear dynamic behaviour


When investigating loudspeaker transfer functions, it is important to distinguish between
the linear behaviour and the non-​linear behaviour. Generally, at high output levels a loud-
speaker will produce quite some non-​linear distortions and to some extent a time depend-
ency in the transfer behaviour can be observed due to heating up of the voice coil wires.
Since even with small signal excitations non-​linear effects in loudspeakers can be observed,
these must be considered for measuring the linear dynamic behaviour. Anyway, small signal
excitation should be used for the measurements. In this chapter the linear behaviour of
sound reinforcement loudspeaker systems will be discussed. Linear dynamic behaviour
is used for the description of the most relevant performance data in time and frequency
domains.

3.2.1.1 Frequency domain description: The frequency response curves (sensitivity,


impedance, group delay)
Transfer functions are complex values involving both magnitude and phase for a com-
plete description of the system. The magnitude of the transfer function (more commonly
called frequency response) of a loudspeaker defines the frequency-​dependent output sound
pressure p TF of a loudspeaker measured at a well-​defined point –​in general perpendicular in
front of the speaker –​with respect to a frequency-​independent input voltage u for all aud-
ible frequencies (i.e., 16 Hz to 22 kHz). In general, the frequency response is represented
as a double logarithmic plot with the frequency denoted in hertz [Hz] on the x-​axis and the
output sound pressure level (SPL) of the loudspeaker in decibel [dB] with respect to the ref-
erence sound pressure p 0 = 20 µPa on the y-​axis:

p TF
L measured = 20 log dB ( SPL ) (3.1)
p 0
75

Loudspeakers 75

Figure 3.5 Frequency response of a loudspeaker system plotted for a rated input voltage of 2.83 V and
a measurement distance of 1 m on axis. Sensitivity (see equation (3.2)) and bandwidth
with respect to upper and lower cut-​off frequency is depicted with dashed lines.

A frequency response of a sound reinforcement loudspeaker is given in Figure 3.5. The


example shall be used to define some of the properties related to the frequency response as
stated in ISO 60268-​5:

Frequency range according to DIN EN 60268-​5 paragraph 21.2.1: … frequency response


defined by upper and lower cut off, … where the on-​axis frequency response does not
fall below the average level by more than 10 dB. The average level is defined by one
octave band with the highest sensitivity or a wider frequency band as defined by the
manufacturer. Steep dips of less than 1/​3 octave may be ignored [1].

In Figure 3.5 the sensitivity can be found to be 95 dB for a given input voltage of 2.83 volts
at a distance of 1 m. The lower cut-​off frequency is at 38 Hz and the upper cut-​off is at
18 kHz. The narrow dips in the frequency response above 12 kHz may be ignored because they
are less than 1/​3 octave wide. The sensitivity definition takes the nominal input impedance
Zn of the loudspeaker into account, which can be derived from the frequency-​dependent
impedance plot as given for example in Figure 3.6. To assess the efficiency of loudspeakers,
an electrical power of 1 watt into the nominal impedance is required. Therefore, the ‘nom-
inal’ input voltage can be calculated as U n = Zn ⋅1 W . For an 8 ohm nominal impedance
the input voltage needs to be 2.83 V.
For a correct measurement of the frequency response, the microphone should be placed
in the far field. This requirement leads to measurement distances rmeas of much more than
1 m, especially for large loudspeaker systems. The following equation refers to a nominal
power of 1 W during the measurement:

p TF r dB ( SPL )
L sens = 20 log + 20 log meas (3.2)
p 0 1m (1W / 1m )
76

76 Gottfried K. Behler

Figure 3.6 Frequency-​dependent input impedance of a loudspeaker. For this system the nominal
impedance Zn was defined by the manufacturer as 8 ohms. Taking the tolerance limit into
account allowing a −20% undercut this loudspeaker does not fulfil ISO standards.

For both acoustical (sound pressure) and electrical (impedance) transfer functions,
in most cases the absolute value (magnitude) is shown while the imaginary part (phase
response) is often neglected. In Figure 3.7 the phase response plots for Figure 3.5 (acoustical
transfer function, upper graph) and Figure 3.6 (electrical transfer function, lower graph)
are depicted. The phase response of the acoustical output of a loudspeaker system should
be plotted without the excess phase shift introduced by the sound delay due to propagation
from the loudspeaker to the microphone. The phase of the electrical impedance clearly
shows the resonance of the system (bass reflex tuning frequency at about 43 Hz where the
phase response shows a zero-​crossing) and the transition from compliance to mass loading
above 120 Hz. For higher frequencies the impedance is dominated by the crossover network.
Due to its high sensitivity, attenuation for the horn loaded compression driver is needed,
which in passive loudspeaker systems is introduced by the crossover network. Therefore,
the input impedance of the high-​frequency unit is masked behind the higher impedance of
the network.
Another possible representation for the phase-​related behaviour of a loudspeaker system
is given by the group delay calculated as the first derivate from phase response over angular
frequency:


t gr = − s (3.3)

Compared to the phase response, the group delay response is a linear distortion figure
showing the frequency-​dependent delay introduced by the system (‘energy storage’ in the
7

Loudspeakers 77

Figure 3.7 Upper graph: phase response curves of the sound pressure transfer function (shown in
Figure 3.5); lower graph: the phase response curve of the impedance transfer function
(shown in Figure 3.6).

system). The audibility of group delay distortion very much depends on the frequency and
the amount of variation [2]. Audible effects with loudspeakers are mainly found at low
frequencies where the group delay variation is the highest [3]. The group delay response
derived from the phase response in Figure 3.7 is shown in Figure 3.8.
78

78 Gottfried K. Behler

Figure 3.8 Group delay distortion of the phase response shown in Figure 3.7.

3.2.1.2 Time domain description: Impulse response, step response, waterfalls


The time domain behaviour of a system is connected to the frequency domain by the
Fourier transform. Therefore, the following time domain description of a loudspeaker is just
a different aspect of the same data; also refer to section 5.3. Inputs to a system will always
take place in the time domain, and direct measurements cannot be performed in the fre-
quency domain. Hence, the frequency domain plots shown above have been obtained from
time domain data by using computerized algorithms like fast Fourier transform (FFT).
To measure the behaviour of a linear time invariant system (LTI system), a specific input
signal is required. The generic function in the time domain to perform this measurement
is the Dirac delta function, a theoretical pulse function that is zero at any point except for
time 0, where it is infinitely high. In a practical measurement application, the Dirac pulse
would be the signal with the shortest possible duration and the highest possible ampli-
tude. The digital equivalent would be one sample with full-​scale amplitude while all other
samples are 0. When driving a system with this input signal the measured output is called
the impulse response of that particular system, containing all required information to derive
all the above-​mentioned parameters. However, in practical measurements the Dirac pulse
is replaced by signals like sweeps or pseudo-​random noise that distribute their energy over
time and provide a much higher signal to-​noise (S/​N) ratio. Typically, correlation methods
are used to calculate the impulse response from the specific excitation signal. A very com-
prehensive description of the benefits of sine sweep measurements can be found in [4].

3.2.1.2.1 THE IMPULSE RESPONSE

The impulse response is said to be the only relevant measurement for a linear time invariant
system (LTI) containing all information about the system. Unfortunately, loudspeakers are
neither extremely linear nor extremely time invariant, mostly due to the mechanical nature
79

Loudspeakers 79

Figure 3.9 Impulse response of the loudspeaker shown in Figure 3.15.

of the transformation of electrical current to acoustical sound. At low power levels most
loudspeakers can be considered as having low distortion factors. However, care must be
taken that a certain distortion limit is kept during the measurements. For them to be mean-
ingful, the distortion at low frequencies below 200 Hz should be THDmax = 3% and in the
mid and high frequencies THDmax = 1% should be kept (compare eq. (3.4)).

3.2.1.2.2 THE STEP RESPONSE

The step response is derived from the impulse response by integration over time. Likewise,
the impulse response can be calculated from the step response measurement by differenti-
ation over time. So, compared to the impulse response there is no specific new information
obtained by the step response function, it is just a different way of visualizing the same data.

3.2.1.2.3 WATERFALL DIAGRAM

The waterfall diagram combines the frequency response plot with time domain informa-
tion: it displays the frequency-​dependent progression of the impulse response, or vice versa,
the time-​dependent frequency response as a 3D Plot (x-​axis: frequency [Hz], y-​axis time
[s]‌, z-​axis SPL [dB]). However, since the duration of a single period of a sinusoidal signal
depends on the frequency, identical damping at different frequencies results in different
decay times. Simply speaking: the decay for low frequencies takes longer than for high fre-
quencies though both may die out during only one period, meaning they are identically
damped. Though apparently the decay time at higher frequencies is shorter, the damping
might be less. Therefore, the time in the waterfall plot should be time-​compensated in a
way that one period independent of the frequency covers the same length on the time axis.
A plot like this can be achieved using a wavelet transform instead of the commonly used
sliding FFT-​window plot.
80

80 Gottfried K. Behler

Figure 3.10 Step response of the loudspeaker shown in Figure 3.15.

It is possible to show the same data in a 2D coloured plot: in Figure 3.11 the most
common display of the decay time over frequency is shown for the sample loudspeaker
(frequency response shown in Figure 3.3). It displays a very well damped behaviour for the
entire frequency range. However, the above-​mentioned problem (neglecting the frequency-​
dependent duration of a period) is visible and it appears that low frequencies are not treated
as well as the high frequencies. Therefore, the magenta curve shows the duration of exactly
one period at each frequency, which would be equivalent to a frequency-​independent con-
stant damping factor.

3.2.2 Distortion
Often, loudspeakers are blamed as the weakest element in the audio reproduction chain.
With respect to frequency response and distortion value this certainly is true. Whereas
the frequency response can be linearized to some degree using EQs, distortion becomes a
severe problem the closer a loudspeaker is driven to its power limits, which is often the case
for installed sound systems. Hence, knowledge about these limitations is crucial to avoid
planning errors.
Several distortion factors must be considered with a dynamic loudspeaker driver under
load: linearity of the stiffness of the membrane suspension, linearity of the force factor, the
change in voice coil inductance, flow resistance of the port in vented cabinets, parasitic
resonators in the mechanical build of the driver, mechanical excursion limits at low fre-
quencies, thermal limits mainly at mid and high frequencies, and finally power compression
due to voice coil heat-​up. All these limiting factors are connected to each other but may be
evaluated separately.
81

Loudspeakers 81

Figure 3.11 Waterfall diagram of the loudspeaker from Figure 3.5. The magenta curve shows the the-
oretical decay time for constant relative damping equivalent to 1 period of the related
frequency.

Apart from the linear distortion that was discussed earlier, the frequency-​dependent non-​
linear distortion must be measured, usually the harmonics of the fundamental frequency.
The most important ones are the second-​(octave) and third-​order harmonics. These can
be plotted relative to the linear frequency response; see Figure 3.12.
Total harmonic distortion THD values are used to quantify summarized distortion values.

N
U 2N
THD = 100∑ %
U12 (3.4)
2

with U N =​RMS voltage of the harmonic N.


Alternatively, the graph could show the maximum possible SPL at a given frequency
without exceeding a certain distortion level (say, 3%, which corresponds to −30 dB,
compared to the linear part).
Acceptable, practical limits for sound reinforcement applications are 3% THD when
very high sound quality is required, otherwise up to 10% THD, which already introduces
severe loss in quality and fidelity. Figure 3.13 shows a typical measurement result. In this
example, the theoretical maximum output level as rated by the manufacturer is calculated
from the sensitivity curve by adding 30 dB, representing 1 kW input power. Note that the
newgenrtpdf
82
82
Gottfried K. Behler
Figure 3.12 Typical distortion plot measured by EASERA.
83

Loudspeakers 83

Figure 3.13 Frequency-​dependent maximum achievable SPL for a defined limit of the THD figure.
The figure shows the sensitivity of the loudspeaker, the theoretical SPL for the proclaimed
power handling of 1000 W input power (manufacturer rating) and the measured, achiev-
able SPL for a given THD of 3% and 10%.

actual THD measurements show much lower levels at low and high frequencies before the
3% or 10% distortion limit is reached. It can also be seen that for a wide frequency range
(200 Hz to 3 kHz) the achievable level for 10% THD seems to be identical to the 3% THD
limit. Due to the fact that the measurement was limited to the specified 1 kW it is possible
that neither distortion limit has been reached and the achievable SPL at those frequencies
could be higher given more input power.

3.2.3 The directional characteristics


Loudspeaker measurements are typically performed in an anechoic space where one omni-
directional microphone is placed on-​axis, in front of the loudspeaker. These measurements
show the response of the loudspeaker for this single point in space. While the response
of a nicely designed loudspeaker does not change dramatically with re-​positioning of the
microphone, this is not always the case. To be able to make an assessment of the entire
loudspeaker, especially with regard to placing it inside a room, a full understanding of the
response of the loudspeaker unit is needed. This measurement can result in the power
response (averaged frequency response of a sphere around the loudspeaker) and in the
directivity plot. The directivity characteristics of a loudspeaker are even more important
than the frequency response: whereas the frequency response can be optimized in the signal
path by using analogue or digital equalizing devices, the directivity cannot be altered by
any signal-​processing device. It is mostly defined by the mechanical design: the shape of
84

84 Gottfried K. Behler
the loudspeaker cabinet, the size and placement of the transducers, the use of wave guides
(horns etc.) and the crossover frequencies (the latter not being mechanical in nature but
electrical, but often not user-​accessible).
The information about the directivity is obtained by a set of frequency response
measurements carried out at different angles in space around the speaker. The most common
method is to define the on-​axis front direction to be the 0 degree of a spherical coordinate
system. With respect to this, the elevation angle is defined to be positive to the upper and
negative to the lower hemisphere. Accordingly, to establish a coordinate system that fits the
Cartesian system the azimuth angle counts positive anticlockwise.
In Figure 3.14 the typical placement of the coordinate system is displayed. It illustrates
that from the reference point –​which in most cases is coincident with the high-​frequency
transducer –​an axis (the x-​axis) perpendicular to the front panel defines the 0° direction.
The same orientation is then used to measure the frequency response curve, as shown in
Figure 3.5. The distance to the microphone is referenced to this point. For a large loud-
speaker and rather close microphone, the distance between each individual driver and
the microphone will vary when rotating the loudspeaker around its reference point. In
the ideal far field, the loudspeaker becomes an actual point source, and the measurement
of angle-​dependent transfer functions is perfect, whereas for shorter distances the error
due to the distance mismatch of each driver becomes more and more severe. As a rule
of thumb, the relative level error may be estimated by a simple formula considering the
distance variation given by the maximum offset of any of the loudspeakers in the front
plane ∆l to the reference point and the microphone distance D:

 ∆l 
∆L = 20 ⋅ log  + 1 dB (3.5)
 2 D 

Cartesian system defining the angles for the measurements of directivities with
Figure 3.14 
loudspeakers. Note that the distance to the point for the microphone is not defined, but
must be chosen with respect to the size of the loudspeaker.
85

Loudspeakers 85

Consequently, the minimum microphone distance Dmin can be calculated from the
dimensions of a loudspeaker and a given maximum level error ∆L max :

∆l
Dmin = m
 ∆L20max  (3.6)
2 10 − 1
 

As an example, a measurement of a large loudspeaker with a distance between high-​


frequency unit (reference point) and woofer of ∆l = 0.5 m with a given level accuracy
of less than ∆L max = 0.5 dB requires a minimum distance of Dmin = 4.2 m. In Figure 3.15
a typical set-​up for measurements of loudspeaker directivity is shown. The loudspeaker is
mounted on a device that rotates in any direction by means of two computer-​controlled
stepping motors responsible for the z-​axis and the x-​axis.

Computer-​
Figure 3.15  controlled robot for measuring directionality information of loudspeaker
systems. Note the tilting of the z-​axis, which in consequence leads to an intersection
of the x-​axis with the ground plane of the pictured half-​anechoic room. At this point
of intersection, the microphone is placed so to ensure that there is only one signal path
between source and receiver. The distance in this set-​up is 8 m. (Picture courtesy AAC
Anselm Goertz.)
86

86 Gottfried K. Behler
As a first result of this type of measurement, the directional factor Γ can be found, which
gives an angle-​dependent relative value of the sound pressure found at any direction relative
to the front direction:

p (φ,θ)
Γ ( φ, θ ) = 1
p (φ = 0,θ = 0) (3.7)

Figures showing the directivity ratio are not quite typical whereas the directivity gain
showing the dB value of the directional gain D ( φ, θ ) is a common description found in
polar plots and other figures of the directivity such as the different plots shown in section
3.2.3.1:

 p (φ,θ)   p (φ = 0,θ = 0) 
D ( φ, θ ) = 20log Γ ( φ, θ ) = 20log   − 20log   dB
 p0   p0  (3.8)

The data acquisition for a full sphere may take several hours depending on the reso-
lution in space; compare Figure 3.15. The minimum spherical resolution acceptable is 15°,
which already requires the measuring of 288 frequency response functions. For a resolution
of 5° the number of individual measurements is 2522 (considering that the measurements
at the poles are only taken once)! If each single response takes 5 seconds measurement
time, the procedure lasts about 3.5 hours. It is obvious that the handling of these data may
be difficult and for a general visualization of the directivity a less detailed solution must
be found.

3.2.3.1 Display of loudspeaker directivities


From the above-​mentioned measurements, any required information can be taken and
visualized. The most common representation of the directivity is given by the polar plot
(Figure 3.16). Unfortunately, the plot can only show a limited number of frequencies since
otherwise it becomes quite confusing.
Three-​dimensional isobar plots are helpful for the illustration of the directionality in
one plane (either horizontal or vertical) and can reveal many details which are difficult
to see in polar graphs. Figure 3.17 shows the isobar plots of the loudspeaker in Figure 3.15.
In the range below 3 kHz the horizontal directivity behaves quite well and is symmetrical.
For the vertical directivity, the symmetry is less perfect, showing a downward inclination
of the lobe around 500 Hz. For both plots –​horizontal and vertical –​the directivity at high
frequencies becomes irregular, which is a result of the horn flare used. Even the crossover
frequency can be found in the vertical directivity plot as the beamwidth narrows around
2 kHz. This is a typical behaviour since for the frequency range where both transducers are
equally loud the source dimension becomes large. However, the phase shift between low-​
and high-​frequency transducer seems to be well controlled since there is no shift of the main
lobe from the 0° axis.
In the case of using a PC to display the data, interactive 3D polar plots (or balloon plots)
are useful; however, they are difficult to print to 2D without losing information and they can
only provide information at a single frequency. In Figure 3.18 a typical balloon plot of the
directivity at one single frequency is shown.
newgenrtpdf
87
Figure 3.16 Horizontal and vertical polar plots of a two-​way PA loudspeaker system with 12′′ woofer and 1′′ horn-​loaded tweeter for different

Loudspeakers 87
1/​3-​octave bands. The frontal direction points to the 0°. For positive and negative angles, the observation point –​at a fixed dis-
tance –​rotates around the reference point, as shown in Figure 3.14.
8

88 Gottfried K. Behler

Figure 3.17 Isobar plots for a loudspeaker in 2D and 3D display for the horizontal and vertical
directivity. The frequency resolution is smoothed to 1/​12th of an octave. The horizontal
x-​axis shows the frequency. The vertical axis shows the level relative to the frontal dir-
ection (0 degree) in different colours, hence the 0° response is equal to 0 dB. The orange
range covers a deviation of ±3 dB around the 0 dB, whereas for all other colours the range
covers only 3 dB. The right axis shows the angle of rotation (either horizontal or vertical)
of the loudspeaker for a full 360° rotation.

3.2.3.2 The directivity index and other measures related to directivity


In many applications, it is not necessary to deal with complicated directivities and their
detailed description in huge data files. Alternatively, a compact description may be given
by the directivity index, which is obtained from the ratio of the sound pressure level on axis
in comparison to an omnidirectional radiation of the acoustic power of the loudspeaker.
The understanding of this index is the gain of the directional system in comparison to
an omnidirectional system having the same output power. There are different methods to
derive the directivity index. Measuring sound source power in a reverberant chamber is an
old-​fashioned but nevertheless very accurate method. The method in detail is described in
DIN EN ISO 3741 [5]. The sound power level is defined as:
89

Loudspeakers 89

Figure 3.18 Balloon plot of a loudspeaker system for a given frequency.

P
LW = 10 ⋅ log 10 mit P0 = 10 −12 WdB (3.9)
P0

with P being the measured sound power of the source which is derived from sound pressure
level measurements performed in a reverberant chamber according to the regulations
stated in ISO 3741. In the case of loudspeaker measurements, the measurement conditions
shall be exactly the same as for the sensitivity measurements. Especially the input voltage
needs to be the same since in the end a comparison of the two measurements (eq. (3.12))
is done.
The calculation of the sound power is derived from the average level L calculated
from N pressure level measurements (Li) taken at several positions in the reverberant
chamber:

1 N Li 10
L = 10 ⋅ log 10 ∑10
N i =1
dB (3.10)

The sound power level of the loudspeaker then can be calculated

T60 / s
LWLS = L − 10 ⋅ log 10 − 14 dB dB (3.11)
VRC / m 3
90

90 Gottfried K. Behler

The relationship between free field sensitivity and diffuse field sensitivity. The DI
Figure 3.19 
describes the difference between the two graphs. The diffuse field sensitivity is typically
measured in 1/​3-​octave bands; therefore, the free field sensitivity needs to be averaged in
the same bandwidth to evaluate the DI.

The corrections in this formula are required due to the frequency-​dependent absorption
and the volume of the reverberant chamber, which both affect the sound pressure level
measured in the diffuse field. For a detailed derivation please refer to ISO 3741. Therefore,
the reverberation time T60 as well as the volume VRC of the reverberation chamber must
be known. Some further correction terms denoted in ISO (with respect to meteorological
conditions) are required for laboratory accuracy. The sound power level is equivalent to a
sound pressure level of an omnidirectional source measured at a distance of 28 cm. To com-
pare this with the typical free field measurements of loudspeakers taken at a distance of 1 m
the sound power level must be reduced by 11 dB. This finally leads to the calculation of the
directivity index (see Figure 3.19):

DI = L sens − LWLS + 11 dB dB (3.12)

This value is often called the direct or frontal to random index, which is the logarithmic dB
value derived from the frontal to random factor, also known as directivity factor Q:

Q=
∫ p
S
0R
2
dS
=
S
(3.13)
∫ p R 2 ( φ, θ ) dS ∫ Γ 2 dS
S S

DI = 10 log Q dB (3.14)
91

Loudspeakers 91
where
p R is the sound pressure at a given distance R from the defined centre of radiation
p 0 R is measured in the frontal direction at the distance R
S is the spherical surface around the speaker at the distance R
ϕ, θ are the angles according to Figure 3.14

Since all these numbers strongly depend on frequency, many manufacturers show either
tables or plots of this dependency in the data sheets of their products. For approximate
calculations or measurements of the directivity index at least the range of the 500 Hz and
1000 Hz octave band needs to be covered.
For combining the influence of the directional effect as well as that of the distribution
between directional and omnidirectional energy, the product of both quantities called the
directivity deviation factor g ( φ, θ ) is used:

g ( φ, θ ) = Q ⋅ Γ 2 ( φ, θ ) . (3.15)

Obviously, the directivity deviation factor is based on the previously introduced directional
factor Γ (eq. (3.7)) and is determined by multiplying by the constant value of the directivity
factor Q; only the range of values of the angle-​dependent function changes. Since Γ 2 ( φ, θ )
is 1 for the reference angle ( φ, θ = 0 ) (cf. Figure 3.14), the ‘directivity deviation factor’ can
be understood as the angle-​dependent decrease of the directivity factor (application see also
section 6.3.2.1).
The logarithmic expression of the directivity deviation factor g ( φ, θ ) is the directivity
deviation index G denoting the deviation in dB, which gives a better estimate for practical
application:

G ( φ, θ ) = 10 log g ( φ, θ ) = D ( φ, θ ) + DI dB (3.16)

The reader should be aware of the partially contradicting usage of the introduced values.
It is important to note that Q ( f ) and DI ( f ) are frequency-​dependent and present a single
number of the directional behaviour (integrated over the full sphere) at certain frequencies
or frequency bands. By definition, they need to be calculated with respect to the frontal dir-
ection as defined by the manufacturer. On the other hand, the two properties G ( φ, θ, f ) and
g ( φ, θ, f ) depend on frequency and spatial direction.

3.2.3.3 Loudspeaker efficiency


The efficiency η of a loudspeaker system is determined by the ratio between radiated
acoustic power and supplied electric power:

Pac E2 4 πr0
η= ⋅ 100% = LS ⋅ ⋅ 100% (3.17)
Pel ρ0 c QLS

where
Pac total radiated acoustic sound power
Pel electric power fed into the loudspeaker
92

92 Gottfried K. Behler

E LS sensitivity of the speaker defined as the sound pressure level in frontal direction per 1
W at 1 m distance
r0 reference distance, 1 m
Q LS directivity factor of the speaker

ρ0 c characteristic acoustic impedance of air = 413, 3 Ns  Pa ⋅ s  at 293, 15 K (20C )


m3  m 
Combining all constants, one obtains the following approximation:

E2 LS
η= 3 ⋅% (3.18)
QLS

This correlation can be seen in Figure 3.20. In practice, the efficiency of loudspeaker
systems is between 0.1 and 10%. Eq. (3.18) suggests that owing to the frequency depend-
ence of the directivity factor (cf. Figure 3.19 and eq. (3.14)) and the mostly insignificant
frequency dependence of the free-​field sensitivity, the loudspeaker system efficiency also
depends heavily on the frequency.
Loudspeaker efficiency has become less important with the availability of high-​power
amplifiers and high power-​handling capacities of the loudspeaker drive units. Nevertheless,
it should be kept in mind that an almost inaudible level increase of 3 dB requires double the
power. To achieve twice the perceived loudness (a level increase of about 10 dB) requires 10
times the power (i.e., from 500 W to 5 kW).

3.2.4 Power handling


The loudspeaker’s voice coil is manufactured by winding an aluminium or copper wire
around a cylindrical tube made of aluminium, kapton or any other heat-​withstanding strong

Figure 3.20 Sensitivity of a loudspeaker as a function of the Efficiency and Directivity Index.


93

Loudspeakers 93
material. It should be lightweight and stable with respect to its shape and dimensions when
heated up to temperatures of up to 250°C. Temperatures that high are not common but
possible during operation at the limit of the power-​handling capability. At temperatures
above 200°C the risk of damaging the transducer is imminent. The following rules can be
followed to determine the maximum acceptable input power with respect to the voice coil
temperature.
According to IEC 60268-​5 [1], short-time power handling (paragraph 17.2) and long-
time power handling (paragraph 17.3) are defined as the maximum input voltage of typical
program material that does not damage the device. For the short-time limit, the input signal
duration is 1 second and the signal is repeated 60 times with a pause of 1 min in between.
For the long-time limit, the input signal duration is 1 minute and this signal is repeated 10
times with 2 minutes pause in between. Besides these two figures, a continuous sinus-​signal
power-​handling is defined in paragraph 17.4 of the standard. This is the sinusoidal input
voltage at a given frequency (within the range of application) that does not damage the
device when presented for at least 1 hour.
It is important to understand that these ‘rated values’ are not results of measurements but
are stated by the manufacturer and can be verified or falsified with the above-​mentioned
measurement procedure stated in IEC 60268-​5. The Audio Engineering Society has defined
methods for the measurement of ‘drive units’ in the AES2-​2012 standard [6].The differences
from the IEC 60268-​5 are small and in most cases the AES2 standard refers to the IEC one.

3.3 Directional loudspeakers


As shown in the previous chapter, the directivity of a loudspeaker system is the one prop-
erty that cannot be changed by means of any signal-​processing device in the signal chain
in front of the loudspeaker. This is one reason why it should be considered one of the most
important properties when selecting an appropriate loudspeaker for a particular application.
The directivity defines where the sound is directed and how homogeneous the distribution
of the sound will be.
Furthermore, using dedicated signal-​processing in connection with multiple drivers in
one loudspeaker system allows the accurate steering of sound energy to where it is needed or
away from where it is not wanted. Therefore, this section deals with the creation, definition
and manipulation of the directivity of loudspeaker systems.
Directivity is a phenomenon based on coherence and interference and due to this fact
very much depends on wavelength. In acoustics, the wavelength of audible sound covers
17 m for the lowest frequency of 20 Hz and only 17 mm for the highest frequency of 20 kHz.
It is due to this wide range that a simple solution for a homogeneous directivity of a loud-
speaker is not feasible.
The directional characteristic of a loudspeaker system, ranging from a simple two-​way
system to a complex line array, can be calculated by summing up the contributions of the
individual subsystems with respect to magnitude and phase. In the very early design phase
of a loudspeaker system this could be accomplished by using the boundary element method
(BEM), allowing the study of the interaction between surface velocity of each single trans-
ducer and the shape of the surrounding body (i.e., the cabinet, the horn flare etc.). However,
this method assumes exact knowledge of precisely those vibration amplitudes –​or even the
velocity distribution on the surface of each of the individual sound transducers. Since these
are often not known or difficult to measure, in practice one must be satisfied with the meas-
urement of the directional characteristic of the overall system.
94

94 Gottfried K. Behler

Figure 3.21 Theoretical polar plots for a circular piston of 25 cm diameter (a typical 12′′ woofer) in an
infinite baffle for frequencies from 500 Hz up to 2.5 kHz in steps of 1/​3 octave. The total
range is 50 dB. The dotted lines have a distance of 6 dB, so that the intersection points
at the −6 dB line denote the beam width of the directivity, leading to approximately 150°
at 1 kHz, 100° at 1.25 kHz, 75° at 1.6 kHz, 58° at 2 kHz and 45° at 2.5 kHz. Obviously,
omnidirectional sound radiation can be assumed for frequencies below 500 Hz.

3.3.1 The directivity of the circular piston


To understand the origin of directivity we need to firstly understand the directivity of ‘simple
sources’ like a circular vibrating piston. This is the model radiation pattern for almost any
loudspeaker with a circular membrane flush-​mounted into a cabinet wall.
Figure 3.21 shows polar plots for different frequencies of a 12′′ cone driver (the diameter
of the membrane approx. 25 cm). In many two-​way compact loudspeaker systems, the 12′′
cone needs to reproduce frequencies up to 1500 Hz or even higher, because the 1′′ compres-
sion driver in combination with a rather small horn mouth does not permit a lower cut-​off
frequency. Obviously, the directivity of such a construction varies significantly from almost
omnidirectional below 500 Hz to rather directional at 2 kHz.

3.3.2 Horn-​loaded loudspeaker systems


One of the oldest methods to achieve a directional sound source is using a waveguide like a
horn. The simplest design is the conical horn, which was used in ancient times as well. The
basic idea is to prevent the undirected propagation of the sound by creating a guide for the
sound waves in the direction of the horn axis by means of the horn walls. Unfortunately,
the huge bandwidth of sound (wavelength from 17 m at 20 Hz up to 1.7 cm at 20 kHz) leads
95

Loudspeakers 95
to quite bulky devices where low frequencies are included, since the horn size needs to be
in the range of the wavelength to become effective. Therefore, the application is restricted
mainly to mid and high frequencies.
Current horn constructions differ greatly from the conical shape and the first
improvements were made by the invention of the exponential horn, which was used for
the early gramophones. Webster’s horn theory [7] was issued in 1914 long before the first
electroacoustic systems were available. However, the pure exponential horn shape is not
often found, but the growth of the cross-​section in general follows an exponential law. The
main benefit of the exponential horn compared to the conical horn with identical low
cut-​off frequency is its significantly reduced length, which makes it applicable in standard
loudspeaker cabinets in combination with a 10 ′′ to 15′′ woofer. This very popular two-​way
loudspeaker system can be found in many different applications, from mobile DJ systems to
installed sound reinforcement systems in churches, theatres and other small venues. The
examples presented in section 3.2 illustrate such a loudspeaker.

3.3.2.1 Constant directivity (CD) horns


The idea of a constant directivity horn arose with the aim of covering larger audience areas
with many identical loudspeakers, each of them addressing only a certain patch of the audi-
ence area, followed by a neighbouring one, which seamlessly should cover the next patch. To
avoid interference between neighbouring loudspeakers the overlapping area needs to be well
controlled, which is only possible with loudspeakers radiating the sound into a well-​defined
angle. This means constant directivity (CD). The first inventions and patents are from the
1950s; [8] and [9] are comprehensive and good scientific papers covering many aspects.

Figure 3.22 JBL 2360 Bi-​Radial® Constant Coverage (another name for CD) horn with attached 2′′
compression driver (courtesy JBL). The left picture shows the narrow slit in the neck
of the waveguide, which continues into the final horn and is intended to diffract the
soundwave horizontally into the final surface line of the horn, so to cover a wide hori-
zontal angle of 90°. The vertical angle of 40° is maintained throughout the length of the
horn with a little bit of widening to the end of the horn mouth.
96

96 Gottfried K. Behler
A typical example of a CD horn is the well-​known theatre horn JBL 2360. Figure 3.22

Figure 3.23 Horizontal directivity of the JBL 2360 with JBL 2445J 2′′ compression driver. The aimed
coverage of ±45° is met in the frequency range between 600 Hz and 10 kHz. To cover
lower frequencies requires a larger horn and for the higher frequencies the diffraction slit
probably needs to be even smaller. (Measurement courtesy Anselm Goertz.)

shows the horn with attached compression driver. In Figure 3.23 the horizontal directivity
achieved by this horn is shown.
One of the disadvantages of this construction is the long horn neck with only a small
growth rate in cross section. Rather, the neck is supposed to produce a flat wavefront that
propagates to the diffraction slot. The sound intensity must be distributed evenly over the
wavefront to achieve a uniform sound level in the desired angular range. Although this goal
may be achieved, the result is paid for with higher distortion due to non-​linear wave propa-
gation at high levels. The longer the horn neck becomes, the more distortions are created.
For this reason, horns of this type are used less frequently.
Besides the JBL 2360A, which was somewhat typical, many other solutions from
different manufacturers were on the market. All followed the same idea and had similar
design principles. Whereas the ‘smaller’ horns like the JBL 2360 could only be used in two-​
way systems with an additional low-​frequency box, other manufacturers such as Electro
Voice offered full-​range horn-​loaded constant directivity designs. Figure 3.24 shows one
example of a large horn-​loaded stadium loudspeaker. The application is focused on speech
reproduction; the lower cut-​off frequency is at around 100 Hz. Though the loudspeaker is
a two-​way loudspeaker, it follows the principle of coincident placement of low-​and high-​
frequency transducer. The benefit of this arrangement is that the directivity at the crossover
frequency is not affected by interference due to the side-​by-​side placement. This phenom-
enon is discussed in the next section.
Modern horn design is still based on the CD horn principles. However, disadvantages
like the distortion-​producing horn throat shape are found less often. In today’s more
common line arrays, the horn designs must essentially have a uniform horizontal directivity,
while vertically they should primarily produce a flat wavefront with constant sound pressure
and phase. This allows the creation of an approximately homogeneous line source when
97

Loudspeakers 97

Figure 3.24 Electro Voice MH 6040AC stadium horn loudspeaker covering the full frequency range
from 100 Hz up to 20 kHz. The construction uses two 10′′ woofers to feed the large
low-​frequency horn and one 2′′ compression driver feeding into the small horn placed
coaxially into the large mouth. The dimensions: height 1.5 m, width 1 m, length 1.88 m,
weight 75 kg.

combining several sources, resulting in a controlled vertical directivity in combination with


a defined horizontal coverage (see section 3.3.5.2).

3.3.3 The directivity of multiple sources

3.3.3.1 The most common two-​way loudspeaker system


The combination of multiple transducers in two-​way (or > 2) loudspeaker systems leads to a
more complex directivity behaviour due to the interaction of the different membrane sizes,
the physical distance of the individual drivers and the crossover network. Specifically, the
transition from one transducer to the next at the crossover frequency causes the directivity
to change due to different factors. Firstly, at the crossover frequency, the sound pressure
of all transducers involved is equal (in most designs of the crossover network) whereas
below and above that frequency the contribution of the non-​primarily-​responsible driver
declines. This causes the total radiating surface to morph from one transducer to the other
with an in-​between combined radiating surface area at the crossover frequency, hence, the
directivity narrows. Secondly, as the two transducers may not have the same diameter, the
directivity for each one is different for the same frequency. This will affect the smoothness
of the directivity index (refer to section 3.2.3.2). Thirdly, when the two transducers are not
phase-​matched (n times 360°) near the crossover frequency, the main axis of radiation is
tilted away from the perpendicular on-​axis direction.
With respect to the aim of a smooth and monotonously increasing directivity index,
the beamwidth of the horn and the woofer should match at the crossover frequency. The
crossover frequency and the dispersion angle are related. The measured directivity of a com-
bination of a 15′′ low-​mid driver and a 1.4′′ compression driver attached to a CD horn (pic-
ture of a similar system shown in Figure 3.25) shows a smooth transition at the crossover
frequency at 1.5 kHz, as can be seen in Figure 3.26. The diameter of the 15′′ cone and the
horizontal width of the horn mouth (designed for this particular crossover frequency) are
98

98 Gottfried K. Behler

Figure 3.25 Two-​way PA loudspeaker system with 15′′ woofer and 2′′ compression driver and CD horn
(courtesy Klein+​Hummel).

almost identical and lead to the same dispersion angle at the crossover frequency. This is
valid for the horizontal plane since the lateral dimensions are identical and the superpos-
ition of the two elements does not change this. Vertically, the two sources combine to more
than twice the size of a single unit at the crossover frequency. Consequently, the radiation
of sound narrows.
Figure 3.26 shows this typical behaviour in the two directivity plots. It is clearly visible
that for the horizontal directivity the beamwidth does not change much at the crossover
point, whereas vertically a narrowing and asymmetry in the frequency range from 800 Hz
to 1.6 kHz is found. The crossover network design defines the frequency range and the
symmetry of the narrowing. The higher the order of the crossover filter design the smaller
the narrowing range in frequency will become due to the steep filter curves. If the phase
response deviates from the intended ‘in phase’ (n times 360° phase shift between the two
elements) design the main lobe of the sound beam will tilt away from the 0° direction.

3.3.4 Directivity control using line arrays


The goal of any array of loudspeakers is to coherently sum up the transmitted sound energy
from each individual cabinet in the array. Combining this with the benefits of a (nearly)
line source (radiation limited to two dimensions instead of three and therefore less attenu-
ation over distance) results in the so-​called ‘line array’ technology that is widely used now-
adays. It is important to understand the concept, the advantages and the limitations of such
systems. Figure 3.4 shows a typical large line array created of 16 identical elements. These
elements can be installed in a vertical array to form a theoretically ideal line source, but the
array can be curved to change and adapt the vertical directivity to match the venue.
9

Loudspeakers 99

Figure 3.26 Standard isobaric plots for the horizontal and vertical directivity of an ordinary two-​way
PA loudspeaker equipped with a 15′′ woofer and a CD horn with 1.4′′ compression driver.
While the horizontal directivity is fairly symmetrical with a slight narrowing at 2 kHz and
becomes narrower at higher frequencies, the vertical isobaric plot shows a typical asym-
metry due to the placement of the two speakers side by side and a strong constriction of
the directivity at the crossover frequency (between 800 Hz and 1600 Hz) due to interfer-
ence. (Courtesy four audio.)

Firstly, to understand line array technique the straight line array created with identical
sources shall be discussed.

3.3.4.1 Straight line array with identical sources


In section 3.1.2.1 the model of the point source with directivity was introduced. The
directivity of a point source describes the angular dependency of the far field frequency
response with reference to the main radiation axis when rotating the loudspeaker around
its reference point (in general a point on the front at the centre of the high-​frequency
unit; refer to Figure 3.14). To calculate the final directivity of the line array then becomes
relatively simple by superimposing the directivity of the directional point source with the
directivity of an idealized line array consisting of n omnidirectional point sources. The latter
newgenrtpdf
10
100
Gottfried K. Behler
Figure 3.27 The directivity of a loudspeaker column (with N =​16 identical chassis, membrane diameter a =​6 cm; equally spaced d =​8 cm).
The figure shows the resulting directivity (right picture) derived from the directivity of a single driver (left picture), which also
shows the horizontal directivity of the column, and the directivity of equally spaced monopole sources (point sources, in the
middle). All directivities are calculated for the centre frequencies (as listed in the plot) with an energetic averaging within 1/​
3-​octave band.
10

Loudspeakers 101
is well described in [10, 11]. In theory, to form a straight-​line source, the individual point
sources need to be less than half a wavelength of the maximum desired frequency apart in
order to form a coherent wavefront. Figure 3.27 illustrates how the two directivities com-
bine for an example of a large column with 16 loudspeakers with a membrane diameter of
6 cm spaced at 8 cm from centre to centre.
As one can see in Figure 3.27, the resulting directivity is neither constant nor uni-
form: with increasing frequency, the directivity gets narrower and higher frequencies are
radiated into a very small angle, plus multiple side lobes (off-​axis SPL maxima) can be
observed.
To optimize the first issue of this array of drivers, the active length of the column should
be frequency-​dependent (longer at low frequencies, shorter towards higher frequencies),
so that a constant size ratio relative to the wavelength is obtained. This can be achieved
by a frequency-​dependent control of the individual loudspeaker drivers in such a way that
only very few loudspeakers (either at one end or in the centre of the column) radiate the
entire frequency range from lowest to highest frequencies. For all other loudspeakers in the
column, the signal is low-​pass filtered with decreasing cut-​off frequency for loudspeakers
further away (Figure 3.28). The second issue described above has its cause in the distance
between each loudspeaker driver in the column. Once the distance reaches half a wave-
length or more, side lobes with rather high intensity are created due to angle-​dependent
constructive and destructive interference. Figure 3.27 shows this behaviour in the rightmost
plot for frequencies above 4 kHz.

Figure 3.28 Same column as in Figure 3.27 except for the frequency-​dependent low-​pass filtered loud-
speaker arrangement to achieve a constant active length of the column relative to wave-
length. The width of the main lobe is significantly greater for high frequencies though
not smooth. The plot shows a simulation with piston-​like membranes and theoretical
radiation pattern; it reveals the great potential for DSP-​controlled loudspeaker arrays.
102

102 Gottfried K. Behler


In many applications, especially in churches and for speech reinforcement solely, such
straight, unfiltered line column speakers have been used for many years and are still in use
today. The narrow and high columns fit well into the structure of stone columns. Since
the main lobe of the directivity pattern radiates perpendicularly from the loudspeaker
body and the loudspeaker is normally installed above the listeners’ heads, it is necessary
to tilt the loudspeakers downwards. Acoustically the distribution of many rather small
loudspeakers in the venue leads to the perception of sound from many different directions
as well as a high excitation of room reverberation, which may lead to degradation of speech
intelligibility. Since the end of the last century, this deficiency and the rapid decline in
the price of signal processors as well as amplifiers have led to the development of active
loudspeaker arrays whose directional characteristic can be tilted electronically rather than
mechanically in the manner of a ‘phased array’ and therefore be optimized for a defined
audience area.

Figure 3.29 Left picture: a DSP-​controlled loudspeaker column. By combining up to nine elem-


ents a column with a length of almost 9 m can be realized. (Courtesy Steffens.) Right
picture: placement of DSP loudspeaker columns in St. Paulus Cathedral in Münster
(Germany). Photo taken by the author during the celebration for the reopening of the
cathedral after renovation including the sound system. Note the installation above
the audience, allowing unobstructed sound propagation even to more distant places in
the audience. However, this requires a downward tilting of the sound beam.
103

Loudspeakers 103

3.3.5 More complex control of the desired directivity


The main task for the sound reinforcement system design is to achieve good sound for the
audience. Good sound means equal loudness, a homogeneous frequency response, low dis-
tortion as well as good speech intelligibility everywhere in the audience.
To achieve those goals, modern loudspeaker concepts can make use of electronic-
ally adjustable directivities. The steering concepts in general are based on the line array
method in combination with adjustable frequency response, level and phase. To achieve
this, individual digital signal-​processing (DSP) and individual amplification for each
driver is used.

3.3.5.1 DSP-​controlled line array loudspeaker


The invention dates back to the 1990s [12] and has since then been continuously improved
by various companies. Today, several manufacturers offer a wide range of DSP-​controlled
steerable loudspeaker columns of different sizes and lengths to cover all sound reinforce-
ment requirements. Especially in acoustically difficult locations such as churches and
railway stations, the concept of these loudspeakers delivers results that clearly surpass con-
ventional loudspeaker system installations.
The programming of the DSP requires dedicated software that is issued by the manufac-
turer and allows the optimization of the filter parameters to cope with the desired directivity.
In general, the target is defined as a cross-​sectional plan of the audience area and the aiming
of the loudspeaker system then is set to deliver an equally loud distribution of the sound
within the defined area.
For large line arrays as depicted in Figure 3.4, similar solutions are available today [13]
though the demands for these types of loudspeakers are much more complex. The results are
quite new but promising.
Planning an installation with these DSP-​ controlled loudspeaker systems requires
computer-​aided tools that allow a detailed simulation of the behaviour of the loudspeakers
in context with the room. The procedure starts with the placement of loudspeakers in the
room. In the next step, the aiming of the sound beams for each loudspeaker is defined. This
results in an estimation for the tilting of the sound beam, the angular width of the beam and
the distance to the audience. The frequency-​dependent directivity of the loudspeaker then
is optimized using the specific loudspeaker filter optimization software. This result is used
for the final simulation of the sound distribution in the acoustics simulation software. The
procedure is described in more detail in Chapter 8.

3.3.5.2 Large line arrays


Large line arrays are the ultimate solution for large outdoor events and concert halls. The
idea goes back to the early 1990s as well [14]. The experience with sound columns regarding
the formation of side lobes, especially at high frequencies, due to the distance between
individual sources, which radiate like point sources, posed a problem that had to be solved.
However, knowing that a continuous line does not produce these unwanted side lobes the
solution was obvious: the sound energy had to be distributed continuously, with the trans-
ducer elements remaining discrete elements at a certain distance. To achieve this, Urban
and Heil invented a waveguide, which they placed in front of a compression driver unit. In
retrospect, the idea is simple: the waveguide had to redistribute the sound energy emerging
104

104 Gottfried K. Behler

Figure 3.30 A DSP-​controlled loudspeaker line array (length 3.6 m) is optimized to deliver sound
to two different audience areas. Each picture shows the optimization within a band-
width of one octave, from upper left to lower right: 250 Hz, 500 Hz, 1 kHz, 2 kHz,
4 kHz, 8 kHz. As expected, the suppression of side-​lobes at high frequencies is diffi-
cult. (Calculation performed with the software dedicated to the Steffens Evolutone
loudspeaker.)

Figure 3.31 The transformation from circular input to rectangular output. The DOSC geometry sets
all possible sound path lengths to be identical from entrance to exit, thus producing a flat
source with equal phase and amplitude.

from the circular opening of the compression chamber driver to a linear, planar wave front
at the output, with the amplitude and phase along the line being equal. However, to achieve
this the geometry of the wave-​guide became rather complicated [15].
By arranging several of these elements one above the other to obtain a straight line for
the high frequencies, and besides this keeping the distance of the low-​frequency transducers
105

Loudspeakers 105

Figure 3.32 Representation of the variation of the distance for cylindrical sound radiation and far field
divergence angle (spherical radiation) with frequency for a straight-​line source array of
height 5.4 meters.

small compared to ½ of a wavelength, an almost perfect continuous line with homogeneous


sound radiation is created. Still the limitations of a line source of finite length apply, but the
disadvantage of the nasty side lobes is solved.
When looking at the limitations of a line source of finite length several findings are
relevant: the frequency-​dependent radiation pattern, the distance-​dependent frequency
response and level drop, and finally the impulse response. The discussion about this started
much before the first practical applications with large line source arrays took place [16].
The most relevant concern is how to keep the frequency response constant for different
distances. This relates to the situation depicted in Figure 3.27. As the dispersion of sound
from a linear array depends on the wavelength, it is obvious that for lower frequencies the
array radiates spherical waves, whereas at high frequencies it radiates cylindrical waves.
When at a distance very close to the line array a flat response is maintained, this will dis-
appear at larger distances because the level drop for spherical waves is proportional to −6
dB per doubling the distance whereas for cylindrical waves the level drop is only −3 dB.
Moreover, even more complicated, the cylindrical wave front disappears for any frequency
at a certain distance that depends on the relationship between wavelength and line array
length. In [15] the authors show the frequency-​dependent behaviour, which is reproduced
in Figure 3.32. The calculation was carried out for a straight line array of 5.4 m height, with
equally distributed source strength.
Figure 3.32 reveals the problem that the reach for low frequencies is rather short whereas
high frequencies are much less attenuated. This leads to difficulties concerning the equal-
ization of such loudspeaker arrays. To solve this problem, most line array loudspeakers are
not straight but have a slight or pronounced curvature. This leads to a more homogeneous
sound dispersion. The distance-​dependent level drop with curved arrays gets closer to the
−6 dB per distance doubling (because of the curved loudspeaker front) but the start of the
1/​r law lies behind the loudspeaker in the centre of the curvature. Therefore, the level drop
106

106 Gottfried K. Behler

Two-​
Figure 3.33  dimensional loudspeaker array using individual signal-​ processing and power-​
amplifying for each driver. The software allows different types of directional pattern and
sound field applications.

in front of the loudspeaker shows a more even declining characteristic. The next generation
of large line arrays will have digital signal-​ processing (comparable to the line arrays
described in section 3.3.5.1) to supply each element with its individually corrected, delayed,
equalized and band-​limited signal.

3.3.5.3 Two-​dimensional arrays


Much more complex directivities are possible if the placement of transducers is expanded
to a two-​dimensional array. Steering of the lobe towards any desired direction as well as the
creation of complex sound fields is possible, which may allow the addressing of different
listeners in front of the array with individual sound or information. Currently very few
systems are available on the market. However, further development of prices for high-​
performance DSP hardware and the constantly increasing computing power may make this
technology part of the sound engineer’s ‘standard tool set’ in the foreseeable future.
First prototypes have been presented at relevant trade fairs for the last couple of
years. Figure 3.33 shows a system from the company Holoplot, which enables a variety of
applications [17].

References
1. EC 60268-​5 Sound System Equipment -​Part 5: Loudspeakers, 2010.
2. W. Woszczyk and G. Soulodre. The audibility of spectral precedence, in Audio Engineering
Society Convention 93, 1992.
107

Loudspeakers 107
3. G.J. Krauss. On the audibility of group delay distortion at low frequencies, in 88th Convention
of the Audio Engineering Society, Montreux, 1990.
4. S. Müller and P. Massarani. Transfer-​function measurement with sweeps, J. Audio Eng. Soc.
49(6), pp. 443–​471, 2001.
5. DIN EN ISO 3741 -​Determination of sound power levels and sound energy levels of noise
sources using sound pressure -​Precision methods for reverberation test rooms, 2011.
6. Audio Engineering Society, AES2-​2012; AES standard for acoustics -​Methods of measuring
and specifying the performance of loudspeakers for professional applications -​Drive units.
New York: Audio Engineering Society, Inc., 2012.
7. A.G. Webster. Acoustical impedance, and the theory of horns and of the phonograph. Proc Natl
Acad Sci USA, pp. 275–​282, July 1919.
8. P.W. Klipsch. Loud-​Speaker Horn. USA Patent 2537141, 15 June 1951.
9. J.D.B. Keele. What’s so sacred about exponential horns?, in 51st Convention of the Audio
Engineering Society, 1975.
10. H.F. Olson. Acoustical Engineering, 2nd edn. New York: D. van Nostrand, 1960.
11. L.L. Beranek. Acoustics. New York: McGraw-​Hill, 1954.
12. G. de Vries and G. van Beuningen. A digital control unit for loudspeaker arrays, in 96th
Convention of the Audio Engineering Society, Amsterdam, 1994.
13. F. Straube, F. Schultz, M. Makarski, S. Spors and S. Weinzierl. Evaluation strategies for the opti-
mization of line source arrays, in AES 59th International Conference, Montreal, Canada, 2015.
14. C. Heil and M. Urban. Sound fields radiated by multiple sound sources arrays, in 92nd Convention
of the Audio Engineering Society, Vienna, 1992.
15. M. Urban, C. Heil and P. Baumann. Wavefront sculpture technology, in 111th Convention of
the Audio Engineering Society, New York, 2001.
16. S.P. Lipshitz and J. Vanderkoy. The acoustic radiation if line sources of finite length, in Preprint
2417, 81st Convention of the Audio Engineering Society, Los Angeles, 1986.
17. www.holop​lot.com.
108

4 Microphones
Gottfried K. Behler

This chapter gives a general overview on the topic of microphones and their applications.
There are two main types of microphones commonly used, the condenser and the dynamic
microphone. In the first section the principle of operation is discussed; in the second section
the characteristics that are important for the application, which are primarily determined
by the directionality characteristics and the intended use, are discussed. Please refer to the
additional literature [1] for more in-​depth information.

4.1 The different transducer types for microphones


A microphone is a device that converts the acoustical energy of a sound field into electrical
energy. The device can be perceptive of sound pressure or the sound pressure gradient,
which is proportional to the sound velocity (the combination of both leads to the sound
intensity). There are four different transducers that can be used as reciprocal transducers for
microphones (as well as for loudspeakers):

Capacitive transducer
Piezoelectric transducer
Dynamic transducer
Magnetic transducer

All of the aforementioned transducers facilitate the transformation of mechanical


vibrations into electrical oscillations (voltage or current), whereby they follow fundamen-
tally different principles. For the first two, the change of the electric field (in a capacitor
or a piezo ceramic) is used, for the last two, the induction of a current in a conductor by
the movement of a wire in the magnetic field (dynamic transducer) or the change of the
magnetic flux in a coil (magnetic transducer). We will consider the differences relevant
for the application of the different transducers. Since primarily two of the four transducers
mentioned here –​the capacitive and the dynamic transducer –​are of practical importance,
this chapter will be limited to these two types of transducers.

4.1.1 The condenser microphone


In the condenser microphone, the oscillation of a very thin, metallized or metallic foil
in front of a fixed counter-​electrode is used to convert sound pressure into an electrical
voltage. This method makes use of the fact that the capacitance of a capacitor (with a
fixed area A of the capacitor plates) is inversely proportional to the distance d between the

DOI: 10.4324/9781003220268-4
109

Microphones 109

Figure 4.1 Basic design of a condenser microphone. To the left, a sectional drawing of a classic meas-
uring microphone is displayed; to the right, the relationship between the components
involved in its construction (diaphragm mass, air volume behind the diaphragm as a
spring, and viscous friction of the air between the diaphragm and the back electrode) is
shown. ©B&K

plates. This capacitor with the static capacitance C0 is now charged with a fixed voltage U0
in order to produce a defined charge Q0. It can now be assumed that this charge remains
unchanged during operation of the microphone, since it is brought to the capacitor via a
very high-​impedance resistor. In the currently widely used electret condenser microphone,
this charge is built into the capacitor as a permanently electrically charged polymer plastic
material and therefore does not have to be fed by an external voltage source.
To prevent the resting position of the diaphragm of a condenser microphone from being
modified by the static external atmospheric air pressure, the volume behind the diaphragm
has a defined ‘leak’ (capillary tube), which allows the air pressure in the volume to equalize
with the ambient air pressure. This mechanically determines the lower cut-​off frequency fu
of the microphone. The upper cut-​off frequency is determined by the resonance frequency
created by the mass of the membrane and the compliance of the air cavity behind the
membrane. Above this resonance frequency, the mechanical impedance has a mass-​spring
character and thus the excursion of the diaphragm declines by 12 dB per octave (filter of
second order).
Considering the relationship between the constant charge Q0 and the voltage applied
across the capacitor, the following formula can be found:

Q0 Q0 ⋅ ( d 0 + d ~ )
U = U0 + U~ = = V (4.1)
C ε0 ⋅ A
10

110 Gottfried K. Behler


The distance d~ assumed in this formula represents the change in the distance between the
plates as a result of the sound pressure acting on the moving diaphragm. It is easy to see
that an increase in pressure moves the membrane towards the back electrode (i.e., ( d0 + d~ )
becomes smaller), while a decrease in pressure increases the distance (i.e., ( d0 + d~ ) becomes
larger). As a result, the voltage (U 0 + U ~ ) at the capacitor directly follows the distance
between the capacitor plates, but in inverse proportion to the external force acting on the
membrane (force increase ⇒ distance reduction ⇒ voltage decrease, and vice versa). The
temporal course of the signal voltage (U ~ ) superimposed on the rest potential (U 0 ) there-
fore corresponds perfectly to the temporal course of the sound pressure, but with a negative
sign. The convention that a positive sound pressure increase at the +​-​pin of a balanced
microphone output should produce a positive voltage increase is achieved by the circuit
technology in the microphone. Some measurement microphones which have unbalanced
outputs and only output the AC voltage at the condenser via an impedance converter actu-
ally show an inversion of the sound pressure-time curve in the voltage curve at the output.
For a linear, distortion-​free conversion of sound into a voltage proportional to it, the
movement of the diaphragm (the locus) must follow the sound pressure on the diaphragm
precisely (and thus the force acting on the diaphragm surface). This is achieved by making
the mechanical impedance of the diaphragm act as a spring within the frequency range that
the microphone is designed to cover. To achieve this, the resonant frequency of the oscil-
lating system formed by the mass of the membrane and the spring created by the defined
air volume behind the membrane must be tuned sufficiently high (typically between
15 and 25 kHz) so that the membrane appears to be spring-​loaded below the resonance
frequency in first approximation. At the resonance frequency, the frequency response can
then be tailored by appropriate selection of the mechanical friction or damping (primarily
the viscous loss in the air between the membrane and perforated back electrode). With low
damping, the microphone will show an overshoot (gain in sensitivity) in the vicinity of the
resonance frequency, whereas with high damping the sensitivity attenuates. This mech-
anism is used for the tonal tuning of studio microphones or for correction in measuring
microphones, depending on the application.
The linear relationship between the force driving the diaphragm (sound pressure × dia-
phragm area) and the resulting output voltage is not achieved by any other transducer
principle in comparable quality. Condenser microphones are therefore ideally suited for
converting sound pressure into an electrically proportional quantity.
In addition to the described operating principle with a DC-​charged capacitor, known as
an LF (low frequency) circuit, an HF (high frequency) circuit is also possible for condenser
microphones. In this technique, the microphone’s capacitor is part of an oscillator circuit,
so that the frequency of the oscillator (in the range of MHz) is defined by its capacitance.
If the diaphragm moves, the capacitance changes (as already described) and the oscillator’s
frequency shifts, which finally leads to a frequency modulation of the oscillator’s frequency
(in the MHz range) according to the audio-​signal. In a subsequent FM (frequency modu-
lation) demodulator circuit, the audio-​signal is reconstructed and after low-​pass filtering
of the signal, the pure audio signal can be obtained. Microphones that work according to
this principle are much less sensitive to moisture and can also be used outdoors in rainy
conditions. Due to the low impedance of the transducer capacitance (at the high frequency
of the oscillator) compared to the very high impedance of the capacitance in the audio fre-
quency range, a leakage resistance parallel to the capacitor, formed, for example, by mois-
ture, has hardly any effect, whereas in the low-​frequency circuit this can lead to a loss of
sensitivity or even failure of the microphone.
1

Microphones 111
It is obvious that condenser microphones cannot be operated as purely passive
transducers. They always need an electronic circuit that has to render different tasks. Even
the generation of the capsule bias voltage in classic LF condenser microphones requires a
special circuit: e.g., a DC-​DC converter that generates the required capsule bias voltage of
60–​100 volts from the low phantom voltage of 12–​48 volts. In particular, an impedance
converter with very high input resistance (in the range of 109 ohms) and low output imped-
ance (typically < 200 ohms) is needed to drive a line between the microphone output and
the input on the mixing console or similar. The increasing miniaturization of electronics
makes it possible to accommodate these circuits even in very small microphones. For elec-
tret microphones, which do not require a bias voltage, a field-​effect transistor as impedance
converter and a battery in the microphone as voltage source are sufficient. Microphones
that operate in the high-​frequency range using the frequency modulation method are not
inherently conceivable without appropriate electronics. The required power to feed the
electronics normally comes from the mixer via the cable if there is no internal battery.
There are several standardized supply variants (refer to section 4.2.1.2).

4.1.1.1 The dynamic microphone


Even if the condenser microphone appears superior to all other microphone types in terms
of linearity and impulse response, there are still reasons for using dynamic microphones.
First and foremost, the robustness of these microphones should be mentioned. They work
well in rough live-​stage environments, in the rain on the football field or during outdoor
reporting, when the use of a condenser microphone, for example, could lead to difficulties.
The transducer principle for dynamic microphones is slightly more complex than for
condenser microphones. As with a dynamic loudspeaker, a cylindrical coil is located in a
magnetic field and is moved by the sound pressure acting on the membrane. Here too, the
force on the diaphragm meets the mechanical impedance of the vibrating system –​formed
by the diaphragm mass, the suspension spring and the friction elements present for damping.
The output voltage of a coil moving in a magnetic field follows the law of induction:
U = − Bl × v V (4.2)

Here Bl is the transducer constant defined by the length of the voice coil wire and the mag-
netic field strength. In the equation, v is the velocity of the coil moving in the magnetic
field. It can therefore be stated that for a proportional conversion of the sound pressure
present in front of the diaphragm, the velocity of the diaphragm (and not the excursion as
with the condenser microphone) must follow the force acting on the diaphragm. Therefore,
the impedance of the mechanical system (formed by membrane mass, compliance of the
suspension spring and friction elements) must be friction-​like. In this case, the relationship
between force and velocity is linear and independent of frequency. The corresponding for-
mula then provides the relationship between force (sound pressure multiplied by the mem-
brane area) and mechanical frictional resistance:

F
v= V (4.3)
rmech

This correlation means that dynamic microphones must be operated as highly damped res-
onance systems. The resonance frequency is somewhere in the centre of the frequency range
12

112 Gottfried K. Behler

Figure 4.2 Section through the capsule structure of the legendary Sennheiser MD441U. Note the
multi-​resonance construction with several chambers and ports. Furthermore, a cone-​like
sound guide is placed in front of the diaphragm, which serves to optimize the directional
characteristics.

Figure 4.3 Basic construction of a ribbon microphone (the left figure shows the Beyer M130).
The magnetic flux of the high-​energy permanent magnets is guided around the ribbon
by the ring-​shaped yoke wires made of highly permeable, soft magnetic material. The
internal magnetic field should be as homogeneous and tangential as possible through
the ribbon.
13

Microphones 113
that a dynamic microphone is supposed to cover. Naturally, the frequency response of such
a construction can only be flat to a limited extent as there is a trade-​off between high sen-
sitivity (system weakly damped) and a resonance curve that is as flat as possible (system
strongly damped). In the first case the velocity at resonance is higher due to weak damping
and therefore the frequency response is boosted around the resonance; in the second case
the strong damping in the whole frequency range leads to a low but relatively frequency-​
independent (flat) output voltage. To increase the bandwidth and sensitivity of dynamic
microphones, they are designed as multi-​resonance systems. These are realized by coupling
resonance chambers in front of and behind the membrane. In this way microphones can be
designed which are almost equal to condenser microphones in terms of frequency range and
linearity. One example of a microphone that still meets the highest demands in this respect
is the Sennheiser MD 441 (see Figure 4.2):
A special dynamic microphone is the ribbon microphone (see Figure 4.3), in which no
cylindrical coil oscillates in the magnetic field, but a metal ribbon, usually made of very
thin aluminium, which is located inside a magnetic field transversely and tangentially to
the ribbon. Both surfaces of the ribbon are exposed to the sound field, so that the movement
of the ribbon is caused by the difference in force between the front and back of the ribbon.
With appropriate damping, the velocity of the ribbon follows the sound pressure gradient,
resulting in directivity with a figure of eight. If one side of the ribbon is covered by a housing
the microphone can be used as a pressure sensor as well.
The sensitivity of ribbon microphones is very low for two reasons: first, the length of the
conductor in the magnetic field is short, and second, due to the width of the ribbon the gap
in the magnetic circuit is relatively wide, resulting in a weak magnetic flux B. Therefore,
ribbon microphones very often use an output transformer to achieve a higher output voltage
and a higher output impedance.

4.1.2 Differentiation of the application


Even though the frequency range of dynamic microphones may be just as good as that
of condenser microphones, they are at a disadvantage when it comes to the correct
reproduction of percussive or impulsive sounds (piano, drums, guitar, etc.). Group delay
distortions caused by the design principle (multi-​resonating system) can cause the attacks
of the impulses to blur and thus spoil the sonic character of these instruments. For the
same reason, the decay times of dynamic microphones are longer than those of condenser
microphones; however, all musical instruments have a considerably longer decay time,
so that this effect is hardly audible. This disadvantage is perceived less disturbingly with
instruments that do not have pronounced transients (typically wind instruments, human
voice). Naturally, dynamic microphones like to find their place on stage in front of loud
instruments like trumpets and trombones, as well as with singers. Here the high-​level cap-
ability of dynamic microphones and their immunity to mechanical shocks is an advan-
tage. They are also relatively resistant to moisture, which can easily cause LF condenser
microphones to malfunction.
In contrast to the condenser microphone, which cannot be used as a reciprocal trans-
ducer due to the required electronics, dynamic microphones are purely passive and
can easily be used as a sound source, even if only for very low volumes. This prop-
erty can be used, for example, with intercoms, in order to eliminate the need for a
loudspeaker.
14

114 Gottfried K. Behler

4.2 Microphones and application

4.2.1 Characterization of microphone parameters


More than one hundred manufacturers worldwide build microphones. It is therefore helpful
to define the properties in standards to allow comparisons and interchangeability between
microphones. In IEC 60268-​4 [2] the procedures and measurement methods to characterize
microphones are listed to help correctly publish all technical parameters of interest. The
manufacturer provides the technical characterizations in the form of a data sheet or tech-
nical description. The following information is crucial:

Comprehensive set of technical data for microphones:

The transducer principle


• Real condenser microphone
• Condenser electret microphone
• Dynamic microphone
• (magnetic microphone)
• (piezoelectric microphone)
Type of microphone
• Pressure, pressure gradient, combination of both, sound velocity sensor
Directional characteristic of the microphone
• Omnidirectional, unidirectional, bidirectional, (e.g., sphere, cardioid,
supercardioid, hypercardioid, figure of eight, hemisphere or spatially rotated
half cardioid)
• Related to the directivity the sound incidence point needs to be marked
(‘front’, ‘back’)
• To enable simple estimation of the ‘gain’ in a PA system the directivity index
of a microphone shall be given. D = 20 lg ( M0 / Mdiff ) with M0 the free field
sensitivity and Mdiff the diffuse field sensitivity
Intended use(s) of the microphone
• Preferred use such as far-​field, near-​field or close-​up
• Related to this different measurement procedures apply
Interconnection of the microphone
• Asymmetric or symmetric output, electronically or transformer balanced
• Type of connector (most microphones today use the three-​pin XLR type.
Recommendations for labelling the connections and settings are given in
IEC 60268-​1. Connectors and cables must comply with IEC 60268-​11 or IEC
60268-​12)
• Typical output impedance of the microphone and the recommended input
impedance of the microphone amplifier
15

Microphones 115
Requirement of power supply
• Dynamic microphones usually do not need an external power supply
• Capacitive microphones need a power supply for the impedance converter
and for the bias voltage of the condenser (not in the case of an electret mic).
The most common power supply is the so-​called ‘phantom-​power’, which
feeds the microphone with 48 V via the symmetric signal cable. Some modern
microphones are able to operate on a wider range, i.e., from 12 V to 48 V (so-​
called universal phantom supply). The supply must comply with IEC 61938 [3]
• If a phantom power is needed, the required supply current shall be stated
Microphone sensitivity
• The output voltage relative to the input sound pressure. Most commonly used,
the output voltage in mV relative to 1 Pa
• The sensitivity needs to respect the application of the mic. The conditions for
the measurement are either free field or diffuse field for typical microphones
in PA systems
• Transfer function (frequency response curve)
• The frequency-​dependent sensitivity of the microphone. Unless otherwise
stated, measurements are made in free field conditions, with the frequency
response referring to plane waves whose wave front propagates in the direction
of the reference axis of the microphone
• If the microphone is intended for near field or other special use, the frequency
response shown must relate to that application
• The frequency curve shows the logarithmic sensitivity (in dB) over frequency
in compliance with IEC 60268-​1
• Upper and lower cut-​off frequencies (this is typical misleading information
used to look better than the competitor. In reality, it says almost nothing
and the important information is already contained in the frequency
response curve)
Distortion figures and max. sound pressure level
• Total harmonic distortion (THD) for a given sound pressure level
• The maximum sound pressure that the microphone can convert linearly into
an output voltage for a given distortion limit (either 0.5% or 1% THD)
Equivalent input noise level
• An assumed sound pressure level that produces the same weighted output
voltage as that produced by the microphone’s self-​noise in the absence of
an external sound field. This quantity provides a good estimate of the lowest
signal the microphone can pick up
Environmental conditions for ±2 dB deviation from stated parameters
• Temperature range
• Range of static air pressure
• Range of relative humidity
16

116 Gottfried K. Behler


Some more parameters can be of interest, such as

• The sensitivity to structure-​borne noise. Relevant information is how sen-


sitive the microphone is to handling or vibrations (either handheld or on
a stand)
• The sensitivity to wind (equivalent sound pressure level for wind)
• Electro-​magnetic sensitivity (EMV)
In 4.1 we introduced the different transducer principles relevant for sound reinforcement
applications. In the following paragraphs, we will discuss some of the important parameters
listed above.

4.2.1.1 Type of microphone and related directivity pattern


Both the type of microphone and the directivity pattern are related. In principle, microphones
with a spherical directional characteristic are pressure receivers (pressure microphones),
whereas microphones with a pronounced directional characteristic are pressure gradient
(figure-​of-​eight) or combinations of pressure-​gradient and pressure receivers. Directivity
can also be achieved with so-​called interference receivers.

Pattern The shape of the directivity


Directivity function The mathematical formula to create the angle-​
dependent directivity. The two factors may be used
to adjust the relative level of two microphones (A
for the sphere and B for the figure of eight) so as
to create the mentioned pattern by adding the two
output signals, assuming that they are identically
sensitive for the main axis
Beamwidth The angle of coverage for −3, −6 or −9 dB attenuation
Directivity index (DI: dB) The gain for direct incident sound relative to the
diffuse sound sensitivity; the DI can be used for
the calculation of the stability in PA systems (see
4.2.1.4)
Back-​attenuation and side The attenuation for sound arriving from 180° or 90°
attenuation relative to the frontal direction

To characterize microphones with respect to their directivity, Table 4.1 lists the basic
categories and shows how the combination of pressure receiver (sphere) and pressure gra-
dient receiver (figure of eight) leads to desired directivities for practical application. Since
idealized directional functions do not exist in the real world, it makes sense to present some
special cases for the microphone types discussed here.

4.2.1.1.1 PRESSURE RECEIVER (PRESSURE MICROPHONE)

The membrane of a pressure receiver moves according to the ambient pressure change.
Since there is no directional dependence for the pressure in the sound field (it is a scalar
field), the physical orientation of a very small microphone is irrelevant. This leads to the
newgenrtpdf
17
Table 4.1 Typical parameters for microphones with different directivity patterns

Type/​pattern Sphere Wide cardioid Cardioid Super-​cardioid Hyper-​cardioid Figure of eight Interference

Pattern

Dir. function A =​1 A =​0.63 A =​0.5 A =​0.366 A =​0.25 A =​0 No function


g(t) =​ A+​B·cos(ϑ) B =​0 B =​0.37 B =​0.5 B =​0.634 B =​0.75 B =​1 defined
Beamwidth (−3 dB) 360° 90° 65° 57° 52° 45° Length
DI (dB) 0 2.5 4.8 5.7 6 4.8 -​
Back-​attenuation 0 −8 −∞ −11.7 −6 0 -​
180° (dB)
Side-​attenuation 90° 0 −3 −6 −8.6 −12 −∞ -​
(dB)

Microphones 117
newgenrtpdf
18
118
Gottfried K. Behler
Figure 4.4 Comparison of the directivity of two pressure microphones: left side a ¼′′ capsule, right side a 1′′ capsule. The polar plots
show a clear directionality for the large membrane at high frequencies whereas the small membrane shows almost perfect
omnidirectional sensitivity. The frequency response curve for the ¼′′ microphone is flat for free field sound incidence, the
one for the 1′′ microphone shows a distinct presence boost in free field whereas for diffuse field the response is rather flat
until a roll off above 10 kHz (DPA).
19

Microphones 119
non-​directional sensitivity of omnidirectional microphones. However, for high frequencies
with wavelengths similar to or smaller than the circumference of the microphone capsule,
a directional effect occurs caused by diffraction. For large-​diaphragm microphones, even
if they are pressure receivers, this results in a distinct directional characteristic at high
frequencies. Figure 4.4 shows the comparison of two omnidirectional microphones with
different membrane diameters.

4.2.1.1.2 PRESSURE GRADIENT AND COMBINED WITH PRESSURE

The pressure gradient receiver is a microphone in which the diaphragm moves due to the
difference of the forces on the two sides of the diaphragm. If the membrane is sufficiently
small in relation to the wavelength, this force corresponds to the so-​called pressure gradient,
which is proportional to the sound velocity (the second, vectorial sound field property). This
results in a sensitivity that depends on the orientation of the membrane in the sound field. If
the sound wave propagates tangentially across the membrane, the resulting force difference
becomes zero and no membrane movement and therefore no output signal is created. When
turning the membrane with one side towards the incident wave the membrane movement
and therefore output signal increases until it reaches a maximum for perpendicular inci-
dence of the sound wave. The directivity of such a microphone looks like a figure of eight
(see Table 4.1). Figure 4.3 shows a typical microphone with a figure of eight directivity.
Besides the ribbon microphone –​a dynamic microphone –​condenser microphones with
figure of eight directivity are common.
Many modern condenser microphones do not only provide a figure of eight pattern but
allow a stepwise selection of directivities between spherical and figure of eight. This fea-
ture is achieved by combining two membranes placed in front of either one common or
two separated back electrodes. Typical constructions of such microphones are shown in
Figure 4.5. The wiring shows for both microphones three connectors, one for the common
back electrode and two for the membranes. Whereas one membrane (the one that provides
in-​phase polarity of the output signal relative to the sound) is in general set to fixed bias
voltage (typically between 60 and 120 V), the second membrane can be changed from
positive to negative bias voltage either continuously or in defined steps in order to achieve
dedicated directivities when adding the signals of both membranes. Each side of such a cap-
sule construction looks the same therefore the side providing in-​phase output needs to be
clearly marked.

4.2.1.1.3 HIGHER-​ORDER DIRECTIVITIES: ARRAY MICROPHONES AND SHOTGUN MICROPHONES

If a higher directionality is required, either for the pickup of faraway sources or interviews
in noisy environments, the aforementioned directivities may not be directional enough.
The DI listed in Table 4.1 clearly shows that there is a limit for the frontal to random gain
of about 6 dB. To improve this, microphones with higher-​order pattern are needed. To
create higher-​order directivities requires microphone capsules with more than two active
elements. Microphones for the recording of 3D sound are typical representatives of this
type. Recording for first-​order ambisonics (compare section 8.2) requires a setup with four
microphones of cardioid type (see left picture in Figure 4.6). For higher-​order ambisonics
(HOA) microphones with up to 32 capsules arranged on a spherical solid body are commer-
cially available (see right picture in Figure 4.6) and setups with up to 256 capsules can be
found in research facilities.
120

120 Gottfried K. Behler

Basic construction of double-​


Figure 4.5  membrane microphones (left side: AKG K4;
middle: Neumann M49; right side: Neumann SM 2). The K4 is set to figure of eight
only, whereas the M 49’s directivity can be changed remotely. The SM 2 forms a stereo-​
microphone for coincidence stereophony (either XY or MS). Both capsules are remotely
adjustable for the pattern (between sphere and figure of eight). By choice of the right pattern
and rotation of the upper capsule, the desired width for the stereo-​panorama can be set.

Two microphones with higher-​


Figure 4.6  order directivities. Left: Sennheiser Ambeo; right:
Eigenmike. The capsules are individually connected to allow any adjustment to the
directivity by external signal-​processing. Whereas the Eigenmike capsules are pressure
type microphones, the Ambeo capsules are cardioid microphones.

In typical sound reinforcement applications, these microphones are not very relevant, but
for applications such as teleconferences, discussion groups, lecterns, etc. array microphones
with a larger number of capsules in various arrangements can be beneficial. They all work
on the same principle, the phased array technology. A sound wave coming from a certain
direction arrives at different times at the microphone capsules. By applying appropriate
delay times (to compensate for the different arrival times), the signals are added up in phase,
leading to an amplification of the signal by +​6 dB each time the number of capsules doubles,
but only for the particular direction from which the wave comes. For any other direction,
the signals from the capsules add up randomly, and the amplification factor is only +​3 dB per
newgenrtpdf
12
Microphones 121
Figure 4.7 Line-​array microphone with pronounced vertical (left panel) and wide horizontal (cardioid, right panel) directivity (Microtech
Gefell KEM 975). The microphone is built with eight differently spaced cardioid capsules set into a vertical line.
12

122 Gottfried K. Behler


doubling of the number of capsules. Consequently, the direction in which the microphone
is facing receives a +​3 dB improvement in S/​N ratio per doubling of the number of capsules.
The more capsules are used the narrower the beam can become. The beam in modern con-
ferencing microphones automatically follows the target speaker by continuous adaptation
of the delay times (compare for instance the Shure MXA910).
One microphone in particular (see Figure 4.7) has gained a firm place in applications
where acoustical feedback is a risk. The idea for this microphone is easily described: the
position of the speaker in front of a lectern is usually fixed in height (especially if the lec-
tern is adjustable in height). However, the speaker can move slightly or turn his head hori-
zontally. The microphone’s directional characteristic must therefore be narrow vertically
and wide horizontally to ensure that the speaker is picked up at a constant level while still
providing a high level of feedback immunity. Possible applications for such a type of micro-
phone include reinforcing the voice of a speaker in front of a lectern, recording a discussion
plenary or a choir, etc. Figure 4.8 shows the application in a parliament hall.
Another way to increase directivity is applied in so-​called ‘shot-​gun’ microphones. They
have a very high directionality which is achieved by interference. For this purpose, a tube
with a lateral slit is placed in front of the microphone capsule so that only waves that run
parallel to the tube arrive at the capsule with full amplitude. Sound waves that arrive under
an angle from the side at the microphone cause pressure oscillations in the tube that are not
in phase with the propagating wave. Therefore, they are attenuated when arriving at the
capsule. This effect is wavelength-​dependent and since the tube cannot be of any length the
effect is greater at higher frequencies. This leads to an angle-​dependent frequency response
and some colouration occurs for signals not arriving from the frontal direction, which
demands very good aiming when recording sound.
Typical applications for shotgun microphones are the long-​distance pickup of speech
and sound. Very often in outdoors use, the ambient noise requires a very pronounced
directivity even with microphones quite close to the source. In this case, shorter versions
of shotgun microphones are used, often in combination with handheld booms. Very often,
the microphones must be invisible and therefore cannot be placed close to the source.
A common application is the pickup of sound from the stage in a theatre. Another appli-
cation is recording the sound of a soccer game: shotgun microphones are placed alongside
the field to capture the players’ and referee’s voices and the sound of the field of play. In
this case, the microphone has to accept moisture and rain and, therefore HF condenser
microphones operating on the frequency modulation principle with a high-​frequency cir-
cuit are typically used.

4.2.1.1.4 SPECIAL MICROPHONES

Due to the improvement with micro-​ electronics and micro-​


mems design the size of
microphones today has been reduced very much and applications that have not been
thought of in the past are widely used today, perhaps mainly the common practice of
placing microphones in close proximity to the sound source. It is mostly agreed to use a
neck-​mounted microphone for a speaker instead of a lectern-​mounted microphone. The
advantage is obvious: the distance between speaker and microphone is fixed; even if the
speaker turns his head or moves, the sound will not change much. The issue with untrained
talkers who often tend to neglect the need to place a microphone close to the mouth to
prevent feedback is not a problem any more. These microphones are so small that they are
123

Microphones 123

Figure 4.8 The KEM 975 in use at the lectern at the German Bundestag. A diversity switch ensures
that only one of the two microphones (the one with higher level) is in use at a time.
(Courtesy Microtech Gefell.)

Typical shotgun microphone (Sennheiser MKH 8070) and the frequency-​


Figure 4.9  dependent
directivity.

almost invisible, and even in TV shows it is a solution often used today (refer to left picture
in Figure 4.10).
A tried and tested alternative is the so-​called Lavalier microphone. Whereas in former
times the microphone was carried by means of a neck cord in front of the talker’s chest,
today’s application is a clip that fixes the microphone to the garment of the talker (refer to
right picture in Figure 4.10). The benefit in comparison to the neck cord is clear: a more
stable placement in combination with a decoupling from the breastbone that creates a peak
in the frequency range at 700 Hz.
Another application is the pickup of musical instruments with close-​up microphones.
Many manufacturers provide a wide range of clamps and clips for different instruments to
meet all possible applications (Figure 4.11). The microphone for this purpose needs to be
small, lightweight and capable of withstanding very high sound pressure. A trumpet easily
124

124 Gottfried K. Behler

Head-​
Figure 4.10  mounted microphone (left); Lavalier microphone (right) (courtesy DPA).
The placement of these microphones requires some EQ to provide sound without
colouration.

Figure 4.11 Direct recording of the violin sound with a small condenser microphone, which is
mounted on the frame of the violin (courtesy DPA).

reaches more than 150 dB of sound pressure right in front of its bell. The advantage again
is obvious: the sound operator can adjust the level of each instrument independently from
other instruments since the close placement of the microphone reduces the crosstalk to
other instruments.

4.2.1.2 Interconnecting microphones and preamps


Professional microphones generally provide their output signal balanced on a three-​pin
XLR connector. The XLR connector on the microphone is always male (the pins point
in the direction of signal transmission), and the pin assignment is 1 for ground, 2 for +​
-​signal and 3 for −-​signal. While dynamic microphones achieve the symmetry of their
output simply by connecting the voice coil ends directly to the two contacts, condenser
microphones that produce unbalanced signals in the capsule need an internal balancing
transformer or an electronic balancing circuit. Modern condenser microphones have elec-
tronically balanced output stages, which must be supplied with power in addition to the
required polarization voltage for the capsule. In order to avoid the need for additional wires
in the cable and to avoid having to design the inputs on mixing consoles differently for
dynamic and condenser microphones, the supply is fed to the microphone via the signal
lines. The most common power supply is the so-​called phantom power [3], where a DC
voltage of +​12 to +​48 V is applied equally to both signal lines of the balanced signal cable
via two paired resistors (refer to Figure 4.12). This means that phantom power does not
125

Microphones 125

Figure 4.12 Microphone supply according to DIN EN IEC 61938 [3]. (a) Phantom power supply
U =​48 V, R1 =​ R2 =​6800 Ω, Imax =​10 mA; U =​24 V, R1 =​ R2 =​1200 Ω, Imax =​10 mA;
U =​12 V, R1 =​ R2 =​680 Ω, Imax =​15 mA. (b) A-​B power supply: U =​12 V, R1 =​ R2 =​180
Ω, Imax =​15 mA.

have to be switched off when a dynamic microphone is to be connected, as no current flows


through the voice coil. However, connecting non-​symmetrical sources may cause trouble,
since the negative signal input might be short-​circuited and the internal resistor might be
overloaded and damaged.
The A-​B power supply also uses the signal wires for the supply current, but the two lines
are connected to the plus and minus voltage of the supply. Therefore, the use of the input
when connecting to dynamic microphones can be dangerous as current flows through the
voice coil and forces the membrane out of its rest position. A-​B powering of microphones
is out of use and nowadays condenser microphones in general are phantom-​powered. An
exception is the (again) increasingly popular condenser microphones with tube amplifiers,
which require their own power supplies due to their specific power requirements.

4.2.1.3 Microphone sensitivity, signal to-​noise ratio, maximum SPL


Microphone sensitivity is the one parameter that in combination with the input stage of
the mixer/​recording device very much defines the noise floor and achievable dynamic range.
Especially with microphones having a low sensitivity (ribbon and dynamic microphones)
126

126 Gottfried K. Behler


the input stage needs to be of high quality with respect to self-​noise and impedance
matching. Most condenser microphones deliver a rather high output voltage and therefore
the sensitivity of the microphone preamp does not need to be very high.
The sensitivity is defined by the relative output voltage for a defined sound pressure. In
general, the reference sound pressure is 1 Pa, which relates to a sound pressure level of 94
dB. The sensitivity then can be calculated:

u V
ME = (4.4)
p Pa

M  1V
GE = 20 log  E  using M0 = dB (4.5)
 0
M 1Pa

Typical sensitivities for microphones are:

mV
• Studio condenser microphones: ME ~10 to 100 or GE ~−40 to − 20 dB
Pa
mV
• Lavalier-type condenser microphones: ME ~2 to 20 or GE ~−54 to − 34 dB
Pa
mV
• Dynamic microphones: ME ~1 to 20 or GE ~−60 to − 34 dB
Pa
mV
• Ribbon microphones: ME ~ 0.1 to 2 or GE ~−80 to − 54 dB
Pa

In addition to this definition, the reference sound field needs to be specified. The following
definitions include a frequency-​dependent result for the sensitivity:

• The pressure field sensitivity: the ratio between the effective output voltage and the
effective sound pressure at the microphone created as a pressure field without propaga-
tion. The measurement conditions are created with the microphone in a chamber with
dimensions smaller than 1/​2 of a wavelength.
• The free field sensitivity: the ratio between the effective output voltage and the effective
sound pressure of a propagating planar sound wave in free field. The front of the micro-
phone points perpendicular to the incident wave front. The measurement takes into
account the diffraction effects due to the microphone size and body.
• The diffuse field sensitivity: the ratio between the effective output voltage and the
effective sound pressure in a diffuse sound field. A diffuse sound field creates a sort of
average for all possible incidence angles of sound waves with equal likeliness.

For all microphones, the different measurement conditions result in identical results for
the sensitivity at very low frequencies (i.e. < 250 Hz), because effects like diffraction or
directivity can be ignored.
Another important specification is the signal-to-​noise (S/​N) ratio, which defines the
self-​noise created by the microphone. There are different definitions used in datasheets, the
most common being:
127

Microphones 127
• Signal to-​noise ratio, CCIR (re. 94 dB SPL). It states the level difference between the
noise floor and a signal of 94 dB. The frequencies of the noise are linearly weighted.
• Signal to-​noise ratio, A-​weighted (re. 94 dB SPL). It states the level difference between
the noise floor and a signal of 94 dB. The frequencies of the noise are A-​weighted.
• Equivalent input noise level (dB), either A-​weighted or CCIR (linear). The equivalent
noise level is calculated by subtracting the S/​N ratio from 94 dB (level for 1 Pa). It is
most informative, since it directly tells what signal level would create the same output
level as the internal noise of the microphone. This information helps in deciding which
microphone is suitable for the task at hand.

Finally, the maximum output level of a microphone is important. Here, there are depend-
encies between the sensitivity and the available dynamic range. The more sensitive a
microphone is, the less capable it can be for high sound pressure. In the microphone
specifications the maximum output voltage is rated with respect to a given distortion
number. Most common is a THD (eq. (3.4)) of 0.5%. The calculation for the maximum
output voltage is

 maxSPL − 94 
maxOutputVoltage = sensitivity  10 20  V (4.6)
 

This formula ignores the potential output voltage limit. It simply calculates the required
RMS value for the given SPL limit stated by the manufacturer. The internal electronic
circuit determines whether this output voltage can be provided. It allows assessment
of whether the input available at the mixing console or microphone amplifier can pro-
cess the output level of the microphone without distortion even at the highest acoustic
level. Attenuation switches (typ. −10 dB) are available with many microphones and
can prevent the risk of overloading, but usually with the disadvantage of an increased
noise floor.
If the maximum output voltage of a microphone is known, the maximum possible sound
pressure level can be derived from the sensitivity. If the sensitivity is 50 mV/​PA and the
typical maximum output voltage is +​18 dBm (6.2 V) then the upper limit for the sound
pressure level is 136 dB (SPL). The calculation is as follows:

 output voltage limit 


maxSPL = 94 + 20log   dB (4.7)
 sensitivity

(Sensitivity and output voltage both need to be RMS values.)


Both calculations completely ignore the increasing distortion that rises with higher
sound pressure levels and it is therefore quite essential that in the datasheet the manufac-
turer specifies the maximum values with respect to a given distortion. This very often is a
THD limit of 0.5% or 1%.

4.2.1.4 Directional behaviour of microphones


The dependence of the microphone voltage on the direction of incidence of the exciting
sound is called the directional effect. The following quantities are used for describing this
effect:
128

128 Gottfried K. Behler


• the directional factor Γ(ϑ) as the ratio between the (free-​)field sensitivity MEd for a plane
sound wave arriving under the angle ϑ at the main microphone axis and the value
ascertained in the reference level (incidence angle 0°).

MEd ( ϑ )
Γ (ϑ ) = 1 (4.8)
MEd ( 0 )

• the directional gain D as the 20-​fold common logarithm of the directional factor Γ
• the coverage angle as the angular range within which the directional gain does not drop
by more than 3 dB (or 6 dB or 9 dB respectively) against the reference axis.

The relationship between the sensitivities by reception of a plane wave and those with
diffuse excitation characterizes the suppression of the room-​sound components against the
direct sound of a source. This energy ratio is described by the following parameters:

• the directivity factor QM: if the sensitivity was measured in the direct field as MEd and in
the diffuse field as MEr, the directivity factor is

MEd 2
QM = 1 (4.9)
MEr 2

• the directivity index DI as the 10-​fold common logarithm of the directivity factor.

DI = 10 ⋅ log QM dB (4.10)

While the directivity factor of an ideal omnidirectional microphone is QM = 1, that of an


ideal cardioid microphone is QM = 3 or DI = 4.8 dB ; compare Table 4.1. This means that a
cardioid microphone picks up only 1/​3 of the sound power of a room compared to an omni-
directional microphone at the same distance from the source. This implies for instance that
for an identical sound power reading, the speaking distance for a cardioid microphone may
be 3 (about 1.73) times greater than that of an omnidirectional microphone (compare
analogously eq. (2.5a)).

4.2.1.5 Application-​related microphone characterization


Whereas the underlying technology of a microphone directly defines the available quality
and very much influences the pricing, the application of the microphone and its intended
use might vary greatly. On one hand, a distinction must be made between handheld and
stand-​mounted microphones; on the other hand, the distance to the sound source must
be considered as well as the question whether the microphone is to be used indoors or
outdoors. All these criteria lead to specific designs and should be taken into account when
selecting a microphone.
Some guidelines for the selection of microphones are given below.

4.2.1.5.1 MICROPHONES FOR HANDHELD USE

• They need to be robust against mechanical impacts and structure-​borne sound. This
leads to microphone designs with robust housing and lightweight and small membranes.
129

Microphones 129
Quite often, they are used at a close distance to the mouth, which requires protection
against pop and wind noise as well as plosive sounds. To achieve this, foam covers are
often used. This helps as well against spit and humidity. Though dynamic microphones
in general are more sensitive to structure-​borne sound (due to the larger membrane
mass compared to condenser microphones) they are robust and little sensitive to
humidity. However, there are dedicated designs with condenser microphones for hand-
held use as well.

4.2.1.5.2 MICROPHONES ON STANDS

• Any microphone can be used on a stand, of course; however, a typical microphone that
needs a stand is the opposite of a handheld microphone. In this case, we are talking
about microphones in studios, or on stage, and, in general, indoors. All condenser
microphones with large membranes belong to this group. They are most sensitive to
structure-​borne sound, pop sound and humidity. Many small membrane condenser
microphones are intended for use on stands as well. Mostly they offer little protec-
tion against structure-​borne sound and will transfer grip sounds quite well. All these
microphones are best used indoors and have only little protection against humidity
and wind noise. However, protection against wind noise can be achieved using a wind-
screen (foam or basket).

4.2.1.5.3 THE DISTANCE TO THE SOURCE

• Microphones for close-​up recording of sources require two main features: capability for
high sound pressure level and a correction for the proximity effect. The first is met with
small membrane condenser microphones that cover a range up to 150 dB, the latter
depends on the construction. A pressure microphone does not show any proximity
effect, whereas a cardioid microphone will boost the low frequencies more and more
when approaching the source at short distances. This needs to be compensated either
directly with an appropriate filter built into the microphone or in the signal-​processing
later on.
• Microphones for recording sound from larger distance should have a very good S/​N
ratio. The dynamic range should rather start at a very low level than allowing a very
high maximum sound pressure level. Typical equivalent noise level ratings for good
microphones are in the range 10 to 20 dB(A). The upper limit is not that much of an
issue; even with fortissimo passages in a large symphonic orchestra the peak level at
some 10 m distance will not exceed 125 dB (SPL).

4.2.1.5.4 ENVIRONMENTAL CONSIDERATIONS

• Microphones are sensitive to environmental conditions. Wind, rain, temperature,


vibration, all these conditions will lead to different problems and need to be considered
when choosing the appropriate microphone. The manufacturer of the microphone
should therefore state the limits of the environmental conditions with respect to tem-
perature and humidity range in its data sheet. The decision to use wind screens, pop
screens, weather protective devices etc. has to be made before the performance takes
place.
• In general, small microphones are less sensitive to structure-​borne noise than large
diaphragm microphones. The reason for this is simple: since the mass of the diaphragm
130

130 Gottfried K. Behler


depends on its size, the inertia increases with increasing diaphragm diameter, and the
deflection of the diaphragm reacts more sensitively to movements of the microphone
body. The microphone increasingly becomes a seismometer.

References
1. Eargle, John. The Microphone Book. London: Focal Press, 2012.
2. International Electrotechnical Commission. IEC 60268-​ 4 Sound System Equipment -​
Part 4: Microphones, 2019-​08.
3. International Electrotechnical Commission. IEC 61938 Multimedia systems –​Guide to the
recommended characteristics of analogue interfaces to achieve interoperability, 2018.
4. M. Zollner and E. Zwicker. Elektroakustik, 3rd edn. Berlin: Springer-​Verlag, 1993.
5. W. Reichardt. Grundlagen der Elektroakustik. Leipzig: Akademische Verlagsgesellschaft Geest &
Portig K.G., 1960.
13

5 Design for Sound Reinforcement Systems


Wolfgang Ahnert

5.1 Introduction
This chapter deals with the most important design criteria for specifying a sound system
for the wide range of different applications. Many of the criteria are identical for different
applications but some of them are particular. Two major differentiations are the application
of the sound system either for speech or for music, which call for a different design approach
regarding, for example:

• sound amplification
• support of stage performances
• influence of acoustic impressions
• reproduction of sound events

For these design issues the following criteria have to be considered.

5.2 Sound Pressure Level and Signal to-​Noise Ratio


Sound pressure is measured in pascals and is audible without pain for a human being in the
range 20 μPa up to 20 Pa. Using the following equation

 p 
L = 20 log10   dB ( with p 0 = 20µPa )
 p 0 

the corresponding range of 0–​120 dB is obtained.


Figure 2.1 illustrates the scale of audible sound between the lowest level (the hearing
threshold) and the highest level (the threshold of pain). A typical dynamic range of
around 120 dB is observed. Figure 5.1 illustrates the typical audible level range and the
ranges of music and speech signals. The black curves show the so-​called Fletcher and
Munson contours, i.e., the typical frequency-​dependent range of audible sound. Values
higher than 120 dB are painful and may create health impairments. Over the millions
of years, the sensitivity of the human auditory sense for low frequencies has fortunately
been reduced for low frequencies. Sounds of 100 Hz at 20 dB may be measured but are not
heard by humans.
So, low-​frequency sounds all permanently present in the environment don’t bother us
humans, we simply don’t hear them. A critical consequence of this behaviour is air attenu-
ation, which is minimal at low frequencies. For open-​air installations where the distance

DOI: 10.4324/9781003220268-5
132

132 Wolfgang Ahnert

Figure 5.1 Audible level range for speech and music signals.

between loudspeaker and listener is large, an additional frequency-​dependent propaga-


tion attenuation Dr as a function of temperature and relative humidity must be considered.
In contrast to small distances such as in rooms up to a distance of 50 m the additional
attenuation can be ignored. However, the attenuation Dr increases with the frequency
(Figure 2.11). Hence, an electro-​acoustical system is required to compensate for high-​
frequency loss, while low frequencies are clearly audible over large distances.
The reduced sensitivity of the auditory system for low and very high frequencies at low
sound pressure levels is approximately simulated for determined loudness values by means
of weighting curves in sound level meters (Figure 2.15). According to the IEC 61672-​
1: 2013, the A-​weighted curve corresponds approximately to the sensitivity of the human
ear at 40 phons, whereas the B-​weighted curve and the C-​weighted curves correspond
more or less to the sensitivity curves of the human ear at 60 phons and 90 phons, respect-
ively [1].
The A-​weighted curve selectable in any sound level meter is relevant for sound
reinforcement engineering. In addition, the use of the A-​weighting is recommended
for measuring the sound level distribution of speech and information systems in noisy
environments, and also the noise of exhaust openings of ventilating and air-​conditioning
systems, so as to not get wrong measurement results caused by air turbulences or other
low-​frequency sound.
The range in Figure 5.1 marked red illustrates the range of typical speech levels and the
range marked blue that of music signals, for natural sources without sound reinforcement.
As the goal of this book is to deal with sound reinforcement issues for speech as well as
for music signals, the sound pressure levels in these ranges need to be met in all larger facil-
ities with sound reinforcement.
13

Design for Sound Reinforcement Systems 133

Figure 5.2 Noise criteria (NC) curves as a function of frequency.

One factor which has not yet been mentioned but that might significantly determine
the dimensioning of the amplification is the existing noise floor in the venue. Often, a spe-
cific signal to-​noise ratio (S/​N) is asked for in the specification for a sound system design.
Standards require an S/​N ratio of 10–​15 dB. Within rooms mostly the air conditioning
and ventilation systems produce any disturbing noise and the requirement can be achieved
relatively easily. To characterize background sound levels the so-​called noise criteria (NC)
curves have been developed; see Figure 5.2.
Different NC values are required for different room types; refer to Table 5.1.
Slightly adapted noise rating (GK) curves are in use according to DIN 15996; see
Figure 5.3; refer also to Table 5.2. Above NR20 the so-​called ‘Grenzkurven’ GK are iden-
tical to the noise rating curves NR.
A relationship between the noise rating curves and the overall sound pressure values in
dB(A) is shown in Figure 5.4. One could ask the question why could not just dB(A) or any
other sound pressure level be used to determine background noise levels: the explanation is
that all SPL dB standards are averaged over certain frequency ranges –​hence the noise level
at one specific frequency could be very high while the average value is still very low. This
cannot happen with NC and NR curves.
The values in Table 5.2 are mainly in use for recording studios but sometimes also asked for
in performing arts facilities. Recording studios for symphonic music are similar to high-​end
concert halls; therefore, the noise floor should not exceed the NR5 curve (max. 18 dB(A));
this is only achievable with high demands and significant complexity of the HVAC system.
Achieving a noise floor between 18 and 20 dB(A) in a concert hall is very costly. All
technical equipment (HVAC, light fixtures, video projectors, loudspeaker systems on
standby etc.) must adhere to very stringent criteria regarding not creating any noise. This
will increase the expenses for such systems significantly. In most concert halls the NR15
noise rating curve should not be exceeded; this corresponds to 25 dB(A).
134

134 Wolfgang Ahnert


Table 5.1 Noise criteria values in different room types

Type of facility Recommended NC Max. SPL


curve in dBA

Public lobbies, corridors, circulation spaces 40 to 50 49 to 58


General classrooms, libraries 30 to 40 40 to 49
Executive offices, conference spaces 25 to 35 36 to 44
Small general-​purpose auditoriums (fewer than about 500 max. 35 max. 45
seats), conference rooms, function rooms
Small churches and synagogues max. 25 max. 36
Radio, TV, recording studios (close microphone pickup) max. 25 max. 36
Churches, synagogues (for serious liturgical music) max. 25 max. 36
Large auditoriums for unamplified music and drama max. 25 max. 36
Opera performance halls max. 20 max. 32
Music performance and recital halls max. 20 max. 32
Small auditoriums, theatres, 20 to 30 32 to 40
Music practice rooms, large meeting rooms, teleconference 20 to 30 32 to 40
rooms, audiovisual facilities
Large conference rooms, large auditoriums, executive offices 20 to 30 32 to 40
Small churches, courtrooms, chapels (for very good listening 20 to 30 32 to 40
conditions)
Large churches, recital halls (for excellent listening 15 to 20 28 to 32
conditions)
Concert halls 10 to 15 22 to 28

Figure 5.3 Noise rating (NR) curves used in Europe.

5.3 Transfer Function, Impulse Response, Frequency Response


An impulse response (time response) or a transfer function (frequency response) in a room
valid for a specific listener position will quite often not be measured directly by using
impulse source excitation (such as a balloon being punctured or an alarm pistol being fired),
135

Design for Sound Reinforcement Systems 135


Table 5.2 Noise rating (NR) values in different studio facilities

Radio drama GK0


Classical music
-​Chamber music GK0
-​Symphonic music GK5
Entertainment music GK15
Rooms used mainly for speech recordings GK5 to GK10
Rooms used mainly for assessing the sound quality and/​or for sound processing GK5 to GK15
TV production studios and processing rooms for TV and broadcasting GK10 to GK20
Processing rooms of office-​like character GK20 to NR 25
Technical rooms NR30 to NR35

Figure 5.4 Relationship SPL values and noise rating curves.

but rather by means of using predetermined measurement signals and fast post-​processing
algorithms in a computer. Figure 5.5 shows the schematic block diagram to acquire impulse
responses of a space by means of computer software such as SMAART, DIRAC, EASERA
or Room EQ Wizard REW.
These advanced measurement systems can measure the complex transfer function or
impulse response of the system under test. For this purpose, the system is excited with a
known test signal and its response is recorded.

e(t) SUT a(t)


136

136 Wolfgang Ahnert

Figure 5.5 Computer-​based measurement system for different excitation signals (schematic block
diagram).

Assuming that the system is a linear, time-​invariant (LTI) system, the transfer behaviour
can be obtained from the deconvolution of the two data sets. That is because the response
function a(t) is the convolution product of the excitation signal e(t) and the transfer
function h(t):

a (t ) = h (t ) ⊗ e (t ) (5.1)

where * stands for linear convolution.


For periodic waveforms, this becomes a simple product in the frequency domain so that
one can solve for the transfer function H(ω):

A (ω )
H (ω ) = (5.2)
E (ω )

An inverse Fourier transform will lead back to the time domain in the form of an impulse
response.

h (t ) = ∫ H ( ω ) e ω dω
j t
(5.3)
−∞

The integration from negative to positive infinity can be simplified by introducing a lower
and a higher frequency threshold –​which is not really limiting in the case of the audio spec-
trum as human hearing in itself is limited (20 Hz –​20 kHz). In addition, the integration
process itself can be significantly accelerated (fast Fourier transformation) by introducing a
sample rate (basically dividing the curve into a series of discrete steps) –​again, for audio this
137

Design for Sound Reinforcement Systems 137

Figure 5.6 Overlay of excitation, raw data and impulse response files.

is no major limitation as the signal is already digitized (converted to steps) once it enters
the algorithm. For measurement algorithms utilizing this deconvolution method, either
pseudo-​random noise, swept-​sine, MLS signals or other well-​defined excitation signals are
used. Also, an impulse-​like test signal is possible in theory. However, this is not much used
in practice since the short duration of the signal requires a high amplitude to sufficiently
excite the system.
Figure 5.6 shows an overlay of three signals: green is the excitation signal (an MLS
signal of 15th-​order) radiated by the loudspeaker (in Figure 5.5). This signal corresponds
to signal e(t) in eq. (5.1) and following. Blue is the so-​called raw data signal, recorded with
the microphone in Figure 5.5, and corresponds to signal a(t) in eq. (5.1) and following. By
post-​processing according eqs. (5.2) and (5.3) the red impulse response is obtained (the
amplitudes of the curves have been adjusted slightly to show all three signal parts in one
graph).
The transfer function H(ω) in eq. (5.2) is the Fourier transform of the impulse response
and characterizes the frequency dependence of the transfer behaviour of the system under
test; see Figure 5.7. This is not the actual frequency response measured for a loudspeaker or
natural sources.
With the Fourier transformation of an impulse response, not only is the magnitude
of the transfer function obtained, but also its phase response; see Figure 5.8. This figure
shows a wrapped presentation, i.e., the phase jumps between −180° and +​180°. For loud-
speaker measurements and other audio equipment (such as filters) the phase information
is very important, as it can point out frequency-​dependent timing issues; for room-​acoustic
measurements though the phase information can be ignored.
Only when the source that excites the space has a 100% accurate, ideal and flat fre-
quency response will the transfer function and the measured frequency response measured
at the same location correlate with each other; their amplitudes may be different depending
on the excitation level of the source. Usually though the measured frequency response in
a room is determined not only by the room behaviour but also by the frequency behaviour
and the directivity of the source; see Figure 5.9.
newgenrtpdf
138
138
Wolfgang Ahnert
Figure 5.7 Transfer function as Fourier transform of the impulse response.
139

Design for Sound Reinforcement Systems 139

Figure 5.8 Phase response of the impulse response.

Figure 5.9 Frequency response and spectrogram presentation.

The frequency response curve is usually shown on a logarithmic scale. So, the magnitude
is squared and logarithmized. By squaring the phase information is lost.
In Figure 5.9 are shown not just the spectrum (here frequency response) of the meas-
urement signal, but also its spectrogram, i.e., the frequency dependency over time of the
measured signal. Such presentations are common in sound level meters [1] and have been
used since the 1970s, and today in modern handheld sound level meters such as those
manufactured by B&K or Norsonics.
140

140 Wolfgang Ahnert

5.4 Important Design Criteria for Sound Systems

5.4.1 Loudspeaker Coverage and Off-​Axis Sound Radiation

5.4.1.1 Introduction
Covering a specific area with sound is the basic job for any sound system. To select the
correct loudspeaker its data must be studied, most specifically its radiation pattern, also
called coverage or balloon data due to the appearance of the 3D data representation. The
radiation pattern basically answers the question what sound pressure level is projected by
the loudspeaker in which direction. When installing a loudspeaker, the main radiation
pattern must be determined and off-​axis sound radiation must be assessed, specifically to
reduce sound energy that is radiated towards undesirable areas of the room like back to the
stage, to the ceiling or to the back wall. By not considering such influences feedback on
stage may happen.

5.4.1.2 Coverage Issues


Coverage analysis is based on available loudspeaker data. Figure 5.10 shows a typical data
sheet of a point source loudspeaker. Often the balloon data are indicated in an old format
and based on text information (compare section 8.2). Figure 5.10a shows specification data
such as directivity, max. power and max. SPL, all vs. frequency. The other two figures b and
c show balloon data used for coverage investigations in simulation programs. Some simu-
lation programs allow the projection of the coverage patterns (beamwidth presentation) of
Figure 5.10c directly onto audience areas.
Figures 5.11 shows the aiming process to properly define the correct point source
loudspeaker at the correct location. Figure 5.11a shows the directivity balloon of the
selected loudspeaker and Figure 5.11b the coverage contours for the mid-​frequency range.
Figure 5.11c shows the directivity cones (−3, −6 and −9 dB attenuation respectively) in
a room and Figure 5.11d the view out of the selected loudspeaker with visible directivity
cones (−9 dB cone outside the view window).
A more sophisticated data presentation including coverage figures was developed mainly
for two reasons:

• inclusion of phase and other physical and mechanical data


• prevention of unauthorized loudspeaker data manipulation

Figure 5.12 shows a renewed data presentation for a similar point source such as in
Figure 5.10. It may be seen that not only are the phase data used but filter settings as well,
to obtain the required frequency response. These filter settings may be used to correctly set
up the amplifiers and equalizers.
More complex loudspeaker data may be shown as well. Figure 5.13 shows the data for a
line array consisting of eight modules. Filter settings and array curving may be used to cover
the audience area with a flat frequency response.
newgenrtpdf
14
Design for Sound Reinforcement Systems 141
Figure 5.10 Loudspeaker data and polar diagrams.
newgenrtpdf
142
142 Wolfgang Ahnert
Figure 5.10 Continued
newgenrtpdf
143
Design for Sound Reinforcement Systems 143
Figure 5.10 Continued
14

144 Wolfgang Ahnert

Figure 5.11 Different aiming diagrams of a typical point source loudspeaker.


145

Design for Sound Reinforcement Systems 145

Figure 5.11 Continued

By knowing the exact loudspeaker data most simulation programs are able to display the
coverage. Figure 5.14 shows the coverage pattern of a line array in a stadium segment.
The audience area in Figure 5.14 is not well covered by sound; the lower and upper parts
suffer from poor coverage. This could be improved with sound better directed to these areas.
Often digital signal processors (DSP) are used to control the sound radiation to the areas
which must be covered and to avoid sound in areas which should remain silent. Hence
not just the loudspeakers or line arrays determine the radiation pattern but also the zoning
that controls sound levels per area. To illustrate this, say, as an example, the middle zone 2
should not be covered with sound. By using the relevant control buttons in Figure 5.15 it is
possible to control the DSP settings by data export and to achieve the appropriate coverage
pattern shown in Figure 5.16.
146

146 Wolfgang Ahnert

Figure 5.12 Sophisticated data presentation of a point source loudspeaker.


147

Design for Sound Reinforcement Systems 147

Figure 5.13 Sophisticated data presentation of a modern line array.


newgenrtpdf
148
148
Wolfgang Ahnert
Figure 5.14 SPL coverage figures for a line array in part of a stadium.
149

Design for Sound Reinforcement Systems 149

Figure 5.15 Control panel for coverage control.

By comparing Figure 5.14 with 5.16 it is clear that the physical configuration of the line
arrays has not been changed, just the control setting. In contrast to the poor coverage results
in Figure 5.14 for the lower and upper zones the sound energy in Figure 5.16 is now signifi-
cantly improved and the middle zone 2 is almost not covered with sound.

5.4.2 Delay Issues, Time Alignment, Equalization and Gain before Feedback

5.4.2.1 Travel Time Phenomena


In everyday life we are confronted with numerous travel time (delay) phenomena of sound.
Best known may be the fact that the distance of the centre of a thunderstorm can be
calculated by measuring the time between the visible lightning (c0 =​300,000 km/​s) and
the audible sound of the thunder (c =​344 m/​s at 20°C). If, for example, 3 s pass by after
the lighting before the thunder is heard, the thunderstorm’s centre is about 1 km away
(s =​ c•t =​344 m/​s ⋅ 3 s ≈ 1000 m).
At open-​air events where large projection screens are used for showing the artists the
optical and acoustical perceptions drift apart on the rear seats. At a distance of 300 m the
resulting time difference is up to 1s. These delay issues cannot be compensated by technical
means, since the synchronization delta is location-​dependent.
Sound speed is calculated by the following empirical equation:

c = 331.4 1 + 0.00366 ϑ

For ϑ =​20°C, for instance: c =​343.3 m/​s.


For determining the travel times as a function of the travel paths under normal air
pressure and temperature conditions, Table 5.3 may be used. One sees that 58 ms are
newgenrtpdf
150
150
Wolfgang Ahnert
Figure 5.16 Controlled radiation avoiding sound coverage in unoccupied zone 2.
15

Design for Sound Reinforcement Systems 151


Table 5.3 Relationship of distance in m and ft and run time in ms (path time relation at 20°C)

s/​m t (ms) s (ft)

1 2.92 3.28
2 5.83 6.56
3 8.75 9.84
4 11.66 13.12
5 14.58 16.40
6 17.49 19.68
7 20.41 22.96
8 23.32 26.24
9 26.24 29.52
10 29.15 32.81

s =​0.343 * t, t =​ s/​0.343; s in m, t in ms.

required for compensating a travel path difference of say 20 m. In addition to the metric
scale, the British foot is also indicated (1 ft =​0.3048 m). The foot scale offers a comfortable
simplification in that 1 ms travel time corresponds to about 1 ft travel path.

5.4.2.2 Suppression of Echoes


The human hearing can distinctively separate two successive sound events (e.g. sound
impulses) at the so-​called blurring threshold of Δt ≈ 50 ms. For instance, if two non-​delay-​
compensated loudspeakers are installed at more than 17 m distance from each other, it is
possible that an echo is audible, unless the signal of the near loudspeaker masks that of the
far one.
Figure 5.17a illustrates a case often occurring in large halls where two loudspeakers L1
and L2 are installed, i.e. one of them, L1 (the main sound reinforcement), near the stage and
the second one, L2, under the balcony for improving the reproduction conditions in that
area. If this second loudspeaker is connected directly to the output of the mixing console
(such as the first one), echo phenomena will occur in the area under the balcony if the
main loudspeaker is also audible there (Figure 5.17b). The echo disappears, however, if the
signal of loudspeaker L2 is delayed by Δt (Figure 5.17c). If it is delayed even longer (by Δt2,
Figure 5.17d), also no echo occurs, while additionally the main loudspeaker is acoustically
localized as being the primary source, and not the loudspeaker mounted under the bal-
cony that in fact reproduces the majority of the audible sound. For large distances it is also
imaginable that the signal from the main loudspeaker L1 is masked by the near L2 signal,
hence echoes are not perceptible either with or without delay equipment.
Bolt and Doak [2] have investigated what proportion of listeners do feel bothered
by echoes of varying intensity and delay (Figure 5.18). A threshold curve derived from
these results as to which level attenuation of the echo versus the direct sound needs to be
maintained for echo interferences to remain unperceived is shown in Figure 5.19.
By means of acoustic delay it is possible to concentrate signals at audience groups
in such a way that also with a distributed loudspeaker arrangement a good sound level
coverage does not necessarily imply echo effects. By a suitable delay compensation one
can achieve, apart from the prevention of echoes, an extensive coherence of the sound
radiated by the various loudspeaker arrays. This approach increases the initial energy and
improves the definition.
newgenrtpdf
152
152
Wolfgang Ahnert
Figure 5.17 Loudspeaker system for suppressing echoes: (a) geometric conditions; (b) temporal sequence of the signals without delay; (c) tem-
poral sequence of the signals with delay; (d) temporal sequence of the signals with delay and localization change.
153

Design for Sound Reinforcement Systems 153

Figure 5.18 Proportion of listeners still perceiving an echo, as a function of echo delay.

Figure 5.19 Limit curve to be complied with for suppressing echo disturbances.

5.4.2.3 Feedback Suppressor


Every electrical channel or system containing active elements may show feedback behav-
iour. This is caused by the impact of an output signal of a system looped back to its input.
This loop-​back oscillation is of a sine wave shape and in the case of an acoustic system or
channel the feedback is perceived as ‘howling’ or ‘whistling’.
In the case of acoustic feedback, some specific observations can be made:

• The feedback loop of an electro-​acoustic amplifier channel consists not only of an elec-
trical, but also of an acoustic part
• It is practically impossible to separate the feedback path into its different components
(e.g., the electro-​acoustic and the room-​acoustic part)
154

154 Wolfgang Ahnert


• Acoustic feedback happens in a variety of different loops and paths; therefore, the
nature of acoustic feedback is more complex than that in pure electrical networks

To avoid acoustic feedback the operator or installer of sound systems needs to compre-
hend the physical background of acoustic feedback, including the basics of how to arrange
microphones and loudspeakers in relation to each other. In particular, the level of monitor
loudspeakers on stage, quite often very high, may result in problems which appear when
a singer with a wireless microphone is performing directly in front of these monitor
loudspeakers.
Another approach to reducing the probability of acoustic feedback is paying attention
to the so-​called secondary structure of the space, more specifically to the wall and ceiling
materials close to microphones or loudspeakers. Wall areas that are covered by absorbers
reduce the occurrence of acoustic feedback otherwise caused by strong reflections. In add-
ition, sound-​focusing effects in concave spaces or concert shells must be avoided as they
would support feedback.
Normally a sound engineer does not have much influence on the wall or ceiling design
of a space, but some knowledge regarding acoustic feedback will assist in reducing this
unpleasant behaviour.
If such relatively simple methods cannot be employed to avoid acoustic feedback some
technical procedures like filters, frequency and phase shifting as well as other feedback
suppressor devices may be used.

5.4.2.4 Use of Narrow Band Filters


The transmission curve (compare section 2.6.1.3.1 and Figure 2.33) of a space shows stat-
istical irregularities with dips and peaks (see [3]). Investigations over the years have shown
that such transmission curves display up to 70 ‘eigen oscillations’. Among these peaks about
3 to 40 (on average 12–​25) oscillations may in fact lead to acoustic feedback.
Automatic notch (very narrow band) filters have been used to identify the feedback fre-
quencies. Digital signal-​processing allows flexibility in terms of frequency detection as well
as frequency discrimination and the method of very rapidly applying notch filters. Auto-​
notching is probably the most popular algorithm because it is simpler to manage the distor-
tion. This way an ‘escalation’ of frequency peaks will be prevented, the frequency response
curve is smoothed and the overall gain of the sound system can be increased by the amount
of level reduction of the peak.
The bandwidth of such notch filters is around 5 Hz; the maximum attenuation is
frequency-​dependent, normally 3–​30 dB. A professional feedback suppressor may enhance
the loop gain by 5–​8 dB.
Under the name of feedback controller microprocessor-​controlled units have recently
become available that trigger frequency-​dependent attenuation in response to incipient
positive feedback phenomena such as timbre changes, reverberation effects and fluctuations
in level. These automatic filters are assigned to the individual microphone channels and are
capable of increasing the positive feedback limit of a system by up to 15 dB [5].

5.4.2.5 Frequency Shifter


Above a certain frequency that is volume-​dependent the transmission curves in spaces
show pure statistical properties [4]. Schroeder found that the average frequency distance
between neighbouring peaks and dips of a transmission curve is proportional to 4/​RT
15

Design for Sound Reinforcement Systems 155


(RT =​reverberation time of the space). By means of a frequency shifter the peaks and dips
are shifted to overlapping positions and the response curve is smoothed this way. A loop
gain of 6–​8 dB can be reached by this approach.
Frequency shifters are mainly used for speech transmissions, as pitch shifting in the case
of music performances is not acceptable.

5.4.3 Equalization by Filtering


Filters for influencing the amplitude-​frequency characteristics of the transmitted sound
signals are among the classic means of sound processing used in sound reinforcement engin-
eering. Two main fields of application exist:

• optimization of timbre within the reception area concerned


• suppression of acoustic positive feedback frequencies; see above

The timbre optimization depends on the application of the system. A balanced frequency
response over the entire audio spectrum may for instance be desirable for high-​quality
systems designed for music transmissions. For improving intelligibility in mere speech trans-
mission systems, a reduction in the lower frequency range and an enhancement of certain
formants in the range of about 2 kHz is appropriate [6]. Figure 5.20 shows recommended
frequency response target curves for different speech or music applications [7].

Figure 5.20 Tolerance curves for the reproduction frequency response in different applications: (a)
recommended curve for reproduction of speech; (b) recommended curve for studios or
monitoring; (c) international standard for cinemas; (d) recommended curve for loud rock
and pop music.
156

156 Wolfgang Ahnert

Figure 5.21 Attenuation behaviour of filters of constant bandwidth (a) and of constant quality (b).

Another set of requirements may be appropriate for optimizing the timbre of stage
monitoring. In larger sound reinforcement systems filters are used at various points. For
influencing the microphone frequency response and in many cases also for suppressing the
most dominant positive feedback frequencies, these filters are normally located in the input
channel of the mixing console.
It has to be pointed out that the use of filters is nearly always at the expense of the
maximum attainable sound level. For this reason, it is necessary to consider corresponding
power reserves when designing the system.
One distinguishes passive filters, which operate without additional power supply and
thus do not offer any amplification possibility (level enhancement) or any reduction of the
signal-to-​noise ratio, and active filters; thanks to their more universal applicability, their
smaller dimensions and their lower price, the latter are nowadays almost exclusively used in
professional audio equipment.
Another distinguishing characteristic is the influence of damping on the behaviour of
the filter curve. In this respect one distinguishes between filters of constant bandwidth and
filters of constant quality q (Figure 5.21).
For reasons of cost, equipment complexity and ease of operation a number of different
practical designs are used:

• treble and bass correctors (shelf filters)


• parametric filters (channel filters)
• multi-​bandpass filters (equalizers)

5.5 Basic Tools and Parameters for Computer-​Based Calculation


A modern and sophisticated approach to designing a sound system is based on a mathemat-
ical model that serves as a virtual prototype for computational prediction performed in an
acoustical simulation platform. Sound designers will have a certain initial idea and concept
of a particular sound system which is then to be tested within a simulation program. As
the programming of such a tool is very elaborate and requires a highly specialized skill set,
only a limited number of different options exist, such as Odeon [8], CATT Acoustic [9] and
EASE [10]. These packages are manufacturer-​independent platforms that allow simulation
157

Design for Sound Reinforcement Systems 157


of a large number of loudspeaker brands. In contrast, packages such as Bose Modeler [11],
Meyer Sound MAPP [12], L’Acoustics, Sound Vision [13], d&b ArrayCalc [14] and others
are limited to simulating the behaviour of limited brands of loudspeakers, hence do not
permit comparisons of different systems and are often limited in other respects as well (such
as proper analysis of the room acoustical parameters). Common to all these programs is the
necessity to create a full 3D model (based on the architectural geometry information) and
the availability of extensive data bases for surface materials and seats as well as loudspeaker
systems for the non-​proprietary ones; please also refer to Chapter 8.

5.5.1 Practical use of Computer-​Based Simulation


Building a model is time-​consuming and requires experience. Different approaches exist:

1. To use 2D drawings and create the model by entering planes based on x, y, z coordinates
2. To use pre-​programmed prototypes or sub-​modules and adapt the coordinates as required
3. To import from AutoCAD or SketchUp files
4. To import from other simulation programs

Most newcomers to acoustical simulation believe that importing from any well-​known
CAD platform would solve all problems as the architect can provide a 3D model that is
relatively simple to import into a simulation program. Most of the time though this does
not work without considerable additional effort as the architectural drawings show far too
many details that are irrelevant for the acoustical calculations and would lead to massively
increased calculation times.

5.5.1.1 Room Acoustic Simulation


To properly launch a room acoustic simulation, a corresponding computer model is to be
created; compare section 8.6. As a rule of thumb, the number of planes should be limited to
a maximum of 5000, preferably in the range 1000–​1500. The finer surface structures should
not be modelled in detail, but properly selected absorption and scattering coefficients
will define the acoustical behaviour of these surfaces much more accurately. Figure 5.22
illustrates an example of a model of the Berlin Philharmonic Hall (4700 surface planes). For
the room acoustical simulations, omnidirectional sources are normally selected.

5.5.1.1.1 STATISTICAL APPROACH

Reverberation time The simplest way to obtain results quickly is to use the direct sound
of one or more sources (loudspeakers) and to calculate the reverberation level of the room
by means of reverberation time equations, assuming the room follows a statistically evenly
distributed sound decay (homogeneous, isotropic diffuse sound field, that is, the rever-
beration time RT is constant over the room). Based on known room dimension data and
the associated surface absorption coefficients a computer program is able to very quickly
calculate the RT according to the Sabine and Norris-​Eyring equations (compare (2.1)).
Also, the volume of the closed space may be calculated, quite often an interesting number
for architects, because the common design tools of the architect might not deliver that
information.
158

158 Wolfgang Ahnert

Figure 5.22 Figures a–​c show the same view of a 3D computer model in AutoCAD, SketchUp and in
the simulation software EASE.

Normally a set of frequency-​dependent ‘target’ reverberation times is available in the


simulation program, hence the calculated RT times can be compared to the ‘target’ values.
The program will then display the calculated RT time versus the ‘target’ RT time for each
selected frequency band and will list the deviation in the RT time for each band relative to
the ‘target’ values within a range of tolerance. Calculation of the early decay time (EDT) is
mostly possible as well.

5.5.1.1.2 RAY TRACING APPROACH

As an example, for the ray tracing approach, the EASE AURA algorithm [15] calculates
the transfer function of a room for a given receiver position using the active sound sources.
159

Design for Sound Reinforcement Systems 159

Figure 5.22 Continued

For this purpose, a hybrid model is employed that uses an exact image source model for
early specular reflections and an energy-​based ray tracing model for late and scattered
reflections. The transition between the two models is determined by a fixed reflection order;
see section 8.5.
For each receiver (and for all 1/​3 band octave frequencies), a so-​called echogram is
created which contains energy bins linearly spaced over time. When a receiver is hit, the
energy of the detected particle is added to the bin that corresponds to the time of flight.
Also, as a separate step, the contributions of the image source model are added to the bins.
The particle model accounts for scattering in the following way: whenever a particle hits
a surface, its energy will be diminished as a function of the material’s sound absorption
characteristics. Then, a random number is generated and depending on the scattering factor
the particle is either reflected geometrically or it is scattered under a random angle based
on a Lambert distribution. Subsequently, the particle will continue to be traced until it hits
either a receiver or another wall.
Modern multithread, multiprocessor, cloud-​based and other network calculations
significantly decrease the calculation times for complex situations from days down
to hours.
In the case of cone-tracing a directional ray distribution starting at one single point and
then fanning out conically over a certain room angle is employed; a special form is pyramid-
tracing, where the cross section is not a circle but a square. The cone-tracing approach
allows for very fast ray calculations but the fact that the cones do not cover the entire
source ‘sphere’ (but only a single point) turns out to be a disadvantage. It is therefore neces-
sary to have adjacent cones overlap and employ an algorithm to avoid multiple detections.
160

160 Wolfgang Ahnert


Alternatively, the energy can be weighted so that the multiple contributions produce (on
average) the correct sound level. Conical beam tracers are widely accepted [16, 17].

5.5.1.2 Results of All these Calculations


Having obtained impulse responses at various listener positions in a space they now may
be used to derive a large number of acoustic quantifiers and measures characterizing this
environment from an acoustical point of view. By handclapping, by firing a pistol or by
using modern measurement techniques we obtain an impulse response of the space in just
a few seconds. This will take considerably more time in a simulation program. A statistical
approach may still take only a couple of minutes, but to calculate a full impulse response in
complex rooms can take hours, even days.
The entire ray-tracing method that calculates impulse responses has to take into account
the directivity of the sound sources and the absorptive and scattering characteristics of
the surfaces encountered en route from the source to the receiving position. Diffracted
sound components must be considered in the future as well. Additionally, the dissipation
of sound energy in air, i.e., the frequency-​dependent air attenuation, must be considered.
Image modelling may be used only for small models; today this method is used mainly for
tutorial purposes.
Using the simulated impulse responses, the designer is able to calculate all room acoustic
measures of the space such as (Figure 5.23):

Figure 5.23 Echogram in EASE-​AURA 4.4.


16

Design for Sound Reinforcement Systems 161


• reverberation time
• clarity
• speech intelligibility
• echo behaviour etc.

5.5.2 Sound System Design by Verification of Simulation Results


When the room acoustic properties of the corresponding space have been studied, evaluated
and possibly optimized, the sound system design may start for real. The loudspeakers are to
be selected and inserted into the model based on the knowledge and the experience of the
sound system designer. Subsequently the loudspeakers must be oriented to the correct direc-
tion to properly cover the existing audience areas, the ‘aiming’ of the loudspeaker.

5.5.2.1 Aiming
Aiming the individual loudspeakers is a critical step ensuring the proper spatial arrangement
and orientation of the sound reinforcement systems. Once the corresponding room or
open-​space model is available and the mechanical and acoustical data of the loudspeaker
systems are accurately known, these systems are approximately positioned and possibly
fine-tuned within the same step. This also includes the beam settings for digitally con-
trolled arrays and/​or their delays. Modern simulation programs employ an isobeam/​isobar
method to initially aim the loudspeakers, preferably utilizing the −3 dB, −6 dB or −9 dB
contours.
Figure 5.24 shows various types of projection of the −3 dB, −6 dB and −9 dB contours
into the room. The superimposed aiming curves for multiple speakers can be studied in audi-
ence areas (Figure 5.25).

Figure 5.24 3D aiming presentation in a wireframe model (EASE4.4).


162

162 Wolfgang Ahnert

Figure 5.25 2D aiming mapping (EASE 4.4).

5.5.2.2 Time-​Arrivals, Delay, Alignment


A graph of time-​arrivals (direct, direct plus reflected, reflected only sound energy) allows
the user to see the arrival of the first wave front as required by the design, to adjust loud-
speaker delays, to bring the loudspeakers into coherence and to obtain an acoustic localiza-
tion of an amplified source (via distance and the Haas or precedence effect); see Figure 5.26.
The consideration of special effects such as localization, stereo imaging, etc. is more com-
plex. Simulation programs allow one to determine the first wave front as well as to calculate
initial time delay gaps or echo detections (c.f. in this respect Figure 5.27).
Array lobing patterns of ‘arrayable’ loudspeakers are displayed by simulation programs,
with the ability to provide signal delay and/​or move the appropriate loudspeakers in an
attempt to shape the array into acoustic alignment. Modern programs have the ability to
provide individual signal delays to each loudspeaker to align them in time.

5.5.2.3 SPL Calculations


After the loudspeakers have been correctly aimed and the delays are set correspondingly,
the achievable sound pressure levels can be calculated. Initial results are given for the direct
sound pressure level, which is the same for statistical and ray-tracing-​based calculation
newgenrtpdf
163
Design for Sound Reinforcement Systems 163
Figure 5.26 Delay presentations in simulation tools: (a) in ODEON 10.0 and (b) delay pattern of first signal arrival in EASE.
164

164 Wolfgang Ahnert

Figure 5.26 Continued

routines. As long as a good direct sound coverage over the listener area is predicted, perfect
intelligibility indices are expected as well, under the condition that the reverberation is
well controlled.
The corresponding sound pressure calculations should take into account either measured
phase data for the individual loudspeakers or the run-time phases of different travel-time
differences of individual loudspeakers if phase differences among these loudspeakers can be
ignored.
A complex summation (including phase conditions as well as travel-time differences)
has to be used as a standard method of calculating the direct SPL. In simulation algorithms
the complex sound pressure components of different coherent sources are added up and
afterwards squared to obtain SPL values. The so-​called DLL or GLL approaches (see section
8.6) calculate the complex sum of all sources in the array.
Modern simulation programs are analysing programs, capable of calculating which
levels can be obtained by which loudspeakers and under which acoustical conditions. But
questions are often asked inversely: an advanced algorithm could query the user for a desired
average target SPL of the system, and subsequently adjust the power provided to each loud-
speaker (with a warning indication when the power required exceeds the loudspeaker’s cap-
abilities), taking into account target SPL, the sensitivity and directivity of the loudspeaker,
the distance of throw and the number of loudspeakers.
The goal of all these efforts is to evenly cover the entire audience area(s) with music-
ally pleasing and intelligible sound, while providing sound pressure levels suitable for the
intended purpose; compare Figure 5.28.
165

Design for Sound Reinforcement Systems 165

Figure 5.27 Echo detection in EASE: (a) initial time delay gap (ITD) mapping to check echo
occurrence in a stadium; (b) echogram in weighted integration mode at 1 kHz; (c) echo
detec­tion curve for speech at 1 kHz.
16

166 Wolfgang Ahnert

Figure 5.28 Waterfall presentation in EASE.

5.5.2.4 Mapping and Single-​Point Investigations


Once the aiming, power setting and alignments are completed, the programs provide
a colour-​coded visual coverage map of the predicted sound system performance. This
coverage map considers the properties of the loudspeakers as well as the impact of
reflecting or shadowing planes or objects. Such maps provide as a minimum the following
displays:

(a) Predicted sound pressure level at 1 octave or 1/​3 octave band frequencies, and at an
average of these frequencies (Figure 5.29)
(b) Predicted speech intelligibility values, listed as STI values (see Figure 5.30)
(c) Predicted acoustic parameters (at octave or 1/​3 octave band frequencies), such as C80,
C50, Centre Time, Strength and other values according to ISO standard 3382 (com-
pare Figure 5.31).

5.5.3 Auralization
To make acoustical data more approachable and comprehensive for parties not necessarily
able to meaningfully analyse graphically displayed acoustic data, auralization was introduced
at the beginning of the 1990s. Auralization is a post-​processing routine which utilizes the
calculated impulse responses and through convolution transforms an anechoic pre-​recorded
167

Design for Sound Reinforcement Systems 167

Figure 5.29 Sound pressure level mapping in simulation tools: (a) 2D presentation in CATT
acoustics; (b) narrow-​
band presentation in EASE; (c) broadband presentation
in ODEON.
168

168 Wolfgang Ahnert

Figure 5.29 Continued

music or speech signal to a sound file that overlays the room’s acoustic signature to the file,
hence giving the aural impression of being positioned right inside the simulated space that
might in reality not even be built yet. Auralization routines generate binaural data files
in WAV or similar formats (Figure 5.32), but other more sophisticated sound file formats
such as a multichannel B-​format can be generated as well. By coupling these formats to a
head-tracking device, the creation of a full three-​dimensional sonic representation can be
achieved which properly changes its directionality while moving one’s head, e.g. the side-​
wall reflection becomes a full-​on sound content when rotating the head to face the wall
surface.

Useful Applications of Auralization

• Subjective, aural comparison of acoustical conditions before and after application of


room acoustic treatment, comparison of different treatment options
• Making audible the effect of a sound system or a natural sound source
• Simulating sound fields in rooms that no longer exist or which are not yet constructed

Limits and Abuse of Auralization

• Making final design decisions on the base of just two auralizations of competing projects
• Exclusive use of auralization for acoustic design work
169

Design for Sound Reinforcement Systems 169

Figure 5.30 Speech transmission index (STI) presentation in EASE: Top: three-​dimensional presen-
tation in a hall and Bottom: STI presentation in a parliament.
newgenrtpdf
170
170
Wolfgang Ahnert
Figure 5.31 Parameter presentation in EASE: Top: Clarity C80 and Bottom: Sound Strength G.
17

Design for Sound Reinforcement Systems 171

Figure 5.32 Block diagram of an auralization routine.

5.6 Efficiency –​Costs, Space, Weight


Modern loudspeaker development considers the reality of today’s productions insofar as
the race for space between video and audio has clearly been won by the video department.
Sightlines to large videowalls, used as backdrops, visual aids or effect displays, should there-
fore not be compromised by large ‘wall of sound’-​style loudspeaker arrangements.
This trend also speaks to other factors such as less trucking space (for tours, which is also
great for sustainability) and less weight on hanging points (which then can be used prefer-
ably for more video displays as the reality shows).
In theatres, concert halls and other sophisticated buildings the architect always demands
to hide the loudspeakers, in contrast to light projectors. Here are needed smaller but powerful
units to fulfil the job of perfect coverage including a wide frequency spectrum. Often it is
troublesome to hide the larger subwoofers in the required space. Small loudspeakers to be
installed for panorama or nowadays immersive sound reproduction are quite often a huge
challenge.
This conflicts quite often with the price of the units. High-​quality units are not cheap, so
the designer is forced to keep the budget and inevitably selects less costly units.
The trend in loudspeaker development therefore points to highly efficient components
with a small footprint. The efficiency of loudspeakers is a complex matter and starts at
the driver unit itself, mainly its motor characteristics (magnetic field strength, length of
coil wire, membrane area and mass). Some of these parameters are conflicting (such as
increasing the coil size, which is great for efficiency, will also increase the mass, which in
turn is detrimental for efficiency). An additional challenge is the dissipation of heat. The
innovative use of materials and construction techniques by the various manufacturers offers
options for high efficiency.
172

172 Wolfgang Ahnert


A second step is then to increase the efficiency of the loudspeaker cabinet and the
method of coupling the loudspeaker membrane to the surrounding air. Over the years,
various approaches have been developed with this goal in mind, e.g., dozens of styles of
horn or waveguide assemblies, bass reflex ports and transmission lines, all in an attempt to
enhance efficiency.
Lastly, electronic monitoring and steering is employed often by proprietary system
controllers, processors or amplifiers to properly distribute electrical power into the cabinets
by maintaining the highest efficiency while protecting the loudspeakers from overheating
and being overpowered. Although analogue circuits have been used for this purpose (some
of them involving a ‘sense’ loop back from the amplifier output to the processor) there is
currently hardly a manufacturer that doesn’t offer a digital implementation.

References
1. StandardIEC 651, 1979 -​Sound Level Meters.
2. Bolt, R.H., and Doak, P.E. A tentative criterion for the short-term transient response of auditor-
iums. J. Acoust. Soc. Amer. 22 (1950) p. 507.
3. Schroeder, M.R. Frequency response in rooms. J. Acoust. Soc. Amer. 34 (1962) pp. 1819–​1823.
4. Schroeder, M.R., and Kuttruff, H. On frequency response curves in rooms. J. Acoust. Soc. Amer.
34 (1962) pp. 76 ff.
5. Behringer | Product | FBQ2496.
6. Tool, F.E. Loudspeaker measurements and their relationship to listener preferences. JAES 34
(1986) 4, pp. 227–​235, 5, pp. 323–​348.
7. Mapp, P. Technical reference book ‘The Audio System Designer’, edited by Klark Teknik in 1995.
8. ODEON software, version 16, www.odeon.dk.
9. CATT-​Acoustic software, version 9.1, www.catt.se.
10. EASE software, version 4.4, www.afmg.eu.
11. Bose-​Modeler, version 6.11, worldwide.bose.com.
12. https://​mey​erso​und.com/​prod​uct/​mapp-​xt/​.
13. www.l-​acoust​ics.com/​produ​cts/​soun​dvis​ion/​.
14. www.dbau​dio.com/​glo​bal/​de/​produ​kte/​softw​are/​arrayc​alc/​.
15. Schmitz, O., Feistel, S., Ahnert, W., and Vorländer, M. Merging software for sound reinforce-
ment systems and for room acoustics. Presented at the 110th AES Convention, May 12–​15,
2001, Amsterdam, Preprint No. 5352.
16. Dalenbäck, B.-​I. Verification of prediction based on randomized tail-​corrected cone-tracing and
array modeling, 137th ASA/​2nd EAA Berlin, March 1999
17. Naylor, G. M. ODEON -​Another hybrid room acoustical model. Applied Acoustics 38 (1993)
2–​4, p. 131.
173

6 System Design Approaches


Wolfgang Ahnert

6.1 Introduction
As mentioned in Chapter 1, sound reinforcement systems can be designed using different
approaches. Certain basic requirements, however, are given in all cases.
The sound level produced by the system within the venue’s audience area must ensure:

• the required loudness (expectation value, adaptation to original sources)


• the appropriate frequency response (quality of the sound reinforcement system)
• the required ratio between foreground useful sound and background sound (the
signal to-​noise ratio, dynamic range of reproduction)

The sound level distribution, which enables assessment of the spatial distribution and coverage
of sound, must be sufficiently uniform. It depends on

• the localization of the loudspeakers


• the directivity characteristics of the loudspeakers
• the reverberance of the room in which the reproduction takes place

The clearness of reproduction must be up to the intended use. It depends on

• definition, clarity (reflected/​direct sound ratio), speech intelligibility


• masking of the reproduction signal
• absence of echoes

It is necessary to obtain sufficient naturalness of the transmission or in other words a percep-


tibility of a desired timbre change. Contributing factors in this respect are

• the timbre depending on the transmission range and the frequency response of the
signal transmitted
• the frequency response
• absence of distortions

Sound reinforcement systems must be sufficiently insensitive to positive acoustic feedback.


This implies demands concerning

• the level of loop amplification

DOI: 10.4324/9781003220268-6
174

174 Wolfgang Ahnert


• the level conditions around the microphone
• the directivity characteristic of the microphones and loudspeakers

6.2 Important Acoustic Parameters for Sound Design

6.2.1 Sound Level Calculation

6.2.1.1 Free Field (Direct Field of the Loudspeaker)


For the calculation of the sound level in the free field one has to use the loudspeaker
parameters valid in the free field.
The sound pressure level Ld in the direct field of a loudspeaker at a certain location at a
distance of rLH from the loudspeaker (with r0 =​1 m) and at the angle ϑ from the reference
axis results in

L d = L K + 10 lg Pel dB − 20 lg rLH dB + 20 lg Γ L ( ϑ H ) dB; (6.1)

where the 1 m, 1 W level is then Ld,1m,1 W =​ LK (sensitivity) and ΓL(ϑ) the directional factor
according to (3.7)
For greater distances (above 40 m) the meteorological propagation loss Dr =​ DLH
according to Figure 2.10 has to be considered in (6.1).
The frequency range between 200 Hz and 4 kHz is generally recommended in the
standards (ISO, DIN) for ascertaining the characteristic sensitivity level of broadband
loudspeakers.
The above-​mentioned method for calculating the sound level in the free field is gener-
ally applicable to outdoor systems, but the free-​field propagation is also of interest for indoor
systems.
If in special cases only the sound power level LW of the loudspeaker and the directivity
index DI =​10 lg QL dB are known, it is possible to calculate the required characteristic
sound level LK according to:

L K = LW + DI − 10 lg Pel − 11dB (6.2)

6.2.1.2 Diffuse Field


If the loudspeaker is located in a room, a more or less homogeneous diffuse sound field
exists. The homogeneity of this sound field depends on the size and shape of the room. For
this reason, calculation of the sound level in this field is often carried out, since the result
provides an indication as to the value of the sound level minimally achievable in this room
under given conditions.
Contrary to the calculation of the sound level in the direct field, the acoustic power
Pak of the loudspeaker(s) located in the room (and which emit the same signal) is of great
importance. The sound power and the equivalent absorption area A of the room (see (2.3))
determine the sound energy density and with p2r ~ wr also the sound pressure level Lr.
The acoustic power Pak is unknown to the sound reinforcement engineer, but it can
be calculated via the efficiency (see (3.17)) and the electric power Pel with which the
loudspeakers are operated.
175

System Design Approaches 175


At the end the diffuse sound level results in:

L r = LW − 10 lg AdB + 6dB; (6.3)

LW is the sound-​power level of the loudspeaker as can be derived from (6.2).


By means of (6.2) and (6.3) it is possible to express the diffuse sound level in the
following form:

L r = L K + 10 lg Pel dB − 10 lg AdB − DI + 17dB (6.4)

Contrary to the direct sound level, all simultaneously active loudspeakers contribute to the
formation of the diffuse sound level. Instead of Pel (installed power of a loudspeaker), the
installed power of all loudspeakers simultaneously active in the room Pelsum is entered under
consideration of their possibly different characteristic sound levels (or sensitivities LKi) and
directivity indices. This assumes the formation of a sufficiently diffuse sound field and there-
fore applies neither to flat rooms nor to very large or heavily damped rooms. These types of
calculations are now performed utilizing computer programs and will not be elaborated on
in detail (see [1]‌).

6.2.1.3 Real Rooms


In enclosed spaces the sound field originates from the diffuse sound as well as from the direct
sound of the loudspeakers. Putting both parts together, the sound level to be expected in the
room at the distance rLH is

 Γ 2L ( ϑ H ) 16π 
L = L K + 10 lg Pel dB + 10 lg  + dB ; (6.5)
 rLH 2
QL max A 

These equations can only be applied if one single loudspeaker is present or if the
loudspeakers are arranged in a very concentrated form, as is the case, for instance, with a
monocluster, where the directional factor Γges and the directivity factor Qges of the array
must be known.
The use of simulation programs is recommended for more complex loudspeaker
arrangements to achieve correct calculation results [1]‌.

6.3 Layout of Sound Reinforcement Systems


The technical layout of a sound reinforcement system is essentially determined by the pur-
pose envisaged and by the characteristics of the room in which it is to be installed.
One can distinguish:

• distributed systems, information systems, voice alarm systems


• straightforward amplification, simple sound systems
• multi-​channel systems with and without delay units
• complex sound systems using delay units
• by different/​switchable usage scenarios, multipurpose systems
• assistive listening systems (AFILS, IR, etc.) for handicapped people
176

176 Wolfgang Ahnert


An important consideration for the selection and arrangement of the sound reinforcement
devices and thus also for the selection of the technical variant to be installed is the spatial
relationship in which the addressed listener is located with regard to the original or supposed
(playback) source of the transmitted sound event. This original source may be completely
separated from the listener, as is the case with the aforementioned information systems (for
instance in department stores, foyers, transportation hubs). It may be located in the same
room as the listener (in the action area), but separated from the listener (in the reception
area), or both areas may also overlap, as is the case with conference and discussion systems.

6.3.1 Information Systems with Distributed Loudspeakers


Information systems are loudspeaker installations which serve to transmit information to
an audience distributed over a large area or many rooms. Voice alarm systems are informa-
tion systems as well. The control room emitting the information is in a separate space and
normally invisible for the audience being addressed. The transmitted information may be
speech, music or in certain cases also masking noise. Special variants are paging systems in
medical or performing arts centres as well as dispatcher and loudspeaker systems in trans-
portation hubs which are installed in enclosed rooms or in the open. A common feature of
all these is that positive acoustic feedback is not or hardly possible.
The required transmission properties, such as transmission range, balance of timbre and
sound level as well as the maximum achievable sound level, a possible volume control in
determined areas and the availability of a ‘compulsory speak-​in’, depend on the functionality
for which the system is provided. Essential for the design and arrangement of the system are
the room conditions. In most cases these conditions correlate with determined functional
requirements, for which a range of typical variants of information sound reinforcement
systems has been created.
The transmission range should cover 100 Hz to 6.3 kHz or –​in the case of higher quality
demands –​80 Hz to 8 kHz. If mere speech transmission systems are concerned, e.g. for an
internal operating area (command transmission), a transmission range of 200 Hz to 4 kHz
can be accepted. Leaving exceptional cases of extremely high environmental noise out of
consideration, the maximum sound level should be between 80 and 90 dB(A) and prefer-
ably around 85 dB(A).

6.3.1.1 Flat Rooms


It is often required to cover one-​storey rooms with a ceiling height of 2.5 to 6 m and consid-
erable horizontal extensions. Flat rooms of this kind are typical for

• foyers in airports, railway stations, congress and cultural centres, hotels, theatres
• restaurants
• shopping centres, sales floors
• open-​plan offices
• museums, galleries, exhibitions
• workshops, factories, storerooms

Various loudspeaker arrangements have been developed for these applications, the most
important of which are mentioned here.
17

System Design Approaches 177


6.3.1.1.1 GRID-​LIKE LOUDSPEAKER ARRANGEMENT IN THE CEILING AREA

The ceiling loudspeakers used are uniformly distributed in the ceiling area and radiate
downwards. With this arrangement it is possible to obtain a uniform sound level distribu-
tion over a large area, also in the case of dense furnishing or partition walls put up between
individual areas (e.g. exhibition booths or office cubicles). All loudspeakers work simultan-
eously without any electronic delays.
To avoid flutter echoes between room ceiling and floor, it is necessary that either

1. the ceiling is sound-​absorbing (given in most cases), or


2. the floor is covered with a sound-​absorbing (e.g. textile) material, or
3. the floor or/​and the ceiling surface is/​are heavily structured (e.g., by the furnishings or
ductwork etc. or by static construction elements standing out in relief in the ceiling area).

The reverberation time in the midfrequency range should be kept below 2–​2.5 s, specifically
for high ceiling heights.
Calculating the quantity of required speakers. The spacing between the loudspeakers and
thus the number of loudspeakers per surface area depends on the desired uniformity of sound
level distribution, the installation height of the loudspeakers and also the desired timbre of
the sound radiation. This number is furthermore affected by the radiation characteristics
of the loudspeakers, which in some cases are widened by means of additional components
arranged in front of the loudspeakers.
Figure 6.1 illustrates the relevant relationship between radiation angle and separation
of the loudspeaker from the average ear height above the floor. This relation between ear
height (1.2 m /​4 ft for seated audience, 1.7 m /​5.5 ft for standing audience), ceiling height
and directivity pattern of the loudspeakers may be optimized by different loudspeaker
arrangements:

• edge-to-​edge arrangement
• minimum overlap

Figure 6.1 Radiation angle and loudspeaker height d above ear-​height.


d =​h –​l; l =​1.2 m for seated audience, l =​1.7 m for standing audience
r radius of the coverage area
α radiation angle
178

178 Wolfgang Ahnert


• centre-to-​centre arrangement
• manually entered distances between the speakers
• distances needed for a minimal level fluctuation

In Figure 6.2 these arrangements are explained for hexagonal or square grid arrangements.
For comparison Figure 6.3 illustrates the sound coverage for a flat room (room height
4 m (13 ft), average reverberation time 1.2 s) with different distances between the speakers.
Left: edge-to-​edge arrangement needs up to 32 loudspeakers in an average distance of
3.2 m (10.5 ft). Right: centre to-​centre arrangement needs 123 loudspeakers in a distance
of 1.6 m (5.3 ft).
The solution on the right is rather costly; normally a distance between the
loudspeakers of 4 m (13.2 ft) is sufficient for even coverage, therefore only 24 loudspeakers
are needed.
Several simulation programs are available which calculate the level coverage as well
as the achieved intelligibility. In this case the reverberation time has to be known or may
be calculated with the software; compare the STI results for speech intelligibility with an
average reverberation time of 1.2 s in Figure 6.4.

Figure 6.2 Installation grids of ceiling loudspeakers. (a) Centre to centre; (b) minimum overlap;
(c) rim to rim.

Figure 6.3 Loudspeaker coverage (left, 32 loudspeakers; right, 123 loudspeakers).


179

System Design Approaches 179

Figure 6.4 STI coverage (left with 32, right with 123 loudspeakers).

The green colour in Figure 6.4 indicates STI values > 0.5, relevant for the installation of
voice alarm systems (compare standards CEN TS EN 54-​32 in Europe and NFPA 72 in the
US; see section 1.1). With 123 loudspeakers the same STI values are achieved as with 32
and even with just 24 loudspeakers (4 m average distance of the loudspeakers). Therefore,
it does not make sense to make an excessive effort by installation of too many loudspeakers,
as just the cost but not the quality of the system is increased.
To obtain high speech intelligibility for information systems the installation height
of the loudspeakers should not exceed 6 m (maximum of 8 m). Heights above 4 m are
acceptable only for very heavily damped rooms or/​and spaces with very large volumes. With
greater installation heights one has to consider that several widely spaced loudspeakers are
perceived simultaneously and travel-time differences increase the spaciousness, which is
detrimental for definition and speech intelligibility.
Loudspeaker installation. Various installation techniques exist to mount the loudspeakers
into the ceiling. Aesthetically most pleasing is certainly the integration of the loudspeakers
within a closed or acoustically transparent false ceiling.
For installation in a closed false ceiling, it is possible to use open loudspeaker chassis which
are mounted directly above the acoustically transparent openings of the ceiling. The false
ceiling then serves as a practically ‘infinite’ baffle.
If the loudspeakers are installed above an acoustically transparent false ceiling, e.g.
a complete sound-​absorbent ceiling for room damping, they must be enclosed in a
backbox or arranged on baffle boards so as to avoid an ‘acoustic short-​circuit’ through
the ceiling perforation. The loudspeakers can then be installed without any particular
opening; hence the architectural concept is not impaired. A precondition for this kind
of installation is sufficient acoustic transparency of the perforated boards or baffled false
ceiling. This transparency depends on one hand on the degree of perforation of the
visual covering, i.e. on the ratio between the open and the closed portions of the sur-
face beneath the radiation area of the loudspeaker. On the other hand, it also depends
on the actual perforation, on its depth and to a minor degree on the spacing of the
perforations. All these factors determine the upper frequency limit for which the ceiling
covers are transparent. A thin steel plate of 1 to 2 mm thickness, even if it has a rela-
tively low degree of perforation of about 15%, may be more favourable than a 20 mm
thick gypsum board with a considerably higher degree of perforation. But also, for thicker
plates numerous small openings are more favourable than a couple of large ones, since
the latter may additionally give rise to narrow-​band blocking resonances (frequency-​
selective attenuations or notches).
180

180 Wolfgang Ahnert


The loudspeakers are generally mounted in compact backboxes. If the system is to be
used for music transmissions, these backboxes should have a volume of at least 4 to 6 l,
otherwise the low-​frequency range decreases unacceptably.

6.3.1.1.2 LOUDSPEAKER ARRANGEMENT BELOW THE CEILING

If ceiling installation of the loudspeakers cannot be realized, three alternatives are available:

• The loudspeakers are installed directly under the ceiling and radiate downwards.
• The loudspeakers are arranged on walls and supports and radiate horizontally or slightly
inclined.
• The loudspeakers are suspended from the ceiling by means of long cables and radiate
downwards. (This solution is also to be chosen when the room to be covered is higher
than 6 m or if the ceiling area is heavily occupied by ductwork, bridging joists, etc.).

Suspended loudspeakers. For loudspeakers installed directly below the ceiling or suspended
from short pendants the same conditions apply as for loudspeakers installed in the ceiling.
Other installation variants, however, require some additional observations.
In very reverberant, high rooms one can achieve, for instance, a significant improvement
of the signal’s definition by suspending the loudspeakers not too far above the ear-​height
level (Figure 6.5). This is very effective if the upward radiation of the loudspeakers is min-
imal and if the covered area is sound-​absorbent (sound projection to the audience). Both
conditions contribute to minimizing the excitation of the upper part of the room.
The installation height of the loudspeakers has to be chosen in such a way that the audi-
ence is still within the range of the critical distance of the loudspeakers Dc: the maximum
installation height above ear-​height level is approximatively given by

Figure 6.5 Suspended directive loudspeakers for avoiding an excitation of the upper reverberant space.
18

System Design Approaches 181


dmax =​0.7 Dc (6.6)

Horizontally radiating loudspeakers. Thanks to the directional characteristics of the


loudspeakers this arrangement allows the choice of a wider loudspeaker spacing: this is for
instance a solution in listed buildings where ceiling installation is not possible. This option
also eliminates the risk of flutter echoes between ceiling and floor.
General installation guidelines:

• The loudspeakers must be installed above the head level of standing persons, to avoid
masking by the audience. The preferable installation height is between 2 and 3 m. If a
greater height is required the aiming should be inclined accordingly.
• The spacing between neighbouring loudspeakers should not exceed 17 to 20 m.
• The loudspeakers are to be aimed in such a way that no initial or phase-​coherent wave
front hits a planar reflecting surface.

A special form for covering extensive rooms or long aisles consists in using radiators with
bidirectional characteristics, consisting, for example, of two loudspeaker boxes joined
back-to-​back. As shown in Figure 6.6, these double loudspeakers are arranged in a staggered
pattern so that the widest coverage area of one loudspeaker fits into the narrowest coverage
area of the staggered next loudspeaker, to obtain a relatively uniform sound level and timbre
distribution.

6.3.1.2 Factory and Exhibition Halls


If the information is to be transmitted into large halls with a high background noise level,
relatively powerful loudspeakers are required. Owing to the large area and the relatively
high reverberation time and spaciousness of these halls and the impairment of sound propa-
gation by built-​in structures (screenings, crane ways, signboards and publicity boards), a

Figure 6.6 Double loudspeakers arranged in a staggered pattern for covering a large-​surface room.
182

182 Wolfgang Ahnert


decentralized coverage by means of suspended ceiling loudspeakers is often the only practic-
able solution.
In the case of long-​ stretched and relatively narrow halls powerful loudspeaker
arrangements, e.g. digitally controlled sound columns at one longitudinal wall or –​in the
case of a somewhat wider hall (maximum 20 m width) –​at the two longitudinal walls, may
also be suitable. The installation height of the arrays depends on the conditions (built-​in
structures, etc. may have a masking effect). The installation height should, however, be
between 3 m and 5 m.

6.3.1.3 Complexes of Individual Rooms


In larger building complexes it is often necessary to supply many individual rooms with a
central programme. Examples in this respect are hotels with several restaurants and lounges,
leisure and sports facilities with numerous training areas, administration and transporta-
tion hubs, and paging systems for individual and ensemble dressing-​rooms; lounges and
workshops for artists and technicians in performing arts centres may also be included in this
category. Complex voice alarm systems must be mentioned here as well. In all these cases
it is required to provide good speech intelligibility and an adequate, often varying loudness
level for every room. In the coverage area it must be possible for the loudness level to be
frequently adjusted so as to adapt it to the varying conditions. The control room must, how-
ever, be enabled to override this adjustment for specific announcements (e.g., voice alarms
or stage manager emergency calls).
The type and arrangement of the loudspeakers are determined by the type and size of the
rooms to be covered. In larger rooms and aisles, the sound installation is done as in standard
single-​storey, i.e., flat rooms. In higher rooms one may also use centralized arrangements.
Small rooms are mostly provided with wall or ceiling loudspeakers. In particular cases, e.g.
in managers’ rooms, table loudspeakers are also used.
Although the quality requirements of these installations are as variable as their intended
uses, they all have in common that good intelligibility is required. For this reason, a limited
frequency range is chosen, specifically in the low-​frequency region. With mere ‘command
systems’ this is also permissible. For systems intended to provide the listeners with music
reproduction, the lower limit of the transmission range should extent to at least 150 Hz.

6.3.1.4 Sound Coverage of Outdoor and Traffic Areas


Typical examples for the use of information sound reinforcement outside of buildings are

• sound systems for large exhibition grounds, factory installations and other large open-​
air sites
• sound and information systems for sports centres like outdoor swimming pools, sports
grounds, stadiums
• information systems for station platforms, bus terminals, etc.

It is common to all these systems that

• weather-​dependent propagation conditions have to be expected because of the large


distances involved
• heavily fluctuating noise levels with partly high peak values may occur
183

System Design Approaches 183


• interferences of neighbouring areas may take place which have to be avoided for reasons
of noise protection and crosstalk to other areas of the system.

6.3.1.4.1 LARGE OPEN SPACES

The goal is to cover large areas economically with echo-​free sound at a sufficient signal-to-​
noise ratio (about 10 dB). Unlike indoor sound reinforcement, the diffuse sound caused
by reflections can be ignored. For covering large distances, as is rather frequently the case
with open-​air sound reinforcement, one has, however, to consider an additional weather-​
dependent attenuation (see Figures 2.10 to 2.13). Since in the most cases the information
systems concerned are used for speech transmission, a treble drop above 10 kHz is usually
acceptable.
With sound transmissions over large distances, one has to take care that listeners close to
the loudspeakers are not exposed to excessive sound levels. This risk is particularly relevant
for centralized arrangement of the loudspeakers.
With a decentralized arrangement of the loudspeakers, however, ‘double hearing’ may
occur when two wave fronts from two separately located loudspeakers or loudspeaker arrays
arrive at the listener with a time difference of more than 50 ms so as to be perceived separ-
ately. This may also occur if the listener perceives the direct sound as well as a strong reflec-
tion. Additionally, it must be ensured that the sound levels arriving on adjacent areas that
are not to be covered are kept within the legally permitted limits.

Centralized Arrangement of the Loudspeakers This is the common solution for the coverage
of smaller sports facilities like outdoor swimming pools and sports grounds. The loudspeakers
are installed at a central elevated position near the paging station. In an outdoor swimming
pool this may, for example, be the lifeguard’s cabin roof around which the pools and the
leisure areas are grouped (Figure 6.7). The individual loudspeakers and loudspeaker arrays
are dimensioned in such a way that the desired sound level L is achieved at the greatest dis-
tance to be covered in each individual case.
It is not always possible to realize a centralized radial coverage. On small sports grounds,
for instance, the loudspeakers have to be installed at the edge of the playground, mostly in the
vicinity of the grandstand (Figure 6.8), in which case the spacing of the loudspeakers should
not exceed 15 m. If the system is used mostly for speech announcements and only occa-
sionally for music transmissions, it is possible to use compression driver horn loudspeakers.
With larger sports grounds higher standards as to reproduction quality, and particularly
to the transmissible frequency range, are required, hence loudspeakers capable of produ-
cing a wider transmission range must be used. Moreover, it is necessary to pay much more
attention to the weather-​dependent attenuation, as well as to an adequate electro-​acoustical
headroom.
Stadiums and sport grounds require special design solutions; refer to section 11.2.3.

Decentralized Arrangement of the Loudspeakers This configuration is preferred when large


and extensively scattered areas have to be uniformly covered. This applies to streets, large
squares, distributed exhibition grounds, industrial plants, etc.
An important criterion is the maximum spacing of the loudspeakers arranged in one
line. For economic reasons it should on one hand be chosen as wide as possible, whereas
on the other hand any double audibility has to be avoided in the interest of good intelligi-
bility. In this respect one must remember that with spacings below 17 m there are no
184

184 Wolfgang Ahnert

Figure 6.7 Sound reinforcement system for an outdoor swimming pool area.

Figure 6.8 Simple sound system at a sports ground.


185

System Design Approaches 185

Figure 6.9 Level relations between two loudspeakers arranged at distance a.

echo disturbances to be expected, since then the wave fronts stemming from adjacent
loudspeakers arrive at the listener’s location within 50 ms. With spacings exceeding 17 m a
level difference of > 10 dB between the near and the remote loudspeakers has to be ensured.
Then the nearer and thus louder loudspeaker masks the remote one so as to eliminate the
risk of double audibility, i.e., of echoes.
Figure 6.9 shows the sound propagation of two loudspeakers arranged at a distance a from
each other. At a point between the loudspeakers, i.e., at a distance r1 from the loudspeaker
S1 and r2 from the loudspeaker S2, the sound pressure difference is

L1 − L2 =​20 log (r1/​r2) dB =​∆L.

If the radiated sound pressure levels are equal, no echo effects occur in the region of

r1 − r2 ≤17 m.

Since according to Figure 6.9 the distance between the loudspeakers results a =​ r1 +​ r2, one
obtains a maximum in-​line spacing of

 2 
α = 17m ⋅  ∆L /20 dB + 1 . (6.7)
 10 −1 

With ∆L =​10 dB (as required by Blauert [2]‌) one obtains a =​33 m.


If the level of one of the loudspeakers is reduced, owing to lower sensitivity, different
radiation characteristic or any other reason, the in-​line spacing has to be reduced as well.
This can be quantitatively assessed by adding the level difference to the minimum level
difference of ∆L. If a radiation-​due level difference of 6 dB gets added to the level difference
186

186 Wolfgang Ahnert


of 10 dB required at the 17 m boundary (i.e. ∆L =​16 dB), the maximum loudspeaker spacing
thus gets reduced to a value of a =​23 m.
Hence it follows that for making use of the maximum loudspeaker spacing –​without using
a delay system –​it is convenient to make adjacent loudspeakers radiate against each other
with the same directivity characteristics and under the same radiation angle. Therefore, for
covering a long, narrow area or street, for instance, it is recommendable to use loudspeakers
with symmetric bidirectional (figure-​eight) characteristics.
For covering larger squares with longitudinal dimensions similar to the transversal
dimensions it is, however, necessary to arrange various rows of loudspeakers staggered one
behind the other, arranged in such a way that the array following in the direction of radi-
ation is located where the level of the first array has diminished to such an extent that
double audibility no longer occurs (Figure 6.10).
A sure remedy for avoiding double hearing and which also enables an acoustic orienta-
tion on the action area, e.g., a field of play, consists in the use of delay equipment between
the individual loudspeakers or speaker line-​ups (Figure 6.10). The delay times have to be
chosen according to the sound travel times that are to be expected from the spacing. With
a speed of sound of c =​344 m/​s at 20°C one may calculate a travel time ∆t by means of
equation (6.8)

s [m]
∆t [ ms ] ≈ (6.8)
0, 341

Delay systems enable large distances between the loudspeakers. It is important in this
respect, however, that the backward radiation, i.e. the radiation in the direction contrary
to that of the delay used, is suppressed, since otherwise the risk of echo formation owing to
the travel time plus additional delay is significantly increased. Therefore, loudspeakers used

Figure 6.10 Loudspeaker arrangement for decentralized coverage of a large square. (a) Loudspeakers
with bidirectional characteristics; (b) loudspeakers with cardioid characteristics.
187

System Design Approaches 187


under these conditions are exclusively of unilateral characteristics (e.g., modern digitally
steered line arrays or classic horn loudspeakers).

6.3.1.4.2 TRAFFIC SYSTEMS

On platforms, under protective roofs as well as to a certain extent in large halls room-​
acoustical conditions prevail which lie somewhere in between those of indoor rooms and
open spaces. Owing to the magnitude of the resulting equivalent sound absorption areas
one may generally expect free-​field propagation conditions. On the other hand, it is neces-
sary to diminish the perceived reverberation, which may be excessive, especially with
large halls.
The sound coverage of platforms is one of the most frequent and also most complicated
sound reinforcement tasks for transportation hubs. This holds specifically true if the
platforms are covered by a large, mostly closed dome. Under these conditions the sound
reinforcement system has to meet the following requirements:

• realization of a largely uniform sound level and a consistent timbre along the total
length (150 to 350 m) and width (7.5 to 15 m) of the platform
• minimizing crosstalk to adjacent platforms (distance 8 to 20 m)
• adaptation of loudness to the constantly varying environmental noise levels (65 to
90 dB(A))
• High intelligibility according to the applicable standards

These requirements can in most cases only be realized by means of a decentralized


loudspeaker arrangement. Mostly very directive loudspeakers with a minimum trans-
mission range of 200 Hz to 4 kHz are used. Restricting the transmission range to
the speech range enables better optimization of the radiation characteristics of the
loudspeakers used.
In the past compression driver horn loudspeakers have proved to be optimally suited.
They are installed at a height of 3 to 5 m and at intervals of 7 to 15 m either in the centre
of the platform or, if the platform carries many superstructures, in two rows offset from
each other at the edges of the platform in such a way that they all radiate in the same
direction and that the main radiating direction is aimed at the platform floor just below
the next loudspeaker. With arrangement at the platform edge, the main radiating direc-
tion is to be swivelled by 5° to 10° towards the centre of the platform. The avoidance of
reflections is important for such loudspeaker line-​ups, especially from the end of the line-​
up. Loudspeakers arranged in lighting bands at about 3 m above the platform are directed
straight downwards.
Another possibility is the arrangement of steered line array columns at about 12–​15 m
distance from each other; see in Figure 6.11 the installation of numerous columns on the
platform of a station.
Delay devices are to be used in such a case if the loudspeakers are arranged in intervals of
more than 15 m, but a separation of 25 m should not be exceeded in this case. For economic
reasons, one uses mostly a symmetric layout in which the delay is zero in the centre of the
long-​stretched platform, whereas it reaches its maximum at both ends. But the zero delay
may also start close to a concourse and the maximum delay values are then at the other end
of the platforms.
18

188 Wolfgang Ahnert

Figure 6.11 Installation of passive directed sound columns on the platform of the Munich main
station. © Duran Audio.

An important problem which is difficult to handle is adjusting the optimum loudness.


Excessive loudness may produce disturbances on the neighbouring platforms, mask
announcements given there and irritate the passengers. Insufficient loudness reduces intel-
ligibility in the area to be covered. The noise level is often variable. Measurements taken in
a railway station hall revealed the following average values:

• empty platform: 65 dB(A)


• fully occupied platform (no arrival or departure of a train): 75 dB(A)
• arrival or departure of a train: 88 to 90 dB(A)

On a neighbouring platform (8 m distance), a crosstalk attenuation of 4–​5 dB may be


achieved. This was increased to 8 dB by the shading effect of a train standing on the track
between the two platforms. The loudspeakers were mounted at a height of 5.5 m above the
platform and with an in-​line spacing of 15 m.
These values indicate that a noise-​dependent volume regulation would be appropriate. This
can be realized by using automatic volume control (AVG), e.g., by a digital signal pro-
cessor (DSP) with an ambient noise compensation algorithm, where a microphone is used
to detect the noise at a representative spot, e.g. in the central area of the platform, and the
amplifier sound level is steered by the algorithm.
Another problem consists in obtaining an appropriate microphone levelling for acquiring
the speech signals since very differing sound pressure levels must be expected at the micro-
phone capsule. To avoid overmodulation of the amplification channel as well as insufficient
loudness levels, an automated gain control (AGC) microphone amplifier is recommended
189

System Design Approaches 189


for the platform paging stations, all the more so because these stations are mostly operated
by poorly trained personnel and often in stress situations. Some railway administrations use
prefabricated records for routine announcements which may be recalled as required.
In addition to the usual platforms running along the tracks, terminal or single-​end stations
also have concourses which are mostly broader than 20 to 30 m. These border on one side to
the comb-​like adjoining longitudinal platforms, while on the other side they are limited by
the reflecting walls of the main station building. Coverage of these platforms should there-
fore preferably be achieved by digitally steered line arrays mounted laterally with a spacing
of 20 to 30 m to the walls of the adjoining buildings. Care has to be taken, however, that no
travel time interferences with the loudspeakers of the adjoining platforms occur.
Another solution is the use of highly directed loudspeaker systems, which work
based on wave-​field principles. In section 3.3.5.3 the mechanisms are explained; com-
pare Figure 3.33. Figure 8.1 shows the use of such radiators on a concourse in a main
station. Only one radiator block is needed to cover a 200 m long platform with the required
sound level; compare Figure 6.12.
Coverage of reverberant halls. Such halls as well as passages in stations and airport buildings
are covered similarly to foyers and depending on the ceiling height by a ceiling loudspeaker
installation or by laterally arranged column loudspeakers (see section 6.2.1.1).
The naves in churches may be covered with decentralized loudspeakers attached to the
existing columns and operate without any delay, an arrangement also satisfying aesthetic
considerations (Figure 6.13).

Figure 6.12 Radiator block to cover a platform in the main station in Frankfurt/​Main with sound.
© Holoplot.
190

190 Wolfgang Ahnert

Figure 6.13 Decentralized coverage of a church nave by means of sound columns for speech.

A centrally arranged array system will provide good sound intelligibility as no other
loudspeakers are installed which would produce reverberant sound in the listener areas
covered by the main system. In this case two or more staggered arranged loudspeakers may
provide worse intelligibility values in comparison to one single array. Of course, the main
system may be substituted by two or three arrays in front of the audience.
On the other hand, an even closer path to the listeners in reverberant spaces can be
achieved by means of an individual radiation by loudspeakers integrated in the seat backs or
tables directly in front of the listeners. The loudspeakers are aimed directly at the listeners,
who largely absorb the sound.

6.3.2 Simple Sound Systems


These systems are characterized by the fact that the microphone and loudspeaker are in the
same room (or both are outdoors). These systems serve to amplify original sound sources on
action areas.

• Any positive feedback of the signal must be prevented.


• The listeners have to be oriented to the original source; in the ideal case it also has to
be acoustically localized. This makes the listener concentrate on this source and has a
positive effect on intelligibility.
• The frequency response of the amplified signal has to be adapted as far as possible to
that of the original source (unless other, more important considerations, like intelligi-
bility and absence of positive feedback, speak against it).

Many sound reinforcement systems fall into this category. A system must consist of at least
a microphone, an amplifier and a loudspeaker.
19

System Design Approaches 191


6.3.2.1 Determination of the Acoustic Gain of a Simple Sound System

6.3.2.1.1 PRELIMINARY REMARKS

The following requirements are to be met:

• obtaining the required sound levels


• covering the frequency response corresponding to the utilization profile of the system
and the expectations of the listener
• unconditional absence of positive acoustic feedback and its consequences (timbre
changes, reverberant impression, disturbing noises)

For illustration, a very simple sound reinforcement system consisting of a microphone, an


amplifier and a loudspeaker is shown in Figure 6.14. Positive feedback may happen. The
direct sound from the original source (along path rSH) arrives at the listener’s location H as
well as the direct sound from the loudspeaker (along path rLH.). The original sound arrives
along path rSM at the microphone.
If effective sound reinforcement is to take place, the sound of the loudspeaker arriving
at the listener must be louder than the original sound. In spite of being directional, a loud-
speaker nevertheless also radiates into the entire room, so the amplified sound radiated by
the loudspeaker may quickly return via the path rLM to the microphone, thus giving rise to
positive feedback.
A quantitative assessment of the interrelations between achievable sound reinforcement
and initial positive feedback is given in the next section. It turns out that the sound
energy density and the sound level derived from it are especially suited for deriving these
correlations.

Figure 6.14 Simple sound reinforcement channel position of source S, microphone M, loudspeaker L


and listener H as well as associated angles.
192

192 Wolfgang Ahnert


6.3.2.1.2 CALCULATION OF THE ACOUSTIC GAIN

Based on Figure 6.14 and by using the mathematics in Appendix 6.1 the maximum sound
gain vL is obtained as:

qSM qLH
υ L = R(X) . (6.9)
qLM qSH

with transmission factors:

• qSM between source S and microphone M


• qLH between loudspeaker L and listener H
• qLM between loudspeaker L and microphone M
• qSH between source S and listener H

The corresponding directivity and beaming properties are summarized in the directivity
factors: see Appendix 6.1.
In level notation the acoustic gain index is obtained

VE =​10 lg vL dB

or

VE = L R + LSM + L LH − L LM − LSH ; (6.10)

where LR is the feedback index and Lxy =​10 log qxy is the transmission measure between the
quantities X and Y.
In a room it is mostly possible to disregard the additional level attenuations LSH, LLM
(i.e., here Li =​0 dB). Since with a centralized sound reinforcement the distances listener-​
source rSH and loudspeaker-​microphone rLM are larger than the critical distance Dc =​ √Q rH
prevailing in the room, (6.10) is simplified to

VE = L R + LSM + L LH . (6.11)

For the feedback index LR the values to be expected are between −6 and −15 dB, depending
on the degree of equalization of the sound reinforcement system (recommended LR =​−9 dB).
The values for the sound transmission measures LSM and LLH can be gathered from
Figure 6.15, which shows the dependencies of the sound transmission measures LXY in gen-
eral terms as a function of the ratio rH/​rXY. Parameters are the respective directivity factors
or the coupling factors Q(ϑ).
Further examples are compiled in Table 6.1, in which the directivity factor of the source
is QS =​1 (which means a source with omnidirectional characteristics).
Since in practice it is often possible to approximate the distances rSH and rLH and since
the microphone is directed towards the source and the loudspeaker towards the listener,
(6.11) is simplified after some conversions so as to produce an acoustic gain index of

VE ≈ L R + 10 lg Q ( ϑ ) dB + 20 lg ( rLM / rSM ) dB (6.12)


193

System Design Approaches 193

Figure 6.15 Sound transmission index LXY as a function of the distance ratio rH/​rXY.
Parameter: directivity factor Q(ϑ) or coupling factor Q(ϑ,ϕ).

Table 6.1 Examples of achieved sound reinforcement

Case QL QM rSM rH/​rSM rLH rH/​rLH VE (dB)

1 5 1 3m 2 12 m 0.5 ≈0
2 5 1 0.5 m 12 12 m 0.5 7
3 5 3 3m 2 12 m 0.5 ≈0
4 5 3 0.5 m 12 12 m 0.5 12
5 5 3 10 cm 60 12 m 0.5 26
6 10 3 10 cm 60 24 m 0.25 23
7 10 5 5 cm 120 24 m 0.25 31
8 10 5 5 cm 120 6m 1 43

R(X) =​3 × 10−2 QS =​1 rH =​6 m.

The coupling factor is herewith


1
Q (ϑ ) = . (6.13)
Γ 2L ( ϑ L ) Γ 2M ( ϑ M )

For transducers with omnidirectional characteristic: Q(ϑ) =​1; for directional transducers
the coupling factors can be Q(ϑ) =​50 to 150.

6.3.2.1.3 CONCLUSIONS

For rough calculations of the achievable sound reinforcement in rooms and in the open it is
sufficient to consider only one reinforcement channel: the one with the loop amplification
194

194 Wolfgang Ahnert


nearest to the feedback threshold. Below the feedback threshold of the critical channel,
sound level values do not normally get higher, not even with n reinforcement channels.
Sound level distribution will mostly be better than with only one channel or with a
centralized loudspeaker arrangement in the middle of the room or in the middle of an open-​
air auditorium.
Hence, the procedure for determining the sound level is:

1. determination of the distance relations to be taken into account rH/​rSM, rH/​rLH or rLM/​rSM,
etc. (the indexes mean: S source, M microphone, L loudspeaker, H listener)
2. determination of the actual directivity factor or the coupling factor Q(ϑ)
3. reading of the actual sound transmission measure LXY from Figure 6.15
4. application of (6.11), (6.12) or (6.13)

A more exact calculation of the sound level values can be realized only by means of a
computer simulation program that renounces approximations and takes into consideration
the exact interactions existing between the different operating quantities. For practical
purposes, however, the above algorithm will suffice.

6.3.2.2 Simple Centralized Reinforcement System


The characterizing feature of a centralized coverage is the concentrated arrangement of
the loudspeakers near the action area where the original (primary) source to be amplified
is located. The loudspeakers (secondary sources) are combined in clusters, or line arrays are
used. This arrangement has the following advantages:

• extensive coherence of the wave fronts of the loudspeaker sound (of the secondary
sources) and generally also of the sound emitted by the original sources
• the acoustic orientation of the listeners is largely directed towards the original source (a
mislocalization towards the top may occur near the action area))
• no delayed sound stemming from secondary sources can cause travel time interferences
in the action area

This loudspeaker arrangement offers an enlargement of the critical distance, owing to the
directional effect according to (2.5a):

Dc =​ √QL(ϑ)*rH;

As an approximation, the directional factor ΓL(ϑ) in (2.5a) may be reduced to 1, since the
loudspeaker arrangement is directed towards the audience area. The reverberation radius rH
of one loudspeaker may thus be enlarged by the square root of the directivity factor.
This arrangement enables a relatively large critical distance and thus a large direct-​
sound-​determined (reverberation-​free) area to develop, which results in high speech intel-
ligibility in medium-​sized acoustically difficult rooms.
If the action area is relatively small, e.g. a platform of less than 15 m × 15 m, a speaker’s
desk or a boxing ring, it is possible to ensure acoustic localization without delay equipment
by an appropriate arrangement of the loudspeakers. A precondition in this regard is that
the precedence effect is considered, i.e. that the direct sound from the original source must
reach the listener before the amplified signal of the central loudspeaker array and that the
195

System Design Approaches 195

Figure 6.16 Use of built-​in arrays for source localization.

level of the original direct sound must not be more than 6 to 10 dB lower than that of the
amplified signal. The level of the original source often does not suffice, therefore support
loudspeakers for boosting the original sound are required in the range of the source. A typ-
ical example is a loudspeaker built into the speaker’s desk; see Figure 6.16. Additionally, a
delay of more than 30 to 50 ms has to be avoided between the first wave front arriving at
the listener from the original source or its support loudspeaker and the wave fronts of the
amplifying sources. Figure 2.21 illustrates the time and level conditions which have to be
considered in this case.
For succeeding without delay equipment, the following additional requirements are
applicable:

• The distance loudspeaker–​listener rLH should be greater than the distance original source–​
listener rSH (Figure 6.17). This condition must be given for the greatest possible distance
between source and listener.
• A sufficient loudness of the original source has to be ensured (if needed by support
loudspeakers).

Apart from enabling a broad coincidence of the visual and acoustical source localization,
the centralized loudspeaker arrangement offers the advantage that unintentional enhance-
ment of spaciousness does not occur, as only slight travel-time differences exist for the
listener.
This condition becomes evident in Figure 6.18. One sees that with a large time diffe-
rence of the sound arriving at the lateral front seats (H1) the effective critical distance DC1
of the radiators becomes reduced by the parallel arrangement, since loudspeaker L2 increases
the reverberant component in the coverage area of the other loudspeaker L1. For a listener’s
location H2 in the central or rear area of the hall this is not the case, since the travel time
from the two loudspeakers L1 and L2 is almost equal.
196

196 Wolfgang Ahnert

Figure 6.17 Geometric relations in the case of centralized coverage without delay equipment.

Figure 6.18 Sound-​field relations with different loudspeaker arrangements. L1 and L2, loudspeakers
at the stage portal (left and right) with the critical distances Dc1 and Dc2; L3, supporting
loudspeaker above the balcony with critical distance Dc3.

The critical distance depends not only on the room-​acoustical and technical loudspeaker
data, but also on the arrangement of the loudspeakers. A delayed loudspeaker L3 in the
rear area of the hall, for instance, always has a clarity-​reducing effect for the front seats. Its
energy therefore must be radiated directly onto the target audience area so that it becomes
largely absorbed. At a rear seat H3, however, the front loudspeakers may also contribute
to enhancing the intelligibility of the signal, if the signal is fed to the local loudspeaker
with such a delay that it arrives at the listener slightly later than the signals from the front
loudspeakers.
This problem does not occur with installed L-​C-​R line arrays. Just the travel-time
difference original source–​arrays (small distance) and arrays–​listener (small to large dis-
tance) as well as the uniformity of sound level distribution must be observed. All these
conditions may be verified in advance with modern simulation programs; compare
Figure 6.19.
197

System Design Approaches 197

Figure 6.19 L-​C-​R arrays in a larger conference hall (EASE simulation).

6.3.3 Multi-​Channel Systems with and without Delay Units


This refers to systems which present themselves as decentralized loudspeaker systems
without delay, but additionally may use delay systems for balconies or other areas. Strictly
speaking, two or more loudspeaker arrays cannot be considered as a centralized system. If,
however, these elements are all arranged in one plane (e.g. in the stage portal area on the
left and right and above the portal), we may still speak of a centralized system without delay
units, if an extensive coherence of the wave front of the radiated sound is given. In practice
it is often required to use additional delayed loudspeakers for supplying shadowed listeners’
locations, e.g. on or under balconies. This causes not just localization problems, but also
echo phenomena, if the travel-time difference between the signals from various delayed
loudspeakers and from the remote main loudspeakers is greater than 50 ms. Mislocalization
and reduction of definition may occur. Figure 6.20 illustrates these problems. In Figure 6.20a
the level of a ‘supporting’ loudspeaker L2 prevails at the listener’s seat.
This implies two problems.

(a) Although the source is located near a centralized loudspeaker, it is not the signal from
the source, but an amplified sound signal coming from the infill loudspeaker which the
listener perceives as the originating point (cause: level L2 > L1 and distance l2 < l1) (pre-
cedence effect; see section 2.3.4).
(b) With increasing distance between main and infill loudspeakers, definition decreases
and with distances above 17 m the risk increases that the signal arriving from the
loudspeakers is perceived separately by the ear and is considered as echo.

While case (a) concerns rather the sound impression and the incident direction
(mislocalization and thus confusion are the consequence of diverging acoustical and visual
impressions), case (b) has to be avoided, since the echoes lead to poor speech intelligibility
198

198 Wolfgang Ahnert

Figure 6.20 Use of supporting loudspeaker for coverage of a listener’s seat. L1, central loudspeaker;
L2, loudspeaker near the listener. (a) Increased level at the listener’s location because of
close supporting loudspeaker. (b) Echo elimination by travel time compensation: ∆t =​(l1
− l2)/​c. (c) Acoustical localization with a delay slightly longer than the transmission
path: ∆t =​(l1 − l2)/​c +​15 ms.

and non-​consistent sound images. It is required to operate the infill loudspeakers with
time delay to avoid this. The delay times required for travel-time compensation may be
calculated by using eq. (6.8) or Table 5.3.
If just the acoustical travel-time difference source–​listener and infill loudspeaker–​listener is
compensated, a phantom source is located somewhere between the two loudspeakers when
the levels and spectra of both arriving signals are approximately equal.
By introducing a further delay of 15 to 20 ms, localization clearly jumps over to loud-
speaker L1, provided the level of L2 is not more than 6 to 10 dB higher than that of L1
(Figure 6.20c).
The echo elimination is obtained in both cases, i.e., either by the travel path compensa-
tion (Figure 6.20b), or also by an additional delay for ensuring localization (Figure 6.20c).
Currently, most multi-​channel systems prefer to use delay units for smooth coverage of
the audience areas.
Decentralized systems are characterized by the use of sometimes a great number of indi-
vidual loudspeakers at close range to the listeners, so as to achieve high speech intelligibility,
eliminating the reverberant sound through direct irradiation. A consistent product of this
decentralization is the loudspeaker built into the back of the seat. Such systems in which a loud-
speaker is assigned to every listener at close range have proved their outstanding efficiency
for congresses with large numbers of attendants. Although the front seat is mostly located
in the direction of the platform so that the visual contact to the speaker is not impaired,
the use of delay equipment is sometimes advisable, e.g. to avoid echo disturbances, if such a
system is to be operated together with a portal system.
With a decentralized arrangement of the loudspeakers, it is generally not possible for the
‘internal travelling time’ of the system, i.e. the travel-time difference between the nearest
and the furthest audible loudspeaker, to be kept below 50 ms. Therefore, either the infill
19

System Design Approaches 199


loudspeakers at the listeners’ seats have to produce so high a level that the sound from the
remote loudspeakers becomes negligibly small (so as to be drowned in the ambient sound)
or the remote loudspeakers, provided with a high backward damping, have to be operated
with delay units so as not to cause any disturbances in the range of the loudspeakers near
the source.
With special decentralized arrangements, like backrest loudspeakers or coverage of single-​
storey rooms by means of ceiling loudspeakers, one always reckons that the loudspeakers
near the listener are critical for the definition and that the remote ones are masked. Backrest
loudspeakers increase the diffuse ambient sound component significantly. A loudspeaker
radiating towards the listener is more favourable than one radiating upwards or towards
the front.

6.3.4 Improving the Naturalness of Sound Reproduction


Unless intended for producing special effects, a good sound reinforcement system should be
designed in such a way that its effect does not contradict the familiar acoustical perceptions.
This requires the following conditions:

• The reproduction loudness should be adapted to the environment (the noise level).
• No unusual non-​linear distortions are perceptible.
• No additional disturbing noises (noise, cracking, etc.) are produced.
• No feedback phenomena occur.
• The occurring linear distortions should correspond to the standard room-​acoustical
conditions.
• The visual and acoustical localisation of the real or imaginary sources (primary sources)
should coincide.

Except for the last requirement, all problems can be solved by technical measures. Apart
from technical factors, psychoacoustical properties are of primary importance for localiza-
tion. The fundamentals hereof were dealt with earlier in Chapter 2, the application of the
precedence effect (see section 2.3.4) is discussed here.
Figure 6.21 explains the fundamental effect of time delay. Without time delay one
localizes a phantom source in the centre between the two loudspeakers. If the signal is
delayed in one channel, the loudspeaker of the non-​delayed channel is localized, and only
if the level of the delayed signal is increased by more than 6 to 10 dB (see section 2.3.4,
Figure 2.21) does the localization point (the phantom source) return to the centre between
the loudspeakers and finally move over to the delayed loudspeaker [3]‌.
For finding the required delays, eq. (6.8) can be used. Starting from one loudspeaker or
original source providing the reference level, the times and levels required by the individual
loudspeakers for ensuring localization of the source or the reference loudspeaker are to be
chosen with reference to the source location and according to the travel time required in
accordance with distance from the listener.
This is illustrated by Figure 6.22. The voice of a talker at the desk is radiated by a
loudspeaker integrated in the desk. The sound level of this loudspeaker does not have
to prevail over the sound reinforcement system of the hall, but only to supply the localiza-
tion reference. The loudspeakers integrated for instance into the seats and used as the
main sound reinforcement system are supplied via delay devices according to their distance
from the speaker’s desk and produce a sufficient sound level thanks to their proximity to
20

200 Wolfgang Ahnert

Figure 6.21 Explanation of localization and phantom source formation as a function of the time delay
of one partial signal.

Figure 6.22 Acoustical localization of a sound source (in the speaker’s desk) by means of a delayed
sound system (schematic).

the listeners. Acoustically localized, however, is the speaker’s desk, since its signal arrives
earlier, though lower in level at the listener’s location. In the rearmost rows it is possible
for the less delayed front loudspeakers to serve as a localization reference, so that also
here localization is towards the front, although the desk loudspeaker is perhaps no longer
audible.
201

System Design Approaches 201


6.3.4.1 Arrangement of the Loudspeakers
The acoustical localization of the reference sources (which may be not only original sources
like talkers, singers, orchestras, but rather reference or support loudspeakers in the case of
reproduction of playbacks or weak-​sounding sources) depends decisively on the location of
the loudspeakers relative to the reference source and listener.
For small action zones, a centralized arrangement of the loudspeakers near this area may
be considered as a satisfactory solution. Because of the expected level attenuation versus
distance, this centralized arrangement can be used only in small halls or outdoor auditor-
iums. Although depending largely on the shape of the room and the arrangement of the
audience in front of the stage, the maximum hall volume for this kind of arrangement
should be in the region of 8000 m³.
Acoustical localization on a specific area of the action zone is often feasible for smaller
halls (e.g. theatres) by a multi-​channel intensity sound reinforcement system, which enables
localization in halls with a width of up to 15 m.
In larger rooms a sufficient sound level distribution and definition of reproduction can
be achieved by means of a centrally supported loudspeaker arrangement appropriately
distributed in both width and depth. This also holds true for installations used for pop and
rock music, although in this genre fully centralized sound reinforcement systems are com-
monly used. Since large-​surface and thus heavily directive arrays are used here, there are in
the near field excessively high sound levels capable of causing permanent auditory defects
among performers and listeners, while at greater distances the levels and the spectral com-
position of the signals no longer correspond to the desired values.
If a centrally supported or even decentralized loudspeaker arrangement is used and an
acoustical source localization is required, however, it is necessary for the loudspeakers, unless
they are in immediate proximity to the source, to be operated with delay. In the following
we are going to compile the particularities of sound reinforcement systems ensuring acous-
tical localization with or without delay equipment.

6.3.4.2 Single-​Channel Sound Delay


If the action area containing the sound sources is not wider than 12 to 15 m and not
deeper than 10 m, travel-time compensation can be adequately achieved by means of
a single-​channel delay system to which the staggered arranged loudspeaker arrays are
connected with different delays. If the powerful main loudspeaker arrays are at a certain
distance from the audience, e.g. above a stage portal, localization of the sources can be
accomplished under observance of the necessary level conditions without pre-​delay of the
main loudspeakers.
With larger action areas this is no longer possible. To enable acoustical localization
under these conditions, a pre-​delay of the main loudspeakers is required. The pre-​delay has
to be adjusted in such a way that the first wave front of the signals stemming from the ori-
ginal source (or its support loudspeaker) located on the stage arrives before those from the
loudspeakers.
With wider action areas, single-​channel sound reinforcement systems may give rise to
problems, since the travel time of the sound from one side of the stage to the other cannot
be compensated by a basic delay. This delay time would have to be so long that a noticeable
pause between the original signal and the amplified signal would occur. This would result in
an echo-​like double hearing.
20

202 Wolfgang Ahnert


6.3.4.3 Multi-​Channel Procedures with Delay Systems
To avoid the drawback of poor sound-​level balance increasing along with the stage width
and to provide the possibility of acoustical source localization on central parts of the stage
area, multi-​channel delay systems were developed more than 40 years ago. The action areas
are therefore subdivided and a delay channel is assigned to each of these ‘source areas’. Via
these delay channels the portal loudspeakers as well as the loudspeakers arranged in the
depth of the room are supplied with sound signals. At that time this particular method
to distribute sound signals was introduced under the name ‘Delta Stereophony System
(DSS)’ [4]‌.
With this procedure the stage area is subdivided into several source zones (Figure 6.23)
and a delay channel is assigned to each of them. The signals originating from the source
zones are fed via the corresponding delay channels to all the loudspeakers in the room
and some of the loudspeakers arranged on the stage. The delay times are appropriately set
considering the travel time of the sound between the individual loudspeaker arrays and lis-
tener seats –​the first wave front of each signal emitted by the reference source originates
from the assigned source zone. The calculation required to this effect has to consider not
only the critical border areas of the source zones, but also those of the audience areas [5]‌.
To enable the simultaneous operation of several source areas, a mixing matrix is
interconnected upstream of the power amplifiers assigned to the loudspeakers.
It has to be ensured that the precedence effect which determines the acoustical localiza-
tion operates throughout the transmitted frequency range. For moving sources over the
participating sub-​stages, a so-​called tracker may be used to blend the source signal from one
delay setting to the adjacent one.

Performance area
Sound-mixing console DSS hardware Filters loudspeaker system
µ l
Simulation loudspeakers
g
Q t4 Σ µ
t3 Q1 l Near-field loudspeakers
Δt µ h
t2
t0 l Proscenium system
Qn
t1 µ µ p Reception area system
Δt
H t5 Σ l
Δψ l r

Tracking system

min Δt max Δtb = t0 – tn

(a)

Remote control/computer control


(b)

Figure 6.23 Schematic layout of a delta-​stereophonic sound reinforcement system. (a) Working


principle.
ϕ angular deviation between optical and acoustical direction of perception without DSS
Q original or simulation source
t0 acoustic travel time of the original sound to the listener
tn acoustic travel time of the loudspeaker sound to the listener
∆tn electrical delay time for the respective loudspeakers
H listener
(b) Equipment structure of the DSS.
203

System Design Approaches 203


Under the name ‘source-​oriented reinforcement (SOR)’ a new vocal localization tech-
nique and automation system has been developed [6]‌. This system provides advanced
solutions in audio show control and sound effects management, amplified opera, theatre
sound automation, corporate event sound engineering, live surround sound, system design
consultancy and production support. A sophisticated tracker technique is used to localize
the talker or singer on stage.
Multiple 2D or 3D tracking zones are shown on the map as rectangular shapes, any
complex-​shaped zone to be covered on stage. This is useful for extending tracking zones
onto thrusts and ramps, or up staircases and stage lifts.
Clear coloured icons show the location and movement of performers in 3D space; in the
photos the positions of the trackers are shown (circles); see Figure 6.24.
Small loudspeakers are sometimes located hidden in the sub-​stage areas to support
the level of the original source in this area. The spatial localization of moving and fixed
signal sources is realized by tracking devices with the corresponding software; compare
Figure 6.25.
These tracker-​based delay setting and localization infrastructures are used world-​wide.
Based on the location the delay settings of all participating loudspeakers or arrays to adapt
levels and delays may be calculated by computers instantaneously.

Figure 6.24 Source-​oriented reinforcement system. (a) Tracking and delay localization zones for mobile
sound sources. (b) Placement of hidden positions of installed trackers. (c) Visualization of
12 localization zones on a stage.
204

204 Wolfgang Ahnert

Figure 6.24 Continued
205

System Design Approaches 205

Figure 6.25 Loudspeakers on stage for source support and tracking procedure with corresponding
software. (a) Stage area with hidden support loudspeakers in a stage design in the Albert
Hall performance. (b) Computer graphic for visualization of the tracking procedure of a
talker or singer on a stage. (c) One tracker placement on stage for a performance in the
Szeged Drone Arena.
206

206 Wolfgang Ahnert

Figure 6.25 Continued

Over the last few years the world of ‘immersive sound’ coming from studio and cinema
applications has found its way into the sound reinforcement business. Hereby the sound
systems give the listener the impression of being enveloped by sound. Loudspeakers not only
on or above stages but also around the listener areas are used to create a complex spatial
sound impression for the listeners. Tracker devices allow the simulation of moving sound
sources on stage. Figure 6.26 shows such a loudspeaker arrangement for a ‘Sound Scape’
sound reproduction.

6.3.5 Multipurpose Systems


Multipurpose halls have the most difficult demands on the acoustic layout of the hall. The
multipurpose use of a hall is quite often neglected. So, a pure concert hall is often used for
speech reproduction such as conferences or presentations. On the other hand, sometimes a
drama theatre or sports arena serves as a concert hall. In both cases the event suffers from
acoustic deficiencies. Therefore, we should categorize the halls as follows:

1. Pure classic concert halls with highest acoustic criteria (less than 5% of existing halls)
Speech performances by using sophisticated sound systems, the performance quality of
speech reproduction is limited, concert acoustics have priority, the reverberation time
up to 2.2 s, depending on volume.
newgenrtpdf
207
System Design Approaches 207
Figure 6.26 d&b Soundscape 360° system.
208

208 Wolfgang Ahnert

2. Multipurpose halls with high acoustic quality suitable for classic concert performances
(around 20% of existing halls)
Classic concerts may be performed in high quality, and speech-​related events like con-
gress sessions or company meetings will happen quite often. Also chamber music events
or jazz performances might happen. Reverberation time should not exceed 1.7 to 1.9 s.
3. Pure multipurpose halls for speech and music performances with good acoustic quality
(around 50% of existing halls)
Often existing city halls or similar assembly halls for every type of event and gathering,
from speech presentations to classic concerts, the latter with less acoustic success.
Average reverberation time does not exceed 1.6 s.
4. Assembly halls mainly used for speech performances (around 25% of existing halls)
Mainly used for speech, but chamber music performances are possible too. Most halls
of this type handle electroacoustic supported music reproduction well, including rock
and pop concerts. Reverberation time varies between 1.0 and 1.4 s depending on
volume.
5. Hall types 1 and 4 are not really multipurpose halls. In type 1 a sound system is to be
designed for announcements and speech reproduction and in type 4 a sound system
may or may not be required for speech; music performances are supported by sound
systems permanently or temporarily installed.

As 70% of all existing larger halls are either of type 2 or 3, these will be explained in more
detail.
Type 2 halls demand extra efforts in acoustics. Here the room-​acoustic design will
anyway be carefully checked by computer simulation and for larger halls even by scale-​
model measurements. The sound systems must be designed carefully as well, so the same
computer model should be used to verify the sound system design for good coverage and
high speech intelligibility.
The design of Type 3 halls is not less complicated if good acoustic properties are to
be achieved. Any standard solutions must be avoided. In close cooperation between the
architect and the acoustician the room-​acoustic properties of the hall must be designed.
Depending on the size and the shape of the hall computer simulation is recommended, espe-
cially if glass structures or rounded wall parts dominate. Also, a sound system should not be a
‘standard’ one. Depending on seat count, hall geometry and stage structure, different sound
systems may be advisable [7]‌. Quite often the architect wants to hide the loudspeakers; this
is often possible, but the sound system design must ensure that the covering surface in front
of the loudspeaker is acoustically entirely transparent. For rock and pop concerts in these
halls the artists might bring with them their own preferred equipment. In most of these
cases the sound quality is often worse in comparison to a hall-​related, firmly installed sound
system. So, this ‘house’ system is often additionally needed to support the rented system to
achieve good sound quality.

6.3.6 Assistive Listening Systems (ALS)


These systems support listening for people with full or partial hearing loss and are available
in two technological flavours:

• induction loops inside the floor and by using visible personal induction receivers
• infrared transmission to visible infrared receivers
209

System Design Approaches 209


Modern in-​ear receivers are more or less invisible but a connection to induction loops or
infrared transmitters is not possible.

6.3.6.1 Induction Loops


Induction loops for hearing-​impaired persons have been in use since the 1930s, mainly used
for speech. The simple signal transfer works with induction loops which are installed in a
main floor, as shown in Figure 6.27 [8]‌.
The sound of a talker is picked up by a microphone and the output signal is led to
a constant-​current amplifier. This amplifier drives a signal to an induction loop, which
surrounds the listener area. This perimeter loop creates a magnetic field. For better field

Figure 6.27 Use of induction loops for compensation of hearing loss in a theatre main floor.
210

210 Wolfgang Ahnert


strength several loops should be installed. The magnetic field containing the acoustic signal
is picked up by an induction coil inside the hearing aids of the listeners and is demodulated
as an audio signal; equalization can be applied as required. This audio signal is then fed into
the ear canal.
It is understandable that the inductive receiver and demodulation module cannot be
placed in an in-​ear receiver of less than about 10 mm size. Therefore, modern receivers for
induction transfer of audio signals are large (30–​40 mm long) compared to in-​ear hearing
aids and visible even when placed behind the ear pinna.
Induction loop systems are not suitable if:

• there is substantial background noise, which will reduce the effectiveness of any
assistive listening system
• there is no practical way to install the loop cable
• there is no sufficiently good-​quality audio source available
• electrical instruments such as electric guitars or dynamic microphones are used within
the area covered by the loop.

6.3.6.2 Infrared Transmission


A radiator of modulated infrared signals and mobile receivers are used to transmit audio.

Figure 6.28 Infrared field strength coverage simulation on listener areas of a lecture hall with two
SZI1015 radiators in blue.
21

System Design Approaches 211

Figure 6.29 Sennheiser SZI1015 radiator/​modulator and infrared receiver.

In Figure 6.28 a computer model of a theatre is shown equipped with two SZI1015
radiators (Sennheiser) in the broadband range 30 kHz –​6 MHz; see Figure 6.29 left. We
discover a smooth coverage of 25 dB and more on all listener seats in the theatre. If the
influence of noise is high these strength values are insufficient, and stronger infrared radi-
ator settings may be selected or the position of the radiators may be modified, to be verified
by means of computer routines.
Mobile infrared receivers are used to convert the modulated infrared signal into an audio
signal; see Figure 6.29 on the right. Languages may be switched; multi-​language support is
possible.

6.3.6.3 FM Transmission
Tour-​guide receivers are also in use as ALS systems. They are designed for applications such
as guided tours, multi-​language interpretation and command applications, for example in
the fields of sports, and also for assistive listening. As an example, the EK 1039 by Sennheiser
(see Figure 6.30) can be used with corresponding headphones or personally worn induction
slings. The handling and operation of the receiver are very simple and intuitive. Speech is
highly intelligible thanks to an audio bandwidth of 15 kHz.
Reliability:

• 75 MHz switching bandwidth –​the tour-​guide system with the widest range
• Adaptive diversity technology for improved RF reception quality
• Reliable operation with standard AA cells or rechargeable batteries
21

212 Wolfgang Ahnert

Figure 6.30 FM receiver EK 1039.

Appendix 6.1

Details of the Acoustic Gain Calculation


The acoustic gain is calculated by eq. (6.9):

qSM qLH
υ L = R(X) . (6.9)
qLM qSH

with transmission factors:

• qSM between source S and microphone M


• qLH between loudspeaker L and listener H
• qLM between loudspeaker L and microphone M
• qSH between source S and listener H
213

System Design Approaches 213


Here are used the following transmission factors:

(a) between source S and listener H


2
r 
qSM = QSH ( ϑ SH , ϑ S )  H  ⋅ 10 − DSH /10 dB + 1 (6A1)
 rSH 

(b) between source S and microphone M


2
 r 
qSM = QSM ( ϑ SM , ϑ S )  H  ⋅ 10 − DSM /10 dB + 1 (6A2)
 rSM 

(because DSM ≈ 0 dB, the partial factor can mostly be ignored)

(c) between loudspeaker L and listener H


2
 r 
qLH = QL ( ϑ H )  H  ⋅ 10 − DLH /10 dB + 1 (6A3)
 rLH 

(d) between loudspeaker L and microphone M


2
 r 
qLM = QLM ( ϑ L , ϑ M )  H  ⋅ 10 − DLM /10 dB + 1 (6A4)
 rLM 

(For distances and angles see Figure 6.14.)


The corresponding directivity and beaming properties are summarized in the directivity
factors (see eqs. (3.15) and (2.5a))

QS ( ϑ SH ) = QSH Γ 2S ( ϑ SH ){QPL } factor of the source towards the listener


QL ( ϑ H ) = QL Γ 2L ( ϑ H ){QPL } factor of the loudspeaker towards the listener
QL ( ϑ L ) = QL Γ 2L ( ϑ L ){QPL } factor of the loudspeaker towards the microphone
QS ( ϑ SM ) = QS Γ 2S ( ϑ SM ) factor of the source towards the microphone
QM ( ϑ S ) = QM Γ 2M ( ϑ S ) factor of the microphone towards the source
QM ( ϑ M ) = QM Γ M ( ϑ M )
2 factor of the microphone towards the loudspeaker

as well as in the coupling factors

QSM ( ϑ SM , ϑ S ) = QS ( ϑ SM ) QM ( ϑ S ) factor between source and microphone


QLM ( ϑ L , ϑ M ) = QL ( ϑ L ) QM ( ϑ M ) factor between loudspeaker and microphone

If the loudspeaker is mainly aimed at the audience area so as to provide the advantage of
exciting the internal reverberation of the room less, the angle-​dependent directivity factor
of the loudspeaker is increased by the equivalent beaming factor QPL.
214

214 Wolfgang Ahnert


In level notation one obtains the acoustic gain index

VE =​10 lg vL dB or

VE = L R + LSM + L LH − L LM − LSH (6.10)

References
1. https://​ease.afmg.eu or www.catt.se or www.odeon.dk.
2. Blauert, J. Spatial Hearing. Cambridge, MA: MIT, 1983.
3. Haas, H. On the influence of a single echo on the audibility of speech (in German). Acustica 1
(1951) 2, 49–​58.
4. Steinke, G., Ahnert, W., Fels, P., and Hoeg, W. True directional sound system orientated to
the original sound and diffuse sound structures -​new applications of the Delta Stereophony
System (DSS). Presented at the 82nd AES Convention, 1987, March 10–​13, London, Preprint
No. 2427.
5. DP 394/​584: Procedure and arrangement for a locally as well as temporally variable signal dis-
tribution over a large sound reinforcement system, especially for audiovisual performances in
auditoriums, preferably dome-​shaped rooms.
6. www.outbo​ard.co.uk.
7. Adelman-​Larsen, N.W. Rock and Pop Venues. Springer, 2014.
8. Induction and Hearing Loop System Design and Amplifiers (ampetronic.com).
215

7 Speech Intelligibility of Sound Systems


Peter Mapp

Intelligibility is the single most important factor when it comes to designing and operating
sound systems intended for amplification and distribution of speech. Indeed, if such a system
isn’t intelligible, then there is no point in having it. It should be realised that speech intelli-
gibility and sound quality are not the same thing, as it is quite possible to have a fairly poor-​
sounding system that is highly intelligible (consider a telephone for example); conversely,
an expensive Hi-​Fi system, whilst sounding wonderful in a domestic setting, is unlikely to
provide adequate intelligibility in a reverberant swimming pool, railway station concourse
or large reverberant cathedral. Intelligibility, however, is a not a binary condition flipping
between intelligible and unintelligible but instead a gradual change occurs and the level
of intelligibility required for one application may not be suitable for another. For example,
the degree of intelligibility required for passenger information announcements in a railway
station would not need to be as high as that needed when listening to a complex lecture,
theatrical drama or in a law court, where every fragment of speech needs to be easily and
fully heard.
The aim of this chapter is to explore the factors that affect speech intelligibility and
see how these affect a potential design and to then establish how intelligibility can be
measured and quantified. However, these processes are interactive and some knowledge
of how intelligibility is rated is required in order to understand certain design aspects and
implications. Several descriptors are frequently used to try and express what is meant by
intelligibility, such as ‘audibility’, ‘clarity’, ‘articulation’ and ‘understanding’. The ability
to understand a particular piece of speech, however inherently intelligible, if spoken in a
language that the listener is not familiar with or has fluency in, may not be understood.
ISO 9921 [1] defines intelligibility as ‘a measure of the effectiveness of understanding
speech’.

7.1 Factors that Affect the Intelligibility of Sound Systems


Intelligibility is a complex, multidimensional parameter and by its nature is highly non-​linear;
this can make the design process quite exasperating and almost unfathomable. However, an
understanding of the nature of speech, how we perceive it and how buildings and spaces
with their associated acoustic environments can affect intelligibility provides a useful basis
for attaining an understanding. Whilst the discussion below is primarily concerned with
PA and sound systems, it is generic for most audio transmission channels, including video
conferencing or wireless transmission. Figure 7.1 diagrammatically summarises the intelli-
gibility transmission chain.

DOI: 10.4324/9781003220268-7
216

216 Peter Mapp


Primary Factors

• Sound system bandwidth and frequency response


• Absolute sound pressure level (SPL) /​loudness
• Signal to-​noise ratio (SNR)
• Reverberation time (RT)
• Volume, floor area and shape of the space
• Distance between listener and loudspeaker(s)
• Directivity of loudspeaker = Direct to Reverberant Ratio
• Number of loudspeakers simultaneously operating

Secondary Factors

• Uniformity of sound coverage1


• Sound focussing and discrete (late arriving) reflections
• System distortion (including headroom/​signal clipping)
• System equalisation
• Direction of primary sound arrivals (loudspeaker location) in front, above to side etc.
• Direction of interfering noise (particularly relative to source of sound)
• Talker type (male/​female/​accented)
• Vocabulary (complexity) and context of speech information
• Talker enunciation (articulation) and rate of delivery
• Talker /​listener first language
• Listener acuity (hearing ability)
• Talker microphone technique
• Signal to-​noise ratio at microphone /​microphone noise rejection characteristics
• Distance between talker’s mouth and microphone
• Visual contact between the listener and talker

Figure 7.1 Simplified sound system or audio channel intelligibility chain.


217

Speech Intelligibility of Sound Systems 217


• Signal-​processing effects (e.g. compression, automatic gain control (AGC), limiting,
echo cancellation (AEC), latency)
• Electronic interference of audio signal (hum, noise, signal continuity /​interruption)

As can be seen from the above list, there are a large number of factors that can potentially
affect the perceived intelligibility of a sound reinforcement system or an audio transmission
channel.

7.2 Discussion of Intelligibility Factors and Implications for Successful


System Design

7.2.1 Primary Factors

7.2.1.1 Sound System Bandwidth and Frequency Response


It is important that the bandwidth of the system is wide enough to ensure that the main
speech components can be appropriately transmitted. It should be noted that whilst often
desirable, a wide bandwidth is not always required with regard to intelligibility (consider for
example the nominal telephone bandwidth of 300 Hz –​3400 Hz, which provides accept-
able intelligibility but is of poor sound quality). As will be seen later, the speech spectrum
essentially covers the range from 100 Hz to approximately 8–​10 kHz but some frequencies
are very much more important than others with respect to intelligibility. The bandwidth is
defined by all the components in the chain as these are effectively in series and so the final
bandwidth is a composite of the microphone (or message store), electronic signal chain
(which may be limited if it contains wireless elements, VOIP or some forms of data com-
pression technique) and the frequency response of the loudspeaker. Each element may be
considered to act as a bandpass filter, albeit with some ripple in the passband with the
overall bandwidth being the sum of all the filters.

7.2.1.2 Absolute Sound Pressure Level (SPL) /​Loudness


For optimal intelligibility, the broadcast level (SPL) of an announcement should lie within
the range from approximately 55 to 80 dBA –​assuming that adequate signal to-​noise ratio is
maintained. In practice, such a range may not be realisable due to the level of the ambient
noise but at speech levels above about 80 dBA intelligibility will gradually decrease due
to the non-​linearity of our hearing system. When assessing intelligibility (as discussed in
section 7.3) it is important to take the absolute SPL into account as well as the overall
signal-to-​noise ratio.

7.2.1.3 Signal to-​Noise Ratio (SNR)


For speech to be intelligible, it is essential that the speech components (words, syllables and
phonemes) can be distinguished or extracted from the noise (see Figures 7.5, 7.15 and 7.16).
This can be expressed in terms of the signal-to-​noise ratio (speech to ambient noise ratio).
In general terms a minimum value of 6 dB (A weighted) SNR and preferably 10 dB SNR
should be aimed for –​though in practice it is rather more complex than this as it depends on
both the spectral and temporal characteristics of the noise. However, it should be noted that
a signal-to-​noise ratio of 0 dB (A weighted) can be surprisingly intelligible and adequate
218

218 Peter Mapp


Table 7.1 Reverberation time and sound reinforcement system design

RT Implication for sound system design

<1 s Good intelligibility should be achievable


1–​1.5 s Good intelligibility achievable –​some care needed for high-​quality systems
1.5–​2 s Careful design and directional loudspeakers required
2–​2.5 s Directional loudspeakers needed –​general limit for distributed speaker systems
unless highly directional and temporally aligned
>2.5 s Directional speakers required –​great care needed
>3 sec Intelligibility will be limited –​even with highly directional (passive) loudspeakers.
Reasonably good intelligibility should still be attainable with directional and
steered line arrays
>4 s* Restricted intelligibility. Intelligibility is still possible –​especially in diffuse
spaces and when using short listener–​loudspeaker distances. Directional
(preferably steered) line arrays essential in order to obtain the best possible
result.
> 6 s* Restricted intelligibility. Adequate intelligibility still may be possible –​especially
in diffuse spaces and when using short listener–​loudspeaker distances. Steered
directional line arrays essential in order to obtain an optimal result.

* Highly dependent on the volume and diffusivity of the space.

enough to understand basic announcements and messages –​albeit with the need to listen
carefully with slightly increased effort. (This assumes that the received speech is free from
reverberation or other degradations.)

7.2.1.4 Reverberation Time (RT)+​


Although intelligibility is dependent upon the reverberation time of a space, there is no
direct relationship connecting the two as it is also dependent upon many other factors such
as the volume of the space and the directivity of the loudspeaker. However, what can be
said is that the longer the reverberation time is, the greater the influence it has on intelligi-
bility and generally the more difficult it becomes to achieve good intelligibility. To put this
in perspective, Table 7.1, an empirical table derived by the author, summarises the effect of
RT on sound system design.

7.2.1.5 Volume, Floor Area and Shape of the Space+​


To a certain extent, the volume of the space can help counteract longer reverberation
times, as for a given RT the reverberant signal level (SPL) is reduced the greater the volume
involved.

7.2.1.6 Distance between Listener and Loudspeaker(s)+​


Not only is intelligibility related to the distance between a listener and loudspeaker but
in most cases this follows an inverse square law relationship (i.e. doubling the distance
between the listener and loudspeaker reduces the intelligibility by a factor of four, though
in practice a limiting distance occurs below which the intelligibility ceases to decrease).
219

Speech Intelligibility of Sound Systems 219


7.2.1.7 Directivity of Loudspeaker +​

Increasing the directivity of a loudspeaker (narrowing its coverage) reduces the level of
reverberant excitation within a space as the radiated sound is confined into a smaller area
and nominally away from reflective surfaces such as the ceiling and potentially the walls,
depending on how the loudspeaker system is set up and aimed.

7.2.1.8 Number of Loudspeakers Operating Simultaneously+​


Where more than one loudspeaker is used to cover a space/​listener area, such as in a
distributed sound system, effectively only one or two loudspeakers will actually provide
coverage to a given listener. The remaining loudspeakers will be covering other listeners/​
areas but a part of the sound that they radiate will be heard by all the listeners either dir-
ectly or indirectly as they excite the space. This unintended sound is effectively perceived
by listeners as ‘noise’ and acts to increase the reverberant level or noise floor (i.e. reduce the
direct-to-​reverberant ratio).

The factors highlighted+​ combine to produce the direct-to-​reverberant ratio (DRR), which
ideally needs to have a positive value in order to yield good intelligibility (though it may be as
low as −3 dB to −5 dB). However, it is RT-​dependent and can only be used as a rough guide
as early reflections (useful reflections arriving within approximately 35–​50 ms of the direct
sound) serve to increase the ‘direct’ component and increase the perceived intelligibility. This
concept is illustrated in Figure 7.2 and discussed later in the chapter when considering sound
clarity (C50) measures.

Figure 7.2 Subjective effect of delayed reflections and later arriving sounds.


20

220 Peter Mapp

7.2.2 Secondary Factors

7.2.2.1 Uniformity of Sound Coverage


The aim of a sound system is to provide a uniform distribution of sound over the intended
listening area. The purpose of this is not only to ensure that the variation in sound level is
such as not to be noticeably different (i.e. louder in some areas than others) but also to help
maintain a consistent direct to reverberant ratio and signal to noise ratio. Typically uni-
formity of coverage should be within ±3 dB (i.e. 6 dB total variation). However, in highly
reverberant, noisy or critical spaces this variation may need to be lower than this in order to
maintain adequate intelligibility (e.g. ±2 dB or 4 dB total variation). For a speech signal this
variation could be determined as the A weighted value or by measuring the spatial variation
of the 2 kHz octave band using a pink noise test signal. An exception to this rule is where
late reflections from the rear wall of a room (unavoidably excited in order to cover the rear
rows of listeners) travel back to the front of the room. In this case it may be necessary to
increase the sound level at the front to overcome the reverberant backwash.

7.2.2.2 Sound Focussing and Discrete (Late Arriving) Reflections


Curved surfaces, circular rooms and domes in particular can cause sound to focus and arrive
after the direct sound with significant amplitude. If these reflections arrive approximately
50–​60 milliseconds or more after the direct sound, they will generally be heard as echoes
and cause a reduction in intelligibility. Such reflections can be evaluated using the echo
perception curves derived by Haas, Nickson and Muncey, Lochner and Burger or Dietsch
and Kraak [2, 3, 4, 5].

7.2.2.3 System Distortion (Including Headroom /​Signal Clipping)


Whilst subjectively annoying, with respect to intelligibility distortion is not highly
disturbing until it becomes predominant or significant clipping occurs. However, in com-
bination with other degradations, such as poor direct to reverberant ratio and signal to noise
ratio, distortion becomes more disturbing, having a cumulative effect. Provided that elec-
tronic distortion (THD) is below approximately 1% (which any system should easily meet)
and acoustic distortion produced by the loudspeakers is below 5%, then it is unlikely that
distortion will have a detrimental effect on intelligibility.

7.2.2.4 System Equalisation


Subjectively, equalisation can have a very significant impact on the perceived intelligibility
of a system, though this often does not translate into a measured difference, except when
the signal to noise ratio is affected. indeed, it is quite common practice to adjust the fre-
quency response of a public address or voice alarm system in order to improve the signal
to noise ratio of particular frequency bands in order to obtain a ‘pass’ value when auditing
a system against a measurement metric such as the Speech Transmission Index (STI) –​
particularly where the ambient noise is high. However note that under quiet, reverberant
conditions, subjective impression and objective electroacoustic measurement of potential
intelligibility may disagree [6, 7].
21

Speech Intelligibility of Sound Systems 221


7.2.2.5 Direction of Primary Sound Arrival
It is a natural human reaction to try and face the source of a sound in an attempt to hear it
better. Whilst not particularly well researched, many listeners find that sound coming from
in front of them to be subjectively more intelligible than that coming from overhead or from
behind (an omnidirectional measurement microphone provides no such discrimination).

7.2.2.6 Direction of Interfering Noise (Particularly Relative to Source of Sound)


Binaural hearing helps allow a listener to concentrate on a particular source of sound. By
locating a loudspeaker away from a potential source of interfering noise, so that the wanted
speech message and unwanted noise arrive from different directions, the potential intel-
ligibility can be increased by taking advantage of the effect known as ‘spatial release of
masking’.

7.2.2.7 Talker Type –​Male, Female and Accented


The spectra of male and female voices differ, with females generally having less low-​
frequency and greater high-​frequency content. Under some reverberant or noisy conditions
this difference can provide greater intelligibility as masking of the wanted speech signal
is reduced. However, the natural difference between voices and articulation by the talker
means that no universal rule can be applied. Talkers with a strong regional accent or who
are not using their first language also tend to be less intelligible.

7.2.2.8 Vocabulary (Complexity) and Context of Speech Information


Simple messages, such as those relating to train arrival times, airport departure gate infor-
mation or venue closing times for example, can be successfully broadcast and understood
with a far lower degree of intelligibility than that required when listening to complex infor-
mation –​such as a technical lecture, an involved drama or court proceedings.

7.2.2.9 Talker Enunciation (Articulation) and Rate of Delivery


The clarity of natural speech spoken by normal (untrained) talkers varies enormously, with
some people speaking far more clearly than others. When the other factors such as reverber-
ation, noise and reduced bandwidth are also taken into account, the variation becomes even
greater. The rate at which people speak also varies considerably, with faster rates of speech
being more difficult to understand –​particularly under reverberant conditions. (Typically,
the rate of normal speech is around 4–​5 syllables per second /​120 words per minute.)

7.2.2.10 Talker /​Listener First Language


As noted above, strongly accented speech is generally more difficult to understand. Equally,
if a speech recipient is not listening in their first language, no matter how fluent they are,
they will require a greater level of intelligibility than someone listening in their natural
(first) language. Typically an improvement of around 4 dB in signal to noise ratio or direct
to reverberant ratio will be required to counteract the effect [8].
2

222 Peter Mapp


7.2.2.11 Listener Acuity (Hearing Ability)
Not all people hear as well as others. This particularly applies to older listeners where a
natural reduction in hearing acuity occurs as a part of the ageing process (presbycusis).
Typically around 14% of the population have some noticeable hearing loss and will there-
fore not hear PA announcements as well as others. This should be kept in mind when
designing systems where it is known that an older demographic might apply or where large
numbers of non-​native listeners might be involved, for example at tourist attractions or
international transportation facilities.

7.2.2.12 Talker Microphone Technique


Whilst often outside the influence of the system designer, the way in which talkers and
broadcasters use the microphone can have a significant impact on the intelligibility of the
speech received by the listener. However, the system designer can incorporate a number of
measures to help mitigate the effects of poor microphone technique. Aspects to consider
include:

• Variation in voice levels between announcers/​users. [Consider using a leveller (auto-


matic gain control) /​compressor and visual level indicator.]
• Distance of microphone to the mouth (proximity effect, level variations). [Consider
using a leveller (automatic gain control) and visual level indicator and a capsule with
less proximity bass rise.]
• Talking off mic (away or to the side of the microphone). [Consider using a leveller
(automatic gain control) and visual level indicator.]

7.2.2.13 SNR at Microphone


Announcement and paging microphones are often located in busy control rooms or
noisy locations such as information desks. The intelligibility of the broadcast speech
can therefore be degraded by the ambient noise right at the input to the system. The
effect is additive and often perceptibly worse as the noise becomes embedded in the
speech and cannot be ignored or mitigated by means of the binaural release from
masking which would normally apply when listening in a noisy environment. Modern
signal-​processing and active noise-​cancellation techniques can be incorporated into a
system where it is known/​unavoidable to locate the broadcast microphone in a noisy
environment. Drive-through ordering systems, for example, regularly employ such
techniques.

7.2.2.14 Visual Contact between the Listener and Talker


Most people automatically use visual cues and information when listening to a talker where
there is visual contact and this subconscious process can significantly add to the perceived
intelligibility. Therefore, advantage should be taken of this effect where it is potentially
available. However, it is important that latency and mis-​synchronisation between the audio
and visual information is controlled, (typically within ±40 ms) in order not to damage this
potential advantage.
23

Speech Intelligibility of Sound Systems 223


7.2.2.15 Signal-​Processing Effects (e.g. Compression, Automatic Gain Control (AGC),
Limiting, Echo Cancellation, Latency)
Modern public address and sound reinforcement systems generally employ considerable
signal-​processing capabilities. When used correctly they can provide significant advantages
to intelligibility, for example by maintaining a stable speech level over the ambient noise. It
is important though to ensure that a suitable gain structure is maintained to ensure that the
speech signal doesn’t distort or become clipped or conversely is not lost in the noise. Echo
cancellers in particular, unless set up correctly can reduce rather than enhance intelligibility
and all DSP processes will add unintended latency (delay) to the system.

7.2.2.16 Electronic Interference of Audio Signal (Hum, Noise, Signal Continuity /​


Interruption)
If designed and set up correctly, public address and speech reinforcement systems should
not suffer from spurious noise interference (including hum and buzz) or intermittently
lose the signal –​though these are common issues. Even low levels of system noise, whilst
being low enough not to affect the intelligibility of the speech directly, may be annoying
and detrimental to concentration and increase the cognitive load and so reduce perceived
intelligibility.

7.3 The Speech Signal and Implications for Intelligible Sound


System Design
Speech can be considered as a series of impulsive acoustic events that occur over time.
Figure 7.3 shows the time history of a typical spoken sentence that lasts approximately 10
seconds. A highly diagrammatic view of a speech sequence can be seen in Figure 7.4.

Figure 7.3 Typical speech waveform.


24

224 Peter Mapp

Figure 7.4 Diagrammatic view of speech events (syllables or words).

Figure 7.5 Diagrammatic view of the effect of noise on speech, for high, moderate and low signal to
noise ratios.

In Figure 7.5(top), a low level of steady ambient noise has been added, but this would not
affect the intelligibility of the speech as this is well below the level of the speech. However,
in Figure 7.5(middle) the level of the noise is greater and now obscures or ‘masks’ part of the
speech signal (the lower amplitude components). In the lower diagram, the level of noise
has been increased so that it is masking most of the speech signal. However, parts of the
signal will still be heard and potentially understood.
In practice, words and syllables will have different amplitudes and so will be affected
differently by a given level of noise, as depicted in Figure 7.6. Here certain elements (words
or syllables) are either not affected or only partially affected by the noise, whilst others are
masked completely.
It may well be that a listener is able to ‘piece together’ the meaning of the speech
by recognising some parts and inferring what the missing elements might be. The more
complex or unfamiliar the speech is, the greater the task and the greater the cogni-
tive load.
25

Speech Intelligibility of Sound Systems 225

Figure 7.6 Diagrammatic effect of noise on speech elements of varying level.

Figure 7.7 Diagrammatic effect of reverberation on speech elements of the same and varying levels.

Reverberation affects speech in a slightly different way, whereby lower-​ amplitude


components are masked by the reverberant decay of preceding sounds –​as shown diagram-
matically in Figure 7.7.
Again, some words or syllables are hardly affected whilst others are completely masked
by the reverberant decay. This is perhaps more easily seen in Figure 7.8, which shows the
speech waveform for the word ‘back’. The initial ‘ba’ sound is quite energetic and higher in
amplitude than the final ‘ck’ sound, which is approximately 20 dB lower.
Figure 7.9 shows the envelope of the same word and in the lower trace with the effect of
reverberation times of 1.0 and 0.6 seconds added. With a reverberation time of 1.0 second,
the ‘ck’ sound is completely masked by the reverberant decay of the ‘ba’. However, when
the reverberation time is 0.60 seconds (the recommended RT value for classrooms) the ‘ck’
26

226 Peter Mapp

Figure 7.8 Speech waveform for the word ‘back’.

Figure 7.9 Diagram showing the effect of reverberation times of 1.0 and 0.6 seconds on the word
‘back’.

survives and is audible. This enables the word ‘back’ to be distinguished from other words
beginning with ‘ba’ such as bag, ban, band or bath for example.
In reality, the amplitude of a speech signal varies significantly from one second to the
next, as shown in Figure 7.10, which shows a 30-​second speech extract measured with
a resolution of 100 milliseconds. The lower curve presents the amplitude in terms of an
r.m.s. measurement whilst the upper curve shows the corresponding true peak levels. The
long-term average peak level is 20 dB greater than the r.m.s., indicating this speech extract
to have a crest factor of 20 dB –​which is a very typical value. Clearly, the amplitude of
speech varies significantly over a given time period. In order to ascribe a level to speech,
27

Speech Intelligibility of Sound Systems 227

Figure 7.10 Temporal variability of speech: LAeq =​73 dB, LAmax =​82 dB, LCeq =​78 dB, LCpk =​98 dB
and average LCpk =​89 dB.

it is therefore customary to measure the long-term average or the average value over the
length of a message or the speech segment under consideration. This is most conveniently
carried out by measuring the equivalent energy level (Leq). In the example in Figure 7.10,
the average SPL (LAeq) is 73 dB, whilst the maximum r.m.s. level is 82 dBA and the peak
level is 98 dBC and the average peak level is 89 dBC.
It is important to appreciate that speech is not produced at a static level but typic-
ally may vary by about 10 dBA, though for some talkers this range may be greater.
Having a knowledge of not only the dynamic range of speech but its likely maximum
levels is extremely important when designing a PA or sound reinforcement system, as this
determines the necessary amplifier voltage (and power) headroom. However, for signal
to noise ratio calculations the LAeq is usually employed to set or determine the speech
level –​though it can be argued that a measure such as the LA10 (10% level exceedance) also
provides a realistic approach. A more detailed discussion of speech signal characteristics
can be found in [9].
The other primary factor of importance for speech is its spectral characteristic. Typically
speech sounds occur over the range 100 Hz to 8–​10 kHz. Individual voices vary enormously
and again in order to be able to predict or measure the potential intelligibility of a system,
an average spectrum needs to be used. Various studies have been carried out and a number
of standards have been developed. These all employ the long-term average spectrum and
whilst generally depicting the same general trends vary considerably, primarily due to the
way the measurement was performed and the size and composition of the talker sample
employed. A typical speech spectrum is shown in Figure 7.11.
28

228 Peter Mapp

Figure 7.11 Typical speech spectrum.

As the figure shows, the maximum energy occurs around 250 Hz to 500 Hz and then
decreases with increasing frequency. The Speech Transmission Index (STI) standard (IEC
60268-​16 [10]) follows a similar approach but was updated in 2020 (Edition 5) to bring it
better into line with other standards and the latest research. Figure 7.12 compares the old
and new STI spectra.
The spectra are idealised, in that the high frequencies roll off at a constant 6 dB per
octave. In reality however, most real speech has an increased high-​frequency content as
compared to the standard, as shown for example by Figure 7.13, which compares the long-
term spectra of six different male voices and the standardised STI male spectrum.
In Western languages, the vowels carry most of the power (SPL) of the voice and
typically cover the range from 125 Hz to 1 kHz, whilst the consonants carry the infor-
mation and occur at frequencies above approximately 1 kHz (note that this is a gross
simplification).
In terms of intelligibility, the 2 kHz octave band information is the most important,
followed by the 4 kHz and 1 kHz bands, as depicted in Figure 7.14. Different standards apply
slightly different importance weightings to the speech frequency bands dependent on how
the background research and testing was conducted and the nature of the speech test signal
under consideration.
As noted above, although the low and mid frequencies carry the power of the voice,
it is the higher but significantly weaker high frequencies that are mainly responsible for
intelligibility. Therefore a single, wideband signal-to-​noise measurement may not be par-
ticularly useful but using an ‘A’ weighted measurement, whereby the 1, 2 and 4 kHz bands
are emphasised and the 500 Hz and 250 Hz bands are attenuated, can provide a reasonable
single-​figure approximation (hence the earlier empirical guidance of 6 and 10 dBA signal to
noise ratios s noted in section 7.1).
29

Speech Intelligibility of Sound Systems 229

Figure 7.12 Speech and test signal spectra from IEC 60268-​16 2011 and 2020 (Editions 4 and 5).

Figure 7.13 Speech spectra of six different voices and comparison with IEC 60268-​16 2011 spectrum.
230

230 Peter Mapp

Figure 7.14 Typical octave band contributions to speech intelligibility (after [9]).

Figure 7.15 Octave band analysis of speech and interfering noise –​with good signal to noise ratio.

By carrying out an octave band spectral analysis of the interfering noise and the speech
content, a more detailed evaluation of a given signal-to-​noise issue can be undertaken.
Furthermore, by considering the octave band intelligibility weightings, it is possible to see
not only where the problem lies but also to gain an insight into what may be done to
improve the intelligibility with respect to the noise. Figure 7.15 demonstrates the idea.
Here the speech signal is well above the ambient noise in each of the octave bands and
so it can be inferred that intelligibility should be good. In contrast the analysis shown in
Figure 7.16 indicates that the intelligibility will be poor as the higher (consonant) frequen-
cies are below the noise.
231

Speech Intelligibility of Sound Systems 231

Figure 7.16 Octave band analysis of speech and interfering noise –​with poor signal to noise ratio.

Figure 7.17 Energy time curve for sound arriving at listening position from distributed sound system
in 1.6 s RT space.

In a similar manner to signal to noise ratio, the direct to reverberant ratio and temporal
aspects of a system can be assessed by evaluating the impulse response of the system in a
room or space. Figure 7.17 shows a typical example.
Figure 7.17 shows the energy time curve (ETC) of a distributed sound system in a
church having a mid-​frequency reverberation time of 1.6 seconds. The plot shows the
direct sound (first spike on the left of the graph) to be strong and well above the reverber-
ation (sloping decay towards the right). Immediately following the direct sound from the
nearest loudspeaker there are a number of other discrete (separately identifiable) sound
arrivals –​primarily from other loudspeakers in the system. These sounds all arrive within
50 ms of the primary arrival and so should integrate and enhance the intelligibility. This
is indicated by Figure 7.18, which is an integrated energy plot and implies that the direct
sound and early reflections will combine to enhance the subjectively perceived direct to
reverberant ratio.
23

232 Peter Mapp

Figure 7.18 Integrated energy plot for distributed sound system.

Sound energy ratios –​C7 is effectively the ‘direct sound’ alone. C50 and C35
Figure 7.19 
include early reflections that will integrate with and increase the effective level of the
direct sound.

Using the information presented in Figure 7.17, the direct to reverberant ratio can be
quantified and as might be expected is frequency-​dependent. Figure 7.19 plots the Direct
to Reverberant sound energy ratio in decibels for three analysis settings C7, C35 and C50,
where the suffix denotes the length of the window set (in ms) to include the ‘direct energy’
or ‘useful’ sound component. C50 is defined as the ratio of the sound energy arriving within
the first 50 ms to the total sound energy arriving after 50 ms and is expressed in dB. It was
noted in 7.1 that 50 ms is often used as a measure of the useful sound which includes the
direct component and early reflections.
23

Speech Intelligibility of Sound Systems 233

Figure 7.20 Example of strong echo occurring in circular reverberant space.

 Energy ( 0 − 50 ms ) 
C50 = 10 x log   , dB
 Energy ( 50 ms − end ) 

Although the C7 indicates that the direct to reverberant ratio is negative by up to 5 dB


over the important frequency range from 1 to 4 kHz, the negative ratio does exceed −5
dB, which, as noted earlier, means that intelligible speech should be supported. Indeed,
subjectively speech heard from the sound system at this location was thought to be highly
intelligible.
The C35 and C50 ratios are stronger, showing that the system creates significant useful
early energy.
Examination of the impulse response or more usefully the ETC and integrated energy ETC
(echogram) also enables potential echoes to be detected. A classic example of a strong echo,
created by sound focussing occurring in a reflective circular room, is shown in Figure 7.20.
Here the strong echo can be seen to occur 165 ms after the first (direct) sound arrival.

7.4 Speech Intelligibility Measures and Measurement


From the foregoing discussion in section 7.3, it can be seen that several analytical procedures
can be adopted to look at parameters such as signal to noise ratio and direct to reverberant
ratios. From these, either empirically derived criteria can be developed or, better still, be
directly related to speech intelligibility. So how do you measure speech intelligibility?
Essentially there are two techniques. The first is to measure the number of words or
sentences (or speech components) correctly identified within a given test using a real talker
(or preferably several different talkers) and a panel of listeners. This is a time-​consuming,
cumbersome and costly exercise to undertake, particularly outside the laboratory, and so is
rarely undertaken for PA and SR systems. Therefore, over the years, a number of techniques
have been developed whereby certain acoustic parameters, such as complex signal to noise
ratios for example, are measured and the potential intelligibility of a system is inferred by
correlating a range of test results to previous calibrated word/​sentence tests. This is effect-
ively an indirect method of speech intelligibility testing and enables an audio channel to
be evaluated without the need for a listening panel and selection of talkers. It is this latter
approach that is generally adopted with respect to PA and sound reinforcement systems.
Whilst often referred to as an ‘intelligibility test’, it must not be forgotten that such testing
234

234 Peter Mapp


merely infers the potential intelligibility of the system or condition under test and is not a
direct or true measure of speech intelligibility. Having said that, modern techniques can
provide some extremely good correlations with intelligibility.
It is beyond the scope of this chapter to discuss subject-​based intelligibility testing in
any detail. Essentially, a listening panel is either played a list of phonemically balanced
words which they have previously been exposed to or is asked to indicate words they think
they heard from a set of multiple-​choice answers. These are often in the form of CVC
(consonant –​vowel –​consonant) or rhyming words, such as ‘dent’, ‘bent’, ‘tent’, ‘went’ or
‘sent’ where the leading or final consonant is changed to see if it is masked by the ambient
noise or reverberation. The percentage of correct answers is calculated and after some stat-
istical adjustment, the intelligibility score is determined. Special test sentences may also
be used with the objective of the listeners identifying a number of target words. The test
sentences are constructed so that the target words cannot be inferred from the context if
not correctly heard. The way in which we perceive and extract intelligible meaning from
speech is remarkably complex and highly non-​linear. For example, in the case of sentence
intelligibility (which is how we would normally hear speech), if a word is missed or not com-
pletely heard, we may subconsciously ‘hear’ it and automatically fill in the meaning from the
context of the sentence. However, it is interesting to note that the same target words when
embedded in an ungrammatical sentence exhibit much lower intelligibility then even when
there is no context. Clearly there is something else about the construction and delivery of
the sentence which gives experienced listeners additional information and clues about the
words that increases the perceived intelligibility.
Undoubtedly there is more to speech intelligibility than merely the ability to just identify
a series of given words. Furthermore, these tests are conducted in an artificial way whereby
the panel of listeners are actively awaiting the next word or sentence and so are applying
more attention and cognitive effort than might normally be the case. Such complexities
make finding or applying a simple acoustic measure of intelligibility a formidable task.
Electroacoustic testing (often mistermed objective testing), as previously indicated,
measures a particular parameter such as signal to noise ratio or direct to reverberant ratio (or
both) and then infers the likely or potential intelligibility of the system. The Articulation
Index (AI) was the first scheme to employ this technique and was based on the assessment of
the weighted signal to noise ratio measured either in octave or one-third-​octave bands. The
method and procedure were standardised in ANSI S3.5 1969 [11]. The Articulation Index
only assessed the effect of noise on intelligibility and whilst useful for many applications,
it suffered by not considering important time-​domain factors such as reverberation. It was
later developed into the SII (ANSI S3.5 1997 [12]), which employed the STI to include
time-​domain issues.
In a complementary approach to assessing signal to noise ratios, sound energy (clarity)
ratios were developed to evaluate the direct to reverberant ratios [13]. They were, how-
ever, developed and intended for the assessment of the natural acoustics of a room or
auditorium rather than an electroacoustics system. Furthermore, these energy ratios, such
as C50, were developed as single frequency band metrics, being primarily based on the
1 kHz octave band. An intelligibility weighted scale was never developed, as there was
little need for this for the originally intended purpose. However, energy ratios can provide
a useful diagnostic tool when assessing the potential cause and solution of intelligibility
problems due to reverberation. Figure 7.19 shows an energy ratio example and the fre-
quency dependence of C7, C35 and C50 measures for a very intelligible sound reinforce-
ment system.
235

Speech Intelligibility of Sound Systems 235


Although a scale was not developed for C50, it is generally recommended that C50
should be at least 0 dB or greater. In practice, 0 dB corresponds to an STI value of approxi-
mately 0.50 to 0.60 depending on how C50 varies with frequency. (The C50 example of
Figure 7.19 corresponds to an STI 0.61 with the average of the 1, 2 and 4 kHz bands being
equal to 1.7 dB.)
C7 and C35 are defined in a corresponding manner for time intervals of 7 ms and 35 ms
respectively. C80, the energy ratio for sound arriving within the first 80 ms, is commonly
used in the evaluation of auditoria intended for live music performance.
Houtgast and Steeneken understood the need for an electroacoustically based test
method for their work of evaluating communication systems and in response developed the
Speech Transmission Index (STI), first publishing their approach in 1971 [14].
STI is the most researched and developed indirect intelligibility test method to date,
with the technique having undergone numerous revisions and enhancements. The STI
concept and implementation have been internationally standardised, with the latest revi-
sion of the standard being published in 2020 (Revision 5). It is now by far the most widely
used technique to assess the potential intelligibility of a public address or sound reinforce-
ment system (or almost any other audio communication system excluding the telephone
and VoIP).

7.4.1 Brief Description of STI


The Speech Transmission Index involves the measurement of the reduction in modulation
depth of a test signal to evaluate the effects of either noise and/​or reverberation on
a speech transmission channel. In this respect it is very similar to measuring the signal to
noise ratio and early energy ratio, except that both are inherently carried out simultan-
eously. The test signal is based on a pseudorandom noise signal, spectrally shaped to repli-
cate speech (see Figures 7.12, 7.13 and 7.27) and divided into seven octave bands. These
octave band ‘carriers’ are each modulated at 14 different frequencies over the frequency
range 0.63 to 12.5 Hz, thereby replicating the modulations most common in normal speech.
This results in a matrix of 7 × 14 =​98 data points, as shown in Table 7.2. The modulation
data of each octave band are averaged and a Transmission Index for each band is calculated.

Table 7.2 STI matrix

Octave band (Hz) 125 250 500 1k 2k 4k 8k

0.63 Hz x x x x x x x
0.80 Hz x x x x x x x
1.0 Hz x x x x x x x
1.25 Hz x x x x x x x
1.6 Hz x x x x x x x
2.0 Hz x x x x x x x
2.5 Hz x x x x x x x
3.15 Hz x x x x x x x
4.0 Hz x x x x x x x
5.0 Hz x x x x x x x
6.3 Hz x x x x x x x
8.0 Hz x x x x x x x
10.0 Hz x x x x x x x
12.5 Hz x x x x x x x
236

236 Peter Mapp


Table 7.3 STI matrix for sound system measurement –​STI =​0.611

Octave band (Hz) 125 250 500 1k 2k 4k 8k

0.63 Hz 0.949 0.936 0.922 0.929 0.953 0.968 0.989


0.80 Hz 0.921 0.903 0.881 0.892 0.929 0.95 0.983
1.0 Hz 0.882 0.86 0.829 0.846 0.896 0.925 0.973
1.25 Hz 0.829 0.805 0.761 0.788 0.852 0.89 0.96
1.6 Hz 0.751 0.732 0.672 0.711 0.792 0.839 0.939
2.0 Hz 0.66 0.657 0.585 0.632 0.726 0.78 0.913
2.5 Hz 0.547 0.574 0.503 0.543 0.654 0.712 0.879
3.15 Hz 0.411 0.476 0.437 0.442 0.58 0.635 0.837
4.0 Hz 0.309 0.355 0.401 0.356 0.516 0.559 0.79
5.0 Hz 0.329 0.248 0.401 0.316 0.478 0.5 0.746
6.3 Hz 0.349 0.241 0.384 0.3 0.461 0.46 0.707
8.0 Hz 0.299 0.267 0.338 0.276 0.455 0.434 0.673
10.0 Hz 0.252 0.271 0.254 0.26 0.431 0.435 0.653
12.5 Hz 0.134 0.2 0.196 0.27 0.419 0.452 0.653
MTI 0.458 0.539 0.537 0.540 0.622 0.652 0.784
STI 0.611

These indexes are then weighted with respect to their contribution to intelligibility (see
Figure 7.14) and combined to produce a single, overall STI value. An example is shown in
Table 7.3.
When first developed in the 1970s, a mainframe computer was required to calculate the
STI. In the 1980s a method of deriving the STI from the system-​room impulse response
was found based on the relationship established by Schroeder [15].2 However whilst appro-
priately taking account of the reverberant component, the background noise element has
to be manually added, frequently resulting in inaccurate or erroneous assessments being
made. The IR-​based procedure still required a relatively powerful PC or laptop computer
and so could not be easily used on site, though as laptop devices became more powerful and
measurement software improved on-​site measurement became a reality. In 2001, a simpli-
fied method of obtaining the STI using a handheld portable device or modified sound level
meter was introduced, specifically with the assessment of sound systems in mind. The new
method was termed STIPA and used a sparse modulation matrix, as shown in Table 7.4.
Here just 14 rather than 98 modulation data points are employed. However, despite the
reduction in resolution, STIPA and STI measurements agree extremely well [16, 17] and
STIPA has become widely adopted for measuring (and predicting) public address (PA) and
voice alarm (VA) system performance.
Plotting the modulation data in a graphical format can provide a useful insight into what
is happening in each frequency band and by examining the decay it can be ascertained if
the modulation reductions are due to reverberation, noise or echo interference. Figure 7.21
shows the data presented in Table 7.3 graphically. The STI scale ranges from 0 (completely
unintelligible) to 1.0 (perfect intelligibility). The just noticeable difference (JND) is usually
taken to be 0.03 STI. As noted earlier, intelligibility is non-​linear and so STI follows this
trend. The difference between 0.40 and 0.45 or 0.45 and 0.55 is very much more notice-
able than a corresponding change say between 0.65 and 0.70 or 0.70 and 0.80 for normal
speech or a speech announcement. The relationships between STI, sentences, phonetically
balanced (PB) words and consonant-​vowel-​consonant (CVC) words is given in annex E of
the IEC 60268-​16 standard.
237

Speech Intelligibility of Sound Systems 237


Table 7.4 STIPA matrix

Octave band (Hz) 125 250 500 1k 2k 4k 8k

0.63 Hz x
0.80 Hz x
1.0 Hz x
1.25 Hz x
1.6 Hz x
2.0 Hz x
2.5 Hz x
3.15 Hz x
4.0 Hz x
5.0 Hz x
6.3 Hz x
8.0 Hz x
10.0 Hz x
12.5 Hz x

Figure 7.21 MTF plot for high-​quality sound reinforcement system in 1.4 s RT space (STI =​0.61).

Apart from replicating the speech spectrum and speech modulations, later enhancements
to STI (and STIPA) also incorporated adjacent band redundancy factors, the effects of
frequency masking and a sound level dependency function. The first two functions take
account of how adjacent octave bands can contribute to speech intelligibility and also how
lower frequencies can mask higher-​frequency information. The latter effect is based on the
psychoacoustic phenomenon known as ‘the upward spread of masking’ whereby a lower
frequency or band of frequencies, if sufficiently higher in level than the adjacent band, will
238

238 Peter Mapp

Figure 7.22 Effect of speech level on STI for three reverberant conditions.

cause the higher frequencies to become completely or partially masked, effectively making
them inaudible. The effect is level-​dependent with the masking slopes becoming steeper at
higher sound levels and therefore increasing the masking effect as the sound level increases.
Associated with, although separate from, the effect of frequency masking is the observation
that higher-​level speech (above approximately 80 dB) gradually becomes less intelligible as
the level is further increased. Conversely, there is a corresponding effect at low levels below
about 50 dB SPL. STI takes both of these effects into account. However, the reduction in
STI due to the absolute SPL is not linear but is dependent on the pre-​existing reduction
in modulation caused, for example, by reverberation or other factors. Figure 7.22 illustrates
the realisation of this, demonstrating the reduction in STI as a function of sound level for
three different reverberant conditions corresponding to reverberation times of 0, 1.0 and
2.0 seconds.
Under anechoic conditions (0 seconds RT) or for an electronic system without echo or
distortion, the upper trace in Figure 7.22 clearly shows the decrease in STI due to the effect
of SPL and associated masking to be quite significant. However, the reduction in STI is not
linear, as shown by the reduced reductions in STI for the same range of SPLs calculated
for different reverberation conditions (equivalent to 1.0 and 2.0 seconds) presented in the
lower traces. Further information on how STI accounts for the effects of masking and abso-
lute SPL on STI can be found in Annex A of the STI standard [10].
The Speech Transmission Index is a sophisticated measurement and to a first-​order
approximation generally gives good agreement with perceived intelligibility. However, as
we have seen, the way that the ear and brain work together to extract meaning from the
acoustic signals that make up the speech sounds we use for communication is remarkably
complex. It is therefore not surprising that it is possible to fool the relatively simple STI
technique or for STI not to have the resolution or refinement to successfully predict the
potential intelligibility in some situations. STI knows nothing, for example, about clarity of
articulation of the talker, neither their rate of speech nor the potential binaural advantage
of the listener.
239

Speech Intelligibility of Sound Systems 239


As noted above, the STI of a system or communication channel can be measured directly
using a modulated speech-​shaped signal or indirectly from the system impulse response.
IEC 60268-​16 makes a clear distinction between the two methods as the approach and
associated test/​excitation signals are quite different. STIPA generally refers to the direct
(modulated signal) method and where it is derived from the impulse response this must be
clearly stated and such a measurement should use the designation STIPA(IR).
Although several voice alarm and emergency sound system standards provide target STI
values (for ­example 0.50 STI is a common requirement) there are few if any standards
that require a given STI value or range for other public address or sound reinforcement
system uses. For this reason, the 2011 version of the STI standard introduced some general
guidance and also presented the concept of measurement tolerance. This guidance
remains in the 2020 version and greater detail concerning measurement uncertainty is also
provided. Originally the STI employed a standard psychometric five-​point scale ranging
from Bad to Excellent. However, in practice, this proved to be confusing when it came to
the assessment of PA systems, as for e­ xample 0.50 STI might be considered to be ‘Good’

Table 7.5 STI qualification bands and typical applications

Category Nominal STI Type of message Examples of typical uses Comment


value information
(for natural or reproduced
voice)

A+​ > 0.76 Recording studios Excellent intelligibility


but rarely achievable
in most
environments
A 0.74 Complex messages, Theatres, speech auditoria, High speech
unfamiliar words parliaments, courts, intelligibility
B 0.7 Complex messages, assistive hearing systems
unfamiliar words (AHS)
C 0.66 Complex messages, Theatres, speech auditoria, High speech
unfamiliar words teleconferencing, intelligibility
parliaments, courts
D 0.62 Complex messages, Lecture theatres, classrooms, Good speech
familiar words concert halls intelligibility
E 0.58 Complex messages, Concert halls, modern High-​quality PA
familiar context churches systems
F 0.54 Complex messages, PA systems in shopping Good-​quality PA
familiar context malls, public buildings systems
offices, VA systems,
cathedrals
G 0.5 Complex messages, Shopping malls, public Target value for VA
familiar context buildings offices, VA systems
systems
H 0.46 Simple messages, VA and PA systems Normal lower limit for
familiar words in difficult acoustic VA systems
environments
I 0.42 Simple messages, VA and PA systems in very Poor intelligibility
familiar context difficult spaces
J 0.38 Not suitable for PA systems
U <0.36 Not suitable for PA systems
240

240 Peter Mapp

Figure 7.23 STI qualification bands (categories).

intelligibility for a noisy and reverberant railway station but this would be totally inadequate
for a theatre or law court –​yet the same criteria were often being applied. Intelligibility was
being considered to be a binary, black and white issue.
Figure 7.23 presents a more detailed scale from the STI standard (IEC 60268-​16).

7.4.2 STI Use and Limitations


Although the Speech Transmission Index is generally thought about in terms of the single
number rating that it provides, examination of the modulation transfer function (MTF)
data (as shown in Table 7.3 for example) does provide some diagnostic capability. The shape
of the MTF curves, for example, can indicate whether the modulation reduction is due to
reverberation (monotonic decay curve) or noise (flat curve) or echoes (decay with distinct
peaks and dips).
Reviewing the Modulation Transfer Index (MTI) data can be instructive, indicating
which frequency bands are subject to most degradation and where remedial action might be
best applied. The upper trace of Figure 7.24 shows the MTI plot for the sound reinforcement
system associated with the data presented in Figure 7.21 and in Table 7.3. The lower curve,
in contrast, shows the MTI for a voice alarm system located in a highly reverberant space
(RT =​5 seconds). The upper curve, relating to the 0.61 STI system, shows that there is
scope for improvement at 500 Hz and 1 kHz, particularly when the intelligibility weightings
(as per Figure 7.14) are considered. The lower MTI curve also shows that there is scope
to improve the overall STI/​intelligibility by increasing the values at 500 Hz and 1 kHz.
Although the MTI values at 125 Hz and 250 Hz are low, in this particular case, due to the
reverberation time characteristic of the space, the low-​frequency output capability of the
loudspeakers and the relative intelligibility contributions to the overall STI/​intelligibility,
this would not be a cost-​effective approach. MTI plots can therefore show where best to try
and improve the intelligibility or which frequencies are being affected the most.
Whilst there is no doubt that the introduction and standardisation of STIPA has
created a significant impact on the performance and testing and assessment of public
address, voice alarm and sound reinforcement systems, with its wide availability comes the
opportunity for misunderstanding and misuse. Few practitioners will probably have read
the STI standard (IEC 60268-​16), which contains much useful practical information. For
example, it is poorly understood that the pseudorandom nature of the signal will give rise
241

Speech Intelligibility of Sound Systems 241

Figure 7.24 MTI plots for two sound systems exhibiting STI values of 0.61 and 0.49 respectively.

to natural variations in the results and so at least three measurements should be made at
each measurement position and an average taken –​provided that the difference between
readings is 0.03 STI or less. Any reading outside this range should be ignored and the meas-
urement should be repeated.
In many cases, readings cannot be taken under normal operating or occupied conditions
and so are made out of hours to avoid disturbance to the public or to building occupants.
However, the ambient noise at this time will not be representative of when the system is
normally used. This problem, however, can be readily overcome by separately measuring
the normal ambient noise and then correcting the STI readings to account for the reduced
signal to noise ratio that will occur during ‘out of hours’ testing. Indeed, many meters and
software provide this function or enable the measured data to be exported to a spreadsheet
for background noise correction.
The sparse nature of the STIPA matrix makes it unsuitable for assessing the effects of
echoes, which requires the full STI matrix. Even then there is some difference of opinion
as to the suitability of STI for this task –​a view perhaps influenced by subjective impres-
sion which relates more to the ‘ease of listening’ than to the objective loss of intelligibility
[18]. Figure 7.25 shows the effect of a single echo (of similar amplitude to the direct sound)
on STI.
Some forms of signal-​processing such as AGC, echo and noise cancellation can also
affect STI measurements –​which are particularly dependent on the nature of the test signal
employed. It should also be understood that the crest factor and dynamic characteristics of
the STI/​STIPA test signals are very different to real speech. Table 7.6 summaries a number
of characteristic parameters. From the table it can be seen that the crest factors of speech
and STIPA are noticeably different and so could potentially affect the transmission of the
24

242 Peter Mapp

Figure 7.25 Theoretical effect of a delayed sound (echo) on STI.

Table 7.6 Speech and STI/​STIPA test signal characteristics

Parameter Crest factor Crest factor LA1 − LA10 − LAeq LAmax > LCmax >
(dB) (dB) LAeq LAeq LAeq

A wtd C wtd (dB) (dB) (dB) (dB)

Speech (typical) 20 17 7 3 9 12
Pink noise 12.0 11.2 2.0 0.1 0.4 2.6
STIPA 12.4 11.6 1.8 1.0 1.8 9.2
Sinewave 3 3 0 0 -​ -​

signal through a sound system. However, there is little energy in the peaks. Of greater con-
cern are the differences in the dynamic r.m.s. behaviours. Typically the r.m.s. speech signal
maxima may be around 10 dB above the long-term average –​and these maxima contain
significant energy and will cause signal processors such as compressors and limiters to react.
The A weighted r.m.s. maxima for the STIPA signal on the other hand are typically some
7 dB lower and the average maximum level is approximately 5 dB lower. Clearly the STIPA
signal does not replicate the energetic and dynamic behaviour of typical speech. The dis-
crepancy is significantly worse if a sine sweep is employed as the test signal, as indicated in
Table 7.6.
A further common error that widely occurs in practice relates to the setting up of a
measurement and adjusting the equivalent speech and STIPA test signal levels. It is often
thought that this should be achieved by setting the STIPA signal to the same LAeq value as
that of a normal speech announcement or reinforcement level (that is, measure the LAeq of
the speech and set the STIPA signal to the same LAeq). However, this is incorrect. According
243

Speech Intelligibility of Sound Systems 243

Figure 7.26 1/​12 octave analysis of Edition 5 STIPA signal (centre frequencies at 125, 250, 500 Hz,
1, 2, 4 and 8 kHz).

to IEC 60268-​16, for equivalence, the STIPA signal must be set to an LAeq value that is 3
dB higher than the speech. This difference can have a significant impact when dealing with
low signal to noise ratios and may also put a higher demand on system amplifiers. It should
also be realised that the STI/​STIPA signal does not have a continuous spectrum as perhaps
suggested by Figure 7.12 but instead consists of a discrete series of separated ½-​octave-​wide
elements, as illustrated by Figure 7.26.
Whilst STI is the best metric that we have for assessing the potential intelligibility
of a sound system, it is far from infallible, but it should provide a reasonable indication.
Assuming that it has been measured correctly, it is unlikely that a system achieving a STI
score of ≥0.50 will be unintelligible. However, STI knows nothing about the clarity or spec-
tral makeup of the talker’s voice or their rate of speech –​all factors that will significantly
affect the perceived intelligibility. Equally, nothing is known about the listener’s hearing
acuity and language skills or the relative location of the source of sound and interfering
noise. One of the great advantages of STI, however, is that it can be predicted from a know-
ledge of the basic acoustic details of a room or space (for example, the volume, reverberation
time, surface treatments /​acoustic absorption coefficients). This enables sound systems to
be designed with a high degree of confidence that they will be intelligible or capable of
meeting a given STI target. However, this task requires the talents of a skilled computer
modeller, particularly when dealing with challenging acoustic environments. Acoustic com-
puter modelling programs are only as good as the data that are fed into them but at times
they can positively encourage the user to compute an incorrect result –​particularly with
respect to STI. A detailed understanding of room acoustics, the way in which loudspeakers
behave in a given space, how loudspeakers radiate sound and a good understanding of STI
and its underlying science are all required.
24

244 Peter Mapp


Table 7.7 Minimum recommended number of spatially distributed STI measurements

Area involved (m2) Minimum number of measurements

<25 2
25–​100 3
100–​500 6
500–​1500 10
1500–​2500 15
>1500 15 per 2500 m2

When assessing the STI/​potential intelligibility performance of a sound system, it


is important that an adequate number of spatially well-​distributed measurements are
made. Some, but certainly not all, standards requiring a given STI criterion to be met
provide guidance on this. In the absence of specific guidance, that provided in ISO 7240
[19] is probably a good place to start. As a guide, Table 7.7 also provides some nominal
assistance.
When computer modelling the potential STI performance of a space, a significantly
greater number of data points (higher resolution) would normally be employed. The statis-
tical distribution of the STI values over the required area is also normally computed as well
as the mean and standard deviations.
Further discussion and information concerning common STI measurement errors and
issues can be found in [20, 21].

7.4.3 Other and New Methods of Speech Intelligibility Assessment

7.4.3.1 Percentage Loss of Consonants (% ALcons)


Whereas STI has been around for 50 years and STIPA for over 20 years, during this time
few other intelligibility assessment techniques have been developed. This may be due to
the enormity and complexity of the task involved. One method, using a rather different
technique to STI for predicting potential intelligibility but also developed in the 1970s, is
that of the Articulation Loss of Consonants (% ALcons). Originally conceived for rating
speech intelligibility in classrooms [22], the method was extended to predict the perform-
ance of a sound system by considering the directivity of the source [23]. The method essen-
tially computes the direct to reverberant ratio but as a function of reverberation time. Later
additions to the technique included the addition of the signal to noise ratio.

% ALcons =​200 D2 RT2 (n+​1) /​ V Q m [applicable where D < 3.16 Dc] (7.1)

where:
D =​distance to the loudspeaker (or talker) (m)
RT =​reverberation time (s)
n =​number of loudspeakers operating and contributing to the reverberant field
V =​volume of the space (m3)
Q =​directivity of the source
m =​acoustic modifier (usually set to 1)
Dc =​critical distance
245

Speech Intelligibility of Sound Systems 245


Table 7.8 Relationship between STI and % AlCons

% Alcons STI (RaSTI)

15 0.45
10 0.52
5 0.65
≤3 ≥0.75

Note the equivalence only really works for the disused RaSTI method, which was limited to just the
500 Hz and 2 kHz octave bands.

Being simpler to calculate, % Alcons was for a while used as a quick method of esti-
mating the likely STI value for a PA system, using the relationship between STI and %
ALcons determined by Houtgast and Steeneken and Farrell Becker. Typical equivalent
values are shown in Table 7.8.
For many years, the acceptable maximum percentage loss of consonants (% ALcons)
was 15%, and was based on the requirement to have at least 25 dB of signal to noise ratio
together with a uniform frequency response in the 2–​4 kHz region –​the critical range for
speech intelligibility. In later years, the continuing study of speech intelligibility (in part
due to the increasing use and understanding of STI) led to the conclusion that 10% was
a more appropriate value (maximum loss) for most purposes. When the information being
delivered is familiar or expected, 10% is quite acceptable. In a learning environment, espe-
cially for people with hearing impairment, the target % ALcons should be closer to 5%.
Whereas % ALcons can still be useful for assessing the Direct to Reverberant ratio effects,
the method was not developed to the same extent as STI nor standardised. In practice, STI
or STIPA has become the preferred and internationally standardised method for rating the
potential intelligibility of a sound system.

7.4.3.2 Coherence
A new measurement technique that does show some promise is based on the measurement
of the coherence of the signal received by a listener [24]. This, however, does require know-
ledge of the original signal to act as the reference. Whilst in some applications this may be
possible, in many cases there is no way of obtaining this reference –​remote measurements
in an airport terminal or large industrial site for example. Here either an asynchronous tech-
nique or a stand-​alone signal such as STIPA is required. The use of coherence as a measure
does have the advantage that it provides a more stable result for reflections and may also be
viewed in real time.
At this stage, there is no proven relationship between coherence and speech intelli-
gibility. Considerably more research is required in order to explore and exploit the link
and derive a robust measurement scale. As with STI however, coherence is an indirect
method of assessment, relying on an acoustic parameter that correlates with intelligibility.
Furthermore, it is not clear how the parameter may be predicted from a knowledge of room
acoustic and loudspeaker parameters. The ultimate goal for the measurement and assessment
of speech intelligibility must be to use real speech and the transmitted speech signal itself –​
though this is a considerable way off being readily realised.
246

246 Peter Mapp

7.4.4 A Comment about System Frequency Response and Intelligibility


In terms of sound quality, the frequency response of a sound system is its most important
subjective attribute. At the higher end of the intelligibility scale, above approximately
0.65, sound quality and speech intelligibility are often confused or interposed when a
subjective judgement is made. As noted earlier, STI is not particularly sensitive to small
changes in spectral response under quiet or high signal-to-​noise conditions, although
these may be subjectively very noticeable [6, 7, 20, 21]. Equally, there is no standardised
method of measuring the frequency or spectral response of a sound system. In the past,
various house curves and recommended response characteristics have been proposed
and recommended for speech systems. Figure 7.27 shows a typical response. However,
good-​quality contemporary sound systems often exhibit an extended and often almost flat
response, ensuring that the frequencies important for good intelligibility are maintained.
Figures 7.28 and 7.29 are typical examples of this approach. Figure 7.28 shows the
response of a speech reinforcement system installed within a large cathedral having a
mid-​frequency reverberation time of 4.5 s. The response extends to beyond 10 kHz before
rolling off but is still contributing beyond 12–​15 kHz. The corresponding STI for the
unoccupied space was 0.55 –​a very satisfactory result. Figure 7.29 shows the frequency
response of a speech and music reinforcement system installed in concert hall designed
for classical music and having a reverberation time of 2.0 s. The frequency response is
well extended –​only beginning to drop above 16 kHz, again apparently breaking all
the ‘rules’. Here the STI was typically 0.60 to 0.65 –​again a very satisfactory result for

Figure 7.27 Typical target speech response curve.


newgenrtpdf
247
Speech Intelligibility of Sound Systems 247
Figure 7.28 Frequency response for cathedral sound system measured at typical listener location.
248

248 Peter Mapp

Figure 7.29 Set of frequency response curves for a concert hall high-​quality sound system.

the unoccupied venue. There is no doubt that system frequency response and perceived
intelligibility are inextricably linked. Whereas we know how to measure intelligibility we
still do not fully understand how to measure the perceived spectral response of a sound
system in a large space. That the direct sound should be nominally flat and well extended
is certainly true but how to take account of the sound of the early and late reflections and
reverberation is not well understood.

7.5 Summary of Sound System Design Factors Required for Good


Intelligibility
1. Adequate bandwidth and frequency response –​minimum 250 Hz –​6 kHz.
2. Signal to noise ratio > 6 dB, preferably > 10 dB ‘A’ weighted.
3. Direct to reverberant ratio > +​3 to 5 dB (1 kHz, 2 kHz and 4 kHz bands).
4. Even coverage within ±3 dB (6 dB variation) over range 800 Hz –​5 kHz ⅓-​octave
bands (1–​4 kHz octave bands) or ±2 dB (4 dB variation) in highly reverberant spaces.
Use 4 kHz band for quick coverage test.
5. Absolute SPL ideally in range 55 to 80 dBA but may need to be higher in noisy envir-
onments or to achieve signal to noise ratio requirements according to (2).
6. Minimise distance between listener and loudspeaker wherever possible.
7. Use directional loudspeakers in reverberant spaces.
8. Aim the loudspeakers at the listeners and keep sound from impacting walls and ceiling
where possible.
9. Equalise system to provide smooth frequency response free from peaks –​particularly in
reverberant spaces.
10. Optimise system gain structure to avoid noise and hum pickup, distortion or overloading/​
clipping of the signal.
249

Speech Intelligibility of Sound Systems 249


11. Provide adequate amplifier headroom (minimum 6 dB r.m.s. ideally 10 dB+​depending
on system type and quality).
12. Incorporate signal levelling/​compression to provide steady speech level to maintain
stable signal-to-​noise level.
13. Minimise noise /​other interference at broadcast microphone –​equalise microphone
signal if necessary for correct microphone position/​response.
14. Minimise latency where there is visual contact between the talker and listener.
15. In noisy and challenging acoustic environments equalise the system response for intel-
ligibility rather than naturalness (ensure good signal to noise ratio at 1 kHz, 2 kHz and
4 kHz).
16. STI targets should be ≥0.50 for emergency sound and information systems and ≥0.65 for
entertainment systems, classrooms and court rooms (0.60 minimum).
17. When carrying out STIPA tests, ensure that the test signal is 3 dB greater than the
normal speech (announcement) level.
18. Wherever possible separate the direction of noise sources and system loudspeakers to
aid spatial release from masking.
19. Microphone users should be trained in the use of the microphone and to speak clearly.
20. Consider likely listener hearing acuity (hearing loss /​language) –​greater intelligibility
may be required.

Notes
1 Uniformity of coverage affects both the direct to reverberant ratio and signal to noise ratio.
2 IEC 60268-​16 makes a clear distinction between the two methods of making an STI measurement,
referring to them as the ‘direct’ and ‘indirect methods’.

References
1. ISO 9921-​Ergonomics –​Assessment of Speech Communication (2003).
2. Haas, H. The influence of a single echo on the audibility of speech. J. Audio Eng. Soc. 20(2)
(1972).
3. Muncey, R.W., Nickson, A.F.B.. and Dubout, P. The acceptability of artificial echoes with rever-
berant speech and music. Acustica 4 (1954).
4. Lochner, J.P.A., and Burger, J.F. The subjective masking of short time delayed echoes by their
primary sounds and their contribution to the intelligibility of speech. Acustica 8 (1958).
5. Dietsch, L., and Kraak, W. Ein objectives Kriterium zur Erfassung von Echostorungen bei Musik
und Sprachdarbietungen. Acustica 60 (1986).
6. Mapp, P. Frequency response and systematic errors in STI measurements. Proc IOA Vol 27 Pt 8,
Reproduced Sound 19 (2003).
7. Leembruggen, G., Hippler, M., and Mapp, P. Further investigations into improving STI’s rec-
ognition of the effects of poor frequency response on subjective intelligibility. AES 128th
Convention, London (2010).
8. Wijngaarden, S., Steeneken, H., and Houtgast, T. Quantifying the intelligibility of speech in
noise for non-​native listeners. JASA 111(4) (2002).
9. Mapp, P. Some effects of speech signal characteristics on PA system performance. AES 139th
Convention, New York (2015).
10. IEC 60268-​ 16: 2011 and 2020, Objective Rating of Speech Intelligibility by Speech
Transmission Index.
11. ANSI standard S3.5 1969 Methods for calculation of the Articulation Index.
12. Ansi S3-​5 1997 (R 2017) Methods for Calculation of The Speech Intelligibility Index.
250

250 Peter Mapp


13. Reichardt, W., Alim, O., and Schmidt, W. Applied Acoustics 7 (1974) and Acustica 32 (1975).
14. Houtgast, T., and Steeneken, H.J.M. The modulation transfer function in room acoustics as a
predictor of speech intelligibility. Acustica 28 (1973).
15. Schroeder, M. Modulation transfer functions: definition and measurement. Acustica 49 (1981).
16. Mapp, P. Is STIPA a robust measure of speech intelligibility performance? AES 118th Convention,
Barcelona (2005).
17. Steeneken, H.J.M., Verhave, J.A., McManus, S., and Jacob, K.D. Development of an accurate,
handheld, simple-to-​use meter for the prediction of speech intelligibility. Proceedings IoA 2001,
Reproduced sound (17) UK (2001).
18. Hammond, R., Mapp, P., and Hill, A. The influence of discrete arriving reflections on
perceived intelligibility and Speech Transmission Index measurements. AES 141st Convention,
Los Angeles (2016).
19. ISO 7240 –​Fire detection and alarm systems.
20. Mapp, P. Some practical aspects of STI measurement and prediction. AES 134th Convention,
Rome (2013).
21. Mapp, P. Speech Transmission Index (STI) measurement and prediction uncertainty. In
R. Peters (Ed.), Uncertainty in Acoustics. CRC Press, London (2020).
22. Peutz, V.M.A. Articulation Loss of Consonants as a criterion for speech transmission in a room.
JAES 19(11) (1971).
23. Klein, W. Articulation Loss of Consonants as a basis for the design and judgement of sound
reinforcement systems. JAES 19(11) (1971).
24. Szuts, T., and Schwenke, R. Speech Coherence Index: An intrinsically source independent
method to estimate intelligibility. Proceedings of 22nd ICA (2016).
251

8 Acoustic Modelling –​Basics


Stefan Feistel and Wolfgang Ahnert

8.1 Why to Model and What Modelling Can Do


Over the course of the last few decades, computer modelling has become a standard practice
for acoustic consulting offices as well as for application engineers at installation companies
and loudspeaker manufacturers. There are a number of reasons for this development. Around
the world the demand for more and better entertainment venues is growing. Similarly, more
focus is being put on reliable and high-​quality evacuation systems as regulations in this field
are tightening up. On the architectural side, the application of concrete, glass and steel as
the main elements of modern building designs is creating substantial acoustic challenges as
well. In response to these advancing requirements manufacturers of loudspeakers and sound
systems have developed –​and continue to do so –​new and innovative solutions, such as
line array systems with configurable sound radiation. New absorber products are entering
the market as well.
The complexity of the situation, i.e., customer requirements and unforgiving regulations
on the one hand and a diversity of fairly complicated technical solutions on the other hand,
makes detailed planning inevitable. Computer modelling of sound systems and room acous-
tics has therefore become the preferred process to mitigate the technical and economic risks
of a typical project; see Figure 8.1.
Speaking more practically, the goal of most projects is either to improve an existing
venue or facility, or to design a new one. Depending on the purpose of the space, aspects
relating to speech or music performances may be dominant or aspects relating to the sound
reinforcement of speech for announcements and emergency situations. Besides theatres,
churches, airports and stadiums, for example, acoustic planning is also often required in
the case of very specific applications such as drilling platforms, factory halls or cruise ships.
For all these cases, acoustic modelling software is used to validate designs, detect problems
early and compare scenarios. In particular, challenges can arise based on the geometry of
the planned or existing room, the choice and configuration of the sound system, and the
type, the amount and the location of acoustic absorption materials. For example, disturbing
reflections or echoes, insufficient signal levels, low speech intelligibility, limited trans-
mission bandwidth or an erratic frequency response can occur. These issues could obvi-
ously prevent the room or loudspeaker system from being used according to the intention.
Consequently, acoustic simulations are commonly employed to develop a proof of concept
or to verify and troubleshoot a design. Last but not least the use of modelling tools also leads
to a generally better understanding of the space in question and its acoustic characteristics.
Looking beyond analysis and design, simulations are also performed to generate output
data required by other parties involved in the project. For most projects it is necessary to

DOI: 10.4324/9781003220268-8
25

252 Stefan Feistel and Wolfgang Ahnert

Figure 8.1 Top: computer model of the main railway station of Berlin (EASE software by AFMG).
Bottom: Holoplot loudspeaker system installed at Frankfurt Hauptbahnhof (main station).
newgenrtpdf
253
Acoustic Modelling – Basics 253
Figure 8.2 Exemplary distribution of direct SPL across listening areas in a medium-​size church (EASE software by AFMG).
254

254 Stefan Feistel and Wolfgang Ahnert

Figure 8.3 Ambisonics reproduction room.

adequately document possible problems in reports as well as proposed design solutions. More
often than not the client or the contracting authority has to be convinced of the conceptual
approach. The problem and the solution approach must be presented in a way that is easily
understood by people who are not experts in the acoustic field (Figure 8.2). While objective
quantities in the form of tables and graphics will help with that, an actual demonstration of
the modelled performance of a sound system or the acoustics of a room will often be most
helpful. The acoustic characteristics of the space can be easily evaluated subjectively by
means of auralization, which is the process of making a sound in the room audible through
computational means, and without the need to actually construct the facility. For this pur-
pose, headphones (binaural reproduction) or specific loudspeaker setups (e.g., Ambisonics)
can be used; compare Figure 8.3.
Acoustic modelling is used in other specific applications as well. For example, many AR/​
VR applications rely to some degree on simulated acoustics in order to realistically repro-
duce certain sounds in the virtual environment. Acoustic simulations are also often used
for educational purposes to illustrate or teach basic concepts, e.g., to university students.
A field that is remotely related but also gaining traction is the creation of acoustic effects
for movies, games or AR/​VR scenes by using basic simulation models.

8.2 Required Data for Simulation Models


When performing acoustic simulations, the input data as well as the parameter settings
are critical. Erroneous input data will lead to erroneous results. Choosing the simulation
parameters in a way that is not adequate for the venue or computational method will also
25

Acoustic Modelling – Basics 255


lead to questionable –​if not unusable –​results. Therefore, care needs to be taken throughout
the process.
The input data for room models can be divided into three categories:

1. Model geometry
2. Acoustic materials
3. Loudspeaker data

An overview of each category will be given in the following.


It should be emphasized that any simulation can only be as accurate as the input data
feeding it. When the geometrical information is wrong, when loudspeaker data are too
rough or when absorption coefficients are only roughly estimated instead of measured, the
simulation results cannot be precise. Similarly, the accuracy of the modelling results will
always depend on the least precise parts of the model. It is useless to enter the room geom-
etry in very fine detail if the loudspeaker data, or more generally the sound source data, are
very coarse.

8.2.1 Input Data for the Model


For room acoustic investigations a geometrical model of the room is required. It must be
sufficiently accurate for the frequency range of interest. However, in practice construction
information is rarely more precise than about ±5 cm (Figure 8.4). This means that usually
reflection studies will lose precision above 4 kHz when looking at the time arrival of a single
reflection or the coherent combination of individual reflections. For the high-​frequency

Figure 8.4 3D computer model of a German church (Frauenkirche Dresden) that shows the level of
geometrical details typically used for acoustic indoor models (EASE software by AFMG).
256

256 Stefan Feistel and Wolfgang Ahnert

Figure 8.5 Illustration of scattering effects: at low frequencies (left) the fine structure of the surface
is ignored. For wavelengths of the order of the structure’s dimension, the incident sound
wave is diffused. At shorter wave lengths geometrical reflections dominate again (courtesy
Vorländer 2006).

range, a statistical set of reflections is therefore a more reliable source of information. On


the other hand, for frequencies below 250 Hz spatial fine structures of 10 or 20 cm size will
not matter as the wavelengths involved are much larger.
Some ray-tracing calculation methods account for geometrical details by means of a
scattering factor. When employing this approach, the complicated high-​frequency reflec-
tion behaviour of a detailed surface that is planar at low frequencies is converted into a
single numerical coefficient (Figure 8.5). This number describes in a statistical sense how
much of the incident sound is reflected in random directions rather than according to the
geometrical orientation of the surface element. Even though the choice of the scattering
coefficient requires some care, this approach has proved to be a reasonable solution for phys-
ically and numerically describing non-​smooth surfaces.
Similarly, some ray-tracing approaches implement an approximation of edge diffraction
effects. By means of a first-​order correction to the ray-tracing procedure, these methods
simulate related wave-​based effects by accounting for edges and corners along the way when
computing the propagation path of the ray.

8.2.2 Loudspeaker Data


For sound system designs the modelling data representing the actual loudspeakers in the
room are critical. The sound radiation pattern as well as the sensitivity and the max-
imum SPL capabilities of a loudspeaker are usually measured in dedicated environments or
laboratories in order to achieve the required degree of accuracy. For modern sound systems
that usually consist of many arrayable elements it is particularly important that both mag-
nitude as well as phase data are measured at a high angular resolution. In addition, detailed
mechanical and electronic information must be included, for example, describing the
mounting and aiming options of a line array and its filter settings. If all data are acquired
and combined carefully, the modelling accuracy even for large loudspeaker systems can be
quite high, up to ±1 dB regarding the simulated SPL across the relevant frequency range.
This precision is approximately equivalent to the limit of human audio perception.
In the last couple of decades, the representation of loudspeaker data in the acoustic
model has undergone a number of evolutionary steps. While initially the loudspeaker
data sets describing directional transfer functions have often consisted of only 1/​1 octave
257

Acoustic Modelling – Basics 257

Figure 8.6 Directivity balloon for a line array (EASE SpeakerLab software by AFMG).

magnitude data at a 10° angular resolution, modern approaches are based on 1/​24th octave
data at up to 1° or 2° angular spacing (Figure 8.6). In addition, low-​frequency models such
as BEM are used in an effort to better describe the baffle effects between neighbour cabinets
of an array which cannot be measured easily because of the size and the weight of a large
system. For some specific applications loudspeaker directivity data are also represented in
spherical harmonics instead of directional transfer functions.
A related area of research has moved into focus over the last few years. It deals with
the question how natural sources like human speakers, singers or musical instruments
can be modelled with reasonable accuracy. In this respect the frequency response as well
as the directional radiation patterns are of great interest as they vary over time and fre-
quency and depend on the talker or musician, as well. For reliable simulation results these
characteristics must be quantified in a reproducible manner. It is of similar interest which
parameters can be used to describe and configure such types of sources in the simulation
258

258 Stefan Feistel and Wolfgang Ahnert


software [1]. A third, very interesting aspect in this regard is how to account for the effect
that musicians adapt to their acoustic environment, i.e., the room that they are playing in.

8.2.3 Wall Materials


By creating the geometrical model, the primary structure of the space of interest (shape,
size, dimensions etc.) is realized. Additionally, the so-​called secondary structure must be
defined, that is, the boundaries of the room model, such as floor, ceiling and side walls, need
to be assigned corresponding acoustic properties.

8.2.3.1 Absorber Data


For room acoustic analysis it is most important to have accurate information about the
absorption properties of the surface elements. In the simplest case, the sound absorp-
tion coefficient of each particular material is known. The standards ISO 354 and ASTM
C423 have established basic measurement procedures for these data. However, these
measurements are made in a reverberation chamber and are based on random-​incidence
assumptions (Figure 8.7). They are also less precise for sample sizes or mounting types
different from the ones used in the measurement setup. This means one must be careful
when drawing conclusions about single reflections when using such data for the computer
model. In consequence, interpretations derived from modelling results will be more reliable
when many reflections are considered as a whole. Obviously, angle-​dependent data would
be more accurate for single reflections. However, they are rarely measured and are available
only for special applications.
Typically, any absorption data sets are available for the frequency range from 63 Hz or
125 Hz up to 4 kHz or 8 kHz at resolutions of 1/​1 octave or 1/​3rd octave frequency bands.

Figure 8.7 Material measurements in the reverberation chamber.


259

Acoustic Modelling – Basics 259


They are published by laboratories or manufacturers in a tabular format and can therefore
be imported easily. Usually, simulation programs also provide a database of materials, some
of them with more than 2000 different data sets.

8.2.3.2 Scattering Data


As stated earlier some calculation models allow the use of scattering data according to
ISO 17497 in order to describe the high-​frequency reflection characteristics of non-​smooth
surfaces. Most often such data are available for specific commercial products, such as diffusor
elements. They can also be derived using dedicated modelling software, e.g., BEM-​based
AFMG Reflex (Figure 8.8). However, in most cases scattering coefficients will have to be
estimated by rules of thumb. On the one hand, this is simply caused by the lack of avail-
able data. On the other hand, the scattering characteristics of a surface element cannot
be considered in isolation from the surrounding elements in the room. They have to be
estimated in the context of the nearby geometry and will therefore be model-​specific.
The scattering coefficient ranges between values of 0 (fully specular) and 1 (fully
scattering), where typically all surfaces have some amount of specular and some amount of
scattering behaviour.

8.2.3.3 Diffraction, Low-​Frequency Absorption


For low-​frequency modelling it is critical to have not only absorption coefficient data for
the surfaces but rather complex impedance data because the modal behaviour of a room also
depends on the reactive properties of its boundaries. In other words, below the Schroeder
frequency (eq. (2.11)), wave-​based calculation methods must be applied and provided
with adequate input data. Since an analytical solution is impossible for the complicated
boundary conditions of real-​world venues, numerical routines have been developed. The
finite element method (FEM), the finite difference time domain method (FDTD) and
the boundary element method (BEM) are commonly used; more details are given in the
following section.
For these methods the acoustic impedance of each individual surface element must be
known. As a first approximation, the impedance of the wall material can be derived from
the known absorption coefficient. Of course, measurements can also be conducted to deter-
mine the complex impedance of the surface, for example, using an impedance tube or a
sound intensity PU probe. However, as of today such measurements have to be taken and
applied very carefully as they are subject to many uncertainties.

8.3 Simulation Methods


When modelling the acoustic behaviour of a room or venue it is important to understand
that a space is characterized basically by its impulse response or transfer function. The room
impulse response (RIR) describes how sound radiated by one or multiple sources is received
at a defined location. This time-​based function and its Fourier transform, the frequency-​
based transfer function, can be considered the unique ‘fingerprint’ of the room. This finger-
print contains all of the information that is required to derive acoustic quality criteria as
well as to experience the room by auralization.
For this reason, all simulation methods aim at calculating or estimating the RIR. They
differ in their computational performance, memory requirements and result accuracy. The
newgenrtpdf
260
260
Stefan Feistel and Wolfgang Ahnert
Figure 8.8 Exemplary scattering and diffusion behaviour of a Schroeder diffuser computed by AFMG Reflex.
261

Acoustic Modelling – Basics 261

Figure 8.9 Schematic structure of a reflectogram.

tradeoff between these aspects depends strongly on the type of room. For simpler rooms faster
and less accurate methods may be employed whereas for complex spaces time-​consuming,
more precise modelling approaches must be used. It is often within the experience of the
user to understand in which cases one should use which approach.
In the time domain the RIR can be characterized by dividing it into three distinct parts
(Figure 8.9). The first part is the direct sound arrival from the source or sources at the
receiver. The second part consists of early, discrete reflections that have a defined level and
arrival time. The third part is the diffuse reverberation which consists of many late specular
reflections that overlap as well as scattered reflections from random directions. The point of
time at which a significant number of reflections are received and subjectively are no longer
perceived as individual reflections is called the reverberation onset.
In the frequency domain the room transfer function consists of two different parts
(Figure 8.10). For low frequencies the room response is normally dominated by modal
behaviour. In this region the transfer function therefore shows the peaks and dips of the
room modes that are excited by the source. These are usually few, which is why the response
shows a rather smooth contour. For high frequencies the density of modes is very high and
modes overlap strongly so that the course of the response function represents the statistical
average across many modes. The transition region between these two regimes is located
around the Schroeder frequency, which again is a function of reverberation time and room
volume; compare eq. (2.11).
Most simulation methods have been developed exploiting these characteristics. In the
time domain, different methods are used to determine the direct sound, the early reflections
as well as the reverberation. In the frequency domain, wave-​based methods are typic-
ally used for the low-​frequency region and particle-​based methods are used for the high-​
frequency range. An overview over the most common approaches is given in the following.

8.3.1 Direct Field


Accurately modelling the direct field of a number of loudspeakers requires high-​resolution
data for the radiation characteristics of the loudspeakers involved. It was shown in [2] that
in order to properly image physical superposition results in the computer model, for example
in the case of the elements of a line array, a point source model based on complex transfer
functions for each angular direction can be used if the angular and frequency resolution
26

262 Stefan Feistel and Wolfgang Ahnert

Figure 8.10 Exemplary room transfer function measured in a medium-​size room (EASERA software
by AFMG). Typical smooth, modal structure in the frequency range 50 Hz to 300 Hz;
typical dense, statistical structure for frequencies above 1 kHz.

are high enough. Using such data, simulation results can be accurate up to about ±1 dB for
typical sound systems. However, this so-​called CDPS (complex directivity point source)
model only works in the far field of the single element or measured transducer. It also
assumes that diffraction or boundary effects by the loudspeaker cabinets are included within
the measurements. For that reason, the detailed low-​frequency analysis of a loudspeaker
arrangement is often conducted during the product design phase using BEM (boundary
elements method), which uses a wave-​based approach in the frequency domain and can
account for any baffle or edge diffraction effects by the loudspeaker case or its surroundings.
However, it is not computationally affordable at high frequencies. Worth mentioning is
also another specialized method, which is the measurement and storage of loudspeaker
radiation data in the form of spherical harmonics coefficients. Since fairly high orders are
needed to reproduce the magnitude and phase information at high frequencies this format
has advantages primarily for single transducers and simpler loudspeakers with less complex
directional characteristics.

8.3.2 Early Reflections


Besides the direct sound, the early, specular reflections contained in the room response are
important for subjectively localizing sound sources. They also contribute to colourations
in the tonal characteristics of the source as well as to any perceived echoes. Because the
human ear is quite sensitive with respect to delay, angle of incidence and level of these
reflections, simulation models try to determine them very precisely. This is often done using
systematic or quasi-​systematic searches based on the image source model. Sometimes the
263

Acoustic Modelling – Basics 263


image source model is used directly. But as it is computationally intense for higher orders
often a ray-tracing, cone-tracing or pyramid-tracing approach is used instead to detect pos-
sible or likely reflection paths. In this case the image source method is only used in post-​
processing to refine the results.
Typically, a cut-​off order with respect to the considered number of reflections or a cut-​
off time with respect to the time length of the room response is to be defined as otherwise
the computational effort becomes too high. For determining objective quantities, such as
speech intelligibility STI, the accuracy requirements are lower than for sensitive listening
tests using auralization. Therefore, the former may be using reflection orders up to 3 or 5
whereas calculations for auralization purposes may extend up to 10 or 15 reflections.

8.3.3 Reverberation
The reverberation part of the room response is usually determined differently from the
direct field and the early reflection’s part. That is because the precise knowledge of indi-
vidual sound arrivals is not as important for late or scattered (non-​specular) reflections.
The density of reflections grows over time and their respective level declines. Therefore,
often Monte Carlo-​based statistical analysis is used to estimate the reverberant field. For
example, in AFMG’s EASE AURA software, particles are emitted by the sound source in
random directions, traced through the room and detected at receiver locations. If enough
particles are used and the randomization is statistically correct, the result will converge
quickly even though only a small fraction of all possible reflection paths are actually
included.
Sometimes the reverberant tail of the room response is artificially generated based on
results from statistical room acoustics, e.g., using the Eyring RT equation. This approach
will, however, only work in rooms that have an approximately homogeneous and isotropic
diffuse field, which is not true for most acoustically challenging environments.

8.3.4 Room Modes


The above-​mentioned methods for determining the reflective and reverberant characteristics
of a space are normally only valid in the high-​frequency limit, i.e. where the wavelength
is small compared to the characteristic dimensions of the room and its surfaces. For large
wavelengths, such as at 30 to 80 Hz, or for small rooms, such as recording studios, this
assumption typically no longer applies.
In this case, different approaches must be used in order to account for the wave-​based
nature of sound. Established methods include:

1. Finite element method (FEM): This approach is based on numerically solving the wave
equation in the frequency domain. It is mostly used for closed spaces, from loudspeaker
cabinets to small rooms. Because the space has to be meshed at a resolution finer than
the wavelength it is computationally very expensive and not feasible to use it for large
venues or high frequencies (Figure 8.11).
2. Boundary element method (BEM): Similar to FEM this approach also solves the wave
equation numerically in the frequency domain. However, it uses points only on the
modelled surface. It is often used for modelling the acoustic properties of loudspeaker
cones, baffles, diffusers and other structured surfaces. It is limited in a way similar to FEM
as the surface of interest also has to be resolved into a grid finer than the wavelength.
264

264 Stefan Feistel and Wolfgang Ahnert

Figure 8.11 Computed modal sound field of a studio room (courtesy AFMG) showing the surfaces of
equal pressure.

3. Finite difference time domain method (FDTD): Due to the obstacles faced by using
FEM and BEM, FDTD has received more attention in the past years. This approach
is applied in the time domain. It models the sound wave numerically as it propagates
through the room. While FDTD still requires a spatial grid the calculations can be
conducted in a computationally more efficient way and a consistent, broad-​band result
is obtained. However, in practice this approach is primarily employed for academic
research due to the necessary calculation times, which are still long.

Using such methods, low-​frequency room modes, pressure distributions as well as transfer
functions at receiver positions may be calculated. As a practical example, even if the
receiver has no direct line of sight to the position of the source, e.g., if it is located behind
a pillar, the diffracted direct sound is computed at the receiver. As an approximation of
this wave-​based approach, some particle-​or ray-​based simulation programs have introduced
edge diffraction models. These allow accounting for diffracted sound, for example, from the
orchestra in the pit to the visually shadowed stalls (see references 8 and 9 in Chapter 5).

8.4 Numerical Optimization


Over the past few years, the widespread use of loudspeaker arrays and the increased
accuracy of simulating their radiation pattern have led to another technological
265

Acoustic Modelling – Basics 265

Figure 8.12 Numerical optimization scheme for sound system configurations as used by AFMG
FIRmaker.

innovation: beam-​steering based on FIR filters. This approach uses modelling results for
the unprocessed loudspeaker systems in order to calculate FIR processing configurations
specifically for a given venue or audience layout. In this manner the sound radiation of loud-
speaker arrays can be optimized to match the geometry of each individual space as well as
possible. In addition, it is also possible to avoid the radiation of unwanted sound into other
selected parts of the room.
The FIR filters are typically derived using numerical optimization methods. The room
geometry, the location of the sound sources and the design goals are given as input values.
The algorithm then evaluates a large number of possible sound system configurations, i.e.
FIR filters, and tries to find those configurations that yield the best results with respect to
SPL, coverage, power efficiency and other criteria (Figure 8.12).
The improvements that can be achieved by this kind of optimization can be substan-
tial, for example, with respect to SPL uniformity. The diagrams of Figure 8.13 show the
measured frequency response on-​axis of a line array of 16 boxes covering a distance of
about 70 m. These so-​called positional maps show the colour-​coded level as a function
of frequency and distance from the array. The map on the top depicts the unprocessed
array whereas the map on the bottom displays the optimized results using one FIR filter
per box.
Obviously, the level distribution across the hall has become much smoother for the
entire frequency range. The standard deviation dropped from about 2–​3 dB between 200 Hz
and 6 kHz to about 1–​1.5 dB. Generally, pattern control is improved up to 13 kHz.
Given today’s abundance of computing power even in conventional desktop PCs as well
as the increasing capabilities of modern loudspeaker systems with respect to mechanical
control and signal-​processing, numerical optimization solutions will become a standard tool
in the future. Already now, so-​called Auto-​Splay features are widely provided by software
that is used for mechanical aiming of line arrays, such as AFMG’s EASE Focus. At this
point in time the first mass-​production 2D loudspeaker arrays, such as those offered by the
manufacturer Holoplot [3], are also entering the market and provide beam-​steering and
26

266 Stefan Feistel and Wolfgang Ahnert

Figure 8.13 Positional maps showing an example of improvement of SPL uniformity when using FIR
numerical optimization. Top: without FIR optimization. Bottom: with FIR optimization.

beam-​shaping functions in the horizontal as well as vertical planes instead of being limited
to just the vertical plane.

8.5 Numerical Methods of Geometrical Acoustics

8.5.1 Image Source Method


The most prominent example of the image source method is the direct computation of
reflection paths using mirrored sources. This algorithm represents a systematic search for
possible reflections. It can have different abortion criteria, such as by reflection order or
time of propagation.
The essential steps of the algorithm consist of

1. Selecting a source and a receiver location.


2. Mirroring the selected source geometrically at the plane of every surface element of
the boundary. This yields N image sources when N is the number of surface elements.
3. For each image source the connecting line to the receiver location is determined.
⇨ If the intersection point of this line with the mirror plane is inside the corresponding
surface element, there is a reflection path from the original source to the receiver
(Figure 8.14a).
267

Acoustic Modelling – Basics 267

Figure 8.14a Image source method. Top: construction of image source S1 by mirroring at wall W1.
The connection from image source S1 to receiver E determines intersection point R1.
Bottom: construction of the (possible) reflection using the point R1.

Figure 8.14b Image source method. Construction of image source S2 by mirroring at wall W2.
Intersection point R2 is outside of the actual room surface. The reflection is impossible.

⇨ If the connecting line bypasses the surface element there is no geometrically pos-
sible reflection (Figure 8.14b).

Obviously, this procedure can be applied to determine first-​order reflections. It can also
be used recursively in order to calculate higher-​order reflections. Additionally, due to
the reciprocity of the algorithm the receiver could be mirrored instead of the source
as well.
268

268 Stefan Feistel and Wolfgang Ahnert

Figure 8.15 Ray tracing. Rays are stochastically radiated by a source S in random directions. Some hit
the detection sphere E after one or more reflections. In this example, the rays representing
the floor reflection RF and the ceiling reflection RC are shown. The direct sound path
indicated by D is computed deterministically.

While this method provides the complete set of possible reflections it is at the same
time computationally very expensive (the effort grows by N³). Therefore, it is mostly used
for academic or theoretical investigations or for determining first-​order reflections only. In
practice different methods are required to compute reflectograms in complicated rooms and
at acoustically relevant reverberation times. It also has to be noted that this direct image
source algorithm cannot account for scattering effects.

8.5.2 Ray Tracing Based on Monte Carlo Methods


Commonly used ray-tracing methods often combine the image source method with the
use of a Monte Carlo approach. Typically, such a simulation models rays that are ran-
domly emitted by the sound source of interest. Each ray’s propagation through the room is
calculated by determining the next intersection point with a boundary or surface element
of the room. Then the direction of the ray is reflected at the plane of intersection so that it
faces the room again (Figure 8.15). Subsequent intersection points along the geometrical
reflection path are calculated as well.
Whenever the algorithm checks for impact at the boundaries, it also checks if the ray
hits a receiver in the room. For this purpose, receivers are modelled by spherical volume
objects: detection spheres or counting balloons. Whenever such a sphere is hit by a ray
the related reflectogram is updated with the ray’s information about impact time, SPL and
direction. At this point, many algorithms use the image source method to verify whether
the determined reflection path in fact is valid. This check is a must for highly accurate
calculation because the detection sphere may be relatively large. Some of the rays hitting
the boundary of the sphere may not be valid for the centre point of the sphere, i.e., the
actual receiver location. Also, multiple detections of the same path can be determined and
discarded at this point.
It is an obvious advantage of the Monte Carlo-​based method that it converges more
quickly to a practically useful result than the direct image source method. Due to its statis-
tical, non-​systematic characteristics the most prominent reflection paths carrying most of
the reverberation energy are found. Reflections by a series of small, possibly insignificant
faces are largely ignored.
269

Acoustic Modelling – Basics 269


It is also advantageous that the Monte Carlo-​based method allows the inclusion of
scattering effects in a very natural way. One common implementation is to decide at each
reflection point based on the scattering coefficient of the surface whether the propagation
happens specularly (angle of incidence equals angle of exit) or in a random direction.

8.5.3 Derivatives and Tail Extensions


There are various approaches that try to further optimize the above concepts with respect to
certain aspects, e.g., the performance for certain types of geometries, the precision of early
and later arrivals, etc.
For example, the above algorithm using a Monte Carlo simulation does not necessarily
have to be random-​based. A systematic or quasi-​random search for reflections using ‘for-
ward’ ray-tracing can still be much more efficient in practice than the image source method.
Similarly, the so-​called cone-tracing approach does not use rays that are sent into the room
but determines the intersections of a cone originating from the source (Figure 8.16). This
method can be used for systematic scans of the room while ignoring surface elements or
reflection paths with low probability. In a similar way pyramid-tracing approaches are used.
All of these algorithms basically differ primarily with respect to the geometrical form of
the traced ‘beam’ and the way of splitting it when the beam hits multiple surface elements
at once.
In complex or large rooms, a significant quantity of rays may be required to model
the response of the room up to a detection time that allows the prediction of acoustic
parameters such as the reverberation time. This can be both computationally expensive as
well as memory-​consuming. However, for most rooms the late part of the room response is
largely diffuse and contains little information about discrete reflections. That is why there
are numerous concepts that use estimation methods that try to derive the late diffuse field
from the early part of the room response as well as from geometrical information.
For example, tail estimation methods try to extrapolate the early decay of the rever-
berant energy to later times. They may also generate concrete late reflections derived from
reflection patterns detected in the early part. Overall, the goal of this class of methods is to
artificially generate a full impulse response or reflectogram based on limited data obtained
from one of the ray-tracing approaches introduced earlier.

Figure 8.16 Pyramid-​or cone tracing. Schematic illustration of a beam tracing approach in two
dimensions. Cones with a defined opening angle are used to scan the room starting from
the sound source S. Receivers E located inside a cone are detected and validated.
270

270 Stefan Feistel and Wolfgang Ahnert

Figure 8.17 Radiosity method. Patch P3 is illuminated and excited by radiation from patches P1 and
P2. It may also radiate sound itself.

The radiosity method is another concept that tries to model the diffuse field itself. In
this case the original sound sources are not considered, but the surface elements or surface
patches are regarded as diffusely radiating boundaries instead. Each surface element is
considered as an emitter of sound energy that is sent into the room and onto other surface
elements (Figure 8.17).
Another way to improve performance and memory requirements of the Monte Carlo
approach above is to use particles instead of rays. The basic idea is that sound particles are
emitted by the sound sources and detected by receivers. However, their propagation path is
not stored and duplicates are considered in a statistical way. In other words, the particle does
not carry any information except for its energy, propagation time and travelling direction.

8.5.4 Wave-​Based Extensions


It is noteworthy that the methods explained above are based on geometrical models. It was
outlined earlier that these models are only valid for the frequency range roughly above the
Schroeder frequency (refer to Chapter 2). Their accuracy is low when applied to lower fre-
quencies. On the other hand, wave-​based models have been introduced as well (see overview
in section 3.2) and have been shown to be quite accurate for low frequencies but computa-
tionally still infeasible for higher frequencies. For this reason, several proposals have been
published for a combined model based on both geometrical and wave-​based acoustics. One
approach demonstrates that the different simulation methods can be executed separately
and their results can subsequently be joined using an adequate crossover function.

8.5.5 Performance Considerations


One of the biggest concerns when using numerical methods based on geometrical acous-
tics or wave-​based acoustics is computation time and required memory. While geometrical
methods are often faster and more efficient memory-​wise than wave-​based methods, quite
often they are still practically infeasible. Large stadiums, railway stations or airports with
high volumes and comparatively high reverberation times can take several hours up to days
of calculation time. Depending on the number of receivers and the result resolution, calcu-
lation runs may also require a high amount of computer memory.
271

Acoustic Modelling – Basics 271


Therefore, from a performance perspective a range of improvements have been applied.
For example, on the one hand multi-threading can be used for many algorithms as the
number of available threads can be spread easily across sound particles, rays or sound sources
if they can be computed relatively independently. On the other hand, appropriate space
partitioning approaches and other log-​based search methods can be used for efficiently
detecting intersection points of rays with surface elements or receiver spheres. Specific
shader codes running on GPUs [4] also seem to offer speed improvements for certain kinds
of ray-tracing calculations.

8.5.6 Limitations
When considering the numerical methods mentioned above, one must be aware that
any approach has its shortfalls, inaccuracies and limitations. One of the most important
limitations of the geometrical approach has already been stated: typically, these methods
cannot provide accurate information in the low-​frequency range. It is also a limitation of
typical ray-tracing approaches that if they are supposed to provide highly accurate results,
e.g., for listening purposes, they require large amounts of memory in order to store time,
level, directional and spectral information. Therefore, these can only be run in parallel for
a limited number of receiver locations.
Still, much of the result data may have to be compacted to some degree. For example,
reflectograms or impulse responses may be using a reduced time resolution of 1 or 5 ms, at
least for the late part. The computed time length of the response may have to be limited as
well. Such adaptations will obviously affect the result accuracy.
In many cases it is also important to consider how the algorithm treats phase relationships
between sources or reflections. The coherent radiation of sound, for example with respect
to the elements of a line array, requires that phase relationships are maintained. Discarding
phase information originating from the source radiation characteristics and the propaga-
tion path, as is often done for example when using particle models, will only work if the
phase response of one signal arrival (either direct sound or reflection) can be considered as
random (incoherent) relative to any other arrival. Obviously, there are other limitations as
well that stem from the underlying physical model for the simulation process. For example,
many algorithms assume that any reflections happen locally, i.e., that the surface absorbs
and reflects sound only at the point of intersection with the ray. It is also often assumed that
the propagation medium is homogeneous and isotropic, which is not true, for example, if
there are temperature gradients or air flow.
Another limitation is possibly imposed by the number of receivers, sources and surface
elements that are supported by the modelling algorithm. Calculation times may be very
long or the calculation may not be possible at all due to memory limitations. Similarly,
the number of particles or rays that are used for the simulation may be limited. This could
lead to results that are not as accurate as required because the number of rays is not suffi-
ciently high.

8.6 Model Data and Output


The modelling process yields a range of results. On the one hand, simulation results are used
for objective evaluation as well as for subjective validation such as listening to the room
being built. On the other hand, the model of the room and of the sound system provides
useful documentation for the project itself.
27

272 Stefan Feistel and Wolfgang Ahnert


With respect to objective quantities the results of the simulation are often presented
as mappings, where location-​dependent acoustic parameters, such as SPL, clarity or
speech intelligibility, are shown as colour or contour plots in three dimensions relative
to the room geometry. The data for the mapping points are computed and displayed
directly on top of the surface of the room, on virtual mapping planes representing
audience areas, or at representative locations, i.e., well-​defined receiver points. Room
acoustic parameters such as reverberation time and others defined by ISO 3382 or STI
defined by IEC 60268-​16 are typically derived from impulse response data or echograms/​
reflectograms determined at the mapping locations (Figures 8.18 and 8.19). However,
these result data sets are usually not as detailed as those data sets that are generated for
auralization purposes.
Based on the visualization of mapping results, a more detailed analysis is often
performed when locations are recognized where important parameters such as signal level
or speech intelligibility are not within the acceptable range and must be improved. For
these spots, arrival times and levels are studied, reflection patterns are analysed, and pos-
sibly necessary wall treatments are investigated. Such a study is usually the result of a
thorough analysis that cannot be conducted for all mapping locations but only for a few
selected positions.
Similarly, high-​end auralization results are often derived only for a few representative
receiver locations. These results are typically available either as binaural filter files for repro-
duction via a headphone or stereo loudspeaker setup or as B-​format files for reproduction
using an Ambisonics-​based loudspeaker arrangement (Figures 8.3 and 8.20). More details
are discussed further below.
Another important set of modelling results is related to the configuration of the room
and of the sound system. This is important for reporting and documentation purposes and
also for presentations to a client or for use on site.

1. For example, the modelling process may yield a fairly detailed table about the proposed
sound system components, which brands and models are used, where loudspeakers
should be positioned and aimed and how they should be configured with respect to
gain, delay and filter settings.
Especially for line array systems it can be very helpful for system engineers to receive
the planned configuration of the system ahead of the event since they need to consider
the number of boxes required as well as the splay angles between individual elements
(Figure 8.21).
2. In a similar way, the acoustic treatment of the room can be documented in order to
generate reports concerning the acoustic materials that have to be used, for example,
to achieve a certain target reverberation time (Figure 8.22). Such a list may also be part
of a set of documents supplied for a tender.
3. Last but not least, the room geometry itself can be documented in order to either show
the underlying data and assumptions made or the proposed changes in order to improve
the acoustic performance. This could include, for example, the positions of absorbers or
the shapes of mounting niches for loudspeakers.
Obviously the room geometry can also be exported in other ways such as in a CAD
file format such as DWG or SKP.
4. Most importantly, graphics of relevant calculation results and modelling data are
created in order to be made available to other parties working on the project, to be
included in reports or in documents for a tender.
newgenrtpdf
273
Acoustic Modelling – Basics 273
Figure 8.18 Clarity C80 results shown as 3D mapping for theatre model (courtesy EASE 5 AURA by AFMG).
274

274 Stefan Feistel and Wolfgang Ahnert

Figure 8.19 Typical example for a result echogram generated by ray tracing simulation methods (cour-
tesy EASE 5 AURA by AFMG).

Figure 8.20 Binaural setup with HRTF selections by head tracker. Blue: Right ear channel. Red: Left
ear channel.
275

Acoustic Modelling – Basics 275

Figure 8.21 Part of a typical project report (courtesy EASE Focus 3 by AFMG).


newgenrtpdf
276
276
Stefan Feistel and Wolfgang Ahnert
Figure 8.22 Computer model of a theatre with different acoustic materials assigned to walls, ceiling and floor (cour-
tesy EASE 5 by AFMG).
27

Acoustic Modelling – Basics 277

8.7 Modelling Reliability


When using simulation software, it is important to recognize that the results of the model-
ling process depend strongly on the quality of the input data. If the input data are rough, i.e.,
if there are high uncertainties, the output data will have high uncertainties as well. In add-
ition, assumptions made by the calculation engine and by the algorithms contribute to the
uncertainty budget as well. The user must therefore be aware that the quantitative results
achieved are only precise to the degree of the uncertainty. Unfortunately, modern simula-
tion software does not report prediction uncertainties yet. That is why the interpretation of
results requires not only expert knowledge but also practical experience.

8.7.1 Input Data


All of the input data for the acoustic simulation are connected to uncertainties.

1. The room geometry can only be entered or modelled as accurately as the dimensions
that are given by architectural drawings or on-​site dimension measurements. For
new rooms or venues, it is also foreseeable that the final space may not be built pre-
cisely according to the blueprint. In either case, dimensions may simply be wrong
by a few centimeters. Additionally, details may be forgotten or omitted on purpose
in the simulated room. It makes little sense to model a power plug or the knob of
a door for acoustic purposes because their effect is negligible at low frequencies
and at high frequencies calculation results (and uncertainties) are dominated by
other contributions. However, some other geometrical elements of the room may or
may not affect the end result significantly, e.g., small windows, door steps, ceiling
grids etc.
2. The acoustic materials used for the simulation contribute to the overall uncertainty
as well. First of all, in many cases there are no accurate absorption data available for a
modelled wall, floor or ceiling. In this case assumptions have to be made or dedicated
absorption measurements have to be conducted. Secondly, even if there are reasonable
measurement data available they are limited to the assumptions and conditions made by
the measurement standard, such as ISO 354 or ASTM C423. Depending on mounting,
size and circumference, the actual absorption characteristics of the material in the room
will be somewhat different from the data acquired for the standardized sample in the
reverberation room. Last but not least, most absorption data sets are published only at
one-​octave resolution and only for the range 125 Hz to 4 kHz. Therefore, extrapolation
and interpolation steps are required for many calculation approaches, e.g., if they work
with ⅓ octave data from 50 Hz to 20 kHz.
It is even more difficult to acquire additionally required data such as the scattering
coefficient of a surface material or structure. Often these have to be estimated based on
experience.
2. The data available for the sound sources used in the model play a significant role as
well. For most applications it is critical that the simulated loudspeakers have been
measured at high resolution, i.e., using impulse response or equivalent data, to deter-
mine the directional characteristics used in the simulation. These directional data sets
must include phase information and have a frequency resolution that is high enough
to avoid interpolation artefacts. Also, the angular resolution must be high enough to
avoid any sampling problems.
278

278 Stefan Feistel and Wolfgang Ahnert


For loudspeakers with multiple drivers, and particularly for loudspeaker arrays, it is
important to measure and model the individual sound sources, such as the transducers,
rather than the far-​ field characteristics of the entire system. This is particularly
considered by the CDPS model [5] mentioned earlier and the practical implementa-
tion of it called the Generic Loudspeaker Library (GLL), which has found widespread
adoption in the audio industry. Additionally, for line arrays the accuracy of the box
drawings and box splay angles is critical as well. Small deviations, e.g., of 0.1 degree,
easily accumulate over the length of the array and may change the overall coverage
pattern of the system measurably, especially at high frequencies.

It has been shown in various publications that if the quality of all input data is sufficiently
high the direct field predictions are accurate within 1 dB compared to measurements. Room
acoustic results such as for the speech intelligibility index (STI) can be within the just-​
noticeable-​difference (JND), as well. However, if some of the input quantities are less pre-
cise, their effect on the result accuracy will depend on how much they contribute to the
overall result. This is not always easy to estimate.

8.7.2 Model Calibration


To reduce the apparent uncertainty of the modelling results, in practice the simulation
model is often calibrated. This means first of all that some acoustical measurements of
the room are taken –​if it exists yet. After that the input data with the highest estimated
uncertainty are varied in a way that the results computed for the existing room match the
measurements. As a next step, modifications to the room or sound system can be modelled
easily and more accurately. A very typical kind of calibration is the tuning of the room’s
RT by varying the absorption data of materials that are not very well known (Figure 8.23).
While obviously this approach allows adapting the absolute simulation results to be closer
to the absolute results from real-​world measurements, still no statement can be made about
the uncertainty.

8.7.3 Modelling Engine


It is similarly important to consider the uncertainty contributed by the simulation
algorithms and their underlying assumptions.
For example, if the algorithm does not handle phase information for loudspeakers, it will
be mostly inaccurate with respect to predicting the performance characteristics of groups of
coherent, interacting sound sources, such as the elements of a line array system. As stated
before, geometrically based ray-tracing algorithms will not be able to simulate the low-​
frequency behaviour of a small room very well.
In fact, many computational approaches ignore effects that are of secondary practical
relevance in most cases but may be dominant in some. For example, usually prediction
algorithms assume that the propagation medium (air) is homogeneous and isotropic, which
means that they do not consider the effect of temperature gradients on the propagation
of sound or the effect of light winds or air movements on the coherence of signals from
different loudspeakers. Most ray-tracing algorithms also assume that the surface materials
are locally reacting, meaning that the surface area of a reflecting wall contributing to the
reflected wave is very small. In reality this is not true for some materials especially at low
279

Acoustic Modelling – Basics 279

Eyring reverberation time calculated for a medium-​


Figure 8.23  size church (courtesy EASE 5
by AFMG).

frequencies where, for example, a large part of a light-​construction wall can resonate when
a smaller part is excited by an incident sound wave.
It is equally important to precisely understand the settings and parameters for the calcu-
lation. In the simplest case, choosing a resolution setting for mapping on surfaces that is too
rough may easily lead to errors. Similarly, choosing too few particles may lead to erroneously
detected echoes or missing major reflections that will be established properly only when
using sufficiently high particle quantities. Sometimes there are other complex settings that
can be modified by the user and that may have a less clear effect on the end results.

8.7.4 Interpretation and Validation of Results


Assuming that the calculation results are reasonably accurate it is still important to consider
the actual definitions of the various acoustic parameters and quantities. Figures like rever-
beration time RT, clarity C50 and C80, or speech intelligibility STI are computed according
to standardized definitions. These definitions are subject to assumptions made concerning
the measurement signal, the frequency resolution, time cut-​offs, certain characteristics of
280

280 Stefan Feistel and Wolfgang Ahnert


the human hearing and signal-​processing aspects, for example. Correctly interpreting the
calculation results requires some background knowledge about the acoustic quantities, their
applicability and limitations.
Obviously, calculation results can be validated by comparing quantities from the simula-
tion with the same quantities acquired through measurements in the actual room. If these
quantities do not match within their respective range of uncertainty the results can be
considered erroneous. If the uncertainty is unknown, however, a comparison and validation
is much more difficult. In this case estimates can be helpful. Calculation results can also be
validated with respect to the subjective impression. If the simulated parameters are within
the just-​noticed-​difference (JND) of the measured parameters the difference will not be
noticeable for human hearing. For most practical applications this is accurate enough.

8.8 Auralization
The goal and purpose of auralization are to make the properties of the room and sound
system audible. This requires several steps (Figure 8.24):

1. The detailed response function for the listening point(s) has to be computed. This typ-
ically includes information about the direct sound arrivals as well as reflections with
respect to arrival time, level, direction and frequency response. This data set is not yet
specific for any type of receiver or listening setup.
2. For binaural reproduction, e.g., via a headphone, the response function is convolved
with the head-​related transfer function (HRTF) selected for the modelled listener. For
each ear, the individual arrivals are weighted based on the HRTF according to their
direction of incidence. This yields a set of two monaural impulse responses, one for
each ear. It is a disadvantage of binaural setups that in-​head localization may occur if

Figure 8.24 Schematic overview of binaural auralization process.


281

Acoustic Modelling – Basics 281


no head-tracking is used. When using head-trackers, the orientation of the head of the
human listener can be detected while listening, and the HRTF can be adjusted accord-
ingly. This tool, typically in combination with a rough visualization of the venue, gives
a fairly plausible impression of the acoustic conditions at the chosen listening location.
For spatial reproduction using a loudspeaker setup, such as Ambisonics, B-​format
filter files are generated based on the response function. This yields a set of filter
impulse responses, e.g. four filters for B-​format (first order) or nine filters for B-​format
(second order). It is a disadvantage of this technique that higher orders provide a more
accurate listening experience but significantly reduce the size of the sweet spot where
the listener must be located.
3. In the final step the filter data set can be convolved with any dry signal, such as recorded
music or speech, to listen to the signal as it would sound in the simulated room at the
selected listening location.

Auralization is a very powerful and effective tool to present acoustic problems and solutions
to laymen. It is also a good tool to obtain a general acoustic impression of a space at various
locations in the room.
For practitioners of auralization a number of aspects are important to consider.
Auralization in general cannot and must not be the only basis for acoustic design
decisions or even commercial tenders. Auralizations are always subjective as they depend
on the perception of the listener and therefore can only complement objective, quanti-
tative results. When working with laymen in the field of acoustics auralization is a good
tool to create a general awareness of different acoustic effects and more clearly illustrate
the difference between different design options. In contrast, professional listeners, such
as musicians, may often be distracted by the perceived artificiality of the auralization
as a simulation result cannot accurately reproduce all details of a real-​world listening
experience.
It should be clear that auralization results are limited by the same factors that were
discussed before for general simulation results. For example, to this day, no available simu-
lation program can adequately model the low-​frequency behaviour of rooms, particularly
also diffraction effects. Aspects that are not considered by the simulation engine will be
missing in the auralization result as well. Comparably, if the input data into the simu-
lation are low-​quality the auralization results will suffer as well. A typical example is
insufficient information about the absorption characteristics of the wall materials. As
a consequence, simulations of venues with high-​quality demands on acoustics, such as
concert halls, should at present be generally complemented by scale-​model measurements
and studies.
Finally, auralization is a tool that builds on top of modelling results. The methods and
circumstances used for the auralization will also affect the accuracy of the reproduction.
Both binaural and spatially based auralization will always add uncertainties and detrimental
effects as indicated above. That means that even if the simulation results are highly accurate
a poor reproduction setup may still cause unconvincing auralization results.

References
1. SEACEN project, Weinzierl. 2017. A database of anechoic microphone array measurements of
musical instruments.
28

282 Stefan Feistel and Wolfgang Ahnert


2. Feistel, S. Modeling the Radiation of Modern Sound Reinforcement Systems in High Resolution
(Berlin: Logos Verlag, 2014).
3. Holoplot GmbH, Germany, https://​holop​lot.com/​.
4. Shader programming, https://​en.wikipe​dia.org/​wiki/​Sha​der.
5. Feistel, S., and Ahnert, W. Modeling of loudspeaker systems using high-​resolution data. J. Audio
Eng. Soc., vol. 55, no. 7 (2007).
283

9 Audio Networking
Stefan Ledergerber

9.1 Introduction
Since the late 1990, the professional audio industry has been shifting from point-to-​point
digital transmission formats (such as AES/​EBU or MADI) to IP-​based standards (such as
AES67). This packet-​based networking has brought massive flexibility, as well as enhanced
control and monitoring capabilities, to audio systems. It offers the flexibility of a physically
fixed installation becoming adaptable and expandable at a later stage through software con-
figuration and updates. Signal paths are no longer tied to physical cables but can be changed
at any time with the click of a mouse –​without the need for dedicated audio routing hard-
ware. It is in the nature of packet-​oriented transmission that audio signals automatically
reach the desired destination via the IT network.
Launched in 1996 by Cirrus Logic, CobraNet is widely regarded as the first successful
audio-​over-​ethernet network implementation and has become the backbone of many audio
installations, such as convention centres, theatres, concert halls, airports and theme parks.
There are still plenty of CobraNet installations, but issues with a relatively high latency
and limited scalability restricted its suitability in latency-​sensitive applications such as live
sound, recording studios and broadcast facilities.
Dante, developed by Australian company Audinate [1] and introduced about 10 years
after CobraNet, stands for ‘Digital Audio Network Through Ethernet’. Dante offers sev-
eral major benefits over the first generation of audio-​over-​IP technologies, including
better usability and higher compatibility with standard network infrastructure. Dante
benefits from a huge equipment ecosystem with thousands of devices by hundreds of
manufacturers.
Before Dante reached its current position of dominance, there was considerable excite-
ment around a technology called AVB (Audio Video Bridging) due to its robust nature and
high level of automatic configuration of AVB-​capable network hardware. Other industries
such as automotive and industrial automation adopted AVB and gave it a more general
name, as it no longer relates just to audio and video applications: AVB was renamed as TSN
(Time-​Sensitive Networking) by the developing group of cross-​industry manufacturers, the
AVnu Alliance [2]. Subsequently the Milan working group, a consortium of audio/​video
manufacturers, decided to develop a more refined specification for use in professional audio/​
video systems, called Milan. It is a specific version of TSN focusing on providing interoper-
ability amongst audio/​video vendors. This was not given using the basic TSN specifications.
However, TSN requires special IT hardware to take care of audio requirements and there
is only a limited number of switch models available that support TSN. Furthermore, it has
severe limitations in terms of size and scalability of installations. For these reasons this

DOI: 10.4324/9781003220268-9
284

284 Stefan Ledergerber


chapter does not focus on TSN/​Milan but instead elaborates on how to use standard IT
hardware for real-time audio applications.
The transition to IP networks may be compared to the transition from analogue to digital
audio: some initial installations using the new technology may exhibit shortcomings in
handling or reliability compared to the traditional approach, but these will disappear over
time. Hence, connecting audio signals will soon be no longer possible without the use of IT
network infrastructure. But there are areas where IT networks work fundamentally differ-
ently from traditional audio routing. Firstly, a standard IT network is not designed to meet
strict timing requirements as are common in audio. In a network environment, data packets
may get held up on their way by other packets on the same path, showing a significant
variation in time of arrival. With traditional audio cables, the timing of data transmitted
was not altered by the cable. Secondly, losing a packet may seem acceptable for regular IT
applications, as they get re-​sent automatically if lost. But for audio applications, for reasons
of minimizing latency, it is imperative that packets must arrive the first time since there is
not enough time for re-​sending. If some get lost, they instantly cause audible interruptions
in the signal. The common cause of packet loss is overload of links.
Therefore, just as roads get optimized to avoid traffic congestion, audio networks need
to be designed in such a way that there is enough bandwidth for all their users. If this
is the case, packet delivery will be on time, as required by audio applications. Ensuring
this is the motivation for an audio user to understand in some depth how IT networks
work and how their behaviour can be influenced by the configuration of their components,
e.g., switches and routers. But instead of a deep dive into understanding all details of IT
networks, it is often easier to over-​provision the network to such extent that all packets
arrive on time without further refinement of the switch configuration. In concrete terms,
this means building IT networks with sufficient bandwidth and using them exclusively for
audio applications, hence not mixing them with common office applications.
Most audio-​over-​IP technologies are developed under the assumption that the under-
lying network will perform as required: no packet loss and no severe collisions with other
packets should occur within the switch hardware. In some networks, especially if audio and
other traffic are mixed, it is important to give audio and synchronization packets priority
over others, e.g., office applications such as internet browsing or file-​copying. This can be
achieved with most commercially available switches today.
The focus of this chapter is low-​latency applications using uncompressed audio signals,
such as high-​quality audio productions. These basically have the same requirements on an
IP network as with AES/​EBU or MADI. All explanations given here therefore apply to all
audio network technologies supporting the AES67 standard, including

• Dante by Audinate [1]


• Q-​LAN by QSC [3]
• RAVENNA by Lawo and Partners [4]

9.1.1 Advantages and Disadvantages of using Audio-​over-​IP


As an overview, the use of IT networks for audio connections offers the following advantages:

• Flexibility to add or modify audio connections without changing cables


• IT offers a wide range and functionality at a very low price, also known as economy
of scale
285

Audio Networking 285


• Adaptation and integration into the IT network infrastructure without the need to
install specific audio or video cabling
• Video signals and control data can be transmitted in the same infrastructure without
additional effort.

On the other hand, audio-​over-​IP networks may confront the user with the following
disadvantages:

• Since several audio samples of a channel are usually put into one packet to improve
overall efficiency, there is a given minimal latency, since the sender must first wait for
the audio samples made available before sending them over the network. This latency is
normally higher than with point-to-​point digital audio standards but can be minimized
and approximated by using optimal packet formats and network setups.
• Since IT networks are not deterministic in terms of travel time of packets, a safety
margin in the form of an audio buffer must be inserted on the receiver end. This buffer
results in added latency. The fewer packet collisions are present in the network, the
more this safety margin (and therefore the latency) can be reduced.
• Added complexity by the variety of audio packet formats, requiring receivers and
senders to be aligned to the same settings. The complexity of audio-​over-​IP technology
is significantly higher than with previous technologies. The industry still has significant
work to do to reduce this complexity for the user by introducing intelligent and user-​
friendly software solutions for managing audio networks.

9.1.2 Phase Accuracy


In most audio applications, synchronized behaviour of multiple devices is critical. In
stereophonic recording situations or sound reinforcement with multiple speakers, phase
accuracy between microphones or speakers is an absolute requirement. When multiple
loudspeakers are connected to an amplifier and all channels arrive in one audio packet,
there is no risk of the channels being out of phase with each other, as the audio samples
within a packet cannot shift between each other by the transmission though the network.
However, in an increasing number of applications, multiple amplifiers and processors
receive audio packets independently, while still being required to reproduce audio signals
in a phase-​accurate way. Hence, a specific audio packet may be received by multiple net-
work devices, buffered, and then needs to be played out at precisely the same time. Since
there are no tight timing specifications in IT networks as to when packets are forwarded
and arrive, audio networks require a means of synchronization between these receivers.
This is an important yet new issue in audio networks. Fortunately, the concept described
hereafter has caught on and is currently used in all audio network standards as well as
for video.
All devices within an audio network are synchronized with an absolute time according
to the Precision Time Protocol (PTP). This means that their internal clocks (PTP follower)
are derived from a reference clock device (PTP leader). This device can be any audio device
that provides this function or even a product specifically designed to generate accurate PTP
clocks. The leader is selected by a user setting or alternatively by a standardized automatism.
The ultimate requirement for all devices, whether audio transmitters or receivers,
is accurate synchronization to this given time. At the exact moment an audio packet is
sent, it is provided with a time stamp indicating when it was sent. The user sets a constant
286

286 Stefan Ledergerber

Figure 9.1 Link offset determines latency.

Figure 9.2 Phase coherence by identical link offset.

time offset at all receivers, the so-​called link offset. When a packet arrives at a receiver, it
remains in its buffer until it is time to play it out. Hence, the moment the audio gets played
out equals the sending time plus the link offset.
All receivers (e.g., loudspeakers with built-​in audio-​over-​IP connectors) can achieve
phase accuracy amongst each other under two conditions:

1. Accurate time synchronization to the PTP clock leader (identical time base)
2. Identical link offset value set by the user in all receiving devices

Consequently, the link offset must be chosen based on the worst-​case delay of all connections
in question. It is recommended to include a certain allowance in case of unforeseen
deviations of the package delivery times.

9.1.3 General Requirements for Audio-​over-​IP Connections


As a summary on a general level, the following points are to be considered to enable an IT
network for high-​quality audio connections:
287

Audio Networking 287


1. Connectivity
IP packets must arrive at the receiver on time and must not be lost on their way. Packets
must never arrive later than indicated by the link offset.
2. Synchronization
To achieve phase-​accurate recording and playout of audio data, all devices must have a
clock which is precisely synchronized with the PTP leader.
3. Connection management
In audio networks, receivers need to know which combination of samples and channels
is used by the sender to form the packets, the stream format. In multicast mode (see
below) the receiver also needs to know about the multicast address used since it needs
to subscribe to it. This type of information is part of the connection management. All
this information can either be registered in senders and receivers manually by the user
or be set in the sender only and then submitted automatically to all receivers interested
in getting this stream. This mechanism of finding senders and their streams on the net-
work is called device discovery, followed by stream discovery.

In the following text, these three topics will serve as a guideline for illustrating all relevant
aspects of audio networks and comparing the technologies available today.

9.2 Connectivity
This chapter covers some of the aspects that are essential for understanding the requirements
of audio networking. It also introduces terminology that will be referred to in the following
sections.

9.2.1 Network Terminology


A computer network consists of nodes, for example physical devices connected to each
other via links that provide a certain communication bandwidth (bits per second). Nodes
can be redistribution devices such as switches or routers as well as end points (hosts).
Nodes contain connectors (interfaces), which are sometimes also referred to as (hard-
ware) ports.
Note: The term port in the context of physical networks can lead to misunderstandings
since the same term is used in the context of software to designate network services. It is
therefore recommended to use the term interface when referring to physical connectivity.
Modern switches operate non-​ blocking, meaning they can process packets on all
interfaces at full link speed and forward them without internal bandwidth limitation.
Therefore, a switch with for example eight interfaces offering 1 gigabit per second (Gb/​s
or Gbps) must be able to handle the forwarding of 8 Gbps internally. Since links operate

Figure 9.3 Unit types within a network.


28

288 Stefan Ledergerber


bidirectionally, this number is commonly indicated in data sheets of switches as 16 Gbps,
corresponding to 8 Gbps in each direction.
A subnet is a logical segment of a network. Such segmentation is often established
for various reasons, including administrative and security aspects. The network adminis-
trator can apply a set of rules for one subnet while choosing different rules for another one.
Routers can interconnect subnets so that hosts can exchange packets across subnet bound-
aries. A router must be configured correctly by creating routes between these subnets. A typ-
ical switch, on the other hand, cannot connect subnets. It is a simpler device compared to
a router.
A virtual LAN (VLAN) is another method to segment a network. Unlike subnets, it is a
secure way to separate hosts from each other. A VLAN is set up within a switch or router so
that only authorized personnel can change it. Configuration of a subnet, on the other hand,
may be accessible to anyone who has access to a host. This includes users who may connect
their own device to a network and configure it into any subnet by themselves. In practice,
all interfaces connecting hosts with a common subnet get also placed in a common VLAN
within the switch. This is necessary to fulfil all security aspects of modern IT networks. The
fact that devices within a subnet are usually also assigned to their specific VLAN often leads
to misunderstandings in communication. Some experts use the term VLAN when they
actually mean subnet –​and vice versa.
A trunk connection is a link transporting multiple VLANs. The switch interfaces on
both sides must be configured accordingly. By doing this a VLAN becomes available across
multiple switches.

9.2.2 IP Addresses and Subnet Masks


Each device within a network must have a unique address so that the packets can reach
their destination. Such an address can be hardware-​related (MAC address) or configured in
the hosts (IP address).

Figure 9.4 Router connecting subnet A to subnet C.


Note: Whether a particular product is considered a router or a switch is sometimes difficult to distin-
guish. Most devices have become a mixture of both, but often with only a limited set of functionalities.
289

Audio Networking 289


An IP address can be assigned in three ways:

1. Manually by the user


This requires documentation and discipline by the user to ensure that a particular
IP address is only used once within the same network. This might be the preferred
approach for permanent installations, as it allows following a certain structure in the
assignment of IP addresses.
2. Assigned to a device by a DHCP server
This is a flexible, yet structured way to distribute IP addresses within a network. A host
in ‘DHCP mode’ tries to find the appropriate DHCP server and obtains all the neces-
sary IP configurations in a well-​standardized way. A user can look up detected devices
and their IP addresses in the DHCP server itself. An administrator can configure it in a
way that only a certain range of IP addresses gets distributed, while others are reserved
for manual assignment.
3. Self-​assignment by the host
This mechanism is also known as Zeroconfig and is recommended for small installations
only due to its limitations that all devices are in one subnet and cannot be connected
to other subnets. It can be somewhat difficult to find out about the IP addresses of a spe-
cific device if it is not indicated on a display. Software tools are available to scan ranges
of IP addresses for the presence of devices, but IT departments often prohibit their use.
If the device supports automatic device discovery such as mDNS or Bonjour, discovery
software can be used to display all found devices with their IP address. An example of
such a free tool is MTDiscovery by Merging Technologies [5].

Deciding whether two IP addresses belong to the same subnet is impossible without veri-
fying their corresponding subnet masks. If the destination IP address of a packet is not in
the same subnet, the sending device must direct it to the IP address of the router instead of
sending it straight to the receiving device.
Sending packets within a subnet has many similarities with company-​internal telephone
calls. All company numbers start with the same digits and differ only in the last part or in the
extensions. Likewise, two hosts within the same subnet have similar IP addresses, differing
only in the last digits. The first part is called network address, the second, individual to a
device, is called host address. The split between the two is indicated in the subnet mask by
the position of the digits ‘0’: The network address gets marked in the subnet mask by a value
greater than ‘0’, while the host address is the remaining right part, where the subnet mask
indicates ‘0’. This can best be understood by means of examples:

Host A

Host B
290

290 Stefan Ledergerber


➔ Host B is in the same subnet as host A because both have the identical network address
(192.168.020). The network part of the IP address is the part where the subnet mask
equals 255. A router between these two is not necessary.

Host C

➔ Host C is in a different subnet than host A and B because it differs in its network
address (192.168.134 instead of 192.168.020). Host C cannot exchange packets with
hosts A and B without a router. In order to allow this host to communicate with A and
B, it must get a different IP address, starting with 192.168.134… . Alternatively, a
different subnet mask may be chosen for the entire setup, such as 255.255.0.0.

The decimal notation of a subnet mask used above is called dot-​decimal notation. To indi-
cate the same information in shorter writing there is an alternative method commonly used
in IT departments. It is called CIDR notation or slash notation: right after the IP address
followed by slash, it specifies the subnet mask by indicating the number of digits greater
than ‘0’. But this notation refers to the binary form of the subnet mask, hence ‘255’ corres-
ponds to ‘11111111’. The subnet masks in the examples above therefore contain 24 ‘1’s in
their binary form.
Hosts from the above example in CIDR notation:

Computer A 192.168.020.182 /​24


Computer B 192.168.020.079 /​24
Computer C 192.168.134.061 /​24

Installations using routers and connecting multiple subnets operate on layer 3 of the so-​
called OSI model. This model divides the generic functionality of a network into seven
layers, each describing a set of functionalities networking devices must provide in order to
guarantee correct transmission of information, such as forwarding packets to the correct
recipient. All current IT devices follow this well-​defined abstraction layer concept for facili-
tating interoperability between manufacturers. Installations operating on layer 3 can inter-
pret IP addresses, subnet masks etc. and therefore may forward packets across subnets. All
the technologies discussed here can operate in such scenarios. In contrast, some technolo-
gies are restricted to layer 2. This means their packets are delivered exclusively based on
MAC addresses and do not contain subnet information. Consequently, layer 2 networks
cannot be split into multiple subnets, their packets cannot be forwarded via routers and
their scalability is therefore somewhat limited. A popular example of layer 2 networks is the
already mentioned TSN/​Milan as well as –​further back in history -​CobraNet.

9.2.3 Network Topologies


Nodes can be connected in different ways. Defining a topology is one of the most important
decisions to be made while designing a network.
291

Audio Networking 291

Figure 9.5 Star topology.

9.2.3.1 Star
The star is in many ways the preferred topology. Multiple hosts are connected to a redirec-
tion device such as a switch or router.
Today, networks often combine two levels of stars to form a ‘star of stars’ in a hierarchical
sense. This is called spine/​leaf architecture. The central switch/​router (spine) must usually
forward more traffic than the peripheral switch (leaf), as most traffic between the segments
may pass through it. If the high-​bandwidth link between spine and leaf is unable to forward
the traffic of all hosts simultaneously, this design is blocking. The opposite is a non-​blocking
network design, where the high-​bandwidth links are capable of transmitting the total traffic
of all hosts connected to the respective leaf switch.

9.2.3.2 Ring
Nodes can have multiple interfaces: at least two are required to realize a ring topology. Each
link between two nodes offers full bandwidth and it is up to the nodes to forward packets within
the ring. In this sense, each node acts as a switch, forwarding packets between its two interfaces.
Choosing a ring topology often makes sense when large distances need to be bridged and
the connections are costly. Practical examples are a network between several locations, but
rings are also formed for connection of devices within a rack where there is no space for an
additional switch. Ring topologies offer a certain built-​in redundancy –​all devices can be
reached even if an interconnection is broken.

9.2.4 Unicast and Multicast


When a host sends a packet to another host, this mechanism is called unicast. Such a
connection has exactly one sender and one receiver.
29

292 Stefan Ledergerber

Figure 9.6 Spine/​leaf architecture.

Figure 9.7 Ring topology.

Unicast often uses the Transmission Control Protocol (TCP), whereby the receiver
confirms successful reception of each packet back to the sender. If the confirmation is not
received, the sender automatically re-​sends the packet. An alternative to TCP is UDP
(User Datagram Protocol). In this case, the sender trusts the network that the packets will
successfully arrive at the receiver. There is no acknowledgement of receipt, and if the packet
is lost, the content will be lost. Although maybe unexpected, this in fact is the preferred
transmission mode for professional audio networks. As latency must be low, retransmission
of packets cannot be afforded as this would cost time and therefore increase the overall
latency of the audio transmission. In the case of live audio packet loss, it seems best to con-
tinue playing the next audio samples instead of trying to recover the previous one.
In audio applications, there is often a requirement to receive an audio signal at multiple
destinations in parallel, e.g. a microphone signal routed in parallel to the mixing consoles
for front-​of-​house and monitoring. Even a third destination could exist such as an audio
recording device. When the sender transmits audio packets in unicast, the audio signal
comes as three packets with identical content but different destination addresses. This
293

Audio Networking 293

Figure 9.8 Audio to multiple destinations using unicast.

results in an unnecessary processor load on the sending host, but also takes up bandwidth to
each of the three destinations. This can be optimized using multicast.
The use of multicast has numerous advantages, including less processor load on the
sender and less overall traffic on the network. The sender addresses its packets to multi-
cast addresses and not to host addresses. It does not know which recipient(s) the packets
will arrive at. Multicast addresses are comparable to frequencies on a radio: anyone who is
interested can tune in and receive the content. The sender puts the audio data in a packet
once, sends it to a multicast address and the receivers need to know which multicast address
they want to listen to. Hence, this is more of a ‘pull’ than a ‘push’ principle.
Note that multicast addresses are inherently unrelated to subnets, as they are not related
to nodes and their IP addresses. Therefore, multicast packets are received by devices across
subnets unless they are separated by VLANs (which they usually are).
The only difference between a UDP unicast packet and a multicast packet is its destin-
ation address. By definition, any packet with a destination address in the range 224.x.x.x–​
239.x.x.x is a multicast packet and gets treated accordingly by the network. Several
subsequent packets containing a particular audio signal are called an audio stream, or a
multicast stream, to specifically describe multicast operation. The switches involved must
have multicast forwarding enabled. This is a device-​internal setting. If the destination
address of a packet is in the aforementioned range, a multicast-​capable switch will forward
these packets to multiple interfaces. This bears the risk of unnecessary (over)load of
the network, as not all connected hosts are interested in a certain audio stream. Some of
them may be, for example, printers that have nothing to do with audio at all. Therefore, it
is important that the multicast traffic only reaches hosts asking for it. The solution for this
is IGMP snooping (Internet Group Management Protocol). All discussed audio network
technologies support IGMP snooping by default. If enabled within the switch, multicast
294

294 Stefan Ledergerber

Figure 9.9 Audio to multiple destinations via multicast.

Figure 9.10 Querier activated in a switch with high-​bandwidth links.

packets get only sent through those interfaces where periodical IGMP request arrive from
the connected host. If no requests are received, the corresponding multicast is stopped so
that no unnecessary traffic gets on that link. IGMP snooping can be considered as a kind of
floodgate that is closed by default and only opened on request. It is strongly recommended
to enable IGMP snooping in a multicast network –​in all switches. Otherwise, the risk of
overloading the network and non-​participating hosts is unnecessarily high.
The function of an IGMP querier should also be mentioned. It triggers the regular
requests of all hosts. Normally this function is activated in one particular switch. Although
295

Audio Networking 295


only one is necessary per network, it does no harm to have multiple enabled, since all but
one will get de-​activated automatically.
An important aspect to be considered when planning the network: all multicast traffic
is also forwarded to the querier, regardless of whether it needs it or not. This can lead to
overload situations despite correct switch configuration. In practice it is recommended
to activate the querier in the switch that offers high bandwidth interfaces and is cen-
trally located in the network. It will most likely be involved in forwarding the multicast
traffic anyway by the nature of its location. In a spine/​leaf network, it is usually the spine
switch.

9.2.5 Quality of Service (QoS)


In a well-​designed network, there is always enough bandwidth for all packets to pass
through the desired route. However, to cater for exceptions, switches and routers can give
priority to certain packet types. This is especially important if audio networks are combined
with office traffic such as email or internet surfing. In such installations the use of QoS is
highly recommended. In particular, copying files over the audio network can cause conges-
tion and therefore must be considered by prioritizing audio data over the file packets. The
mechanism used for this is called DiffServ (Differentiated Services). The type of packets is
thereby identified with a tag numbered between 0 and 63, the DSCP value (Differentiated
Services Code Point). Most audio technologies mark their packets with a tag numbered
34 or 46. It is therefore a task for the switch administrator to assign a strict priority to all
packets carrying tags with those numbers.
As a background for understanding QoS, the concept of queues must be explained: each
QoS-​enabled switch has multiple queues feeding its interfaces. The queues have assigned
priorities, for example queue 1 has the lowest priority and queue 4 the highest. The queues
and their priority are fixed by the switch manufacturer or configurable by the administrator,
depending on the product. In audio networks the recommended policy is strict priority,
meaning that whenever a packet occurs in a queue with higher priority, other transmissions
are held back until this high-​priority queue is emptied. In the example above with four
queues and fixed priorities the switch administrator must map DSCP value 34 and 46 to
queue 2, 3 or 4, while all other packets end up in queue 1. It does not matter in which one
of the higher prioritized queues the audio data end up, as long as it has a higher priority
than queue 1.
Note that packets in transmission are never interrupted under normal circumstances
and the packet with higher priority must still wait until the packet in transmission has
been sent. This is the reason why packets with a reasonable maximum size are suitable for
audio networks, so that others are not held up for too long. In most networks, this max-
imum –​Maximum Transmission Unit (MTU) –​is 1500 bytes. Administrators may allow
the use of jumbo frames (up to 9000 bytes and more) in some networks. But since such
packets may hold up audio packets in the queue for too long, the use of jumbo frames is not
recommended in audio networks.

Recommended QoS Settings


The audio-​over-​IP standards define the DSCP values 34 for audio and 46 for synchroniza-
tion. However, currently Dante technology does not comply with this specification. In
296

296 Stefan Ledergerber

Figure 9.11 Quality of Service (QoS) concept.


Note: The next chapter explains how audio devices get synchronized phase-​ accurately using
specialized packets (PTP). Those packets are small and do not hold up traffic for long, but it is
important to forward them in switches with highest possible priority. This will increase the accuracy of
the synchronization eventually. In relation to QoS this means that PTP packets should be prioritized
even higher than audio. In the example above, they would therefore be assigned to queue 4, while
audio is in either queue 2 or 3.

Table 9.1 Recommended QoS settings

DSCP value
Highest priority (for example, queue 4) 46 and 56 (synchronization and Dante audio)
Medium priority (for example, queue 2 or 3) 34 (audio, all with the exception of Dante)
Lowest priority (for example, queue 1) All others

Dante networks audio gets tagged with number 46 while synchronization gets the number
56. Hence, unfortunately number 46 indicates two traffic types: Dante uses it for its audio
packets while all others use it for synchronization. Therefore, some non-​Dante devices
allow the user to alter this value manually, so that they can be adjusted to the Dante policy.
But if this is not possible, the set-​up in Table 9.1 is a good compromise and likely to work
best in practice.

9.3 Synchronization
To operate synchronously and with low latency, all IP senders and receivers must be
synchronized to the same clock. With traditional audio technologies, the devices were
synchronized either by a separate word clock connection or by using synchronous audio
formats such as AES/​EBU or MADI. The receivers were able to directly derive their fre-
quency and phase from these formats, as they provide some kind of ‘pulse’ to indicate
the moment when an audio sample is being generated or played back, for example in
analogue-to-​digital converters.
297

Audio Networking 297

Figure 9.12 A PTP leader synchronizes the absolute time across all followers. Each device then derives
its own media clock from this.

Audio-​over-​IP no longer relies on traditional clocks in the form of ‘pulses’, but on abso-
lute time information instead. All devices on the network get synchronized to the same
time of day using the Precision Time Protocol (PTP). The origin of this time is in a device
called the clock leader (also called clock master) while the devices adjusting to it are clock
followers (also called clock slaves). Each device must then generate the desired traditional
clock internally (for example, 48 kHz), derived from the absolute time received via PTP.
This resulting internal clock is the media clock. If properly implemented by the manufac-
turer, the media clocks of each device show the same frequency and phase. High accuracy
can be achieved technically but represents a challenge for audio manufacturers. Therefore,
the phase accuracy between PTP synchronized devices may vary depending on its quality.
An acceptable variance is <1 µs.
Since IT networks are not sufficiently deterministic as to when a packet gets delivered,
accurate synchronization of the devices requires a sophisticated approach. It is the principal
job of each PTP follower to compensate for two effects occurring in any network:

1. JITTER COMPENSATION

The current time is indicated in sync messages by the PTP leader to all followers using a well-​
known multicast address (224.0.1.129). By the nature of the network and queues in switches
being also used by other traffic, this information does not always arrive with a constant delay
at the follower end. This variance is called packet jitter or packet delay variation (PDV) and
must be smoothed out by every PTP follower. Typically, audio networks use a sync rate of
1–​8 messages per second with 8 being the recommended value for maximum compatibility.

2. DELAY MEASUREMENT

The second crucial task for a follower is the measurement of the packet delay between
leader and follower. This is necessary to correct the time received in the sync messages by
298

298 Stefan Ledergerber

Figure 9.13 Principles of the Precision Time Protocol (PTP).

the time they took to travel through the network. This measurement includes delays of all
components between them, including cables and switches. In other words, the cable length
and the number of switches inserted between leader and follower does not matter. The PTP
time amongst all followers is the same with an accuracy down to nanoseconds. The only
condition for PTP to work accurately is for the delay to stay constant and symmetrical in
both directions, from leader to follower and vice versa. But this condition is normally met
in local network installations. The delay gets measured by an exchange of two messages,
the delay request from the follower and subsequently the delay response from the leader.
Measuring this delay is often carried out at the same rate as the sync rate, anywhere between
1 and 8 times per second.
Since a leader must be able to exchange messages with every single follower, there is a
limit to the maximum number of followers a leader can handle. Unfortunately, this is not
a clearly specified limit as it depends on the message rates used. In typical audio setups
without a dedicated PTP leader device, master-​capable audio devices may typically serve
anywhere from 25 to 250 followers.

9.3.1 Selection of the PTP Leader


Since more than one device on the network may be able to act as the PTP leader and dis-
tribute its time to all followers, the standard has established a set of rules for selecting the
leader. This rule is called the Best Master Clock Algorithm (BMCA).
Each leader-​capable device can send out announce messages with information about
its priorities (set by the user) as well as its oscillator accuracy. In turn, it must also listen
to other devices that may send their respective announce messages. If other incoming
messages report better quality the device stops announcing itself as a leader. Otherwise,
it continues to send its own messages regularly as a ‘heartbeat’ and therefore signals to all
others its active state. The interval at which these messages are sent is called the announce
interval. Announce messages also serve as a ‘heartbeat’ for others to know that the current
master is still operational. If the connection is interrupted, all units wait a certain time
(announce timeout) until they send their announce messages and repeat the selection
process. During the time until a new leader is identified and sends its sync messages, the
followers are requested to continue running their own oscillators internally. Audio must not
be interrupted during a change of leader.
The quality specified in the announce messages includes two values that are set by the
user: priority 1 and priority 2. Each has a value between 0 and 255, where 0 is the best and
29

Audio Networking 299

Figure 9.14 Example of a PTP scenario with several devices that can be leaders.

Table 9.2 Selection criteria for PTP leaders, sorted by BMCA rules

Criteria for the selection Name Comment

1 Priority 1 Set by user


2 Clock class Automatic (e.g., GPS locked is better than freewheeling)
3 Clock accuracy Defined by the manufacturer
4 Clock variance Defined by the manufacturer
5 Priority 2 Set by user
6 Source port ID Normally equals the MAC address

wins over all others. If a priority 1 value is lower on one unit than others, that unit becomes
the leader. The value under priority 2 is only relevant if all others before it –​including pri-
ority 1 –​are the same in multiple units. This can occur in installations with two identical
types of devices whose priority 1 has been set to the same value by the user. In this case, the
priority 2 determines which of the two units acts as the main leader and which one acts as
a backup.
Note that some devices do not offer the possibility of entering a numerical value for pri-
ority 1 and 2 to the user. Instead, they simply show an option to select a device as ‘Preferred
Leader’. Technically these products use a fixed value in their announce messages, defined by
the manufacturer. It will therefore still be possible to win over such a device by entering an
even lower value in another PTP leader. Some devices also support a setting called ‘Slave
only’ or ‘Follower only’. When enabled, the device never tries to take over and become the
PTP leader.
On top of the two user-​definable values priority 1 and 2, there are further criteria that
influence the leader selection. But those are set by the manufacturer and cannot normally
be changed by the user. They specify the accuracy of the oscillators and whether the leader
follows an external time source such as GPS. The complete list of these criteria by which
the BMCA process selects the leader is given in Table 9.2.

9.3.2 PTP Profiles -​Recommended Settings


PTP is configurable in many ways, with different message rates as well as priority settings for
masters. To reduce the number of choices and standardize the values within PTP devices for
30

300 Stefan Ledergerber


Table 9.3 Recommended PTP parameter settings

Parameter Value

Announce interval 1 per second


Announce timeout for upcoming leaders 3 seconds
Synchronization rate 8 per second
Delay requests 8 per second
(or 1–​4, to reduce CPU load)
PTP domain 0

Note: The PTP domain may allow the use of multiple time bases on the same network. For audio
applications, it is not beneficial to use more than one PTP domain within the same installation.
Unfortunately, some products do not allow the changing of this value by the user and are fixed at
0. Therefore, operating on domain 0 throughout all installations is recommended.

audio, standardization organizations such as AES and SMPTE have defined recommended
sets of values referred to as PTP profiles. The most important ones are currently

• Standard [6]
• AES67 Media [6]
• SMPTE ST 2059-​2 [7]

To meet all these requirements and operate audio equipment as reliably as possible, the
settings in Table 9.3 are recommended in practice (see also [8]).

9.3.3 PTP Support in Switches


As explained above, PTP messages should have minimal jitter for the follower to achieve
accurate synchronization with the leader. Several approaches minimize PTP packet jitter:

1. Prioritization of PTP
The minimum requirement is to prioritize all PTP messages over others. As described
earlier, the corresponding functionality is available in most switches today and is called
Quality of Service (QoS). If the switch is correctly set up, it forwards PTP packets
immediately whenever there is one waiting in the queue. All other packets are tempor-
arily held back to prioritize PTP. In networks where PTP and audio share the network
with other traffic like office-type applications, enabling QoS is essential. In separated
networks though, it may not make a significant difference –​provided there is sufficient
bandwidth for PTP and audio.
2. Boundary clock switch
Some newer switch models actively participate in PTP synchronization rather than just
forwarding its packets. These boundary clock switches synchronize themselves to the
leader in the same way as any other follower and become themselves the leader for all
subsequent devices, using the identical time base. The unit that synchronizes the entire
setup centrally is now referred to as the grandmaster or primary leader.
Communicating with followers such as individually answering delay requests
requires processing power in the leader devices. Using boundary clock switches
this load gets distributed amongst all switches. Offloading synchronization tasks to
301

Audio Networking 301

Figure 9.15 Concept of a boundary clock switch.

switches saves processing resources on the grandmaster, but it also can keep the PTP
packet jitter low throughout the network since every switch individually re-​generates
the sync messages.
Boundary clock switches get their own PTP priority settings. If the grandmaster is
lost, these settings get disseminated by announce messages as with any other leader.
They may even be able to run freely for a while, temporarily synchronizing the entire
network themselves. Because of all these advantages, using boundary clocks in a setup
makes PTP scalable up to very large systems with virtually no size limitations.
3. Transparent clock switch
This type of switch actively participates in PTP synchronization as well, although
not to the same degree as boundary clocks. All PTP messages pass through the
switch, forwarded from the grandmaster to its followers. Hence, the grandmaster’s
message load is not reduced in any way. But transparent clock switches measure the
variable delay of each packet as it passes through them. This delay gets entered into
the designated correction field in sync and delay request message. Therefore, packet
jitter occurring within this switch can easily be compensated in the follower. It just
initially needs to take the correction value into account during the synchroniza-
tion process. In practice this means transparent clock switches do not introduce
any relevant packet jitter to PTP but keep the processing load of the grandmaster
unaltered.

9.3.3.1 Recommendation
PTP is the foundation for any audio-​over-​IP technology discussed here. Proper PTP syn-
chronization has a major impact on system reliability. Shortcomings such as high jitter or
loss of synchronization cause all sorts of undesirable effects on audio transmission, including
dropouts or total loss of signal.
302

302 Stefan Ledergerber

Figure 9.16 Concept of a transparent clock switch.

A clean PTP concept is comparable to a solid grounding concept during analogue audio
days. If it is done properly, the likelihood of problems is low. If not carefully set up, intermit-
tent and seemingly random issues may occur that lead to long troubleshooting sessions.
For small systems, using non-​PTP-​aware switches is not a problem, possibly with QoS
enabled. However, when it comes to scalable or larger systems, using boundary clock
switches throughout the network is the recommended approach to achieve full stability. It
is difficult to define a limit in system size beyond which the use of boundary clock switches
is necessary, since it depends on how well the chosen products can cope with gradually
degraded PTP. Recommendations range from 30 to 250 nodes.

9.4 Connection Management


After explaining all necessary principles of how packets can be transmitted over a network
(connectivity) and the principles of PTP (synchronization), the third major topic is how
the sender and receiver of IP audio on a network know about each other and how a receiver
identifies the settings used to create the audio packet on the sender side.

9.4.1 Device and Stream Discovery


The AES67 audio standard does not specify how networked devices can detect each other,
nor determine which streams are available on the network. Theoretically the user is required
to type in the information manually, but in practice all manufacturers have consolidated
their implementation to the following methods.

9.4.1.1 Device Discovery


All described technologies use the mechanism known as bonjour or mDNS for devices to
inform each other about their existence. Each device sends out messages via a fixed, known
30

Audio Networking 303


multicast address (224.0.0.251). All devices request to receive this multicast address and
thus get informed about all others.
One limitation of this mechanism is that it does not work in large installations with mul-
tiple subnets/​VLANs. For these use cases, manufacturers have developed their own solution
(e.g., Audinate with Dante Domain Manager [1]) or follow the audio/​video standard NMOS
for the purpose of detection and connection management (for NMOS refer to section 9.6.3)

9.4.1.2 Stream Discovery


Audio streams are discovered via one of two mechanisms, depending on the manufacturer.
As with device discovery, both use a predefined multicast address to distribute the stream
information so that the recipients find the available streams and their parameters:

• Session Announcement Protocol (SAP) –​used by all Dante products


(Multicast address 239.255.255.255)
• Bonjour /​mDNS –​used by all other technologies discussed
(Multicast address 224.0.0.251)

Fortunately, many current products allow both protocols to be activated simultaneously so


that a particular audio stream is announced in parallel via both mechanisms; however, this
may lead to the somewhat inelegant situation that some streams are displayed twice in a
user interface. In practice though, this is not a problem, as selecting one of the two entries
subscribes to the same stream anyway. However, if a particular product does not offer the
option to activate both, a free software tool is available that translates between the two
‘dialects’. It is called Rav2Sap [4]. This tool may run on any suitable PC with access to both
aforementioned discovery multicast addresses. Although helpful if constantly running, it is
solely of use when new connections are made between senders and receivers. Audio is not
routed through this application and therefore is not interrupted if the application goes off-
line for some reason.

9.4.2 Setup of Senders and Receivers


Once a receiver identifies an audio stream on the network, it must receive the following
information to be able to receive the packets and correctly unpack the audio:

• Multicast address and port of the multicast packets containing the desired audio
• Settings used by the sender:
• Sampling frequency (e.g., 48 kHz)
• Audio resolution (e.g., 24 bit)
• Number of channels combined in one packet
• Number of audio samples of each channel (packet time, e.g., 1 ms)

Variations in these values result in different stream formats, and not all devices are capable
of generating or receiving every combination. For this reason, it makes sense to reduce these
variations, as given in the AES67 standard [6].
A management system is needed to oversee the entire system and to set up senders
and receivers. Several manufacturers have started to develop their own connection
management tool. One of these is the Dante Controller software by Audinate [1],
304

304 Stefan Ledergerber

Figure 9.17 Example of an SDP file (relevant parameters in red).

which is used to establish audio connections between Dante devices. Another tool is
ANEMAN by Merging Technologies [5], which can be used for a variety of RAVENNA-​
based products.
To describe a stream, virtually all technologies make use of a standardized method: the
SDP file. The file is generated by the sender and contains all the information required by
the receiver to properly retrieve the respective audio stream. It is the control software’s task
to copy this file from the sender to the receiver.
An example of an SDP file is given in Figure 9.17, just to illustrate how all relevant infor-
mation for the receiver is contained therein.
The SDP file is usually hidden from the user, but some products show it in their extended
user interface. The SDP file is an IT industry standard that has been used for many years in
applications such as video conferencing or IP telephony. Even the popular free VLC Media
Player software [10] can interpret its content, subsequently receive an audio stream and play
it back via the PC’s internal speakers.

9.4.3 Stream Formats


It may seem that packing individual audio channels with only one sample each would be
the fastest and most flexible way of transmitting audio over a network. While this is true
in principle, it leads to a high overhead and a loss of bandwidth efficiency in the network.
There are two reasons for this:
305

Audio Networking 305


1. High consumption of processing power in senders and receivers
Creating and receiving a packet requires a certain amount of processing power on
both the sender and receiver ends. Assuming an audio signal sampled at 48 kHz, trans-
mitting just one sample in each packet requires the handling of 48,000 packets per
second. Even with high-​performance processors, this requires a considerable amount
of computing power. However, by, for example, buffering 48 audio samples and then
putting them all into one single packet, the packet rate is reduced to just 1000 packets
per second, enabling a sender or receiver to handle a larger number of streams in
parallel.
2. High data overhead within packet
Each packet must contain a specific amount of information, defined in the IP
standards, which is named the packet header. It includes the addresses of senders and
receivers and various kinds of additional information about the packet, including,
for example, the DSCP value for QoS operation. The technologies described here
are based on the Real-Time Protocol (RTP), which has a fixed amount of 54 bytes
reserved for this information. In the case of transmitting only one audio sample (24
bits =​3 bytes), this overhead would take up a disproportionate amount of bandwidth
compared to the transported audio (payload). Such an approach would be compar-
able to sending a letter in an envelope with only one word in it, followed by a second
letter with the second word and so forth. The overall effort it would take to label
the envelopes with addresses, put a stamp on them and drop each one in a mailbox
separately would not justify the amount of information they contain. The better solu-
tion is to write a reasonable amount of text first and then mail it afterwards. The
highest efficiency is reached when the weight of the envelope gets close to the max-
imum allowed by the post office. In network terms the Maximum Transmission Unit
(MTU) dictates the maximum size of a packet while the fixed overhead for packets
is given by the RTP protocol. Maximum efficiency is therefore reached when a high
number of audio samples are put into a packet and the overall packet size gets close
to the MTU.
Waiting for samples before creating a packet increases the latency of the connection.
This is not always acceptable in audio. Alternatively multiple channels can be
combined into one packet. This will achieve a large packet size as well and may not
reduce the flexibility much. After all, many traditional audio connections consist of
multiple channels. Hence, these contextually connected channels may need to reach
the same receiver anyway.

In practice, audio packets contain a specific number of samples per channel and mul-
tiple channels on top of that to achieve a reasonable packet size and thus maximize
the use of bandwidth in the network as well as the processing power of the connected
nodes.
The most common stream formats are shown in Table 9.4 (48 kHz, 24-​bit resolution).

9.5 Latency
If a packet contains, for example, 1 ms of audio, the latency of this connection will always
be >1 ms. This is obvious because the sender must then buffer 1 ms of audio before putting it
into a packet and then sending it over the network. This delay is followed by the travel time
through the network with all its switches and queues before finally reaching the buffer in
306

306 Stefan Ledergerber

Figure 9.18 Stream variants with identical packet size: number of channels versus packet time.

Table 9.4 Typical audio stream formats

Number of channels Number of samples per channel (packet time)

2 48 samples (1 ms)
2 6 samples (0.125 ms)
8 48 samples (1 ms)
8 6 samples (0.125 ms)
16 6 samples (0.125 ms)
64 6 samples (0.125 ms)

Note: 16-​or 64-​channel streams are not feasible with 1 ms packet time, as packing 16 channels and
more with 48 samples each into one packet would exceed the MTU.

the receiving device. The total latency of an IP audio connection, the link offset, is therefore
the sum of these three factors:

1. Packet time
2. Travel time on the network
3. Receive buffer

In practice the technical term link offset carries many alternative names, depending on
the manufacturer, for example delay or latency. It is the user’s responsibility to choose a
link offset that is sufficiently long so the receive buffer never runs empty and hence audio
interruptions never occur. Despite the widespread perception that networks are slow, in a
typical situation the network is not the most dominant factor to overall latency. In a net-
work with very little traffic, it is in fact most likely the sender’s packet time. Therefore,
307

Audio Networking 307

Figure 9.19 Elements of total latency.

some users may decide to select a shorter packet time despite the increased processing power
requirements and packet overhead. But if there is significant competing traffic in the net-
work, the packet jitter of the audio stream increases (more queuing-​up in the switches) and
the receive buffer will be the dominant contributor to the total latency as it must compen-
sate for the packet jitter.
Reducing the competing traffic in a network directly leads to a reduction in packet jitter,
which subsequently reduces the need for its compensation in the receive buffer and then
ultimately allows for a minimum latency setting. It is generally worthwhile to reduce the
packet time only when the jitter effects have been minimized.

9.6 Standards
Looking at the previous explanations it becomes apparent that a great deal of parameters
can be user-​defined within an audio-​over-​IP installation. In order to limit the number of
variations, the availability of open standards is a good thing. In many ways, all manufacturers
have attempted to achieve the same thing, often without any substantial technical reason
not to do it the identical way as others. Fortunately, though, after some manufacturers had
developed their own technology, they agreed on a common denominator that all would
use as a basis to go forward. This was the initiation of the AES67 standard in 2013. This
standard defines a minimum set of parameters that must be supported by all manufacturers
that adhere to AES67. Hence, certain manufacturers may support more parameter settings
than the AES67 minimum; in this case, it is the user’s responsibility to verify that all devices
on the network support those.

9.6.1 AES67
According to the AES67 standard [6], all devices must meet the following minimum
specifications (excerpt):
308

308 Stefan Ledergerber


• Connectivity:
• Unicast and multicast (in practice, most manufacturers currently only support
multicast, although the standard additionally asks for unicast support)
• Transport protocol UDP/​RTP
• Setting DSCP tags to specified values so that Quality of Service (QoS) can be
enabled (DiffServ).
• No automatic discovery of devices and streams specified
• Synchronization:
• Use of PTPv2 (IEEE 1588-​2008)
• PTP profile Standard (in practice, Dante currently requires a higher synchroniza-
tion rate, therefore the Media profile is recommended)
• Connection management
• Senders must issue an SDP file
• Recipients must be able to interpret an SDP file
• The receive buffer must be able to store at least 3 ms of audio
• Stream formats
• One to eight channels (the sender can choose a fixed number, but receivers
must be able to flexibly receive any of these options).
• 24-​bit and 16-​bit resolution (the sender can choose one, but the receiver must
be able to read both)
• 48 kHz sampling rate
• 1 ms packet time (48 samples)
• Multicast addresses are between 239.0.0.0 and 239.255.255.255

Note: Many additional parameters and values are stated in the standard, but they are not a
minimum requirement as listed above.

9.6.2 SMPTE ST 2110


The Society of Motion Picture and Television Engineers (SMPTE) [25] has developed a
standard that specifies the transport of video and audio over an IP network. They have
decided to split audio and video into two separate streams and a third one for additional
data such as subtitles, time code information, etc., called ancillary data. These three sep-
arate streams are called essence streams. They have been given their own standard number
as follows (‘…’ stands for further versions with incremental refinements):

• SMPTE ST 2110-​2… for video streams [12]


• SMPTE ST 2110-​3… for audio streams [13]
• SMPTE ST 2110-​4… for ancillary data streams [14]

To keep all these streams in sync and to avoid the problem of audio being out of sync with
the video (bad lip sync), all essence streams must include a PTP timestamp in each packet.
A receiver can then easily reconcile them and ensure that there are no lip sync issues,
also known as AV delay. This overall concept is described in a separate standard with the
number SMPTE ST 2110-​10 [11].
Fortunately, the audio part of this concept is almost identical to the specifications of
AES67 [6], hence any AES67-​compatible device can participate as an audio device in an
SMPTE ST 2110 network.
309

Audio Networking 309

Figure 9.20 Principle of SMPTE ST 2110.

9.6.3 NMOS
While the previously mentioned standards focus on the interoperability of audio and video
streams between manufacturers, the Networked Media Open Specifications (NMOS) purely
specify the control aspects of the application which are in fact fully independent of audio or
video. Therefore, NMOS is a well-​suited complement to the above standards, completing
them by specifying topics such as device/​stream discovery and connection management.
NMOS is a series of constantly evolving specifications attempting to standardize more and
more subtopics. The most important of these are:

• IS-​04: Device and stream discovery (initially using Bonjour/​mDNS, but also for larger
facilities with multiple subnets).
• IS-​05: Connection management (e.g., how multicast addresses are assigned to senders
and how SDP files are transferred from senders to receivers).
• IS-​08: Control of audio crosspoint matrices in senders and receivers (which channels
are fed into/​from a stream).

Further documents are under development, specifying additional topics, e.g., net-
work security, management of devices etc. The NMOS specifications are developed
by a group of industry representatives called Advanced Media Workflow Association
(AMWA) [9].

9.6.4 Internet Protocol Media Experience (IPMX)


By the time this chapter is published, IPMX is just about to get issued as an official standard
[15]. It is based on SMPTE ST 2110 and NMOS but contains adaptations to the needs of
sound reinforcement applications. While SMPTE ST 2110 is geared towards demanding
310

310 Stefan Ledergerber


requirements such as broadcast applications, IPMX has been derived from it, with some-
what looser timing requirements and features as required in fixed-​install professional AV
applications. For example, IPMX supports compressed video right out of the box, demanding
less bandwidth compared to the uncompressed versions of SMPTE ST 2110. However, a
future goal is to bring these industries closer together and ensure interoperability between
different vendors and applications. IMPX furthermore includes control aspects of NMOS,
again with some extensions. These include standardized signalling of HDCP content copy
protection as used in HDMI interfaces or connecting general-​purpose interfaces like IR
remotes, RS232 or USB over IP.
The proliferation of the open standards SMPTE ST 2110, NMOS and IPMX is com-
mercially supported by a group of industry-​leading companies, called Alliance for IP Media
Solutions (AIMS) [16].

9.6.5 AES70
Even before the development of the above standards, a group of companies called Open
Control Alliance (OCA) [17] developed a protocol in 2011 with the goal to make devices
from different manufacturers interoperable in terms of device control and monitoring,
such as changing processing parameters in an audio mixing console or setting a micro-
phone gain.
In addition, the protocol contains a specification of how audio stream-​ related IP
parameters are set. Although AES70 is sufficiently generic to serve audio and video
applications in general, its primary focus was audio. Its acceptance in the market remained
somewhat limited [18, 19, 20].

9.6.6 SNMP
The IT industry has been using the Simple Network Management Protocol (SNMP) [21]
for many years to monitor the health of hardware components. SNMP traps contain an
individual message ID and are sent spontaneously by each monitored device. Monitoring
software must have access to the device-​specific MIP file to translate the received ID
into a human-​readable error message. One of the shortcomings of SNMP is the fact that
messages are transmitted using the UDP protocol. Therefore, if the message is lost on the
network, the missing information about the error state of a device is not immediately
retransmitted.

9.6.7 Milan (AVB/​TSN)


Milan is an alternative to AES67-​based technologies in spatially limited networks and is
used in live sound /​PA systems [2]. It is based on AVB/​TSN, an audio and video transmis-
sion real-time-​capable version of Ethernet (layer 2). In contrast to regular Ethernet this
implementation guarantees certain aspects which are helpful for audio transmission:

• No oversubscription
An audio stream does not get established unless there is sufficient available bandwidth
throughout the network. Other packets cannot overload the link since priority is given
to audio by the nature of this technology. Due to this automatic bandwidth reservation,
there is no need for manual configuration of prioritization.
31

Audio Networking 311


Table 9.5 Overview of the chosen approaches

Dante Q-​LAN RAVENNA AES67 Milan (AVB/​TSN)

Discovery Proprietary Proprietary Bonjour not specified ATDECC (IEEE


1722.1)
Connection Proprietary, Proprietary, RTSP, SIP, SIP, IGMP ATDECC (IEEE
management IGMP IGMP IGMP 1722.1)
Session description Proprietary Proprietary SDP SDP ATDECC (IEEE
1722.1)
Transport Proprietary RTP RTP RTP AVTP
(IEEE 1722)
Quality of Service DiffServ DiffServ DiffServ DiffServ Stream Reservation,
Prioritization, Shaper
(IEEE 802.1Q)
Stream formats ≤ 8CH ≤ 16CH ≤ 128CH ≤ 8CH ≤ 64CH
Synchronization PTP V1 PTP V2 PTP V2 PTP V2 gPTP
(802.1AS, =​PTP V2
on layer 2)
Media clock 44.1–​192 kHz 48 kHz 44.1–​384 48 kHz 48–​192 kHz
specified kHz

• Traffic shaping
Packet jitter is reduced by each switch upon exiting its queues. Each switch takes care
not to transmit audio at irregular intervals.
• Transparent clock by default
AVB/​TSN-​capable switches operate in a mode similar to a PTP transparent clock and
therefore provide stable PTP synchronization without further considerations.

As explained earlier, such conditions are not widespread in common IT applications. In


fact, these specific requirements are somewhat special to industries such as professional
audio, automotive and industrial automation. The acceptance in the audio industry is slow
due to concerns of availability of AVB/​TSN-​enabled hardware and limitations in scalability
(only one subnet). The future will show where this development will go. However, Milan
has found some powerful supporters particularly in the sound reinforcement field.

9.7 Proprietary Technologies


While standards enable interoperability between manufacturers and simultaneously some
healthy competition amongst them, several proprietary technologies in the audio market
have become established over the years. Regarding audio streaming technologies, most of
them date back to the early days of audio-​over-​IP, before AES67 was defined. Since then,
many of them have evolved and more and more adopted AES67, with Milan being the only
standard-​based alternative.

9.7.1 Ember+​
Talking about control protocols, another variant is Ember+​. The specification as well as
source code was made openly available by the company Lawo [22]. The Ember+​philosophy
312

312 Stefan Ledergerber


is to enable manufacturers to support third-​party products with little effort, but not in a
‘plug and play’ approach. The specifications therefore only describe the generic exchange of
parameters and deliberately do not mention specifics such as values of a microphone gain.
By its nature Ember+​is generic enough to serve a variety of applications and has a growing
community of supporters, mainly in the broadcast domain.

9.7.2 Dante and AES67


Dante is the best-​known proprietary technology currently available, therefore it is worth
describing in more detail how it compares to other technologies, how it can be integrated
into AES67 networks and what its special features are, as applicable at the time of writing.
First, it is important to verify whether a particular Dante product is already AES67-​
compatible or not. All modern Dante products are, with the exception of the ‘Dante Virtual
Soundcard’, a software that allows computers to become audio nodes in Dante networks.
But to be operational, the AES67 mode must be manually activated on Dante products.
Only then will all corresponding settings become available.

9.7.2.1 Connectivity
When it comes to multicast addresses, Dante products allow specification of a ‘prefix’ for
the addresses used by the system, narrowing down the range of multicast addresses within an
installation. This means that this product can then only work within this range of multicast
addresses and other AES67-​compatible products must operate within the same range as well.

9.7.2.2 Synchronization
Dante products use PTP version 1 (PTPv1) by default, while AES67 requires at least PTP
version 2 (PTPv2). It is important to note that these two versions are not compatible, although
they can run in parallel on a given network. When setting up an AES67 network, it is
recommended to switch all Dante products to AES67 mode and thus enable PTPv2 synchron-
ization. Subsequently the device with the highest clock quality should be selected as the PTP
leader or grandmaster. AES67-​capable Dante devices will then automatically send out PTPv1
messages just in case some older Dante products are also present on the network. Therefore,
care must be taken to ensure that in every VLAN containing older Dante products (including
the Dante Virtual Soundcard software) at least one AES67-​enabled Dante device is present
for the purpose of PTP translation. However, if multiple AES67-​enabled Dante devices are
present, all of them will synchronize to PTPv2 directly and only legacy devices will use PTPv1.
Dante products seem to require at least 4 sync messages per second. Therefore, the use
of the profile AES67 Media [6] or SMPTE ST 2059-​2 [7] which use 8 sync messages per
second is mandatory, while the default profile (1 sync message per second) does not meet
the synchronization requirements of Dante products. It should also be noted that Dante
devices will operate in PTP domain 0. This can only be modified by using Audinate’s Dante
Domain Manager software.

9.7.2.3 Connection Management


Dante units can create AES67-​compatible streams, but they must be manually configured.
When creating a new multicast stream and defining the channels contained within, the
31

Audio Networking 313

Figure 9.21 Synchronizing older Dante devices in an AES67 environment.

AES67 option must be selected. In this case the packet time for Dante senders is fixed to 1
ms, but receivers can accept shorter packets as well.
Dante sends the SDP file to all interested devices using the Session Announcement
Protocol (SAP) on multicast address 239.255.255.255. Since no discovery protocol is speci-
fied in the AES67 standard, the user should check if any of the other AES67 devices have
the option of SAP support. If yes, discovery will work automatically across technologies. If
not, conversion software tools are available, as mentioned in section 9.4.1.

9.8 Redundancy
In the early days of audio networking, some users were sceptical about the reliability of
IT hardware. Due to its widespread use though, IT equipment is well proven and often
more reliable than traditional audio equipment. In addition, most IT network components
offer several mechanisms for diagnosis and quick problem-​solving in the case of equipment
failure, some of them described here. Note that most likely these processes must be manu-
ally activated in the network switches.

9.8.1 Spanning Tree Protocol (STP)


If switches are connected to form a loop, there is a danger that packets are endlessly trans-
mitted in that loop. In contrast to the audio world, such ‘feedback loops’ are automatically
detected by the network hardware with the help of STP. If a loop is detected, one of the
links gets automatically inactivated by the switch.
Additionally, STP can be used to secure a system against accidental link disconnections,
including cable cuts. The approach is to intentionally create loops and let the system deacti-
vate one of them. Then, if one of the cables gets accidentally disconnected, the system
detects this within seconds and reactivates the passive link. During this phase, audio is
interrupted for a few seconds, but it is still much faster than manual troubleshooting and
installing a new cable. In some cases, the spanning tree protocol may even help with broken
switches. In this case only nodes connected directly to the failed switch are lost, while
others still get their connections back using an alternative path.
314

314 Stefan Ledergerber


In most systems, STP is enabled by default and loops can be established without further
considerations.

9.8.2 Link Aggregation


If a particular link is exceptionally relevant in an installation, it is possible to connect two
or more cables in parallel for safety. While the main purpose of such a link aggregation is
gaining more bandwidth between two switches, it can also be used as a cost-​effective way to
secure a link against accidental disconnections or cable cuts. Typical cases are, for example,
links between two buildings, or a stage connected to a mixing console. The switches at
both ends must be configured in the same way: two or more interfaces are declared as a Link
Aggregation Group (LAG) and will appear as one interface to the switch. In practice, the
use of link aggregation to reduce cabling issues can be very useful due to its simplicity, as the
user simply needs to provide an additional cable and adjust the configuration of the switches
at both ends. However, when disconnecting a cable, it might happen that the audio trans-
mission is interrupted for typically a fraction of a second before the switch has activated the
alternative link.

9.8.3 Stream Redundancy


The safest –​but also the most expensive –​way to realize redundancy in a network is setting
up two separate audio networks, providing two independent paths between sender and

Figure 9.22 Loop detection by the Spanning Tree Protocol (STP).

Figure 9.23 Link aggregation as safety net for cabling issues.


315

Audio Networking 315

Figure 9.24 Maximum safety through double networks.


Note: At the time of writing, some Dante devices provide an equivalent mechanism for duplicating
and merging streams. Due to the proprietary nature of Dante, it is difficult to identify the exact
differences from SMPTE ST 2022-​7. However, reception of SMPTE ST 2022-​7 streams is supported
by Dante devices providing two physical interfaces.

receiver. In such a setup, each node needs to provide two network interfaces connected to
both networks. The sender creates two packets with identical (audio) content, stamps the
identical PTP time on both and then sends them over both networks. On the receiver end,
both packets are received and unpacked. Even if one of the packets is lost, the remaining
packet contains all the information and ensures that the audio continues without interrup-
tion. In fact, this mechanism is the only approach in a network to compensate for occa-
sional packet loss without having to repeat them from the sender and therefore add latency.
In other words, stream redundancy protects against the loss of single packets all the way up
to the complete failure of one half of the network.
Fortunately, a standard describes this mechanism called seamless protection switching
or hitless merge. The standard is SMPTE ST 2022-​7 [23] and describes how content is
duplicated and how the packets are handled on the receiving end to ensure uninterrupted
signal flow. It is formulated to be independent of the actual data type contained in the
packets and is therefore applicable to audio and video.

9.9 Common Mistakes in Audio Networks


The following list of the most common errors in audio networks summarizes all issues covered
so far. The most common problems found in audio networks are caused by user errors. Being
familiar with the underlying mechanisms does help to avoid them. Technically, networks
can be just as reliable as any traditional technology.

9.9.1 Incorrect Device IP Settings


If addresses in devices and their subnet masks are not correctly set, the packets will not find their
path from sender to receiver. A verification is required that the subnet mask is set to the same
value in all devices while making sure that the IP addresses are part of the correct subnet. As a
simple test, connect a computer to the same subnet as the sender and check whether you can
reach the receiver with the ping command using the command line of your operating system.
Note: ping sends unicast packets back and forth and therefore tests for a successful unicast
connection. To check for a multicast connection, more advanced tools such as WireShark
are required [24].
316

316 Stefan Ledergerber

Figure 9.25 Example of a successful (unicast) connection to a device.

9.9.2 Incorrect Matrix Settings in Audio Devices


It often happens that users set up senders and receivers correctly but forget to set the
crosspoints of the audio matrix in the sending and receiving device accordingly. The
matrix must be set up so the correct audio channels are fed into a particular sender and
the audio channels within a stream reach their desired output interface in the receiving
device.

9.9.3 Occasional Audio Dropouts or No Audio at All


The user is responsible for specifying an overall latency (link offset) for a connection
allowing all packets to arrive within that timeframe. If the latency is set too short, occa-
sional packet loss or even complete silence may occur.
Not all products indicate lost packets in an intuitive way. One good example is how
Dante products display the status of their receivers in the Dante Controller software
(Figure 9.27).
A histogram illustrates how many packets have arrived at a particular time. In this
example, a link offset of 1 ms is applied and so far, all packets have arrived on time. In case
packets have arrived later than the link offset, a counter displays the number of late packets
and audio dropouts may occur, as seen in Figure 9.28.

9.9.4 Unexpected Traffic on the Network


Sometimes users unpack a new switch, connect their audio devices and are happy that every-
thing works fine. When gradually adding more devices, they enable multicast on senders for
load reduction at one point. Most likely everything continues to work fine because most
switches have multicast support enabled by default. However, IGMP snooping is usually not
active, so all packets get broadcasted across all links. This soon adds up to high total traffic,
overloading some links and subsequently leading to packet loss. The rule of thumb is as
follows: do not use multicast without having IGMP snooping (multicast filtering) enabled
on all switches!

9.9.5 Network Congestion due to Lack of QoS


Particularly when audio networks are mixed with, for example, office applications it can
happen that large files are copied over the network and a data peak occurs at this point
in time. In this case it is important that PTP and audio packets get prioritized over
newgenrtpdf
317
Figure 9.26 Matrix crosspoints in senders and receivers must be correctly set.

Audio Networking 317


318

318 Stefan Ledergerber

Figure 9.27 Example of a well-​set link offset. All packets arrived within the set latency.

Figure 9.28 Example of a link offset that is too short. Not all packets arrived within the set latency.

other traffic. Otherwise, distorted audio or occasional audio dropouts may be observed.
Another rule of thumb is: do not build mixed networks without active Quality of
Service!

9.9.6 Unstable PTP Synchronization


PTP requires a range of settings, all of which must be correctly set. For example, a chosen
sync rate may turn out to be too low for some devices, causing that particular product
to occasionally interrupt its audio. It may also happen that the announce timeout in
319

Audio Networking 319


leader-​capable devices is set too short and therefore a device tries to become the PTP leader
while the previous one is still alive and was just about to send its next announce message.
Usually, selecting one of the pre-​defined profiles such as AES67 Media or SMPTE ST 2059-​
2 easily solves this confusion. Alternatively, some devices offer a PTP setting ‘Follower only’
or ‘Slave only’. It is recommended to activate this option on all devices within an installa-
tion which should never become the PTP leader.

9.9.7 Audio Problems on a Computer with a Virtual Sound Card


By installing a virtual sound card on a computer, it becomes an audio node and can send
and receive audio streams. This functionality requires proper access to computing power.
If a computer is running other demanding software, it may occasionally miss an audio
packet coming through the virtual sound card. If audio dropouts are observed, it is advisable
to either

(a) Extend the link offset of the connections to/​from the computer
(b) Close some applications running on this computer
(c) Change the respective sender setup to use a longer packet time and thus generate fewer
packets per second, or
(d) Over-​provision the computer hardware in terms of computing power, since common
operating systems do not guarantee real-time execution of critical applications.

Since conditions on a particular PC may change over time and with any newly installed
software package it is advisable to leave a device in the same software state as long as pos-
sible unless one can re-test audio for an extended period of time before using it product-
ively again.
Last, but not least: it should not be forgotten to disable any standby or power-​saving
modes on a PC during audio use. Not doing so may unexpectedly interrupt audio playout
or recording.

9.9.8 Stream Format not Supported


If the sender sets a packet time or a number of channels that is not supported by all receivers,
they will not be able to play the audio. It is best to stick to the AES67 specifications of 1 ms
packet time and one to eight audio channels within a stream. This format must be supported
by all devices that claim to be AES67-​compatible.

9.9.9 The Switch is not Suitable for Audio Networks


Ideally, a switch supports all the functions described so far:

• Multicast
• IGMP snooping, including querier
• Quality of Service (QoS)

In addition, there is one more thing to consider: some switches save energy by grouping
packets in their queues and then sending them all together at once. With audio, such
320

320 Stefan Ledergerber


behaviour naturally leads to high packet jitter. Therefore, it is advisable to use a switch
where the energy-​saving Ethernet can be deactivated.
Finally, it is always safe to specify switches with PTP boundary clock functionality, espe-
cially if it is a large network or is assumed to grow over time.

9.9.10 Devices and Streams not Visible


Since device and stream discovery are not specified within the AES67 standard, some
devices may operate in one mode (e.g., SAP) while others detect streams in a different one
(e.g., Bonjour). It is advisable to enable both modes on all devices if possible. Alternatively,
free conversion software such as ‘Rav2Sap converter’ can be run on a computer on the same
network.
Remember that automatic discovery is based on using multicast. It may be the case that
multicast or IGMP filtering is not correctly configured and therefore devices or streams do
not become visible across the entire network. In this case, it may be advisable to statically
forward the known multicast address of Bonjour or SAP to all devices instead of just relying
on IGMP for discovery. This can be defined in the switch setup.

References
1. www.audin​ate.com.
2. https://​avnu.org.
3. www.qsc.com.
4. www.rave​nna-​netw​ork.com.
5. www.merg​ing.com.
6. AES67-​2018: AES standard for audio applications of networks -​High-​performance streaming
audio-​over-​IP interoperability, available on https://​aes.org.
7. SMPTE ST 2059-​2:2021: SMPTE Profile for Use of IEEE-​1588 Precision Time Protocol in
Professional Broadcast Applications, available on www.smpte.org.
8. AES-​R16-​2016: AES project report -​PTP parameters for AES67 and SMPTE ST 2059-​2 inter-
operability, available on https://​aes.org.
9. www.amwa.tv.
10. www.video​lan.org.
11. SMPTE ST 2110-​10:2017: Professional Media Over Managed IP Networks: System Timing and
Definitions, available on www.smpte.org.
12. SMPTE ST 2110-​20:2017: Professional Media Over Managed IP Networks: Uncompressed
Active Video, available on www.smpte.org.
13. SMPTE ST 2110-​30:2017: Professional Media Over Managed IP Networks: PCM Digital Audio,
available on www.smpte.org.
14. SMPTE ST 2110-​40:2018: Professional Media Over Managed IP Networks: SMPTE ST 291-​1
Ancillary Data, available on www.smpte.org.
15. https://​ipmx.io.
16. https://​aimsa​llia​nce.org.
17. www.ocaa​llia​nce.com.
18. AES70-​1-​2018: AES standard for audio applications of networks -​Open Control Architecture -​
Part 1: Framework, available on https://​aes.org.
19. AES70-​2-​2018: AES standard for audio applications of networks -​Open Control Architecture -​
Part 2: Class structure, available on https://​aes.org.
20. AES70-​3-​2018: AES standard for audio applications of networks -​Open Control Architecture -​
Part 3: OCP.1: Protocol for IP Networks, available on https://​aes.org.
321

Audio Networking 321


21. www.rfc-​edi​tor.org/​info/​std62.
22. https://​git​hub.com/​Lawo/​ember-​plus.
23. SMPTE ST 2022-​ 7:2019: Seamless Protection Switching of RTP Datagrams, available on
www.smpte.org.
24. www.wiresh​ark.org.
25. www.smpte.org.
32

10 Commissioning, Calibration, Optimization


Gabriel Hauser and Wolfgang Ahnert

10.1 Introduction
Prior to putting a sound reinforcement system into operation, the installer and/​or the con-
sultant need to perform tests regarding its electrical, mechanical as well as its acoustical
functionality. These tests ensure that the system is ready for operation according to the
specifications and include checking for electrical and mechanical defects, polarity of the
loudspeakers, Sound Pressure level calibration (including amplifier gain, SPL for different
zones, SPL limiters), time delay settings as well as adaptation of the speakers to the existing
room acoustical conditions, using frequency response optimizations (equalization), minim-
izing room excitation and boundary reflections, and maximizing uniformity of SPL distribu-
tion and speech intelligibility for the entire audience area.
The following sections will help in getting an overview on the topics that need careful
consideration during setup and optimization of new installations as well as maintaining and
improving existing systems.

10.2 Functional Testing and Installation Verification

10.2.1 Electrical
Before powering up the system, some investigations concerning the electrical connectivity
and setup are required. First item would be electrical power: how are the various electronic
devices powered, what power phase do they use, what is the configuration of the fuses. It is
recommended to use a sequenced power-​on schedule that powers the amplifiers (or active
loudspeakers) last when turning on the system and makes sure to power them down first
when switching off. Is electrical power redundancy available in the system (universal power
supply, backup generator or similar) and is it operational?
Secondly, it is beneficial to verify whether the signal interconnectivity between devices
is correct –​balanced vs. unbalanced cables and connectors, line level vs. microphone
level, digital or analogue signals, etc. In modern systems, this task involves substantial IT
knowhow, as signal linking increasingly takes place in a digital matrix or in a network
switch (see Chapter 9 for details). It is advisable to have the person responsible for the
installation at hand for this type of in-​depth testing.
After the signal routing test is completed, it is advisable to test the impedance of the load
on each amplifier channel (assuming the amplifiers are not built into the loudspeakers). For
this, the loudspeakers must be disconnected from the amplifier output and a classic resist-
ance test needs to be performed. This procedure enables identification of short circuits,

DOI: 10.4324/9781003220268-10
32

Commissioning, Calibration, Optimization 323


open circuits or misaligned connections of 100 V loudspeakers. Also, correct polarity of the
cabling needs to be checked.
Next, the correct setup (or preset) of the loudspeaker management or crossover devices
must be determined to prevent damage to tweeters or general overload of drivers: are all
crossover frequencies set correctly, including slope, gain and limiter settings?
Now test signals from a known source (such as pink noise, sine tones, speech or music)
can be carefully fed to the input of the system with low gain and the signal flow is followed
through the devices. If not already done and documented by the installer, it is advisable to
review the gain staging of the installation and make sure all inputs/​outputs are correctly
aligned. For certain installations such as speech evacuation systems, tests of redundancy,
impedance monitoring as well as real-time failure reporting are required.
As a last step, the control surface needs to be verified to make sure that all the required
functionality is mapped and working correctly.

10.2.2 Mechanical
Verifying the correct mechanical installation includes checking the installation racks
for sufficient airflow and/​or cooling to and from the heat-​producing equipment (such
as amplifiers), making sure no vents are blocked by wiring harnesses etc.; all cables
need to be securely fixed to the rack in order to minimize stress on the connectors.
When it comes to the mechanical side of the loudspeaker installation, the correct
positioning and aiming of each device in the room needs to be inspected, and the
choice of mounting hardware and whether the loudspeakers are properly secured, espe-
cially if they are mounted above audience. While it might not be the responsibility of
the consultant to install the loudspeakers correctly and according to local code, it is the
planner’s responsibility to draw the attention of the appropriate installer if something
needs to be improved or fixed.

10.2.3 Acoustical
Finally, after testing electrical and mechanical matters, the system can be switched on and
initial acoustical tests can be performed. It will be immediately evident if there are any
undesirable noises when turning on the system. There should not be any loud plops or
clicks, nor any constantly audible noise, buzz, hum etc. after turning on the system.
Testing should start with low gain (low sound pressure levels) at first to make sure that no
components are harmed if there are still some faults in the installation. It is also advisable
to test one loudspeaker or, if possible, even just one driver at a time. Pink noise test signals
can quickly help in identifying the frequency content of a device under test but are not
very helpful in determining if the loudspeaker is causing distortion or if mechanical noise or
rattling occurs. These issues become more obvious when using sine sweep signals, a sinus-
oidal test signal that sweeps the frequency range from very low (usually 20 Hz) to very high
frequencies (usually 20 kHz) (see 10.6.2).

10.3 Troubleshooting
• Humming
This effect usually occurs at the electrical mains frequency (50/​60 Hz and harmonics).
The cause could be:
324

324 Gabriel Hauser and Wolfgang Ahnert


• ground loops, i.e., the system has several grounding points with a (small) level
gradient so that a compensating current flow will develop, causing the humming
voltage through an intermediate resistance
• open high-​impedance input with subsequent amplification
• reverse polarity of mains supply between different parts of the system
• Feedback
This effect is the typical ‘howling’ sound of PA systems, and means that single or mul-
tiple frequencies across the audio band feed back into the system and tend to quickly
grow louder and overload the system. Possible causes include:
• insufficient separation between loudspeaker and microphone
• open channel input with subsequent strong amplification
• insufficient gain before feedback with open microphones
• Dropouts at high volumes
If with increasing input level to the system the sound pressure level does not rise accord-
ingly, starts to alter the dynamic content of the source material or even drops out, some
possible causes are:
• overloading one or several power amplifiers
• overloading of a compressor or limiter
• excessively high-​impedance termination of a power amplifier
• Bass-​emphasized, ‘boomy’ or periodic drop-​out reproduction
• low-​resistance mismatch of the power amplifier output (e.g., a short-​circuit in the
loudspeaker line or an overload due to incorrect matching transformers)
• Treble-​emphasized, ‘sharp’ reproduction
• interruption of the program line on the input of the power amplifiers (capacitive
coupling of the signal)
• unipolar (one-​legged) coupling of the power amplifiers
• Reduced low-​ frequency reproduction in between two loudspeakers when using a
coherent signal
• polarity mismatch of a loudspeaker connection
• High-​frequency interference –​buzzing
• insufficient separation between the cables of the lighting system and the program
lines applied to the input of the sound reinforcement system (a minimum separ-
ation of 0.5 m has to be ensured)
• missing interference suppression of the lighting control system (dimmers), may be
adjusted by inductive blocking
• Insufficient high-​frequency coverage in the audience
• faulty mechanical alignment of speakers
• HF horn driver not correctly rotated inside the box
• adapt coverage by changing angles in line arrays or by changing directivity
characteristics in digitally steered arrays
• Cancellation and interference of frequencies
• when multiple loudspeakers are used, verify that the polarity of each loudspeaker
and each driver is correct. A swapped polarity will cause low-​frequency cancella-
tion and mid-​high-​frequency interference patterns
• Uneven SPL distribution in the audience
• make sure all gain settings are correct and all loudspeakers are working (especially
in a line array)
• ensure that all loudspeakers are correctly aimed
325

Commissioning, Calibration, Optimization 325

10.4 Calibration and Optimization

10.4.1 Electrical
Electrical optimization and commissioning of larger sound reinforcement systems is an
important part of the overall tuning process and typically represents a precondition for the
subsequent acoustic tuning of the system.
Firstly, the gain structure throughout the signal chain needs to be optimal: all devices
within the system need to operate within their nominal input and output voltage range,
otherwise signal degradation will occur. All source devices that will be used in the system
need to be tested for their actual output level and signal type (balanced, unbalanced, digital
with all its sub-types). Unnecessary signal conversions need to be avoided, be it from
balanced to unbalanced or from digital to analogue and back.
If an analogue input is available for public use (for example a line input to plug in a guest
device for rehearsals etc.), it needs to be correctly limited in level to make sure that the
input to the system cannot be overloaded. Microphone gain settings also must be set some-
times for a multitude of possible users; this requires some careful setting of gain, enhancer,
compressor and limiter parameters.
If gain before feedback is an issue in the installation, compression on microphones needs
to be set with caution. The use of a dedicated narrow notch filter bank, an automatic feed-
back suppressor or careful equalization is recommended.
In addition to the above, crosstalk between channels as well as signal-to-​noise ratios
(S/​N ratios) can be measured.

10.4.2 Mechanical
It has to be verified that the loudspeakers are firmly mounted in their brackets, the truss
or the bumper and that nothing is rattling. It is very important that the loudspeakers can
radiate freely, with no obstruction of the dispersion through installations such as light
fixtures, trusses, beams, curtains or pillars. Sometimes not all of these elements are known
in the planning phase and might have been added just prior to commissioning.
It is often possible to significantly optimize the coverage of the audience area by
adjusting the alignment (orientation or aim) of loudspeakers, the inter-​box angles in line
arrays or the directivity pattern of digitally steered line sources. This includes directing
sound energy away from open microphones (increasing gain-​before-​feedback) and hard
reflecting surfaces.

10.4.3 Acoustical
The acoustic optimization and tuning of a sound reinforcement system should always be
preceded by the evaluation of the room acoustical properties of the space. This is necessary
to determine to which degree the quality of the sound transmission will be affected by the
room acoustic environment. In a professional design of a sound system this factor will have
been considered prior to the installation, maybe in computer simulations, and can now be
compared to the actual measurements of the finished room.
Room acoustical data can sometimes be obtained from the measurements performed by
the responsible acoustic consultant.
326

326 Gabriel Hauser and Wolfgang Ahnert


10.4.3.1 Room Acoustical Properties
• Reverberation time (RT60, T30, T20, EDT, best in octave or ⅓-​octave bands).
Measurements allow calculating the equivalent sound absorption area required for esti-
mating the diffuse sound level as well as the critical distance (see section 2.2.4.1). For
this reason, reverberation time measurements for all ⅓-​octave bands from 100 Hz to
4 kHz should be carried out for all room configurations in which the sound reinforce-
ment system operates, if possible, for the empty as well as the occupied room, at
various measuring positions; see Figure 10.1. In theatres one may also measure different
conditions of the stage house as well as variable stage settings. Moreover, this may
include different audience hall and stage configurations as may specifically be found in
multi-​purpose halls.
• Sound pressure level distribution without the sound system (‘natural’ speaker/​talker)
can be measured using a dedicated source on stage, for example an artificial human
speaker (Head and Torso Simulator, or a special active loudspeaker device, e.g. NTi-​
Audio-TalkBox; see Figure 10.2).
• Speech intelligibility (STI –​Speech Transmission Index or STIPa –​Speech
Transmission Index for Public Address Systems) at different locations representative
for the room. Intelligibility measurements without the sound reinforcement system but
by using the artificial human speaker are achieved by means of a dedicated signal gen-
erator and a mobile (real time) analyser (see section 7.4).
• Measurements of the background noise level in the audience area of the venue. These
include noise sources such as HVAC systems, lighting and video devices, outside noise
(traffic, machinery, people) and sound transmissions from other rooms adjacent to the
venue. Measurements should be performed using calibrated Class 1 devices that can not

Figure 10.1 Top view of a convention hall, showing measurement locations (R) in the auditorium and
source locations on stage.
327

Commissioning, Calibration, Optimization 327

Figure 10.2 Example of an artificial human speech source for room acoustic measurements.

only display single number SPL values but can analyse the spectral frequency response
of the noise. This allows for later analysis of the respective noise criteria (NC –​Noise
Criterion, NR –​Noise Rating, GK –​GrenzKurve).

10.4.3.2 Electro-​Acoustical Properties


After acquiring the room acoustical parameters, the performance of the sound reinforce-
ment system can be evaluated (see sections 10.6 and 10.7). Due to the current lack of
standardized binaural quality criteria, omnidirectional measurement microphones are
used. To average the results over the audience area, multiple receiver locations must be
measured, either sequentially (with only one microphone) or simultaneously (using mul-
tiple microphones and multichannel measurement devices). The measurements include:

• Sound level distribution in the auditorium and on the stage area. The level diffe-
rence between the highest and the lowest sound level measured over the relevant
audience area should not exceed ±3 dB, but in any case should stay within ±5 dB
(SPL, dBA and dBZ).
• The frequency response in the audience and stage areas. Geometry, room acoustics
and the choice of loudspeakers usually do not result in a flat frequency response in
the audience areas throughout the venue. Measuring the frequency response allows
one to optimize it using equalizers. The tolerance ranges of reproduction curves for
different applications are given by Mapp [1] (Figure 10.3). Not all acoustic phenomena
that cause deviations in the frequency response can be compensated by equalization,
though: destructive interference from boundary reflections or loudspeaker placement
for example will remain unmodified after an equalizer is applied since the direct sound
newgenrtpdf
328
328
Gabriel Hauser and Wolfgang Ahnert
Figure 10.3 Tolerance curves for the reproduction frequency response in different applications: (a) recommended curve for
reproduction of speech; (b) recommended curve for studios or monitoring; (c) international standard for cinemas;
(d) recommended curve for loud rock and pop music.
329

Commissioning, Calibration, Optimization 329


as well as the cancelling reflection are increased in level by the same amount. Averaging
multiple receiver locations tends to balance out these localized anomalies and gives a
better picture of what to actually equalize (see 10.7.7).
• Determination of the correct arrival times for each individual loudspeaker as a
coherent wavefront as compared to the arrival time of the original (natural) sound, at
locations of interest in the audience area.
• The noise produced by the system itself (hum, buzz, hiss etc.) in the audience area,
measured in dBA as well as in ⅓ octave for later analysis of the respective noise criteria
(NC –​Noise Criterion, NR –​Noise Rating, GK –​GrenzKurve; see section 5.2).
• Speech intelligibility and further acoustical parameters derived from the captured
impulse responses, such as C50, C80, echogram etc.
• Determination of maximum SPL (dBA, dBZ) of the system in its stable condition
(no severe distortion, no clipping amplifiers etc.) and before limiting. In order to
achieve good speech intelligibility in voice alarm systems, the SPL has to be roughly
6–10 dB higher than the background noise but not more in order to prevent detri-
mental masking effects that occur with higher SPL (see Figure 7.22 in Chapter 7). The
most demanding venues in this respect are sports arenas, where a maximum SPL of 105
dB(A) might have to be achieved (please note the necessary crest factor of the speech
signal).

10.4.4 Subjective Evaluation


Apart from the objective measuring procedures discussed in section 10.4.3, subjective
procedures are also required for a holistic assessment of the system, for example if very sub-
jective, complex qualities (such as specialized applications of the system or the employment
of certain sound-​effect devices) have to be included.
The evaluation of a systems intelligibility using nonsensical ‘words’ or rather consecutive
syllables is a proven method. The complete ‘word’ needs to be well transmitted in order to
be correctly perceived because the human brain will not be able to complement the word
from context. The ratio of syllables understood (received) compared to syllables read (sent)
is a measure of the subjective intelligibility; refer also to ‘CVC words’ in section 7.4.
Far more complex is the subjective assessment and classification of a sound system by
a team of listeners. Such an evaluation might be required when tuning and optimizing
enhancement systems that are used to change room-​acoustical parameters in a venue. The
listening team would consist of roughly 5 to 20 people who test in various audience areas
throughout the auditorium simultaneously. Various different program examples should be
used as the source for testing and the people attending should change their location within
the room after every test session. The subjective test results should be entered into carefully
prepared questionnaires which will be statistically evaluated afterwards.

10.5 Documentation
All too often this part of the commissioning is neglected since it has no immediate effect on
the outcome of the process. But in later phases of the project, maybe after some years when
the first update or replacement of equipment occurs, the quality of the documentation will
determine the quality of the work that was originally done. It is well possible that another
party is responsible for the update, and if they can rely on proper documentation, it will be
appreciated.
30

330 Gabriel Hauser and Wolfgang Ahnert


Modifications that were made during the commissioning are very relevant: this informa-
tion needs to be shared with the parties responsible for the installation and later the oper-
ation to be included in their documentation.
As a general rule a good basis to author the documentation is taking notes during the
commissioning process along with meaningful photographs of the installation.

10.5.1 Electrical
All settings and presets need to be exported, saved and backed up, and written notes on
firmware versions, routing and settings that cannot be saved (such as analogue equalizer and
gain settings, microphone types used etc.) need to be taken and archived. Choosing logical,
unique names for the files, including the date, will prove helpful.

10.5.2 Mechanical
All changes that were made in the process need to be documented, such as loudspeaker
orientation and positioning. If anything needs to be modified or added (safety mounting,
doors or hoods for racks and mixing desks etc.), this should be documented as well.

10.5.3 Acoustical
Documentation of the state of the venue during measurements: was an audience present,
seats, positions of acoustic banners, iron curtain, state of dividing walls/​doors/​windows?
All measurements that were taken should be saved and named appropriately. It might be
required to measure the system in multiple setups (i.e., all loudspeakers active, only indi-
vidual loudspeakers or zones active, each with different presets etc.).

10.6 Acoustical Measurements

10.6.1 Measurement Methods

10.6.1.1 Traditional Sound Level Measurements


The volume level of sound events has been measured since the existence of the very
first microphones. The common logarithm of the sound pressure is normally used and is
expressed in ‘bel’ or more commonly ‘decibel’ (dB) units. In 1926 Barkhausen proposed the
subjective scale ‘phon’, which is identical to the decibel scale at 1 kHz but accounts for the
frequency-​dependent sensitivity of the human hearing system (Fletcher-Munson Curves).
The weighting curves commonly used with decibels (A, B and C) are an effort to approxi-
mate this behaviour. In particular, the A-​weighted decibel level as displayed by a broadband
measurement device (conventional, simplistic sound level meters) describes the subjective
loudness of sound at average levels as it is perceived by a listener. Such a conventional
sound level meter works as a standalone all-​in-​one handheld unit that consists of a linear,
omnidirectional microphone, the necessary amplification and an analyser with a display.
This unit is calibrated to accurately show the measured sound pressure level. Depending
on the complexity of the device, it can analyse very low-​level noise, ⅓-​octave band or
even narrowband frequency responses and display as well as store multiple parameters
simultaneously.
31

Commissioning, Calibration, Optimization 331


10.6.1.2 Current Sound Level Measurements
In the 1970s objective measurements started to benefit from the first advanced acoustic
measurement devices for room acoustics and sound reinforcement as well as from computer-​
based processing.
Around that time the TEF (Time Energy Frequency) analyser was developed in the
USA. Using the so-​called time-​delay spectrometry (TDS) these devices could measure
energy-time curves by means of a swept sine wave excitation signal. This method will be
discussed in section 10.6.2.6.
Several years earlier, in the 1960s, Schroeder had introduced another concept which
was based on using pseudo-​random noise composed as maximum-​length sequences (MLS)
to determine the impulse response of the system under test. But it took until 1988 for the
first implementation when a device called MLSSA (Maximum Length Sequence System
Analyzer) became available worldwide. Shortly after that MLS became an accepted standard
for room acoustic and loudspeaker measurements. This was also facilitated by the fact that
various room acoustic parameters had been established in the meantime and an extensive
set of post-​processing functions available in MLSSA allowed the display of these parameters
immediately after the measurement (refer to section 10.6.2.5).
After the introduction of modern personal computers and especially due to the broad
availability of laptop computers, several software-​based measurement packages (e.g. Clio,
Dirac, EASERA, Smaart, SysTune and WinMLS) were developed which all allow for a
variety of excitation signals to determine the impulse response or transfer function. They
distinguish themselves primarily with respect to the available post-​processing.

10.6.2 Measurement Techniques Based on Fourier Analysis

10.6.2.1 Fundamentals
Acoustic measurement methods are generally based on the recording of sound signals and
their evaluation. Since the very beginning, frequency analysis was found to be an important
tool for the assessment of recorded data, as it enabled investigations comparable to the
human perception of sound.
In general, Fourier analysis (Fourier, French mathematician, 1768–​1830) is understood
as the spectral decomposition of a time signal with respect to the contributing harmonic
frequencies. A simple continuous sine wave will look like a narrow line at a particular fre-
quency in the spectrum, representing the single frequency that is present. Mathematically,
the signal amplitude at a given frequency is determined by the scalar product of the time
signal a(t) and the harmonic function at frequency ω.
The resulting complex frequency spectrum Ã(ω) provides insight into the contributions
of individual frequencies or tones to the sum signal. Therefore, Fourier analysis is on one
hand useful for comparing subjectively perceived spectra of tones with objectively measured
ones, but also to draw conclusions from the objective measurement regarding the subjective
impression. Furthermore, this method facilitates the identification of resonances in com-
plex processes which cannot be observed by human hearing when masked by the overall
signal. Therefore, evaluation in the frequency domain has become a common measurement
method and it complements investigations in the time domain.
The development of digital signal-​processing and computer-​based measurement systems
along with the use of analogue to digital (A/​D) converters made it necessary to manage
32

332 Gabriel Hauser and Wolfgang Ahnert


signals that were discrete (‘stepped’) in time and treat them properly [2]. The sampling of the
incoming time signal in fixed intervals (‘sample rate’) and the evaluation of data frames of
finite length limit the temporal and spectral resolution of digital recordings compared to the
actual, analogue signal. This is well defined by the Shannon theorem, which states that the
sampling rate fs determines the highest frequency resolved, the so-​called Nyquist frequency:

1
fmax = fs (10.1)
2

The sampling interval T determines the density Δf of the discrete frequency spectrum:

1
∆f ≈ (10.2)
T

Accordingly, all digital audio and measurement systems are subject to these basic constraints.
In practice a variety of such measurement systems are being utilized. Typical
representatives of simple measurement systems, which only display and analyse the fre-
quency spectrum at the input, are complex handheld sound level meters (e.g. B&K
2250, Norsonic Nor145, NTi XL2) and mobile analysers and mobile phone apps (Faber
Acoustical, Ivie IE-​45, decibel app etc.). They provide broadband figures such as the overall
sound pressure level as well as results based on a ⅓-​octave or octave band resolution. This
method also allows for the determination of the frequency response of the system under test
(SUT), but only with respect to its magnitude. For this purpose, a broadband noise signal
is employed with a spectral profile that is pink or white. In a display of sum levels based on
fractional octave frequency bands the pink noise turns into a constant function over fre-
quency. In consequence, the frequency response of the system under test can be assessed
immediately by the deviation of the curve when the signal is fed into the system under test.
These devices are unaware and are independent of the signal that is fed into the system
under test.
Advanced measurement systems can measure the complex transfer function or impulse
response of the system of interest (real and imaginary part). For this purpose, the system is
excited with a known test signal and its response is recorded compare section 5.3.
In practice this technique, also known as inverse filtering, is not well suited for low
signal-to-​noise ratios and excitation signals e(t) of insufficient density or limited spectral
coverage, as frequencies not contained in the input signal cannot be accounted for.
Therefore, for measurements utilizing the deconvolution method, pseudo-​random noise,
swept sine and other well-​defined excitation signals are commonly used as they cover the
entire spectrum.
Real-​world systems are only approximately linear. In this respect, several aspects must
be considered. On one hand, the noise floor which always exists and re-​appears in the
impulse response must be accounted for. On the other hand, higher non-​linear terms
also come into play; they may be caused by the loudspeaker system (distortion), by other
parts of the measurement chain or by inhomogeneities of air as the transfer medium,
such as air movements and temperature gradients. Often, time variance of the system
under test is a cause of measurement errors, especially when measuring outside (stadia,
open-​air venues).
Therefore, the best-​suited measurement signal should be chosen depending on the
nature of the disruptions. Swept sine signals are advantageous because the deconvolution
3

Commissioning, Calibration, Optimization 333


places non-​linearities (harmonics of higher order) at the end of the impulse response,
which then allows for simple removal by windowing. The duration of the measurement
period and the number of averages determines the reduction of the random part of the
background noise (the more, the better). If only short measurements are possible, exci-
tation signals with a high power density should be used to maximize the signal-to-​noise
ratio. Under certain circumstances, it may be possible to exclusively use excitation signals
that are insensitive to small time variances in the system under test (e.g., sine sweep
signals).
Signals from external sources can also be used for measurements if the signal is known
to the measurement system for the deconvolution. This method is mainly used at concerts
and rehearsals in the presence of musicians or audience when measurements with dedicated
but disturbing test signals are not possible, and the music itself is used as the measurement
signal. The quality of the data obtained by this method significantly depends on the fre-
quency content of the supplied signal as well as its sound pressure level relative to the noise
floor. In most cases, longer measurement times and filtering are necessary to achieve results
comparable to the measurement procedures described before as the probability increases
that all frequencies are included eventually.
The measurement process is often supplemented by using multiple evaluation locations.
This is not only a necessity for the tuning of modern sound systems, but also allows for spatial
averaging and thus additional means to suppress the background noise in the measurements.
For this purpose, measurements are performed at several positions of the room and the
individual frequency responses are averaged. From this a correction or equalization can be
applied to the sound reinforcement system.

10.6.2.2 Conventional Excitation Signals


One of the most common excitation methods is feeding the system under test with random
noise. This signal can have various frequency weightings, mostly white (i.e., equal amount
of energy per hertz) or pink (i.e., equal amount of energy per octave band). In the loga-
rithmic world of frequency perception, the latter is closer to human hearing and the energy
content of music and is therefore mostly used. As it sounds subjectively more pleasant,
especially over a longer period of time, it is more widely accepted, also by third parties pre-
sent during measurements. Loudspeaker systems usually have a lower sensitivity and higher
power capacity at low frequencies and pink noise supports this by putting more energy
into the lower-​frequency bands compared to white noise, which in return has a higher
probability of damaging the high-​frequency transducer of the sound system or of causing
distortions in the measurement chain.
The energy content of a pink noise signal decreases by 3 dB per octave; the power density
spectrum can be defined as

1
S (ω ) ∝ (10.3)
ω

This correlation is also shown in Figure 10.4 in comparison to white noise.


But ‘pink noise’ is not completely determined, since it is a random signal. Both the exact
sequence of time samples as well as the crest factor can vary from measurement to meas-
urement or from application to application. When comparing or reproducing measurement
results these limitations have to be taken into account.
34

334 Gabriel Hauser and Wolfgang Ahnert

Figure 10.4 Characteristic frequency spectra for white noise and pink noise. Graph shows the power
density spectrum in dB using an arbitrary normalization.

To determine the complex transfer function of the system under test by means of decon-
volution, exact knowledge of the excitation signal’s time function is required. Therefore,
many computer-​based measurement systems use pseudo-​random noise which is precisely
determined in advance with regard to its amplitude function over time while mimicking the
properties of true random noise.
Another classic excitation technique is the impulse test. For this purpose, typically an
alarm pistol, clapping sticks or a balloon burst are used. The loud impulse sound is recorded
and subsequently analysed in post-​processing. Mainly time parameters such as RT60, EDT
etc. can be analysed since the frequency response highly depends on the stimulus used
and its capability to excite very low or high frequencies. This method is limited to certain
investigations in room acoustics, but is not applicable for the tuning and alignment of sound
reinforcement systems.

10.6.2.3 Measurements with Frequency Sweeps


Over the last decades the swept sine signal (also known as sweep, chirp) combined with the
deconvolution in the frequency domain has become the dominant measurement method,
since this technique benefits from a set of measurement advantages (like low sensitivity to
ambient variance and system non-​linearities) and computer-​based measurement systems are
currently readily available along with sufficient computational performance. In contrast,
during the late 1990s and early 2000s, limited computational resources required the use of
specific technologies, such as MLSSA [3] or TEF [4], in order to comply with the computer
35

Commissioning, Calibration, Optimization 335


hardware available. One downside of this approach was the limited flexibility regarding the
choice of the excitation signal.
The sweep is a continuous sinusoidal signal s(t) the frequency of which changes
over time:

s (t ) ∝ sin ( φ(t)) (10.4)

In this definition, the phase φ(t) can depend on time t in different ways. Usually, this rela-
tionship is given by the instantaneous frequency

d
Ω (t ) = φ(t) (10.5)
dt

If the instantaneous frequency changes linearly over time, the sweep is a simple sweep, a
so-​called ‘white sweep’.

Ω (t ) = α × t + ω 0 (10.6)

In this case, the sweep rate α in Hz/​s is constant since the signal covers the same frequency
range in the same period of time.
If the dependency is exponential, the sweep is called ‘pink sweep’ or ‘log sweep’, which
has a pink-​noise-type energy distribution.

φ (t ) = exp (β1 × t + β0 ) + φ0 (10.7)

The pink sweep has a constant sweep rate of β =​ β1 /​ln (2) in octaves/​s, as the same number
of fractional octave bands is covered in the same period of time. In addition to the sweep
rate, also the start and stop frequencies are important parameters. These should be defined
so that they include the entire frequency range of interest [5].
Comparing the swept sine with other signal types, such as pink noise or maximum length
sequences (MLS), it represents a continuous function over frequency. This is advantageous
for example for digital to analogue (D/​A) converters: compared to stepped or discontinuous
signals the probability of overshoot in the anti-​aliasing filters of the D/​A converters is lower.
In addition, depending on the exact type and length of the sweep, non-​linearities caused by
distortions in the measurement chain can be removed fairly easy from the measured impulse
response. In particular the log sweep allows precise identification, isolation and analysis of
all higher-​order harmonics separate from the fundamental. Furthermore, the sweep is also
less vulnerable to small time variances of the system under test [6]. Finally, another sig-
nificant advantage is that sweep measurements allow the engineer to subjectively identify
distortions and perturbations during the measurement itself, which is more difficult with
noise signals.
Figure 10.5 shows three fundamental sweep signals, a white sweep, a log sweep and a
weighted sweep. The latter provides an adapted spectral shape which is especially useful
for loudspeaker measurements (higher signal excitation at low frequencies). In contrast to
the log sweep its level reduction in the high-​frequency range is smaller and therefore the
signal-to-​noise ratio is usually higher.
36

336 Gabriel Hauser and Wolfgang Ahnert

Figure 10.5 Characteristic frequency spectra for white sweep, log sweep and weighted sweep. Graph
shows the power density spectrum with an arbitrary normalization.

10.6.2.4 Measurements with Other Noise


Several other noise signals are employed for acoustic measurements. Like the sine sweep,
noise signals can be coloured, that is, frequency bands can be weighted relative to others
in order to achieve an optimal adaption to the measurement conditions and thus to maxi-
mize the S/​N ratio. For example, a noise signal could be created with a frequency response
equivalent to the weighted sweep, to optimize loudspeaker measurements performed with
noise excitation.
In the early 2000s, another type of dedicated noise signal emerged for the assessment of
the quality of speech transmission of announcement and voice alarm systems. The so called
STIPa (Speech Transmission Index for Public address systems) measurement method [7]
uses an excitation signal similar to noise which is subjected to amplitude modulation and a
specific octave band filter set.

10.6.2.5 Measurements with Maximum Length Sequences


Acoustic measurements using MLS (Maximum Length Sequences) are based on determin-
istic pseudo-​random number sequences and their correlation and of the response of
the system under test. Maximum length sequences are characterized by their order N, a
positive integer number, and by definition they have 2N − 1 values (or samples). The indi-
vidual samples represent a binary sequence made up of 1s and 0s (measurement systems
usually rescale this to the symmetrical range of +​1 and −1). The construction of the
sequence is explained by the use of a shift register algorithm (see Figure 10.6 for N =​3)
which combines selected bits (taps) recursively with the register state so that all possible
37

Commissioning, Calibration, Optimization 337

Figure 10.6 Shift register for the construction of the maximal length sequence of order N =​3.

combinations of N bits are counted through except for the zero vector. Creating a sequence
with 2N − 1 sample length requires only a minimal amount of memory, that is, the number
of N register bits.
Depending on the algorithm several different MLS for the same order N may result; see
Figure 10.7. It can be advantageous to choose among these different versions, for example if
the first one proves to be inappropriate due to small non-​linearities [8].
The MLS measurement itself is a two-​step process. First the generated maximum length
sequence is sent into the system under test and the response is recorded. Then, the correl-
ation of the two sequences (original and recorded) is computed, which results in the impulse
response of the system under test. Due to the nature of the MLS the numerically expen-
sive computation of the correlation function can be dramatically simplified. This specific
transformation which exploits the particular properties of the MLS is called the Hadamard
transform.
The spectrum of the MLS signal is constant over frequency (white). Its crest factor is min-
imal (in theory 0 dB) and allows for a high signal-to-​noise ratio. The strongly discontinuous
newgenrtpdf
38
338
Gabriel Hauser and Wolfgang Ahnert
Figure 10.7 Section of the time function of an MLS of order N =​16. The sampling rate is 24 kHz.
39

Commissioning, Calibration, Optimization 339


course of the MLS sequence over time is a disadvantage, since the rapid oscillation between
the extreme positive and negative state may be problematic as it may lead to distortion and
clipping in measurement devices. Therefore, in practice MLS test signals are often applied
at a lower sound pressure level compared to sweep signals, which in turn do not have the
advantage of the low crest factor (theoretically 3 dB). It should also be mentioned that MLS
signals are quite sensitive to time variances in the system under test [8, 9], so, for example,
the measurement of decay curves of time-​variant enhancement systems is not possible.
One relevant implementation of the MLS technique in acoustics is found in the computer-​
based measurement system MLSSA [3], which has been available since the late 1980s.
Of course, an MLS signal can be weighted [10], for example to achieve a sound pressure
level reduction of the high-​frequency component (pink MLS, e.g. with EASERA).

10.6.2.6 TDS Method and Technique


As early as the late 1960s Richard Heyser developed the so-​called time-​delay spectrometry
(TDS) measurement method for applications in the audio field [4]. The development even-
tually led to the first TEF analysers (TEF-​10, TEF-​12), which were based on PCM tech-
nology and utilized a tracking filter; the currently still used TEF-​20 and TEF-​25 analysers
are computer-​based.
Time-​delay spectrometry (TDS) can be understood as a measurement where the swept
sine excitation signal is created by the measurement system. After sending this signal
to the system under test, the measured response is fed back to a variable band-​pass filter
whose centre frequency increases in sync with the frequency of the excitation signal. By
changing the delay offset between excitation signal and tracking filter as well as the band-
width of the filter, selected sections of the time response can be evaluated. This can also
be looked at as a frequency-​dependent form of windowing. Therefore, the signal to-​noise
ratio of such measurements can be high as the measurement system ‘listens’ to the appro-
priate frequency only and disregards any other audio events. The tracking filter hence also
facilitates the exclusion of acoustical room reflections, which allows for quasi-​anechoic
measurements in rooms. Figure 10.8 demonstrates the principles of this measurement
method.

10.6.2.7 Measurements Using Arbitrary Excitation Signals


The use of excitation signals such as sweeps or noise is usually not overly disruptive when
performing purely electrical measurements. Acoustical measurements in rooms or outdoors
though often have to be specifically scheduled, as they may conflict with other activities
and can either for example interfere with rehearsals or be disturbed by excessive background
signals such as traffic noise. The use of acoustic measurement signals is even more tricky
when measurements have to be performed in the presence of an audience. Feeding loud
noise or sweep signals into the sound system is a bothering process and therefore often only
possible under special conditions such as during the night or in an empty venue. But the
evaluation of measurement parameters such as speech intelligibility become less reliable,
since the presence of the audience significantly modifies the acoustical properties of a room.
If measurements were performed in an empty venue, the original measurements must be
corrected using predictions of the influence of the additional sound-​absorbing audience.
It would therefore be advantageous if the measurements could be carried out in the
presence of the listeners without their noticing that acoustic measurements are being
340

340 Gabriel Hauser and Wolfgang Ahnert

Figure 10.8 TDS principle. (a) Measurement principle; (b) action of the tracking filter.
341

Commissioning, Calibration, Optimization 341


made. In 1986 Meyer Sound Lab., Inc. introduced such a system, called SIM, which
stands for ‘Source-​Independent Measurements’, now available in its third iteration [11].
It utilizes fairly complex measurement hardware and an algorithm based on Fourier ana-
lysis with the program material as the test signal. Its primary goal is the acquisition of
the complex transfer function of loudspeaker systems in real time and it is limited to cap-
turing the direct sound and early reflections. It does not derive the full-​length impulse
response of the room, which corresponds approximately to the length of the reverber-
ation time in the venue. The software SMAART (Sound Measurement and Acoustical
Analysis in Real Time), introduced around 1995 by Rational Acoustics [12], facilitates
a two-​channel measurement on a standard computer platform to obtain an impulse
response by calculating the complex transfer function of the two channels and applying
the inverse Fourier transform, with the possibility of utilizing the program material as
test signal.
The method commonly used to acquire room acoustic impulse responses is based on two
distinct steps: firstly the recording of both the original signal and the response of the system
under test, followed by appropriate post-​processing and evaluation. In 2008, SysTune,
employing a new measurement method, was introduced by AFMG, which combines these
two steps and performs them simultaneously and continuously. Hereby the room acoustic
impulse response of full length can be determined in real time by so-​called real time decon-
volution; see Figure 10.9.
This continuously derived impulse response is equivalent to the static computation in a
post-​processing algorithm, but requires a number of optimization steps. It can assume typical
lengths of the order of, for example, 4 to 8 seconds. The transform from the time to the

Figure 10.9 SysTune measurement system.


342

342 Gabriel Hauser and Wolfgang Ahnert


frequency domain is performed linearly and with full length, that is, without windowing
or loss of data, and thus provides results identical to the static setup; also, electro-​acoustic
and room acoustic parameters can be derived. Furthermore, averaging and advanced filter
techniques can be applied to improve the signal-to-​noise ratio.
The real-time capability of SysTune [13] is accomplished by providing very high
refresh rates for the computation of the impulse response as well as for its display and fur-
ther analysis results. Simply put, this measurement system can be understood as an ‘oscil-
loscope for room impulse responses’, allowing for a new and improved way of acoustic
evaluation.

10.7 Performing Acoustical Measurements

10.7.1 General Comments


In the past, acoustic measurements in rooms have primarily been performed using time-​
constant, random noise signals, in most cases pink noise. The disadvantage of this method
is that with a non-​deterministic excitation signal one can only obtain information about
average amplitudes, but phase relationships that exist between individual sources as well as
reflections from the ceiling, floor or side walls of the room are ignored.
For this reason, modern computer-​ based measurement systems determine the full
impulse response, from which all relevant energy, time and frequency information can be
obtained by means of post-​processing. However, real-time analysers still have an important
supportive role.

Figure 10.10 Octave-​band display of the spectral shape of white noise and pink noise. Graph shows
the band-​related power sum spectrum with an arbitrary normalization.
34

Commissioning, Calibration, Optimization 343

Figure 10.11 Top view drawing of the Allianz arena in Munich, showing measurement locations in
the bleachers.

10.7.2 Selection of Measurement Locations


To determine the measurement locations usually a grid of positions is defined in the per-
formance and audience areas. The granularity and size of the grid depend on the degree
of complexity with respect to achieving even signal coverage over the areas of interest.
Important and critical locations, such as seats under the balcony or at the edge of a seating
zone, must be accounted for (Figure 10.11). Receiver locations should not be chosen too
close to boundaries since this will result in measurement errors, especially since measure-
ment microphones are usually omnidirectional.
The example in Figure 10.11 shows source locations which are distributed symmetrically
over all seating areas as well as with receiver locations in the pitch. It is recommended in
theatres or other facilities to limit the receiver positions R to only one half of the venue.
This choice is advantageous in symmetrical rooms as it saves measurement time; compare
Figure 10.1. However, for asymmetrical rooms measurement locations should be distributed
over the entire auditorium.

10.7.3 Measurement of Room Acoustic Properties


To determine the acoustic properties of a room, measurement routines are utilized which
provide the impulse response of the system under test (in this case the room). These methods
are essentially identical to those used for sound reinforcement system measurements.
34

344 Gabriel Hauser and Wolfgang Ahnert

Figure 10.12 Room-​acoustic measurement setup.

Room-​acoustic measurements typically use omnidirectional loudspeakers (usually a dodeca-


hedron) as source. To determine the most relevant room-​acoustic parameters, such as defined
by ISO 3382, a four-​channel measurement configuration should be used. Figure 10.12 shows
a typical setup. It allows standard measures to be obtained with an omnidirectional micro-
phone (channel 1), lateral measures like LE (Lateral Efficiency) and others using an add-
itional figure-​of-​eight microphone (channel 2) and binaural measures based on the left
and right ear microphone of the dummy head (channels 3 and 4). It has become quite
common to mount all of these receivers on a single stand. Sometimes this configuration is
complemented by 3D arrayed microphone arrangements to allow the resolving of the sound
field with respect to the angle of incidence [14].
Computerbased measurement platforms facilitate the use of excitation signals such
as sine sweeps, maximum length sequences or noise and also specific signals such as
speech or music samples. External signals such as impulse-type excitations generated
with sport pistols or other impulse-​based methods could be recorded for analysis
purposes as well.
As a result, a single-​or a multi-​channel impulse response is acquired and provides
the starting point for further post-​processing, leading to the room-​acoustic parameters of
interest.
345

Commissioning, Calibration, Optimization 345

10.7.4 Time Domain


Each measured impulse response (monaural, binaural or bidirectional) is a function of sound
pressure over time and can be used to determine time domain quantities and energy ratios
in a post-​processing step.
The room-​acoustic qualities [15] for speech performances (classroom, auditorium, con-
vention hall, church) are separate from those for music performances (concert hall, opera).
They are also distinguished from general time-​based parameters and quality criteria related
to the specific location of the listener. The latter are usually described by the combination
of a receiver location and the location of one or multiple sources. Specifically, for music
performances these would be the locations of the conductor and the musicians.
The most important time-​based parameters are explained in section 2.2.4.

10.7.5 Frequency Domain


The measured impulse response can be converted to a complex transfer function by the
Fourier transform. This includes spectral functions of both signal amplitude and phase.
The frequency-​dependent amplitude function is also called the frequency response and
serves as a highly important criterion of quality when evaluating sound reinforcement
systems as it ensures that no frequencies are overly exaggerated or underrepresented.
In room acoustics, the frequency dependence of the reverberation time is relevant.
Its quality is determined based on tolerance curves indicating upper and lower limits,
derived from the specific usage and volume of the room in question and covered in
various standards.
In electro-​acoustic installations signal-​processing (such as equalization) is often used to
linearize the frequency response of the sound pressure level. Frequency areas with exces-
sive sound pressure levels or dips caused by the reproduction system may be corrected; see
section 10.7.7. It is a commonly accepted rule of practice that one should primarily focus on
a ⅓-​octave smoothed frequency response, as discontinuities of smaller bandwidth, such as
1/​24th octave or even narrower, are not readily perceived. Also, it is important to average
the frequency response over the entire seating area covered by a particular loudspeaker
before setting the filters.
The complex transfer function can also be used to investigate the frequency-​dependent
real and imaginary part along with the group delay. Filters with linear phase (such as FIR
Filters) have a constant group delay over frequency.

10.7.6 Waterfall Diagram


When combining both representations of the transfer function in time and frequency, a
three-​dimensional illustration can be obtained, the so-​called waterfall diagram, which
shows the sound pressure level on the z-​axis along with time and frequency information
on the x-​ and y-​axis respectively. This method is particularly useful to investigate the decay
behaviour of a loudspeaker and/​or room system based on a choice of parameters such as
the time and frequency resolution and the type of windowing applied. Specific parameter
settings allow the resolving of individual reflections with respect to their spectral content.
This type of presentation can be further enhanced using wavelet analysis instead of har-
monic decomposition (Figures 10.13 and 10.14).
newgenrtpdf
346
346 Gabriel Hauser and Wolfgang Ahnert
Figure 10.13 Partial spectrogram.
newgenrtpdf
347
Commissioning, Calibration, Optimization 347
Figure 10.14 Wavelet type presentation.
348

348 Gabriel Hauser and Wolfgang Ahnert

10.7.7 Special Applications


Additional tools compared to standard measurement or processing techniques include the
application of filters, windows and averaging as well as arithmetic combination of multiple
channels.

10.7.7.1 Filtering and Averaging


Filtering a recorded wideband signal is a good way to suppress noise in a measurement (see
also section 5.4.3). A band-​pass filter is particularly well-​suited to remove disruptions out-
side the upper and lower frequencies of interest. A typical measurement bandwidth usually
lies between 40 Hz and 18 kHz, therefore noise below and above these frequencies can be
filtered out. Some noise within the frequency range of interest can also be removed if it is
very narrow-​band (sine tones). In this case a steep band-​stop filter can be applied to remove
disturbing sinusoidal signals. This has no effect on most acoustic parameters derived from
the measurement since these are mostly defined on the basis of 1/​1 octave band or ⅓-​octave
bands between 100 Hz and 16 kHz.
The octave or ⅓-​ octave band filters applied to obtain such band-​ related impulse
responses have been standardized [16], so that measurement devices provide comparable
results. Analogue meters contain filters that are implemented directly as part of their hard-
ware circuits whereas computer-​based measurement platforms implement filters in soft-
ware algorithms. This in particular includes the weighting functions such as A-​, B-​and
C-​weighting [17]. Digital measurement systems usually mimic this behaviour in the post-​
processing of the acquired data. This allows, for example, the reproduction of integration
times and transient responses of analogue measurement systems with computer-​based digital
measurement systems for comparable results. A typical example would be to reproduce the
use of the time constants ‘Slow’ and ‘Fast’ in real-time analysers [17].
Spatial averaging is another form of post-​processing often employed in acoustics. In this
case location-​related quantities (see 2.2.4) are combined into a single result that is repre-
sentative for the entire room, or part of it. To be precise, this resulting average needs to be
provided with information about the variance of the different measurements leading to the
average.
Averaging raw impulse response measurements requires some attention. Straightforward
averaging leads to issues due to the complex phase information embedded in the impulse
response and the according rapid variation of the amplitude response over space. However,
time responses (envelope) and frequency responses (magnitude) can be averaged energet-
ically, as the energy distribution in the room is mostly a continuous function, that is, it
changes only by small amounts throughout the space.

10.7.7.2 Determining Timing of Sources


As soon as multiple loudspeakers are installed in a room, the time information of the
arriving wavefronts has to be analysed and often optimized. Even with only one single
loudspeaker and a natural source, such as a presenter on stage, positioning the loudspeaker
and setting the delay can result in different localisation of the apparent source (compare
section 2.3.4): the source that first arrives at the listener position will be perceived as being
the origin of the sound. If the time difference between the initial wavefront and the next
exceeds about 30 ms the definition and clarity of the perceived sound will be reduced. For
349

Commissioning, Calibration, Optimization 349

Figure 10.15 Exemplary section of a measured impulse response where the sound reinforcement
system (after 90 ms) provides a higher signal level than the original sound source on the
stage (at about 44 ms).

time differences of more than 50 ms, the second wave front might be perceived as a dis-
crete echo.
With analysing the measured impulse responses, the arrival times of the individual
sources can be determined. For this purpose, sequential measurements are made; ini-
tially just a single loudspeaker on stage is active (possibly even a speaker simulator
on the lectern in front of the microphone) and then additional loudspeaker groups
are switched on step by step. With this procedure the influence of each group can be
individually studied. Figure 10.15 shows the impulse response of a situation where the
direct sound will be disturbed by the late and loud amplified sound arriving at the lis-
tener position.
Localization errors will occur if the initial impulse of the source that should be localized
arrives later than the initial impulse of the loudspeaker, or if the initial impulse is exceeded
by more than 10 dB within the first 30 ms and by more than 6dB within 30 to 60 ms (rule
of the first wave front; compare Figure 2.21).

10.7.7.3 Signal Alignment


Obtaining a coherent wavefront can be achieved by alignment measurements in the micro-
second range. Especially in the case of loudspeaker clusters or arrays it is important that the
wave fronts of the individual systems arrive at the audience at the same time. If the cluster
is spatially distributed this is generally not the case without further adjustments. Electronic
delays have to be applied in order to compensate for different propagation times among
the individual components of the array. When tuning the system, usually the arrival time
of the latest signal is used as a reference. Relative to that, all elements of the cluster are
delayed, until a coherent wave front is accomplished. These time adjustments are typically
made on the basis of impulse response measurements for each system or group. As a result,
350

350 Gabriel Hauser and Wolfgang Ahnert


establishing the alignment and thus the coherence of arriving wave fronts will also reduce
or eliminate comb filter effects.
A variety of possible measurement methods exists to determine the correct alignment
settings for a loudspeaker system. An efficient approach is the use of a measurement system
capable of storing and graphically displaying the impulse response of the reference system
while showing additional systems as overlays. By adding an electronic delay to a particular
system according to its overlay measurement the direct sound of this path or channel can
be aligned to the reference channel. After that, the next subsystem can be switched on and
the procedure is repeated.
As with equalizing the frequency response of systems, tuning speaker delays for larger
listening zones is of course more complex. A compromise has to be established for all
receiving locations because the exact alignment of all sources is only possible for one pos-
ition in the room. Often several iterations of the tuning process are necessary to obtain a
satisfying result.

10.7.7.4 Acoustic Feedback


Having a microphone signal amplified and reproduced through loudspeakers that are located
within the same room as the microphone, a so-​called feedback loop is generated: amplified
sound from the loudspeakers is fed back into the microphone, re-​amplified and reproduced
through the loudspeaker again. This results in a howling, ringing, sometimes excessively
reverberant sound sensation, and is detrimental to fidelity and intelligibility. If the amp-
lified signal re-​entering the microphone exceeds a certain threshold, the feedback loop
becomes unstable and single frequencies will be amplified with maximum power, with the
possible result of damaging the loudspeaker. Hence this effect needs to be avoided; see
section 10.4.
In order to measure the possible gain before feedback and to identify the feedback fre-
quency a real-time analyser (RTA) can be employed while carefully increasing the amp-
lification of the loop. As soon as an individual tone tends to oscillate and rise in level, its
frequency can be determined in the analyser. Consequently, a narrow-​band filter (notch)
can be applied to reduce the appropriate peak in the spectrum and thus increase the
potential gain before feedback. RTAs are often replaced by computer-​based measurement
systems which can provide additional indicators for a feedback situation, such as a spec-
trogram view (similar to Figures 10.13 and 10.14). Modern measurement methods often
utilize the speech and music signals available during the rehearsal to derive corrections
with respect to the reproduction frequency response without disturbing musicians or audi-
ence [10, 12, 13].

10.7.7.5 Polarity Test


The polarity is correct if the plus output from the amplifier is connected to the plus input of
the loudspeaker. Although it is frequently mislabelled, polarity should not be mistaken as
phase, which is frequency-​dependent. If two adjacent loudspeakers are not connected with
the same polarity, cancellation effects will occur, especially at low frequencies. To determine
the correct connection, polarity checkers can be used (see Figure 10.16) which display the
polarity of one individual loudspeaker by means of a visual signal after the loudspeaker has
been excited with an impulse-type signal. It is not possible to test multiple loudspeakers at
the same time.
351

Commissioning, Calibration, Optimization 351

Figure 10.16 Polarity checker from MEGA Professional Audio.

References
1. Mapp, P. First published in Audio System Design & Engineering. Klark Teknik, 1985.
2. Oppenheim, A.V., and Schafer, R.W. Zeitdiskrete Signalverarbeitung. München: R. Oldenbourg
Verlag, 1992.
3. Rife, D.D., and Vanderkooy, J. Transfer function measurement with maximum-​length sequences.
J. Audio Eng. Soc., vol. 37, no 6 (1989), pp. 419–​444.
4. AES /​Heyser, R.C. Time Delay Spectrometry -​An Anthology of the Works of Richard C. Heyser.
New York: AES, 1988.
5. Farina, A. Simultaneous measurement of impulse response and distortion with a swept-​sine tech-
nique., Presented at the AES 108th Convention –​Paris (19–​22 February 2000).
6. Müller, S., and Massarani, P. Transfer-​function measurement with sweeps. J. Audio Eng. Soc.,
vol. 49, no 6 (2001), pp. 443–​471.
7. Jacob, K., Steeneken, H., Verhave, J., and McManus, S. Development of an accurate, handheld
simple-to-​use, meter for the prediction of speech intelligibility. Proc. IOA, vol. 23, Pt 8.
8. Vanderkooy, J. Aspects of MLS measuring systems. J. Audio Eng. Soc., vol. 42 (1994), p. 219.
9. Vorländer, M., and Bietz, H. Der Einfluss von Zeitvarianzen bei Maximalfolgenmessungen.
DAGA (1995), p. 675.
10. http://​eas​era.afmg.eu.
11. SIM Audio Analyzer, www.mey​erso​und.com/​produ​cts/​#sim, Meyer. Sound, Berkeley, CA, USA;
www.mey​erso​und.com.
12. Smaart Software, www.ration​alac​oust​ics.com, Rational Acoustics, Putnam, CT, USA.
13. SysTune, http://​syst​une.afmg.eu, AFMG Technologies GmbH, Berlin, Germany, http://​afmg.eu.
14. www.iris.co.nz.
15. Standard DIN EN ISO 3382: 2008–​09: Acoustics -​Measurement of room acoustic parameters.
16. IEC 61260: 1995: Octave-​Band and Fractional-​Octave-​Band Filters.
17. IEC 61672-​1:2013: Sound Level Meters-​Part 1: Specifications.
352

11 System Solutions /​Case Studies


Wolfgang Ahnert and Dirk Noy

The following chapter illustrates a number of real-​world applications and case studies,
characterized by their functionality and particular type of venue.

11.1 Paging and Voice Alarm Systems


These types of systems are required for public spaces where people are gathering, such as:

• Transportation hubs (airports, railway and bus stations etc.)


• Shopping malls
• Public foyer areas in concert halls, opera houses or other cultural venues
• Office buildings
• Hospitals and health care centres
• Senior residences
• Schools, universities, day care centres etc.
• Open air spaces, such as swimming pools or parks

Different types of messages with varying priorities can be distinguished:

Priority 3 – ​Background music and advertisement information


Priority 2 –​Paging calls for individuals, search messages, wrongly parked cars and
similar information
Priority 1 –​Emergency calls in the case of fire, major infrastructural damage, violent
activities etc.

What are the common features and where are the differences?

11.1.1 Common Features and Differences


The information distributed by these systems must reach the covered area at the required
loudness level and with clear speech intelligibility.
According to the international standard ISO 7240-​19, the speech intelligibility index
STI must be equal to or exceed 0.5 on 90% of the covered areas plus the average value over
all areas must be equal or exceed STI =​0.45.
Ceiling-​mounted loudspeakers are commonly used within these areas; this makes sense
as in these areas flat ceilings are predominantly present in corridors or other pathways. The
arrangement of ceiling loudspeakers is explained in Chapter 5. Depending on the height of

DOI: 10.4324/9781003220268-11
35

System Solutions / Case Studies 353


the flat ceilings the distance between the loudspeakers may vary, an average distance being
about 3–​5 m from one to the next.
In spaces higher than 6 m modern digitally controlled or passive line array loudspeakers
are used. The use of computer simulation and prediction is recommended and should also
take into consideration the room acoustical parameters of the space in question.

11.1.2 Target Room Acoustic Measures


To achieve high speech intelligibility, a noise floor as low as possible and reverberation time
values recommended for rooms with speech transmission are required.
Generally, the noise floor in public spaces should not exceed 50 dBA (NR40). With
higher levels, for instances in large malls, the sound system requires an automatic gain
control to automatically adapt the level of all paging and emergency calls. With a 70 dBA
noise floor the signal level should be at least 10 dB higher; unfortunately though, a stronger
masking effect starts to establish itself at around 80 dBA and the intelligibility decreases
again (refer to Figure 7.22). Hence, high speech intelligibility may not be achieved just by
increasing the signal level. It is important to establish a corresponding relationship between
the prevailing noise floor and the signal level required. To secure good intelligibility it may
be necessary to reduce the noise floor by technical means, e.g., by inserting silencers into
the air conditioning system.
Another option to achieve good intelligibility is the control of reverberation in all closed
public spaces. As a very general rule of thumb the reverberation time is to be strictly kept
below 3 s, better below 2 s. In modern architecture, the following surface materials are fre-
quently used:

• Glass walls
• Concrete or gypsum board surfaces
• Natural stone floors
• Metal structures

All these materials basically completely reflect the sound and this will increase the rever-
beration time. If an acoustician is involved in the project the sound system designer should
motivate him to implement sound-​absorbing areas. The necessity for absorption should be
discussed in the early design phase and corresponding treatment is to be implemented in the
agreed design. The situation should be avoided where the architect finishes his or her design
work without any consultation of the paging and emergency system designer.

11.1.3 Different System Layout Approaches as a Base for Computer Simulation


The electroacoustic system layout depends on the geometry of the space in question. In
rooms with limited height ceiling loudspeakers are commonly used. By knowing the room
height and the expected reverberation time simple tools allow the arrangement of the
ceiling loudspeakers; refer to section 6.3.1.1. An important condition for the sound system
design is the fact that all selected ceiling loudspeakers must comply with ISO standard
7240-​24 [1]. This standard specifies the requirements, the test methods and the perform-
ance criteria for loudspeakers intended to be part of a fire detection and alarm system that
emits a warning signal to the occupants of a building. These criteria limit the selection of
applicable loudspeakers.
354

354 Wolfgang Ahnert and Dirk Noy


For digitally controlled line arrays a general certification is still missing in the standard
as it is focused on passive systems. Exceptional permits are often granted for highly directed
loudspeakers where no other technical solution is available.
To secure the functionality of the sound system for paging and especially for alarm calls
the use of computer simulation is strongly recommended. In cooperation with the architect
a computer model is created in software tools such as EASE, ODEON or CATT Acoustic.
The acoustic parameters of the wall, floor and ceiling parts are inserted and the intended
sound system design is implemented in the model. The calculation results will confirm the
successful design or indicate which changes must be made. Good results are achieved if the
demands laid out in section 2.2.4.3 are met.

11.1.4 Verification with Different Measurement Tools


After the installation of the electroacoustic paging and alarm systems and during the
commissioning phase (compare Chapter 10) commissioning measurements must be
performed to confirm the functionality of the installation.
The measurements must record the average noise floor and the achieved sound pressure
level of the installed system. Modern software platforms (such as EASERA by AFMG
Technologies GmbH, Dirac by Bruel & Kjaer or Clio 11 by Audiomatica [2, 3, 4]) as well as
certain professional handheld devices (such as Nor145 by Norsonic or SM50 by Bedrock [5,
6]) measure levels and STI speech intelligibility (compare Figure 11.18).

11.1.5 Case Studies

11.1.5.1 Airport –​Hamad International Airport, Doha, Qatar


The new Doha airport was opened in 2014. Electroacoustic studies were performed to deter-
mine the optimum solution for the paging and information system, which also serves as
the voice alarm system. The system is based on Renkus-​Heinz FA136 6′′ full-​range ceiling
loudspeakers and IC16 digitally controlled line arrays. Figure 11.1 shows the simulation
model of the arrivals area, the baggage claim area and an adjacent low ceiling area on the
ground floor.
Further, the next two figures show the mapping files for total SPL and STI speech intel-
ligibility. The simulations match the required target values.
The loudspeaker types, their location and orientation as well as delays and other settings
were fully studied in the simulation to ensure the functionality of the voice alarm and infor-
mation applications.

11.1.5.2 Railway Station –​Main Station, Berlin, Germany


The new Berlin Main Station opened in 2006. The geometrical complexity of the space
and the necessity to shield neighbouring train platforms from unwanted audio messages and
noise required the implementation of highly directional loudspeakers. Digitally controlled
2D beam shaping loudspeakers are accompanied by disc-​shaped 3D beam shaping sources,
specifically developed by Duran Audio, mainly applied for the upper platforms in the curved
area; refer to Figures 11.4 to 11.6.
The disc-​shaped loudspeaker uses 41 individual drivers and complex DSP algorithms to
shape the beam in the horizontal and vertical planes.
35

System Solutions / Case Studies 355

Figure 11.1 Computer model with Atlas Sound Ceiling speakers FA 136 and Renkus-​Heinz Line
arrays IC16 and listener areas.

Figure 11.2 Calculated SPL distribution in the greeter area.

11.1.5.3 Multi-​Storey Lobby –​Elbphilharmonic Hall, Hamburg, Germany


The world-​famous Elbphilharmonic hall in Hamburg was opened in 2017. The massive
lobby, consisting of ten levels, was fully modelled and studied using an acoustical simulation
platform.
The overall ten-​storey lobby volume is over 30,000 m³. Four hundred and seventeen
ceiling loudspeakers (Bosch LBC 3086/​41 and Innovox FRC-​2-​EH) have been installed.
The average reverberation time at individual lobby levels has been optimized to values
<1.5 s. SPL values and STI speech intelligibility have been calculated in all ten different
floor levels.
356

356 Wolfgang Ahnert and Dirk Noy

Figure 11.3 Calculated intelligibility number STI from 0.5 to 0.75 in the greeter area.

Figure 11.4 Computer model of the main station in Berlin.

Figure 11.5 Arrangement of the Duran Audio loudspeaker IntelliDiskDS-​90.


newgenrtpdf
357
System Solutions / Case Studies 357
Figure 11.6 Radiation pattern of nine IntelliDisk DS-​90 speakers along the upper platform in the main station.
358

358 Wolfgang Ahnert and Dirk Noy


Figures 11.7 and 11.8 show an average STI value of 0.547. With the calculation specified
in DIN VDE 088-​4 [7], STIaverage − STIdeviation equals 0.511, which complies with the min-
imum required value of 0.5. This was measured and confirmed for all lobby levels.

11.2 Sports Facilities


Such facilities are, for example:

• Halls and playgrounds in schools


• Multipurpose halls used for sport events
• Large arenas
• Stadia
• Large training centres

Sound systems again serve two purposes: the support of matchday operations
(announcements, information, paging, missing persons, advertisements) and also their
use as voice alarm systems, as emergency instructions must be understood clearly and loud
enough. Standards such as ISO 7240-​19 [8] or US NFPA 72 [9] provide design guidelines and
functional requirements. In Europe the Standard EN 54-​32 [10] plus national regulations
are applicable. Figure 1.2 gives an overview of the current standard situation. All selected
loudspeaker types must be certified according to ISO 7240-​24 or EN 54-​24 [1]. Additionally,
sports facilities designed for international big ticket events (such as football world cups or
Olympic competitions) have to comply with regulations by the appropriate sports authority
such as UEFA, FIFA, IOC or other specifications.
Each of the facility types mentioned above has different acoustic and sound reinforce-
ment requirements. In school and university environments the acoustic measures
are usually rather basic. Most countries have standards and guidelines concerning the
recommended reverberation time in sport halls and the implementation of emergency
call systems.
In larger halls and arenas used for sport events a specific acoustical design is needed,
focussing on the geometry of the halls, the acoustic quality of wall, floor and ceiling surface
materials and also on issues such as exterior noise intrusion into the hall or internal noise
from HVAC systems.
While decentralized systems are well acceptable for the radiation of information
signals, sound coverage that offers acoustical localization is desirable in certain cases, for
instance for:

• Victory ceremonies, where a directional reference to the scene should be established


• Program or advertisement projections on video screens, where the visual presentation
must coincide with the acoustical perception

This is enabled by loudspeaker arrangements capable not only of ensuring speech intelligi-
bility and music clarity, but also of localization of the sound sources.

11.2.1 Large Meeting Rooms Used for Sport Events and Smaller Sports Halls
In smaller Sport halls or gyms of universities and schools (volumes up to 20,000 m³) basic
announcement systems are needed for paging, information and emergency calls. Quite often
359

System Solutions / Case Studies 359

Figure 11.7 Computer model of the complicated lobby structure.

ceiling or wall loudspeakers are employed, as well as centrally arranged loudspeaker systems
installed on one end of the hall close to a platform in operation. Just as with centralized
systems, the required speech intelligibility is achieved by a staggered installation of sound
reinforcement systems, often of single loudspeakers or clusters. Acoustic localization towards
the action area can be achieved by varying the installation height or appropriately delaying
certain loudspeakers. Appropriate staggering of the loudspeakers (spacing ≤15 m) in the
depth of the hall prevents travel-time differences that could cause echoes.

11.2.2 Sports and Multipurpose Halls


The sound-​radiating devices in large sport halls or multipurpose halls used for sport events
are often built as clusters consisting of a number of individual loudspeakers or by using
modern line arrays (Figure 2.28). If delay equipment is not used in rooms having a volume
of more than 50,000 to 70,000 m³ and a width of more than about 80 m, a localiza-
tion reference to the action area (stage platform or field of play) should nevertheless be
established, although travel-time interferences are not necessarily to be expected. This
directional effect, as for most loudspeaker arrays, is less pronounced for low frequencies
than for middle-​and high-​frequency ranges. Additionally, as the radiation impedance
increases in the low-​frequency range, a significant bass emphasis in the diffuse field is
likely, especially outside the main radiating direction. This particularly affects the area
directly below the loudspeakers and must be considered when specifying and arranging
the loudspeakers.
For halls having a stage at one long side, additional loudspeakers are arranged in front of
the stage which are used as the main sound reinforcement system. A subdivision of all array
units into various amplifier channels is suitable. Thus, separate control and possible filtering
of individual arrays becomes feasible, for example, enabling the loudspeakers aimed at the
platform to be used for monitoring, the downwards radiating ones for optimizing the timbre
newgenrtpdf
360
360
Wolfgang Ahnert and Dirk Noy
Figure 11.8 Overall RT, SPL and STI values in selected floor level.
361

System Solutions / Case Studies 361


and the sound in the area near the platform, and the ones radiating away from the platform
to be determinative for the sound level in the main coverage area.
In very large halls it is not feasible to supply the entire hall by one single loudspeaker
array or cluster of loudspeakers with a sufficiently high and uniform sound level. In these
cases, the principle of centralized coverage has to be abandoned.

11.2.3 Sports Stadia

11.2.3.1 Basic Requirements for Loudness and Sound Reinforcement in Stadia


(a) High quality sound coverage of the bleachers in halls and stadia specifically during
emergency situations
• Will be solved with the study of the stadium geometry and the appropriate specifi-
cation of acoustic treatment
• Selection of the appropriate arrays and the design of the appropriate arrangement
(placement and radiation direction)
• Stadia with permanently present or retractable roofs imply more reverberation,
hence additional absorptive treatment is needed
• Calculation of the expected sound pressure level and speech intelligibility values
by computer simulation
(b) Avoidance of high sound level radiation to the exterior environment
• Directing the loudspeaker arrays to the bleachers and to the field of play
• Considering the sound isolation of the outer stadium walls
• Correctly specifying the roof materials, e.g., membrane roofs do not attenuate the
low frequencies of the stadium sound, which are audible at distances up to 5 km
and more
• Stadia with permanently present or retractable roofs block the signal radiation to
the exterior
(c) Prevention of impact of noise signals from outside into halls and stadia
• Must be taken into account while selecting a new stadium site, which should be
selected distant from heavy industrial plants or airport approach flightpaths
• In noisy environments closed stadium roofs are preferred

11.2.3.2 Target Groups for Sound Radiation


In the first place, these are the audience zones in the bleacher areas where smooth coverage
and superb speech intelligibility are required.
Artists, athletes, officials, referees and others in the field of play or stage area may require
special information. This information may be supplied by mobile sound systems to support
the clarity of the general stadium sound and to avoid disturbances that can result from
delayed double hearing on or close to the field of play. Types of information may be:

• Schedule guidance
• Opening and closing rituals
• Team and athlete introductions
• Goal announcements
• Winner honouring
• Announcements of special sport events
362

362 Wolfgang Ahnert and Dirk Noy


• Greeting of VIPs and guests
• Paging and search messages
• General traffic and crowd flow information

For advanced music and speech transmission in a stadium additional technical efforts
are required, such as an extended frequency band to be transmitted by use of subwoofers
(total minimum frequency range 60 Hz to 12 kHz). Typical programming with higher
requirements, e.g.:

• Pre and post event: general sound coverage for background music or information
• Advertisement clips with acoustic localization to the active video wall, hence acoustic
and visual impressions are identically localized for a stronger impression
• Music reproduction including dynamic popular tracks including anthems and other
content

11.2.3.3 Sound Coverage in Sports Stadia


The action area of a sports stadium is the field of play or pitch. This is the spectators’ focus
area, and should also be the origin of the acoustical signals. The source to be amplified, in most
cases an announcer, perhaps in some cases a music band, may be located within the action
area or at its perimeter. During athletics events, for instance, the stadium announcer is located
at the edge of the tracks, whereas during football games the announcer is located in a booth.
In smaller and older stadia, the loudspeakers are often installed along the perimeter of
the pitch at about 15 to 20 m apart. The loudspeakers feature a cardioid radiation pattern
and are aimed upwards at the stands (see Figure 11.9). Because of the sound propagation
over the absorptive spectators, the distribution loss for full attendance is often above 6 dB
per doubling of distance. Specifically in the case of high stands it is often necessary to install
additional loudspeakers for the upper sections.

Figure 11.9 Olympia-​Stadium in Berlin 2002 with no roof and sound coverage from the perimeter of
the field of play.
36

System Solutions / Case Studies 363


In the case of a roof being available, these loudspeakers are often installed below its front
edge and aimed at the rear zone of the stands.
When no roof is present, the challenge is greater: in this case it may be necessary to
erect installation poles in the area of the stands, at the end of which loudspeakers with
high rearward attenuation are installed and aimed at the upper third of the stands. The
height of the poles depends both on the travel-time difference of the sound coming from the
nearest loudspeaker, which should not exceed 50 ms, and on the requirement to obstruct
the view as little as possible. The loudspeakers at the edge of the field –​serving as reference
loudspeakers –​should be capable of transmitting the entire frequency range, whereas the
additional loudspeakers may perhaps have a narrower bandwidth. Since music transmission
is generally required, the transmission range should be at least 70 Hz to 10 kHz. Subwoofers
may be used in smaller stadia as well.
For covering the pitch itself, additional loudspeakers are arranged in smaller stadia
without a roof at the rear of the loudspeakers located at the edge of the field. Often it is suf-
ficient to arrange these loudspeakers on one side of the field. In this case it is possible that a
sound level difference might occur between the two sides of the field, which is undesirable,
but on the other hand there are no travel-time interferences to be expected on the entire
field, which is often the case with dual-​sided coverage. In larger stadiums with a roof the
coverage of the pitch should be achieved from one side only.
Special requirements apply for sound reinforcement systems in stadia for synchronized
gymnastics. In this case any delay is not permitted, as it impedes the synchronicity of
the exercises. In some cases, decentralized loudspeakers embedded in the ground have
been used.
Room acoustic treatment in stadia depends on the geometry of the stadium or arena:

• Fully closed arena (see also section on Multipurpose halls)


• Stadium without a roof
• Stadium with retractable roof
• Stadium with small roof on one side
• Stadium with two roof parts
• Stadium with roof covering all bleachers
• Fan-​shaped roof
• Curved roof with and without absorption
• Roof open or closed at the end of the bleachers

Depending on these architectural solutions different acoustic measures are called for.

11.2.3.3.1 ROOM ACOUSTICS IN A STADIUM WITH ROOF OR PARTIAL ROOFS

The shape of the roof must be studied and the roof may be acoustically treated, taking into
account the following issues:

• A curved roof will quite often create higher reverberation with longer reverberation
time in comparison to a flat one
• The roof underside should be absorptive if possible
• Different acoustic behaviour below the roof; i.e., pure steel construction, absorptive
layer or membrane material produces more or less reverberation; this must be considered
during computer simulation
364

364 Wolfgang Ahnert and Dirk Noy


• Expected acoustic behaviour at the edge area of the roof (almost free field), produces
less and less reverberation being closer to the roof edge

Further acoustic measures in the stadium:

• Use of tilted walls and windows (e.g., at skyboxes) to avoid direct reflections and
echoes
• Reverberation time under the roof should never exceed 3 s when occupied

11.2.3.3.2 ROOM ACOUSTICS IN A STADIUM WITH NO OR ONLY SMALL ROOF PARTS

Acoustic treatment is mainly relevant to avoid echoes and signal colouration:

• Use of tilted walls and window parts to deflect sound from the PA system
• Good sight lines promise good direct sound coverage (important if loudspeakers are
mounted at the pitch edge)
• No large building walls outside the stadium so as to avoid echoes

11.2.4 Case Studies

11.2.4.1 Getec-​Arena, Magdeburg, Germany


In 2018 and 2019 a new sound system was installed at the Getec-​Arena with the goal to
improve speech and music reproduction, specifically during handball games; refer
to Figure 11.10. The previously installed system consisted of three cluster arrangements
of individual loudspeakers with two additional staggered delay lines for the sides of the
bleacher areas.
Figure 11.11 shows the computer model of the hall including the newly specified sound
system, made up of 15 Kling & Freitag Sequenza 5 digital line arrays consisting of altogether
75 modules (6 × 6, 6 × 5 and 3 × 3 line arrays), plus altogether nine point source Kling &
Freitag Gravis 15 W loudspeakers.
The system settings are designed for covering the bleachers during handball games
and for cultural performances with a stage on one short side of the hall. Figure 11.12
illustrates the reverberation time, the SPL and the STI mapping results. Figure 11.12 shows
an average STI value of 0.559. With the calculation specified in DIN VDE 088-​4 [7],
STIaverage − STIdeviation equals 0.523, which complies to the minimum required value

11.2.4.2 Stadium –​Green Point Stadium, Cape Town, South Africa


Multiple stadium sound systems were designed for the 2010 FIFA soccer world champion-
ship in South Africa. Figures 11.13 to 11.15 show the computer model including the loud-
speaker systems of the Green Point Cape Town stadium.
Additionally, expected sound level emissions from the stadium sound systems to the
exterior environment were simulated. The predicted results of such calculations even
had an influence on the time schedule of the games during the championship: compare
Figures 11.16 and 11.17.
365

System Solutions / Case Studies 365

Figure 11.10 View into the Getec Arena during a handball game.

11.3 Hotels, Museums, Exhibition Halls and Convention Centres


In these facilities mainly information, paging and voice alarm systems are of import-
ance. In hotels the public areas such as lobbies, bars and ball rooms must be reached by
paging and emergency calls. The same applies to museums, exhibition halls and conven-
tion centres.
With all paging and voice alarm systems paging calls for search messages, background
and advertisement information and emergency calls in cases of fire, violent incidents or
other emergency events can be distinguished.

11.3.1 Common Features and Differences


All information emitted by these systems must reach the covered area at the required
loudness and with adequate speech intelligibility.
According to the international standard ISO 7240-​19 the STI value for speech intelli-
gibility must be greater than or equal to 0.5 in 90% of the covered areas, while the average
value over all areas must exceed 0.45.
Ceiling loudspeakers are often used in hotels and museums; refer to Chapter 6. Depending
on the height of the flat ceiling the distance between the loudspeakers may vary. Usually,
the ceiling loudspeakers are arranged at 3–​5 m distance from each other. Software solutions
exist to find the optimal arrangement as a function of the space height; compare section
6.3.1.1.
newgenrtpdf
36
366
Wolfgang Ahnert and Dirk Noy
Figure 11.11 Wireframe model of the Getec Arena Hall.
newgenrtpdf
367
System Solutions / Case Studies 367
Figure 11.12 Overall RT, SPL and STI values in the Hall.
newgenrtpdf
368
368
Wolfgang Ahnert and Dirk Noy
Figure 11.13 Wireframe model of the inside geometry of the stadium.
369

System Solutions / Case Studies 369

Figure 11.14 Twelve line array positions with a total of 124 Electro-​Voice XLD 281 modules.

Figure 11.15 Outstanding SPL and STI distributions in the bleachers.

In spaces such as ballrooms, exhibition halls and convention centres modern digital-​
controlled as well as passive line arrays are used. In the design phase computer simulation
is highly recommended. For this purpose, the room acoustic data of the appropriate surfaces
must be known.

11.3.2 Target Room Acoustic Measures


To achieve high speech intelligibility, the noise floor must be as low as possible, with rever-
beration times within the applicable recommended range.
Generally, the noise floor in public spaces of hotels and museums should not exceed 40
dBA (NR30). With higher levels, for instance in large exhibition halls, the sound systems
require a gain control to automatically modify the level of all paging and emergency calls.
For example, with a 70 dBA noise floor the signal level should be at least 10 dB higher, but
it must be noted that signal levels of 80 dBA and higher will start to show a masking effect
(compare section 7.4.1) and the speech intelligibility decreases again. As mentioned before,
high intelligibility cannot be achieved just with increased signal level alone, but rather by a
balance between the prevailing noise floor and the required signal level. To obtain adequate
intelligibility it might be required to reduce the noise floor by technical means, e.g., by
improvements in the air conditioning system.
370

370 Wolfgang Ahnert and Dirk Noy

Figure 11.16 Location of the stadium in Capetown close to residential areas.

Figure 11.17 Noise map during a night time game.


371

System Solutions / Case Studies 371


Another parameter to secure good intelligibility is reverberation control in all closed
spaces. A rule of thumb is to keep the reverberation time in hotel public areas and museums
below 2 s, better below 1.5 s. To achieve this is often quite challenging, as modern architec-
ture is dominated by acoustically hard wall and ceiling materials such as:

• Glass panes
• Concrete or gypsum board surfaces
• Natural stone floors
• Metal structures

All these materials more or less completely reflect sound and this will increase the reverber-
ation time, hence it is important that the need for implementing acoustical absorption is
discussed in the early design phase and corresponding treatment is integrated in the space
design.

11.3.2.1 Different System Layout Approaches as a Base for Computer Simulation


The system layout depends on the type of the facility. In rectangular rooms of limited height
ceiling loudspeakers are often used. By knowing the room height and the expected reverber-
ation time simple tools allow the computational arrangement of the ceiling loudspeakers;
see above.
As numerous types and manufacturers of professional loudspeakers are available that
need to be studied, preselected and ultimately specified, it is highly recommended to employ
computer simulation to assist in the engineering. In cooperation with the architect a com-
puter model is created in software tools such as EASE, ODEON or CATT Acoustic. The
acoustic parameters of the walls, floor and ceiling sections are indicated and the intended
sound system design is realized in the model. The calculation results will either confirm
the successful design or illustrate that modifications need to be done before another simu-
lation run. Positive results are achieved if the requirements laid out in section 2.2.4.3 and
Chapter 7 are met. This procedure is specifically recommended for large halls where the
sound system not only covers speech messages but is also used for music reproduction.

11.3.3 Verification with Different Measurement Tools


After the installation of the audio systems and during the commissioning phase (com-
pare Chapter 10) acoustical measurements are performed to confirm the successful
completion.
The measurements must register the average noise floor and the achieved sound pressure
level of the installed system. In addition, measurement tools allow the measuring of STI
speech intelligibility values (compare Figure 11.18).
Handheld sound level meters such as the Bruel & Kjaer 2250 [11], the Nor145 by
Norsonic [5], NTI XL2 [12] or the SM50 by Bedrock [6] may be used. A number of software
tools are available for acoustical measurements, such as EASERA by AFMG Technologies
GmbH [2], Dirac by Bruel & Kjaer [3], Clio 11 by Audiomatica [4] or REW Room EQ
Wizard [13] for measuring levels and impulse responses. By using a transformation developed
by Schroeder (compare reference 15 in Chapter 7) the results of an impulse test will allow
the obtaining of sound pressure SPL and STI intelligibility values. The influence of the
noise floor must be determined by an additional calculation.
372

372 Wolfgang Ahnert and Dirk Noy

Figure 11.18 NTi XL2 Handheld Acoustical Analyzer including measurement microphone M2211.

11.3.4 Cases Studies

11.3.4.1 Museum –​National Museum, Beijing China


The museum, originally established in 2003, was renovated and reopened in March 2011,
based on a design by Germany-​based gmp Architects International. The room and building
acoustics as well as the sound systems for various halls and lobbies were studied and optimized.
The largest space is the Grand Foyer at the entrance of the museum (Figure 11.19).
The first simulations resulted in an excessive reverberation time of 10 s for the
Grand Foyer, hence special acoustic treatment was specified. Using perforated ceiling
layers with mineral wool underlays the reverberation time could be reduced to about 2
s in the midfrequency range. Compare the simulation and rendering figures for details in
Figures 11.20 and 11.21.

11.3.4.2 Hotel –​Lusail Hotel, Doha, Qatar


This prestigious hotel project was designed in 2012 to 2016 by the Dubai office of Germany-​
based architectural office Kling Consult GmbH.
Specific ceiling and wall designs in the main hall may be understood by considering
Figure 11.22.
newgenrtpdf
37
System Solutions / Case Studies 373
Figure 11.19 Wireframe model of the 250-​m-​wide entrance lobby.
newgenrtpdf
374
374
Wolfgang Ahnert and Dirk Noy
Figure 11.20 Rendered view of the Grand Foyer including ceiling detail.
newgenrtpdf
375
System Solutions / Case Studies 375
Figure 11.21 Designed sound system with line arrays type JBL VT4888DP in the centre of the foyer.
newgenrtpdf
376
376
Wolfgang Ahnert and Dirk Noy
Figure 11.22 Room acoustic design of the main hall. Left above: View to the hotel design. Right above: Computer model of the main hall.
Below: Echo simulations in computer model.
37

System Solutions / Case Studies 377

Figure 11.23 Recommended secondary structure in the main hall. (Left) Architectural design of the
hall (only hard surfaces); (right) hall with acoustical treatment at ceiling and back wall.
Light red faces at front and back wall: Broadband absorber (e.g. slotted panels). Orange
face at back wall: Additional broadband absorber (slotted panels or curtain). Dark red
faces at ceiling: Broadband absorber or sound transparent grid. Dark blue: Glass facade
with inclined lamella structure.

Figure 11.24 RT values in the main hall (left) without treatment and (right) with treatment.

This design resulted in an acceptable reverberation time and absence of echoes (see
Figures 11.23 and 11.24).

11.4 Theatres and Concert Halls


Theatres saw the light of day over 2000 years ago. The name originates from a Greek expres-
sion and stands for ‘place to see’ (a performance). In 15 BC Vitruv reported on the con-
struction of Greek theatres, the conditions for good sightlines and the tools to improve the
acoustic properties (use of short-time reflections on the so-​called scena (a structure behind
the playing area) and of actor masks with a horn-​shaped mouth design). Hence, from the
very beginning speech intelligibility had a high priority and this is still the case today. In
court theatres of the sixteenth and seventeenth centuries the ruler sat in the centre of the
theatre to have close contact with the actors. In the nineteenth century horseshoe-​shaped
theatres were built with a first-​floor centre box for the king or duke. It has always been
understood that excellent sight conditions coincide with good speech intelligibility.
Several setups and configurations are currently distinguished:

(a) Opera house


(b) Classical concert hall
(c) Dramatic theatre
(d) Repertory theatre
(e) Concert venue for pop/​rock and jazz
(f) Concerts in a multipurpose hall
378

378 Wolfgang Ahnert and Dirk Noy


Types (a) to (c) are performance spaces designed to facilitate one single performance genre,
while types (d) to (f) are more suited for a mixed use of music and speech performances.

11.4.1 Common and Special Features of these Facilities

11.4.1.1 Single-​Purpose Facilities


Classical opera houses and concert halls serve one performance genre and so the most spe-
cific and critical demands in room and building acoustics are to be found.
For opera houses the hall acoustics show a trend to higher spaciousness and liveliness.
In the nineteenth century an acoustic dry atmosphere dominated; the reverberation time
in the halls was around 1 s or even lower. Typical examples of this kind are the famous
opera house in Milan (1778), the Bolshoi Theatre in Moscow (1776) or the State Opera
in Berlin (1743), mostly with horseshoe geometry, therefore good sight lines are given at
least for the first rows in the galleries and also for the stalls because of the inclination of
the floor. Correct design of the forestage (almost parallel wall parts and ceiling not overly
high) ensures good sound radiation from the stage into the auditorium. An important part
of ‘good acoustics’ in an opera house is the structure of the orchestra pit. The pit must not be
covered too dominantly by the stage floor; in addition, the walls, floor and ceiling claddings
must follow correct room acoustic design criteria.
Over the last 50 years opera houses have been built with higher spaciousness and
reverberance. The trend goes to reverberation time values known from classical concert halls;
not halls with 1 s reverberation time but with 1.5 s or even higher. Examples are the Theatre
Wolfsburg (1973, Architect Hans Scharoun, the architect of the Berlin Philharmonic
Hall), the Semper Opera in Dresden (reopened 1985), the 2017 reconstructed State
Opera in Berlin, all in Germany, or the Seattle Opera House (1963) and the Metropolitan
Opera House in New York (1966), both in the United States. The acoustics of the Oslo Opera
House (2008) are very similar to those of a concert hall (reverberation time around 2 s).
This makes it easier for a singer to obtain acoustic feedback and support from the audi-
torium. The audience also reports better involvement with the sound. Most building
renovations currently follow the trend to remove excessive absorption and to enhance the
reverberation.
The next single-​purpose hall is the classic concert hall. These halls have been built for
over 200 years. Three basic types of hall shapes predominate:

(a) Concert hall similar to an opera house, but employing an orchestra shell in the rear
part of the stage in lieu of a stage house.
• The most famous hall of this kind is the Carnegie Hall in New York, which is a
horseshoe theatre with the favourable properties of such a shape. As in a theatre all
seats have approximately equal distance to the podium, good sight lines and hence
good direct sound coverage. The structured proscenium and the balustrades of the
galleries supply enough short time reflections. Multiple reflections are rare because
of the absence of larger flat wall parts. Therefore, high clarity values combined
with lower spaciousness and reverberance are observed in these halls.
(b) Shoebox-​shaped halls, such as medieval guild halls –​these have also been used for
music performances since their construction. An example is the Guerzenich Hall in
Cologne, Germany, built in the fifteenth century, in use for musical performances for
over 200 years. Another example is the ‘Gewandhaus’ in Leipzig, now with the third
379

System Solutions / Case Studies 379


building on this site. The old Gewandhaus was built at the end of the fifteenth cen-
tury by the Tailor Guild (‘Gewand’ stands for a garb made by tailors), and concerts
have been performed since the mid-​eighteenth century. This building was torn down
at the end of the nineteenth century, and in 1884 the Old Gewandhaus was opened
on another site. Like the first one it was shoebox-​shaped as was typical for such halls;
compare it also to the famous Musikvereinssaal in Vienna, opened in 1870. The advan-
tage of these shoebox halls is the primary structure with a balanced ratio between side
wall and ceiling areas. The relatively large volume and the shape of the hall create high
spaciousness by acceptable clarity. The secondary structure of the hall (side walls finely
structured by balustrades, sculptures and décor) supplies early and multiple reflections.
(c) Concert hall in amphitheatre or vineyard geometry: this type was first opened 1963
in Berlin. The Philharmonic Hall is the second building with that name, since the pre-
decessor building in shoebox shape was destroyed during the Second World War. This
new Philharmonic Hall then became the prototype for similar halls worldwide, like the
Suntory Hall in Tokyo (opened 1986) or the Disney Hall in Los Angeles (opened 2003).
In Germany the New Gewandhaus in Leipzig (opened 1981), the Elbe Philharmonic
Hall in Hamburg and the Concert Hall in Dresden (both opened 2017) have an amphi-
theatre shape. The good sight lines in such halls ensure perfect direct sound coverage.
As the audience is sitting around the orchestra, ceiling panels (or an adjustable canopy)
ensure good mutual hearing of the musicians and specifically designed balustrades and
hallway areas supply early reflections. Large side walls supply multiple reflections to
enhance the spaciousness and liveliness. This hall type is currently very popular.

The last type of a single-​purpose hall is the drama theatre. The original dramatic theatre
type goes back to Greek and Roman times. The renaissance period did see a significant
increase of performing actor groups playing their shows at fairs or in the courts of nobles,
with the prominent example of the actor group surrounding William Shakespeare in the
late sixteenth and early seventeenth century. Even the first covered performance space,
the Globe Theatre in London, was built. During the seventeenth and eighteenth centuries
more and more playhouses were built, first only for kings or dukes and their entourages,
but later also for the general public. In 1821 in Berlin a famous playhouse was designed
by Karl Friedrich Schinkel, which was rebuilt as a concert hall in 1984 after destruction
during the Second World War. Further theatre buildings mainly in horseshoe shape are the
Burg Theatre in Vienna (1888), the Royal Dramatic Theatre Stockholm (1908) and the
Comédie-​Française Paris (1900). Today many different geometries are being selected, and
most of them are not just used for speech performances but also for concert shows and opera
presentations.

11.4.1.2 Repertoire and All-​Purpose Theatre


Most modern theatres are repertoire theatres, in which a resident company presents works
from a specified repertoire, usually in alternation. Not just dramatic plays but also operas
and concerts are performed. The basic geometry of these repertory theatres is derived
from the single-​purpose ones explained above, either a classic opera house in horseshoe
shape or a drama theatre with or without balconies. Some have orchestra pits to perform
operas, other theatres prefer speech performances and cooperate with other venues to per-
form operas. Germany counts approx. 300 repertoire theatres with their own ensemble or
for guest performances. Similar quantities per capita can be expected in other European
380

380 Wolfgang Ahnert and Dirk Noy


countries, Japan or North America. Emerging countries such as China and some nations in
South America are busy constructing new theatres.
Driven by the broad repertory different acoustic requirements are considered in these
theatres. For classical concerts higher reverberation is desirable, so in a theatre a remove-
able concert shell is erected on stage and methods of variable acoustics are used to change
the acoustic properties. Banners, curtains or sometimes operable wall parts are used to
obtain the variability. Electroacoustic methods are employed more and more to enhance
the acoustic properties; refer to section 2.7.2 regarding ‘electronic architecture’.
Either of two design approaches can be chosen:

(a) The traditional approach is to build a hall with a rather long reverberation time of
1.5–​1.8 s and then to use banners, curtains or other architectural measures to reduce
the natural reverberation time down to 1.1–​1.4 s. This approach is well suited for music
and opera performances.
(b) If the new build or existing theatre is being designed mainly for drama theatre an
electroacoustic enhancement system may be used to adapt the theatre hall for music
performances. These systems offer an acoustic quality that for a layman is not distin-
guishable from the natural acoustic properties of the hall.

11.4.1.3 Concerts in Multi-​Purpose Halls for Modern Music


Modern music such as rock, pop and jazz should be performed in spaces with low rever-
beration or in outdoor venues. Electronically amplified music instruments are played and
amplified in a way that they include the required acoustical effects such as reverberation,
hence any hall spaciousness and reverberation is in fact in conflict with the sonic intents
of the performance. It is not an accident that such concerts are often performed in sport
arenas, stadia or outdoor venues. Modern sport arenas are designed with the acoustical
target to keep the reverberation time in the occupied case below 2 s. Any modern concert
performances and also classical concerts are well feasible within these spaces. Examples
for such halls are the Staples Centre in Los Angeles (1999) and the Mercedes Benz Arena
in Berlin (2006), both erected primarily for ice hockey matches. Jazz concerts mostly take
place in smaller halls where the natural reverberation is lower compared to large arenas.

11.4.2 Task of Sound Systems in Theatres and Concert Halls


It sometimes is assumed that single-​purpose halls do not necessitate a sound system. And
yes, the musicians, the singer do indeed not use microphones, but the production may
still call for announcements, commentary, reproduction of electronic instruments and in
theatres playback of sound effects etc.

11.4.2.1 Tasks in General


Sound systems in Theatres, concert halls or similar venues cover the following functions:

• Providing an adequate sound pressure level with the required speech intelligibility and
clarity for the audience for quiet sources (singer or talker) or for sources which are
to be reproduced only by means of electroacoustic amplification (e.g., electroacoustic
instruments)
381

System Solutions / Case Studies 381


• Clear reproduction of sound effects from different directions of the stage or the audi-
ence area for supporting the scenic actions and increasing the impression of the stage
performance
• Enhancement of the acoustic room impression and the involvement of the audience
hall, creation of immersive sound impressions. Possible electronic enhancement of
the reverberation time or the reproduction of moving sound sources by a multitude of
loudspeakers within the audience area
• Signal playback for the performers as monitoring to enable coordinated, synchronous
teamwork of all involved actors and singers
• Enabling production directions for the actors, singers or musicians during rehearsals
• A sound system providing adequate sound quality for spectators with hearing issues

In addition to providing these functions, the systems should ideally be installed unobtru-
sively into the structure of the building.

11.4.2.2 Systems in Detail Explained in a Theatre

11.4.2.2.1 MAIN SOUND SYSTEM

This system mainly serves to provide sufficient sound pressure level in the audience area.
In a theatre such as that shown in Figure 11.25 the loudspeakers or often line arrays are
installed left and right in the side walls of the forestage and above the stage proscenium.
Additionally, delay systems cover the back of the stalls or any balcony areas with sound (see
main groups 1 to 3).
If the main system is not producing sufficient low-​frequency energy corresponding
subwoofers are required.

11.4.2.2.1.1 Effect Signal Sound System Powerful sound systems are installed left, right in
front of and above a backstage opening to play back effect signals out of the depth of the
stage; see main groups 4 and 6 in Figure 11.25. As these systems are often covered by scenery
or curtains, they must radiate signals with high acoustic power. Furthermore, loudspeaker
groups can be installed on side galleries (main group 7) and on the rear wall of an existing
back stage (main group 5).
All systems used consist of mid/​ high speaker arrangements with additional
powerful subs.
Highly directive sound systems are used for these long-throw applications.
To play back effect signals or for moving sound signals additional loudspeakers are installed
in the audience hall. Altogether around 40 to 70 speakers can be evenly distributed on side
and back walls and on the ceiling, to better integrate the audience into the events on stage,
sometimes even providing 3D immersive sound impressions. These same loudspeakers may
also be used as an enhancement system to radiate simulated early and late reflections out of
the different room directions with the aim of increasing reverberation, i.e., the prolongation
of the reverberation time.
Sometimes loudspeakers are installed in the centre of the ceiling or hidden in the light
installation for directional playback of signals from above the audience.

11.4.2.2.1.2 Stage Monitors Often the performance area itself needs a sound system to
supply audio monitoring for the actors to facilitate acoustic control of the play, mutual
newgenrtpdf
382
382
Wolfgang Ahnert and Dirk Noy
Figure 11.25 Main loudspeaker groups in a theatre.
38

System Solutions / Case Studies 383


hearing of other actors, singers or musicians or any playback. The same system can provide
for stage calls during rehearsals or intermissions.
Stage monitoring is achieved by the following systems:

• Proscenium right and left (main group 8 in Figure 11.25)


• Proscenium bridge (main group 9 in Figure 11.25)
• Forestage: small monitor loudspeakers along the stage edge
• Forestage left and right (use of the lower systems of main group 2)
• Mobile systems

In some theatres the performers use wireless in-​ear monitoring systems.

11.4.2.2.1.3 Mobile Systems A number of stage boxes are equipped with outputs for
connecting mobile loudspeakers.
The routing of amplifiers to the outputs is done by patch fields or electronic matrix
systems.

11.4.2.2.1.4 Microphone Selection Any theatre or concert hall needs a basic supplement
of various microphones of different type, directivity and sensitivity, some of them being
wireless.

11.4.3 Target Criteria for Sound Systems

11.4.3.1 Intelligibility, Sound Level Coverage


The following criteria need to be considered:

• Adequate sound coverage, i.e., the sound pressure level should be evenly distributed
over the entire audience zone
• The sound level difference from the first to the last row should not exceed 10 dB.
This may also be indicated as strength, which should be higher than 0 dB and not
exceed 10 dB
• Based on ISO 7240-​19 or NPA72 the speech transmission index STI must exceed 0.5
in 90% of the audience areas; compare section 7.4.2. This must be the case under the
given room acoustic conditions, independent of the reverberation time [8, 9]
• Echoes from walls or ceilings or slap-​back reflections from a stage house caused by the
sound system cannot be accepted
• The frequency range of the sound system must be adequate for the content of the signal

11.4.3.2 Clarity and Spaciousness


A musical theatre, opera or concert hall requires some natural reverberation and the audi-
ence expects higher spaciousness. Regardless though, the clarity measure C80 in the audi-
ence hall without the use of a sound system should not fall below −2 to −4 dB. On the other
hand, the sound system, important for announcements or background information, should
deliver only positive clarity values. So, in concert halls or opera houses mainly directional
loudspeakers or arrays should be used. Loudspeaker positions deep in the stage house or
mounted backstage require significantly directional loudspeakers.
384

384 Wolfgang Ahnert and Dirk Noy

11.4.4 Brief Summary of the Two Different System Layouts

11.4.4.1 Pure Sound System Design


The functions of an installed sound system are:

• High-​quality, intelligible reproduction of live speech, commentary during concert


performances, presentations and galas in local or foreign languages, cabarets, recitations etc.
• High-​quality transmission of live music, amplification of rock, pop, jazz or other genres of
music, sound reproduction in high quality for musicals and immersive sound reproduction
• High-​quality reproduction of recorded audio signals (speech, music or sound effects)
out of the stage or by using so-​called panorama loudspeakers along the side and back
walls of the hall

These subsystems describe the electroacoustic system in a theatre or concert hall and result
in different acoustic perceptions in the context of the existing room acoustic properties. It
is essential to establish congruence between room and electroacoustic properties.

11.4.4.2 Use of Additional Enhancement Systems


If the room acoustic properties don’t correspond with the expectations of the visitors during
concerts or opera performances a refurbishment of the facility should be undertaken before
a new sound system is installed. In some cases, especially in historic buildings under monu-
ment protection when the modification of the so-​called primary and/​or secondary structure of
the hall is not feasible, electroacoustic enhancement systems can be a sensible solution (see
section 2.7.2). Microphones mounted at the ceiling or in the forestage area pick up the natural
sound, which is being processed to create appropriate additional early and late reflections that
are subsequently delivered by numerous small loudspeakers installed on all surfaces (walls,
ceiling) of the hall. Hence these systems permit an increase of reverberation. As a rule of
thumb this increase should never exceed the natural reverberation time by a factor of 2, other-
wise it sounds unnatural and the artificial enhancement is noted by the spectators.

11.4.4.3 Measurement Approaches


After the system is installed, it must be fine-tuned and adjusted. During the objective tests,
the following parameters, among others, are measured and suitably optimized if neces-
sary: sound level distribution, frequency response, speech intelligibility and, in the case of
enhancement systems, the tuning of the reverberation steps. This is followed by subjective
alignment and acoustic calibration. See Chapter 10 for a more detailed description of the
processes.

11.4.5 Case Studies

11.4.5.1 Musical Theater –​Musiktheater Linz, Austria


Finally, after 20 years of discussions, decisions and rejections in 2006 the city of Linz
confirmed the winner of a second architectural competition for the Musiktheater, the
British architect Terry Pawson. The house finally opened in April 2013.
385

System Solutions / Case Studies 385

Figure 11.26 Rendered section of the Musik theatre Linz. ©Landestheater-​Linz.at.

Figure 11.27 Layout of the second floor.

11.4.5.1.1 LOUDSPEAKER CONCEPTS

Motorized line array loudspeakers were specified in accordance with the planned use,
suspended on the left and the right as well along the central axis of the proscenium. The
following components are installed:

• 2× d&b Q arrays, seven elements at the sides of the proscenium


• 1× d&b Q array, three elements in the centre of the proscenium

The accompanying subwoofer loudspeakers are installed above the right and left arrays.
For nearfield coverage on the stage and for source location on stage, side fills and small
loudspeaker systems are installed (16× d&b E0 at the front of the stage).
Each rigging point consists of two motor-​driven pulling ropes per loudspeaker location.
Compact loudspeaker systems are installed on three levels at the right and left side of the
proscenium for cases when the line arrays are not present (6× d&b Ci90-​90×40).
The above-​mentioned sub-​loudspeakers and the installed localization systems will be
used in parallel in this case.
Additionally, a panorama sound system is installed on three levels to produce moving
sound images throughout the hall, amended by small ceiling loudspeakers. These
approximatively 50 loudspeakers on the ceiling and the sidewalls may also be used for
electro-​acoustic enhancement.
386

386 Wolfgang Ahnert and Dirk Noy

Figure 11.28 View of the stage opening with lowered line arrays.

Six d&b Ci80s and two d&b C4 tops are installed on stage, in both portal towers and
above the entrance to the rear stage. Further mobile loudspeakers may be installed ad hoc
as required in the stage area and connected to floor and wall boxes.
The installed Salzbrenner Aurus consoles may limit the frequency response for speech: 200
Hz (with 6 dB rolloff towards lower frequencies) up to 4 kHz (3 dB rolloff towards higher
frequencies). Variations of the response level are lower than ±3 dB.
For musical performances subwoofers will be used for the lower-​frequency range with a
resulting frequency response after appropriate equalization of 50 Hz to 16 kHz ±3dB.
The power amplifiers of the loudspeaker system, installed at three locations, offer the
possibility of digital signal-​processing, thus enabling equalization, delay and volume regula-
tion for each loudspeaker group. The configuration of the system can be done by means of
a network, in order to enable setup, monitoring and calibration from distributed locations
such as the sound control booth or via a wireless tablet PC in the audience hall.
The digital signal distribution and matrix routing to the amplifiers of the loudspeaker system
are realised via the digital audio network Nexus by Stagetec. Fifteen base devices are installed
at various locations in the hall, on stage and on rehearsal stages. The Aurus mixing consoles
can be connected here: two are permanently installed and two are for mobile use in the house.

11.4.5.1.2 SOUND LEVEL AND RUN-TIME DELAYS

The system is designed to provide a continuous minimum sound level of 94 dB plus an add-
itional level reserve of 12 dB at all seats. In the case of higher sound pressure levels, it is
possible to reduce the general loudness level of the system to avoid producing high distortions
387

System Solutions / Case Studies 387

Figure 11.29 Positions of panorama loudspeakers along the railings of the three galleries.

Figure 11.30 Loudspeaker layouts.

in the transmitted frequency response. For short-time peak levels the system is able to reach
values of 120 dB without producing distortions.
All loudspeaker units are operated so that the origin of the sound is localized on stage.

11.4.5.1.3 SPEECH INTELLIGIBILITY

According to ISO 7240 speech intelligibility of STI > 0.5 is required for 90% of the sound
system’s coverage area.

11.4.5.1.4 NOISE FLOOR

The equivalent noise level in the hall with and without the new sound system is rated at
NR-​10 and does not exceed 23 dB(A). These values are valid for the entire hall.
38

388 Wolfgang Ahnert and Dirk Noy

Figure 11.31 View to the stage including loudspeaker systems.

Figure 11.32 Overall sound level in the audience hall, broad band, A-​weighted.

For limiting the background sound levels in the hall, special fanless components have
been specified.

11.4.5.1.5 OTHER SPACES AND INSTALLATIONS

The large orchestra rehearsal stage may also be used for chamber music concerts and other
perfomances; compare Figure 11.35.
389

System Solutions / Case Studies 389

Figure 11.33 STI mapping.

Figure 11.34 Distribution of the STI values by consideration of noise and masking.

An audience of up to 150 may be present in this room. Three load bars are installed at
the ceiling to connect audio and lighting components. Further rehearsal spaces are present
for the chorus and the ballet.
The studio theatre is equipped with fixed and mobile loudspeakers, loudspeaker suspen-
sion devices, cabling and transport cases, so that a flexible system is at hand, to be installed
and modified for diverse performance requirements. The amplifiers are installed in a tech-
nical room nearby and their outputs are supplied via permanently installed cables to floor
and wall boxes.
Finally, the comprehensive intercom and stage manager systems are based on Delec
Oratis and Mediacontrol components.
Figure 11.37 shows the Performance TRL stage manager console with five displays for
different views. The console may be put on either side of the stage.
390

390 Wolfgang Ahnert and Dirk Noy

Figure 11.35 View of the large (so-​called Bruckner) rehearsal room.

Figure 11.36 View of the studio theatre.


391

System Solutions / Case Studies 391

Figure 11.37 Performance TRL stage manager system.

Figure 11.38 Computer model and detail view of the wall structure.

11.4.5.2 Concert Hall -​Organ and Concert Hall Kharkov, Ukraine


The Hall was opened 2016 after seven years of reconstruction. The work included room
acoustic design work (together with gmp Architects Berlin) and some basic hints for the
sound system design. Figure 11.38 shows a coloured computer model with an organ on
the back side of the stage. The hall had to provide high reverberation for organ music not
common in orthodox churches in the Ukraine, hence the design avoided absorbing material
at all walls and the ceiling. The volume could not be enlarged, the design had to accept the
existing base construction of the building.
The wall structures scatter the sound and a high reverberation time could be achieved,
which is preferred for organ music. For concerts the ceiling above the stage may be lowered
and serves as a canopy for better mutual hearing of the musicians. Additional absorbing
areas are created above the canopy and on the ceiling, reducing the reverberation time to
the targeted 2 s.
As illustrated in Figure11.39 a sound system was designed finally by a local company and
installed as well. Adequate speech intelligibility could be achieved despite the long rever-
beration time.
newgenrtpdf
392
392
Wolfgang Ahnert and Dirk Noy
Figure 11.39 Modifying reverberation time by changing the ceiling height above the stage.
39

System Solutions / Case Studies 393

11.5 Multipurpose Halls

11.5.1 Common and Special Features


Concert halls and opera houses mainly serve one single purpose: being a space wherein to
perform concerts, operas, ballets and other classical events. In all these spaces the room
acoustics are to be closely considered, hence architects collaborate with acousticians from
the first moment to create a space of exquisite acoustical quality.
In multipurpose halls all performances must usually be supported by a sound reinforce-
ment system. As these spaces are often of spectacular architecture and prestige, they may
also be used for business-type presentations, gatherings and conferences for firms and
organizations. However, the high reverberation of such a hall has a counterproductive effect
on speech intelligibility, but well-​designed sound systems will provide communication using
modern speaker setups including line arrays. For theatre performances sophisticated sound
systems are required anyway. Any effect and playback signals are reproduced by a system
of distributed loudspeakers in the proscenium and stage area and these sources may also be
used to carry out meetings and conferences.

11.5.2 Acoustics and Sound Systems


Sound system design in single-​purpose spaces is not trivial but follows known rules, with
audio equipment permanently being further developed and optimized.
The situation is different though in multipurpose halls, where it is important to under-
stand the main purpose and usage priorities of the hall, in coordination with the client and
the architect. Shall it be

• Case 1: Mainly a conference hall with some concerts or musical events during the
year or
• Case 2: Mainly a hall for musical presentations with some conferences or congresses

In the first case the hall is laid out for speech performances, so depending on the size of the
hall the reverberation time should vary between 1 and 1.4s. But what measures can be taken
when more reverberation is desired for the musical events? Two approaches are feasible:

• Mechanically variable room acoustics


• Electro-​acoustic enhancement systems

In the case of variable room acoustics, the secondary and/​or even the primary structure of
the hall will be modified either manually or by motorized installations such as curtains or
draperies that are removed and thereby expose reflecting or absorptive surfaces. This may
happen by vertical or horizontal movements of wall parts or by turning wall and ceiling
parts back and forth (Figure 11.40).
Another option is the modification of the hall’s volume. By opening hinged or sliding
doors or other wall parts the volume of the hall may be increased to create higher reverber-
ation times with added sub-​volumes; see for example the echo chamber solution in the KKL
Luzern concert hall (Figure 11.41).
394

394 Wolfgang Ahnert and Dirk Noy

Figure 11.40 Variable acoustics in the Concert Theatre Coesfeld, Germany.

Another option to modify the room acoustic layout and to enhance the reverberation
including first short-time reflections is the application of methods known as ‘electronic
architecture’. This term was introduced by C. Jaffe to describe such a system [14]. Around
ten different approaches are known to increase the spaciousness and reverberation: some
have common features, and others are unique. All of them pick up the source signal above
the proscenium or in the diffuse sound field, subsequently post-​process it in different ways
and finally distribute the audio signal over an arrangement of numerous loudspeakers hidden
in walls and ceiling. By selection of stored presets different spaciousness and reverberation
values may be obtained in the hall. Because they use microphones and loudspeakers these
systems are quite often called sound systems as well. This may lead to rejection of these
systems especially by musicians; they don’t appreciate being supported by sound systems.
Neverthless, modern enhancement systems are of such high sonic accuracy and quality that
world-​renowned conductors recommend the use of such systems in halls with poor room
acoustic conditions. A detailed explanation of the most important enhancement systems
can be found in section 2.7.2.
After it has been determined that the spaciousness of a multipurpose hall needs
to be enhanced, it has to be asserted that spoken words will be intelligible under all
acoustic conditions of the hall. Without the enhancement system (or with exposed
absorbers in the case of mechanically variable acoustics), i.e., with low reverberation
times, the sound system design is straightforward. The situation is more complex in
concert mode, i.e. with higher reverberation times. Highly directional sound systems
such as line arrays or electronically steered sound columns will ensure that spoken
words are intelligible.
395

System Solutions / Case Studies 395

Figure 11.41 KKL Luzern Concert hall, rare view from within an echo chamber. © KKL Luzern.

Since such halls are often rather wide with a relatively low ceiling, the required loud-
speaker arrangement can be elaborated by means of a sophisticated loudspeaker arrangement.
Owing to the strong level decrease over the depths of such halls and the directivity pattern
of the main front arrays used it is often not possible for these arrays to provide a uniform
coverage of all audience areas. The main loudspeaker arrays should operate with a default
delay setting to correctly localize the original sources on the stage throughout the venue.
Further balcony or sub-​balcony loudspeakers must consider this basic delay. At a greater dis-
tance from the action area where the angles between the original sources on stage and the
front loudspeaker arrays are small plus the sources on stage are no longer perceived because
of their low acoustic power, the front loudspeaker arrays can be used as reference sources.
In the case of a multipurpose hall designed for mainly natural music use and amplified
events only every now and then (case 2) the target reverberation times are in the range
of 1.6 to 1.8 s, with higher values applicable for pure classic concert halls or spaces to per-
form organ concerts. Variable acoustics may be used to decrease the spaciousness and rever-
beration for speech events, with the variable absorption resulting at reverberation times
396

396 Wolfgang Ahnert and Dirk Noy

Figure 11.42 Variable low-​and mid-​frequency absorber aQflex by Flex Acoustics.

of 1.3–​1.4 s. Inflatable membrane absorbers may be used specifically for low frequencies
just by turning the air pump on and off for the inflatable membranes; see Flex Acoustics
(Figure 11.42).
An enhancement system may be used to increase the reverberation time even more to
values over 2 s. The electronic reduction of reverberation time is not feasible.
The sophisticated sound system is similar to the system used in case 1; see above.

11.5.2.1 Target Measures


A new multipurpose hall should have a low reverberation value of around 1.3 to 1.6 s. This
range is adequate for most performance types (estimated 80% of the user profile) in halls
used for:

• Conferences
• Meetings
• Exhibitions
• Rock and pop concerts
• Jazz concerts
• Theatre performances
• Presentations
• Athletic events

For other performances (estimated 20% of the user profile), such as:

• Symphonic concerts
• Musicals
• Folk music performances
• Brass music concerts

an enhancement system will increase the reverberation time to 1.8–​2 s.


397

System Solutions / Case Studies 397


After the room acoustic design is completed, a sophisticated sound system must be
introduced. A permanently installed system is recommended, which is optimized for the
room acoustic properties of the hall. Where rental equipment is used the sound coverage is
often not ideal.
For the first 80% of the performance types mentioned above, speech intelligibility is very
relevant, with target values STI ≥ 0.6.
These values are only achieved by highly directional sound sources such as digitally
controlled line arrays or beam steering loudspeakers. When the stage is reconfigured, the
loudspeaker setup might need to be modified as well. Therefore, after any new loudspeaker
installation the fine tuning must confirm the required target intelligibility value.
For music performances the clarity C80 should be between 0 and +​3 dB. Higher values
must be avoided, lower values are preferred just for symphonic concerts.

11.5.2.2 Different System Layout Approaches as a Base for Simulation


Multipurpose halls are often halls in shoebox-type geometries, with a stage on the short wall
of the hall and the audience sitting on bleachers or on the ground floor. The stage may also
be arranged on a long wall or in the centre of the hall, e.g., for boxing matches.
Depending on the position of the action area different loudspeaker design solutions can be
proposed. In some cases, and depending on the available budget, loudspeaker arrangements
are proposed that can be used for a variety of seating configurations in the hall. Just those
loudspeakers are then made active which ensure the coverage of the active seating area and
the other loudspeakers installed in the hall for other seating setups are switched off. This
allows an efficient turnaround from one performance type to another.
The computer simulation of the new sound system should be performed taking into
account the basic room acoustic conditions of the hall, i.e., normally with lower reverber-
ation times in the hall. Depending on the installation effort a front system close to the stage
front is needed as well as distributed loudspeaker systems in the hall to radiate effect signals
or to function as loudspeakers of an enhancement system.

11.5.2.3 Measurement Approaches


In a first step, the room acoustic parameters of the hall will be measured. The results will
hopefully confirm that the simulation was based on the correct room acoustical properties.
If large deviations from the model are observed it will not be surprising that, for example,
the measurements for sound coverage or speech intelligibility are unsatisfactory. When the
installation of acoustical surfaces was well engineered and supervised this is unlikely to
happen. Subsequently, the performance of the sound system is verified by measuring the
direct sound coverage, the total sound pressure level and the achieved speech intelligi-
bility. The background noise floor must be determined as well. Further details regarding the
commissioning of a sound system may be found in Chapter 10.

11.5.3 Case Studies

11.5.3.1 Convention Centre –​Congress Centrum Suhl, Germany


The Congress Centrum was originally opened in 1972 as a large sports arena including a
boxing ring for over 3000 visitors. During the 1990s the sports usage was reduced; shows
newgenrtpdf
398
398
Wolfgang Ahnert and Dirk Noy
Figure 11.43 Computer model showing all enhancement loudspeakers and a rendered view of the hall.
39

System Solutions / Case Studies 399

Figure 11.44 Opening concert (left) and hall with the new wall loudspeakers.

and presentations have since demanded the extension of the complex with new restaurants
and a covered lobby area. Various types of cultural performances are now possible, including
symphonic concerts by using a mobile concert shell on stage. As during the 1990s the acous-
tics of the hall were optimized for shows and pop music, the sonic quality for symphonic
concerts was unsatisfactory. In 2018/​19 the implementation of an enhancement system
increased the reverberation time and supplies early reflections.
Seventy-​four loudspeakers (JBL Control 28) in the ceiling (originally from an old rever-
beration system) and 35 newly installed wall loudspeakers (Renkus-​Heinz CX61) have been
used to radiate the stage and ceiling microphone signals. These signals are processed by the
‘Amadeus Concert Hall Processor’ and hence allow the following settings:

• System off: Reverberation time RT around 1.3 s


• Theatre setting: RT around 1.4–​1.5 s
• Chamber music setting: RT 1.7–​1.8 s
• Setting for symphonic concerts: RT 2–​2.2 s
• For demonstrations and effects, the reverberation time can be increased to over than 3 s

11.5.3.2 Pop/​Rock Venue –​The Anthem Hall, Washington DC, USA


The Anthem is a concert hall scaled to host up to 6000 fans on its expansive dance floor
and two tiers of balcony seating. It provides an epic setting for extraordinary concert
experiences.
To support the owner’s commitment to providing fans with flawless acoustics, extensive
room modelling programs were employed, and a variety of acoustical measurements and
instrumentation tests were begun in 2014 at the project’s earliest design stage. Exhaustive
research and studies were also devoted to ensuring sound isolation to maintain quietude
throughout the complex’s residential sector.
The Anthem features a 2125 m2 open seating (or dancing) main floor. With a 14 m
ceiling height, the concert hall, redeveloped from the ground up, required extensive acous-
tical treatments and a highly flexible electroacoustic system.
To fine tune the hall, the installation of strategically positioned Helmholtz absorbers and
a selection of medium-​density rear wall broadband absorbers calibrated to enhance a wide
range of performance styles was specified. The electroacoustic system employed flown left
40

400 Wolfgang Ahnert and Dirk Noy

Figure 11.45 The Anthem Hall Washington and view of the hall on the right.

Figure 11.46 Computer model and calculated RT of the Anthem Hall.

Figure 11.47 Mappings of SPL and STI.


401

System Solutions / Case Studies 401

Figure 11.48 Buddistic temple inside and outside.

and right arrays (14 d&b J-​series boxes per array), with the option to use centre and front
fill loudspeakers and a directional subwoofer array (d&b J-​Subs and J-​Infra Subs) for even
low-​frequency distribution.

11.6 Sacral Buildings

11.6.1 Buddhism
Siddharta Gautama was the founder of the Buddhist teachings, living in India from about
560 to 480 BC. He gave up his royal life at an adult age and moved through the country
as an ascetic and found ‘the highest salvation and unparalleled peace’ in April/​May of 528
BC. Since then, he has been considered a Buddha, an ‘enlightened one’, because he has
recognized the four ‘truths’.
The Buddhist place of worship is the temple, initially referred to as a stupa, later as a
pagoda. The stupa is on one hand a relics place, but also a monument, and is understood as
a place of pilgrimage. The Buddhist temples and monasteries hosting the Buddhist rituals
have a hall with one or more Buddhas in the centre. The meeting and meditation rooms are
mostly large and carpeted.
Some monks hold sermons inside the temples on holidays such as the birth of Buddha or
the first sermon of Buddha. Social ceremonies are celebrated in the temples such as births,
becoming an adult, weddings or memorial services.
All spoken words must be clearly understood and are partly repeated, hence sound
systems are currently standard equipment in temples. Small point sources or line arrays
cover between 20 and more than 3000 worshippers.

11.6.2 Judaism

11.6.2.1 Architectural, Acoustic and Sound Design


Judaism originated about 2000 years BC. More than 14 million people worldwide have a Jewish
affiliation. Among the world religions which worship only one God (‘monotheistic’ religions),
Judaism is the oldest. Christianity and Islam were developed largely based on Judaism.
Synagogues do not have a unified floor plan; the architectural geometries and styles vary
widely. As a rule, synagogues are usually built in the prevailing architectural style of the
time and location; for example, a synagogue in Asia sometimes resembles a Chinese temple,
402

402 Wolfgang Ahnert and Dirk Noy

Figure 11.49 Floor plan of a traditional synagogue.

synagogues from medieval Prague or Budapest are built in Gothic style. In the nineteenth
century, after the synagogue had been authorized as a representative building, oriental his-
toricism prevailed for several decades, e.g., the new Berlin Synagogue. Modern architectural
designs are also very common today, such as the synagogues in Munich and Dresden. Most
synagogues have a rectangular floorplan with a second floor for female worshippers, and nor-
mally use sound systems for preaching and praying.
Depending on the size and the shape of the halls simple or more complex sound systems
are specified; the room acoustical design of the synagogues therefore is to be considered and
excessive reverberation must be avoided. A small group of traditional Jewish worshippers
reject any electricity in their houses, so their synagogues are not equipped with sound
systems. In these cases, the room acoustic design is especially important to ensure perfect
intelligibility of the spoken words of the rabbi.

11.6.2.2 Case Studies

11.6.2.2.1 SYNAGOGUE –​OHEL JAKOB SYNAGOGUE, MUNICH, GERMANY

The old main synagogue in Munich was destroyed in 1938 by the Nazis; the new syna-
gogue was built between 2004 and 2006. Room-​and electro-​acoustic studies have been
conducted.
All surfaces in the synagogue are acoustically hard (stone and glass), just the worshippers
absorb the sound. Long reverberation times result, and highly directional, digitally con-
trolled line arrays have been installed left and right of the podium to secure high speech
intelligibility for the worshippers.
403

System Solutions / Case Studies 403

Figure 11.50 New Munich synagogue.

Figure 11.51 Rendered computer model with calculation results in mapping and probability distri-
bution form.
40

404 Wolfgang Ahnert and Dirk Noy

Figure 11.52 View into the Central Synagogue in New York.

Figure 11.53 Ceiling detail of the Central Synagogue including small point-​source loudspeakers in
the corners.
405

System Solutions / Case Studies 405


11.6.2.2.2 SYNAGOGUE –​CENTRAL SYNAGOGUE, NEW YORK CITY, USA

Devastated by a catastrophic fire in August 1998, the Central Synagogue in New York
City was reduced to a burned-​out shell. Following this disaster, the treasured syna-
gogue was rebuilt. To augment the natural acoustics, and to provide additional support
for organ performances and other musical events, the synagogue is equipped with a
LARES electronic reverberation enhancement system. Instead of the traditional speaker
cluster, the system includes 48 smaller loudspeakers positioned discreetly around the
temple.
In-​house audio and video recording and playback capabilities are an equally integral and
multi-​purpose element of the Central Synagogue A/​V installation. In addition to serving
to document weddings, bar mitzvahs, lectures, memorial services and religious services, the
system was designed to facilitate remote broadcast to enable people outside the facility to
participate in services and events.

11.6.3 Christianity

11.6.3.1 Architectural Design


The Christian church layout after the early and medieval ideal follows four basic floor plans.
A large part of the church buildings corresponds to one of these basic schemes, though since
the modern age these are no longer consistently implemented.

• The basilica is the most important basic type of an early and medieval church building,
the interior of which is separated into several longitudinal naves by rows of columns.
• The single-​room church is a single-​nave church building, which consists of a single,
hall-​like room, usually with elevated choir.
• The hall church is similar to the basilica, but its longitudinal naves are of the same or
approximately the same height and usually united under a common pitched roof.
• In centralized construction types, the major axes are of the same length, resulting in
circular, oval, square, cross-​shaped or otherwise centralized floor plans. The central
building is widespread in Western Europe, especially in Italy and is frequently applied
to Eastern Orthodox churches.

(a) Catholic and protestant churches

The main architectural elements of a traditional European church are the choir (altar
house), the transept and the nave. The facade often has one or two belltowers. The nave
is usually multi-​nave, i.e., it has a nave and two or four side aisles. The crossing is located
between the transept and the nave.
The principal musical instrument in these churches is the organ; the worshippers sit in
pews on all accessible floor levels.
Since the reformation, Protestant churches have increasingly become multipurpose
buildings. Protestant meeting houses are thus not only used for worship, but also for various
other gatherings.
In the last 30 years the sound coverage in reverberant spaces has been achieved with
distributed sound systems installed on the pillars (Figure 11.54), currently also by more cen-
trally arranged line arrays.
406

406 Wolfgang Ahnert and Dirk Noy

Figure 11.54 Sound columns in the gothic church Maria Himmelfahrt in Bozen, Italy.

Figure 11.55 View of the iconostasis with installed directed sound columns in the Christ the Saviour
Cathedral in Moscow.
407

System Solutions / Case Studies 407


(b) Orthodox churches

The interior of Orthodox church buildings is designed according to the requirements of the
Eastern Church rite, in Europe mostly the Byzantine rite:

• The sanctuary is visually separated from the communal room by a partition wall covered
with icons, the iconostasis. This wall is transparently designed in such a way that, des-
pite this division of space, the liturgy spoken and sung behind the partition can be well
understood in the community room.
• Orthodox churches do not have organs because Orthodox Christianity considers the
human voice as the only acceptable instrument for praising God.
• Orthodox churches usually have no pews; the worshippers remain standing during the
liturgy. A handful of chairs may be available, mainly used for elderly people who are
unable to stand for a prolonged time.

11.6.3.2 Acoustic Properties


To obtain satisfactory speech intelligibility in often very reverberant spaces a sound
reinforcement system is required that is aimed towards the worshippers. Modern churches
may be well served with a centralized radiation by one single array. A sound reinforcement
system consisting of various loudspeakers at different locations would unnecessarily reduce
the coherence and the effective critical distances.
Figure 11.56 shows a central array arrangement in a modern church. The individual
horn loudspeaker systems aiming into the room in different directions are arranged behind
a concealing screen

Figure 11.56 Centrally arranged loudspeaker cluster in a church.


408

408 Wolfgang Ahnert and Dirk Noy

Figure 11.57 Decentralized small line arrays in a church.

In large churches (especially in gothic ones) with sometimes excessively long reverber-
ation times and a large number of columns shadowing the sound and screening the side
aisles, a centralized loudspeaker arrangement is not a successful approach. Decentralized
arrangements with highly directive digitally controlled line arrays installed at the columns
in close range to the audience are then preferred.

11.6.3.3 Case Studies

11.6.3.3.1 CATHEDRAL –​BERLIN CATHEDRAL, GERMANY

The Berlin Cathedral in its current form is based on the construction of the new cath-
edral that began in 1895 under the direction of builder Julis Carl Raschdorff. The church
was consecrated in 1905. It was heavily damaged during the Second World War. It was
decided to rebuild it in 1973 and, in a first phase, completed in 1980 with the completion
of the outer skin and the consecration of the baptistery and wedding chapel to the south for
worship purposes. The reopening of the Sermon Church subsequently took place on 3 June
1993 after extensive interior restoration work.
409

System Solutions / Case Studies 409

Figure 11.58 Test setup for new sound system in the cathedral.

The main room of the Oberpfarr-​und Domkirche zu Berlin is the so-​called Sermon
Church. This centrally located room has a quasi-​circular base area of around 40 m diameter
with a dome spanning it with a height of around 60 m and is used in a west-​east direction.
The main and diagonal axes hold two-​storey apsidal areas which, as boxes, provide add-
itional seating next to the main room on the ground floor. A total of 1586 people can be
seated in the church.
In 1953 a special cluster consisting of Klein & Hummel loudspeakers was installed
(still visible in the background in Figure 11.58 during a test setup with line arrays in
2001). This cluster was then substituted in 2002 by Duran Audio line arrays; compare
also Figure 11.59.

11.6.3.3.2 CATHEDRAL –​ST. URSEN KATHEDRALE, SOLOTHURN, SWITZERLAND

Solothurn is recognized as Switzerland’s most significant baroque town. Its major hallmark
and tourist attraction is the St. Ursen Cathedral. In January 2011, a fire set by a men-
tally disturbed person massively damaged the Cathedral’s 60 m × 30 m centre congrega-
tion area and side aisles. A careful assessment determined that a full cleaning and repair
of all surfaces could restore the damaged room to its former glory. The project included
all aspects of the building: surfaces, art, lighting, heating, electrical and electro-​acoustic
infrastructure.
410

410 Wolfgang Ahnert and Dirk Noy

Figure 11.59 New main sound system in the cathedral.

Early in the project, extensive acoustical measurements were conducted to both


obtain a ‘status quo’ documentation and to serve as the basis for the predictive simulation
software employed. Although RT60 Reverberation Times exceed 6 s at 500 Hz, and
a reduction would have been helpful to achieve improved speech intelligibility, changing
the material structure of the building was not an option. Moreover, new measurements
completed following the restoration revealed that the RT60 reverberation times were even
slightly higher after the accumulated dirt and grey burn residue were removed. To resolve
these issues, a number of CVS Clearvoice Systems Evolutone 3000, Evolutone 2000
and Evolutone 1000 steerable array loudspeakers were specified based on their inherent
long-​range throw, highly sophisticated steering algorithms and high speech intelligibility
characteristics.

11.6.4 Islam
The geometry of a mosque is different from any of the spaces discussed so far. Larger mosques
mostly have a substantial central space, topped by one or more domes. Smaller mosques are
not domed and have a rectangular floorplan.
A mosque has the following basic interior elements: a carpet that covers the floor,
a niche, the mihrab, in the Qibla wall for determining the direction to Mecca, and,
finally, every mosque has a minbar on which the imam holds the Friday sermon. Women
have a separate space so that the strict gender segregation can also be observed during
prayer.
41

System Solutions / Case Studies 411

Figure 11.60 St. Ursen Cathedral with two visible line arrays for sound coverage.

Figure 11.61 St. Ursen Cathedral exterior.


412

412 Wolfgang Ahnert and Dirk Noy


11.6.1.4.1 ACOUSTICS IN MOSQUES, INTRODUCTION

Mosques are multi-​functional public spaces where various worship activities are performed
through various modes of use. Three distinct activities are performed in the mosque: one is
praying individually or in a group led by an imam. The second is a preaching being delivered
separately or in conjunction with Friday prayers. The third is to listen to or to recite verses
from the Quran. While conducting these activities in the mosque, two general modes of use
may be identified:

1. Prayer mode: All worshippers are either standing, bowing, or prostrating,


always on the same floor level, and aligned in rows parallel
to the Qibla wall, or the front wall, with the imam or talker
facing away from listeners.
2. Preaching mode: The listeners are sitting on the floor in rows parallel to the Qibla
wall while the imam is standing on a high platform (Minbar)
facing the listeners.

The optimum acoustical environments in the mosque may be expressed in terms of some
basic aural requirements such as:

Speech intelligibility, where all speech should be comprehensible irrespective of the


position of the listener.
Full perception of particular emphasis laid on some consonants and vowels when
reciting Quranic verses.
Naturalness of the talker’s voice, arising from the ability of the listener to localize the
de facto source, thus maintaining a feeling of realism and purity.

Most of the existing mosques have sound-​reflecting finish materials on all surfaces, except
the floor area, which is usually carpeted. They have wooden doors and large single-​glazed
windows.
Other factors that emphasize the importance of the acoustical environment in mosques
include the fact that Arabic is used exclusively during prayers, even though many Muslims
worldwide are non-​native-​Arabic speakers. Therefore, a low background noise floor is one
of the most important qualities for Muslims during prayers. Sound absorption in mosques
is very limited; it is mainly provided through the carpeted floor as well as the worshippers.
The different modes of use as well as the variation in the number of worshippers attending
daily prayer and Friday prayer greatly affect the total sound absorption in the mosque
and therefore make control over the actual reverberation time quite complex. All these
factors illustrate the importance and requirements for special acoustical environments in
mosques.

11.6.1.4.2 ROOM ACOUSTICAL CONSIDERATIONS IN MOSQUES

Modifying the primary structure of the mosque (height, length and width of the mosque)
is not feasible in most cases, so the secondary structure has to be considered: the material
and geometry of the walls, ceiling and floor of the mosque. As mentioned above most of
the walls are covered with ornamented marble and gypsum and are therefore only slightly
413

System Solutions / Case Studies 413


absorptive. Just the floor is of acoustical interest. The thickest version of a carpet should be
chosen to get the highest possible absorption coefficient.
All walls should be reviewed regarding the integration of absorption. Low-​frequency
absorbers are more important than those for high frequencies, as the carpet and the
worshippers facilitate some high-​frequency absorption.
Low-​frequency absorbers are moving thinner boards (gypsum board or wood), and also
special constructions like transparent wall parts with low-​frequency absorber layers behind
(with a distance to a fixed wall). The reverberation time in the midfrequency range should
be kept below 2.5 s.

11.6.1.4.3 SOUND SYSTEM DESIGN CRITERIA

Sufficient sound pressure levels can be achieved differently by sound systems, by covering
the prayer area

(a) from the front


(b) from the front with delayed loudspeakers and
(c) from the ceiling. If feasible, ceiling loudspeaker solutions are often useful, as shadowing
caused by columns is avoided

The most important task of the sound system is to achieve high speech intelligibility. As
always, good intelligibility goes with highly directional sound and a high level of early
reflections. Because of columns the coverage from the front will quite often not be adequate,
and high ceilings and large domes additionally reduce the direct sound, hence a system with
coverage from the front while additionally using delayed loudspeakers is recommended. This
can be achieved by modern digitally controlled line arrays. Speech transmission indices STI
> 0.5 must be reached.
The last design criterion, localizing the imam, can only be achieved using sound systems
localizing the imam in the direction of the Qibla wall. Ceiling loudspeaker systems mostly
fail in this regard.

11.6.1.4.4 CASE STUDY

11.6.1.4.4.1 Mosque –​Sheikh Zayed Bin Sultan Al Nahyan Mosque, Abu Dhabi, UAE The
sound system design for the Sheikh Zayed Bin Sultan Al Nahyan Mosque is based on
optimized room-​acoustic measures; see a view into the main hall in Figure 11.62.
Regarding the room acoustic design, the carpeted floor acts as a mid-​high-​frequency
absorber and the originally planned letter wall as a low-​frequency absorber. The letter
wall was to be a huge marble wall with large perforated Arabic letters embedded. These
hollow letters were to be approx. 5 cm wide and would have helped to achieve low-​
frequency absorption due to the Helmholtz effect. It was then decided to cover the back
side of the wall with a golden fleece with a high specific flow resistance. Furthermore, an
airgap of approx. 80 cm to the concrete wall behind it was planned, thereby implementing
a large 135 × 22 m low-​frequency absorber, with an estimated absorption maximum at 250
Hz. In the course of the construction, however, this letter wall was not implemented and
was replaced by a closed, letter-​bearing marble wall. The absence of the low-​frequency
absorber has led to higher reverb times in the low frequencies, but nevertheless for low
41

414 Wolfgang Ahnert and Dirk Noy

Figure 11.62 View into the Sheikh Zayed Mosque in Abu Dhabi, UAE.

frequencies the RT will not exceed 6 s and at 500 to 2000 Hz (speech domain) it will not
exceed 4 s.
The sound design is based on digitally controlled line arrays; compare Figure 11.63.
Sound Columns of different size and power are used which are installed behind the
existing sound-transparent fleece or integrated into the structure columns. Each array gets
an individual input signal (by use of a distribution matrix) to switch line arrays on or off as
a function of the mosque’s occupation.

11.6.1.4.4.2 Mosque –​Daily used Al Eman Mosque, Jeddah, Saudi Arabia Most of the daily
used mosques do not have domes or other sophisticated architectural features. Quite often
these are rectangular rooms, rarely with columns. One wall is the Qibla wall with a simple
praying niche. Figure 11.64 shows the Al Eman mosque in Jeddah in Saudi Arabia. The
mosque is small (just 555 m3); the Qibla wall with the Mihrab niche is visible on the left.
The lines on the carpet are very common in such mosques to highlight the order of max.
300 worshippers in the mosque.
The reverberation time in the unoccupied mosque (including the carpet) is relatively
high at 2.5 s in the mid-​frequency range and will increase to 3.5 s at low frequencies. The
simple loudspeaker arrangement along the walls by use of the small point-​source Tannoy
CPA5 loudspeaker supports the voice of the imam insufficiently; the speech intelligibility
could be improved by use of modern line arrays.
415

System Solutions / Case Studies 415

Figure 11.63 Line arrays close to the Mihrab prayer niche.

Figure 11.64 View into the Al Eman Mosque in Jeddah KSA.


416

416 Wolfgang Ahnert and Dirk Noy


References
1. ISO Standard 7240-​24, Fire detection and fire alarm systems —​Part 24: Fire alarm loudspeakers.
2. www.afmg.eu/​en/​afmg-​eas​era.
3. www.bksv.com/​de/​analy​sis-​softw​are/​acous​tic-​analy​sis-​softw​are/​room-​acoust​ics-​softw​are-​dirac.
4. www.audi​omat​ica.com/​.
5. https://​web2.norso​nic.com/​pro​duct​_​sin​gle/​sound-​level-​meter-​nor​145/​.
6. www.bedr​ock-​audio.com/​index.php/​bedr​ock-​sti​pame​ter.
7. DIN VDE 0833-​4:2014-​10, Alarm systems for fire, intrusion and hold-​up –​Part 4: Requirements
for voice alarm systems in the case of fire.
8. ISO Standard 7240-​19:2007, Fire detection and alarm systems —​Part 19: Design, installation,
commissioning and service of sound systems for emergency purposes.
9. NFPA 72 National Fire Alarm and Signaling Code.
10. DIN CEN/​TS 54-​32:2016-​04, Fire detection and fire alarm Systems –​Part 32: Planning, design,
installation, commissioning, use and maintenance of voice alarm systems.
11. www.bksv.com/​en/​inst​rume​nts/​handh​eld/​sound-​level-​met​ers/​2250-​ser​ies.
12. www.nti-​audio.com/​en/​produ​cts/​xl2-​sound-​level-​meter.
13. www.roome​qwiz​ard.com/​.
14. McGregor, C. Electronic architecture (Jaffe Acoustics), 6th Int. Conf. Sound Reinforcement,
Nashville, 1988.
417

Index

Note: Page numbers in italic denote figures and in bold denote tables.

Acoustic Control System (ACS) 58–​59, 58 160, 256, 263, 268–​269, 268, 271, 274, 278;
acoustic enhancement see electroacoustic reliability 277–​280, 279; results and output
enhancement systems 160–​161, 160, 271–​272, 273, 274, 275, 276;
acoustic feedback 50–​56, 51; calculation of reverberation 157–​158, 263; room impulse
55–​56; in closed rooms 52–​55, 53, 54, 56; response (RIR) 259–​261, 261; of room modes
measurements 350; in open air 51–​52, 52; 263–​264, 264; room transfer function 261,
positive 3, 45, 50, 52, 55, 56, 173–​174, 262; simulation methods 259–​264, 261; sound
176, 190, 191; suppression 153–​154; pressure levels (SPL) 162–​164, 166, 166, 167,
troubleshooting 324 168; speech intelligibility 166, 169; surface
acoustic gain calculation 192–​193, 193, 193, material data 258–​259, 258, 260, 277; tail
212–​214 estimation methods 269; time-​arrivals,
acoustic gain index 192, 214 delay and alignment 162, 163, 164, 165;
acoustic localization 39–​40, 40, 41; multi-​ validation of results 280; wave-​based models
channel systems 197–​198, 198, 202–​203, 259, 261, 270
202, 203, 204, 205, 206; naturalness of sound acoustic optimization 325–​329, 326, 327,
reproduction and 199–​203, 200, 202, 203, 328
204, 205, 206; simple sound systems 190, acoustic overall impression 25
194–​195, 195, 196; single-​channel delay acoustical measurements 323, 330–​350; acoustic
system 201 feedback 350; alignment 349–​350; with
acoustic modelling 156–​168, 158, 159, 251–​281, arbitrary excitation signals 331, 339–​342, 341;
252, 253; aiming loudspeakers 161, 161, 162; averaging 348; with conventional excitation
auralization 166–​168, 171, 254, 254, 263, 272, signals 333–​334, 334; determining timing
274, 280–​281, 280; boundary element method of sources 348–​349, 349; electroacoustic
(BEM) 259, 262, 263; cone-​tracing methods properties 327–​329, 328; filtering 348;
159–​160, 263, 269, 269; of direct field frequency domain 345; with frequency sweeps
261–​262; of early reflections 262–​263; finite 323, 332–​333, 334–​335, 336; fundamentals
difference time domain method (FDTD) 259, 331–​333; with maximum length sequences
264; finite element method (FEM) 259, 263, (MLS) 331, 336–​339, 337, 338; measurement
264; geometrical data 255–​256, 255, 256, location selection 343, 343; with other noise
277; image source method 262–​263, 266–​268, signals 336; performing 342–​350, 343, 344,
267; input data 254–​259, 255, 256, 257, 346, 347, 349, 351; polarity testing 350,
258, 260, 277–​278; interpretation of results 351; room acoustic properties 325–​327, 326,
279–​280; limitations 271; loudspeaker data 327, 343–​344, 344; time-​delay spectrometry
256–​258, 257, 277–​278; model calibration (TDS) 331, 339, 340; time domain 345;
278, 279; modelling engine 278–​279; Monte waterfall diagrams 345, 346, 347
Carlo approach 263, 268–​270, 268, 269, 270; acoustics 20–​32; critical distance 27, 28; energy-​
numerical optimization 264–​266, 265, 266; time curve 30–​31, 30, 31; fundamentals
parameter presentation 166, 170; performance 23, 24; general issues 22–​23, 22; historical
considerations 270–​271; pyramid-​tracing overview 20–​21; reverberation time 26–​30,
methods 159, 263, 269, 269; radiosity method 28, 29; speech intelligibility and music clarity
270, 270; ray-​tracing methods 158–​159, criteria 31–​32; subjective assessment of sound
418

418 Index
quality 25–​26; see also acoustic feedback; proprietary technologies 311–​313, 311, 313;
electroacoustic enhancement systems Quality of Service (QoS) 295–​296, 296, 296,
ACS see Acoustic Control System (ACS) 300, 305, 308, 316–​318; redundancy
Active Field Control (AFC) 59, 59, 60 313–​315, 314, 315; setup of senders and
active filters 156 receivers 303–​304, 304; spanning tree
AES standards: AES2-​2012 93; AES67 283, protocol (STP) 313–​314, 314; standards
300, 302, 307–​308, 311, 311, 312–​313, 313, 307–​311, 309; stream discovery 303, 320;
319; AES70 310 stream formats 304–​305, 306, 306, 308,
AFC see Active Field Control (AFC) 319; subnet masks 289–​290, 315; switches
AGC see automatic gain control (AGC) 287–​288, 288, 293–​294, 294, 300–​302, 301,
airport buildings 5, 189, 354, 355, 356 302, 311, 313, 314, 319–​320; synchronization
Al Eman Mosque, Jeddah, Saudi Arabia 414, 285–​286, 287, 296–​302, 296, 297, 298, 299,
415 299, 300, 301, 302, 308, 312, 313, 318–​319;
alignment measurements 349–​350 transparent clock switches 301, 302, 311;
ALS see assistive listening systems (ALS) unicast and multicast 291–​295, 293, 294, 308,
Amadeus Active Acoustics 62, 65 315; virtual sound cards 319
Ambisonics 254, 254, 272, 281 audio recording studios 12–​14, 13
ANEMAN 304 audio video bridging (AVB) 283, 310–​311,
ANSI standard S3.5 234 311
Anthem Hall, Washington DC, USA 399–​401, auralization 166–​168, 171, 254, 254, 263, 272,
400 274, 280–​281, 280
arenas, sports 7, 358, 359–​361, 364, 365, 366, automatic gain control (AGC) 5, 188–​189, 222,
367 223, 241
array loudspeakers: line arrays 45, 46, 48, 71–​72, automatic volume control 188
73, 98–​106, 100, 101, 102, 104, 105; two-​ AVB see audio video bridging (AVB)
dimensional arrays 106, 106 averaging acoustical measurements 348
array microphones 120–​121, 121, 123
Articulation Index (AI) 234 background noise measurements 326–​327
Articulation Loss of Consonants 32, 244–​245, balloon plots 86, 89
245 basilicas 405
artificial human speakers 326, 327 BEM see boundary element method (BEM)
ASA see Astro Spatial Audio (ASA) Beranek, Leo Leroy 21
assembly halls 11, 208 Berlin Cathedral, Germany 408–​409, 409, 410
assistive listening systems (ALS) 208–​211, 209, Berlin Main Station, Germany 354, 356, 357
210, 211, 212 Best Master Clock Algorithm (BMCA)
ASTM C423 standard 258, 277 298–​299, 299, 299
Astro Spatial Audio (ASA) 62, 64 binaural auralization 168, 171, 254, 272, 274,
audio networking 283–​320; advantages and 280–​281, 280
disadvantages 284–​285; Best Master Clock binaural localization 39–​40, 40, 41
Algorithm (BMCA) 298–​299, 299, 299; BMCA see Best Master Clock Algorithm
boundary clock switches 300–​301, 301, 302, (BMCA)
311, 320; common mistakes 315–​320, 316, boardrooms 17–​18
317, 318; connection management 287, Bonjour 289, 302–​303, 320
302–​305, 308; connectivity 287–​296, 287, boundary clock switches 300–​301, 301, 302,
288, 291, 292, 293, 294, 296, 296, 308; 311, 320
device discovery 289, 302–​303, 320; IGMP boundary element method (BEM) 93, 259, 262,
snooping 293–​295, 294, 316, 320; 263
IP addresses 288–​290, 315; latency 286, broadcasting facilities 12–​14, 13
305–​307, 307, 316, 318; link aggregation 314, Buddhist temples 401, 401
314; link offset 286, 286, 306–​307, 307, 316,
318; network topologies 290–​291, 291, 292; calibration and optimization 325–​329, 326, 327,
packet delay measurement 297–​298, 298; 328
packet headers 305; packet jitter 297, capacitive transducers 108–​111, 109
300–​301, 307, 311, 320; phase accuracy case studies: churches 408–​410, 409, 410, 411;
285–​286, 286; Precision Time Protocol (PTP) hotels 372, 376, 377, 377; mosques 413–​414,
285–​286, 287, 296, 297–​302, 297, 298, 299, 414, 415; multipurpose halls 397–​401, 398,
299, 300, 301, 302, 308, 312, 313, 318–​319; 399, 400; museums 372, 373, 374, 375; paging
419

Index 419
and voice alarm systems 354–​358, 355, 356, Dante 283, 295–​296, 303–​304, 311, 312–​313,
357, 359, 360; sports venues 364, 365, 366, 313, 316
367, 368, 369, 370; synagogues 402–​405, 403, DRR see direct to reverberant ratio (DRR)
404; theatres, opera houses and concert halls deconvolution method 136–​137, 332–​333, 334,
355–​358, 359, 360, 384–​391, 385, 386, 387, 341, 341
388, 389, 390, 391, 392; transportation hubs delay issues 149–​151, 151
354, 355, 356, 357 delay systems 186, 187, 198; acoustic
cathedrals see churches localization without 194–​195, 196; multi-​
Catholic churches 405 channel 202–​203, 202, 203, 204, 205, 206;
CD horns see constant directivity (CD) horns single-​channel 201
CDPS (complex directivity point source) model Delta Stereophony System (DSS) 202, 202
261–​262, 278 device discovery 289, 302–​303, 320
CDS see Cinema Digital Sound (CDS) format Differentiated Services Code Point (DSCP)
Central Synagogue, New York City 404, 405 value 295–​296, 296, 296, 305, 308
Chladni, Ernst F.F. 21 DiffServ (Differentiated Services) 295
Christianity see churches diffuse field sensitivity, defined 126
churches 11–​12, 102, 102, 103, 134, 189–​190, diffuse reflections 23, 24
190, 405–​410, 406, 407, 408, 409, 410, 411 diffusion, defined 26
CIDR notation 290 Digital Theater System (DTS) 15
Cinema Digital Sound (CDS) format 15 digitally controlled (DSP) loudspeaker arrays
cinemas 14–​17, 14, 15, 16, 17 71–​72, 73, 74, 102, 103, 104, 145
circular pistons 94, 94 Dirac delta function 78
clarity 25, 31–​32, 232–​233, 232, direct to reverberant ratio (DRR) 219, 219, 220,
234–​235, 383 231–​233, 231, 232, 234, 244
classrooms 18 directed reflections 23, 24
clipping protection 35 directional factor: loudspeakers 91; microphones
clubs 8 128
CobraNet 283, 290 directional gain: loudspeakers 86; microphones
coherence, speech intelligibility and 245 128
comb filter effects 8, 52, 53 directivity deviation factor, loudspeakers 91
commissioning: calibration and optimization directivity factor: acoustic gain calculation
325–​329, 326, 327, 328; documentation 192–​193, 193, 193, 213; loudspeakers 90–​91,
329–​330; functional testing and installation 92; microphones 128
verification 322–​323; subjective evaluation directivity index: loudspeakers 88–​91, 90;
329; troubleshooting 323–​324; see also microphones 116, 128
acoustical measurements directivity of loudspeakers 71, 74, 83–​92,
computer modelling see acoustic modelling 93–​106; circular pistons 94, 94; digitally
concert halls see theatres, opera houses and controlled (DSP) arrays 71–​72, 74, 102,
concert halls 103, 104; directional factor 91; directivity
condenser microphones 108–​111, 109, 113, 115, deviation factor 91; directivity factor 90–​91,
119, 124–​125, 124, 126, 129 92; directivity index 88–​91, 90; display of
cone-​tracing methods 159–​160, 263, 269, 86, 87, 88, 89; efficiency and 91–​92, 92;
269 horn-​loaded systems 94–​97, 95, 96, 97; line
Congress Centrum Suhl, Germany 397–​399, arrays 71–​72, 98–​106, 100, 101, 102, 104,
398, 399 105; measurements 83–​86, 84, 85; speech
consonants, percentage loss of 244–​245, 245 intelligibility and 219; two-​dimensional arrays
constant directivity (CD) horns 95–​97, 95, 96, 106, 106; two-​way systems 94, 97–​98, 98, 99
97 directivity of microphones 113, 114, 116–​122,
Constellation system 60–​62, 61 117, 118, 120, 121, 123, 127–​128
convention centres 365, 369–​372, 397–​399, discotheques 8
398, 399 distortion: loudspeakers 80–​83, 82, 83;
corporate environments 17–​18 microphones 115, 127; speech intelligibility
coverage angle, microphones 128 and 220
coverage issues 140–​149, 141, 142, 143, 144, documentation 329–​330
145, 146, 147, 148, 149, 150 Dolby standard 14–​15, 14, 15
coverage, uniformity of 220 dot-​decimal notation 290
critical distance 27, 28 double hearing 183, 186
420

420 Index
DSCP (Differentiated Services Code Point) FIR filters 59, 59, 265, 265
value 295–​296, 296, 296, 305, 308 fire detection regulations 3, 4
DSP (digital signal processors) see digitally fluctuating FIR (fluc-​FIR) 59, 59
controlled (DSP) loudspeaker arrays flutter echoes 26, 177, 181
DSS see Delta Stereophony System (DSS) FM transmission 211, 212
DTS see Digital Theater System (DTS) Fourier analysis 331, 341
dynamic microphones 111–​113, 112, 115, 119, Fourier transform 78, 136–​137, 138, 259, 345
124–​126, 129 free field sensitivity, defined 126
dynamic transducers: loudspeakers 68–​70, 69, frequency response 134–​139, 136, 137, 138,
70; microphones 108, 111–​113, 112 139, 155–​156, 155, 173; loudspeakers 74,
75, 76, 77, 78; measurements 327–​329, 328;
echoes: behaviour 38–​39; defined 25; flutter 26, microphones 113, 115, 118, 156; simple
177, 181; perceptibility of 39; suppression/​ sound systems 190; speech intelligibility and
elimination 8, 151, 152, 153, 198, 198, 223, 217, 246–​248, 246, 247, 248
241 frequency shifters 154–​155
educational facilities 18 frequency sweeps 323, 332–​333, 334–​335, 336
effect signal sound systems 381, 382 frequency weighting curves 36–​37, 36, 330
Elbphilharmonic Hall, Hamburg, Germany functional testing 322–​323
355–​358, 359, 360 fundamental frequency 52, 81
electrical functional testing 322–​323
electrical optimization 325 geometrical reflections 23, 24
Electro Voice horn loudspeakers 96, 97 Getec-​Arena, Magdeburg, Germany 364, 365,
electroacoustic enhancement systems 11, 23, 366, 367
57–​62; Acoustic Control System (ACS) Green Point Stadium, Cape Town, South Africa
58–​59, 58; Active Field Control (AFC) 364, 368, 369, 370
59, 59, 60; Amadeus Active Acoustics 62, group delay distortion: loudspeakers 76–​77, 78;
65; Astro Spatial Audio (ASA) 62, 64; microphones 113
Constellation system 60–​62, 61; multipurpose
halls 394–​396, 396, 399; theatres, opera Hagia Sophia, Istanbul, Turkey 12
houses and concert halls 11, 380, 381, 384; Hamad International Airport, Doha, Qatar 354,
Vivace system 62, 63 355, 356
electroacoustic measurements 233–​235, handheld sound level meters 330, 332, 354, 371,
327–​329, 328 372
electronic interference 223, 324 hearing impairment 222, 245; assistive listening
electronic microphone rotators (EMRs) 59, 59 systems (ALS) 208–​211, 209, 210, 211, 212
Ember+​ 311–​312 hearing threshold and range 35–​38, 36, 37, 38,
energy-​time curves (ETC) 30–​31, 30, 31, 231, 131–​132, 132
231, 233, 233 Helmholtz, Hermann von 21
equalization 155–​156, 155, 156, 220 hitless merge 315
equivalent sound absorption area 26, 174, 187 home cinemas 16–​17, 17
exhibition halls 6, 181–​182, 365, 369–​372 horn-​loaded loudspeaker systems 94–​97, 95, 96,
97
false ceilings 179 hotels 5–​6, 365, 369–​372, 376, 377, 377
fast Fourier transform (FFT) 78, 136–​137 howling see acoustic feedback
FDTD see finite difference time domain method humming 323–​324
(FDTD)
feedback see acoustic feedback iconostasis 406, 407
FEM see finite element method (FEM) IEC standards: IEC 60268-​1 114, 115; IEC
figure of eight directivity, microphones 113, 116, 60268-​4 114; IEC 60268-​5 93; IEC 60268-​16
117, 119, 120 45, 228, 229, 236, 239, 240, 243, 272; IEC
filters: acoustical measurement and 348; active 61672 36–​37, 36; IEC 61672-​1 132; IEC
and passive 156; equalization by filtering 61938 115, 125
155–​156, 155, 156; FIR 59, 59, 265, 265; IGMP snooping 293–​295, 294, 316, 320
narrow band 154; notch 154 image source method 262–​263, 266–​268,
finite difference time domain method (FDTD) 267
259, 264 immersive sound systems 16, 206, 207
finite element method (FEM) 259, 263, 264 impedance, loudspeakers 69, 75–​76, 76, 77
421

Index 421
impulse response 134–​139, 136, 137, 138, loudness perception 35–​38, 36, 37, 38
139; loudspeakers 78–​79, 79; room impulse loudspeakers 68–​106; aiming 161, 161, 162;
response (RIR) modelling 259–​261, 261 alignment measurements 349–​350; artificial
impulse tests 334 human speakers 326, 327; ceiling grids
induction loops 209–​210, 209 176–​181, 177, 178, 179; as complex systems
information system layouts 176–​190, 353–​354, 70–​72, 72, 73; coverage issues 140–​149, 141,
371; ceiling loudspeaker grids 176–​181, 177, 142, 143, 144, 145, 146, 147, 148, 149, 150;
178, 179; complexes of individual rooms digitally controlled (DSP) arrays 71–​72, 73,
182; factory and exhibition halls 181–​182; 74, 102, 103, 104, 145; directional factor 91;
flat rooms 176–​181, 177, 178, 179, 180, 181; directivity deviation factor 91; directivity
horizontally radiating loudspeakers 181, display 86, 87, 88, 89; directivity factor
181; outdoor areas 182–​187, 184, 185, 186; 90–​91, 92; directivity index 88–​91, 90;
reverberant halls 189–​190, 190; suspended directivity measurements 83–​86, 84,
loudspeakers 180–​181, 180; transportation 85; directivity of circular pistons 94, 94;
hubs 182–​183, 187–​190, 188, 189 distortion 80–​83, 82, 83; dynamic transducer
information systems 3, 4, 5, 6, 176, 352–​354, principle 68–​70, 69, 70; efficiency 91–​92, 92,
365; case studies 354–​358, 355, 356, 357, 171–​172; frequency response 74–​77, 75, 76,
359, 360; standards 3, 4; see also information 77, 78, 83–​84; group delay distortion 76–​77,
system layouts 78; horizontally radiating 181, 181; horn-​
infrared transmission 210–​211, 210, 211 loaded systems 94–​97, 95, 96, 97; impedance
installation verification 322–​323 69, 75–​76, 76, 77; impulse response 78–​79,
intelligibility see speech intelligibility 79; line arrays 45, 46, 48, 71–​72, 73, 98–​106,
interference, electronic 223, 324 100, 101, 102, 104, 105; linear dynamic
Internet Group Management Protocol see IGMP behaviour 74–​80, 75, 76, 77, 78, 79, 80, 81;
snooping parameter overview 72–​74; point source
inverse filtering 332 model 71, 72; power handling 92–​93;
IP addresses 288–​290, 315 sensitivity 75, 75, 83, 90, 92, 92; sound level
IP networks see audio networking calculation 174–​175; sound power levels
IPMX standard 309–​310 71, 88–​90, 90, 174–​175; step response 79,
Islam see mosques 80; suspended 180–​181, 180; time domain
ISO standards: ISO 354 258, 277; ISO 3382 behaviour 78–​80, 79, 80, 81; two-​dimensional
272; ISO 3741 88–​89, 90; ISO 7240 244; ISO arrays 106, 106; two-​way systems 94, 97–​98,
17497 259; ISO 60268-​5 75; for voice alarm 98, 99; waterfall diagrams 79–​80, 81; see also
systems 3, 4 speech intelligibility; system layouts
isobar plots 86, 88, 98, 99 LTI see linear time invariant (LTI) systems
Lusail Hotel, Doha, Qatar 372, 376, 377, 377
JBL 2360 horn loudspeaker 95, 96, 96
Judaism see synagogues magnetic transducers 108
masking effect 37–​38, 37, 38; speech
Kircher, Athanasius 20 intelligibility and 237–​238
maximum length sequences (MLS) 331,
Lavalier microphones 123, 124, 126 336–​339, 337, 338
layouts see information system layouts; system maximum transmission unit (MTU) 295, 305
layouts mDNS 289, 302–​303
Lee effect 65 mechanical installation verification 323
line array loudspeakers 45, 46, 48, 71–​72, 73, mechanical optimization 325
98–​106, 100, 101, 102, 104, 105 media production facilities 12–​14, 13
linear dynamic behaviour, loudspeakers 74–​80, meeting rooms, corporate 17–​18
75, 76, 77, 78, 79, 80, 81 microphones 108–​130; array microphones
linear time invariant (LTI) systems 78, 136 120–​121, 121, 123; condenser microphones
link aggregation 314, 314 108–​111, 109, 113, 115, 119, 124–​125, 124,
link offset 286, 286, 306–​307, 307, 316, 318 126, 129; directivity 113, 114, 116–​122,
localization, acoustic 39–​40, 40, 41 117, 118, 120, 121, 123, 127–​128; distance
localization errors 349 to sound source 129; distortion 115, 127;
locally directed reflections 23, 24 dynamic microphones 111–​113, 112, 115,
log sweep 335, 336 119, 124–​126, 129; environmental conditions
loop amplification 45, 51, 52, 54, 55, 56 115, 129–​130; equivalent input noise level
42

422 Index
115, 127; handheld 128–​129; interconnection networking see audio networking
of 114, 124; Lavalier microphones 123, 124, NMOS (Networked Media Open Specifications)
126; maximum sound pressure level 115, 127; 309, 310
musical instrument-​mounted 123–​124, 124; noise cancellation techniques 222, 241
parameter overview 114–​116; power supply noise criteria 133, 133, 134
115, 124–​125, 125; pressure gradient receivers noise-​dependent volume regulation 188
119, 120; pressure receivers 116–​119, noise measurements 326–​327
118; ribbon microphones 112, 113, 119, noise rating 133, 134, 135, 135
126; selection guidelines 128–​130, 383; noise-​to-​signal ratio see signal-​to-​noise (S/​N)
sensitivity 110, 113, 115, 116, 125–​126; ratio
shotgun microphones 122, 123; signal-​to-​ notch filters 154
noise (S/​N) ratio 126–​127, 222; speaker-​worn Nyquist frequency 332
122–​123, 124; stand-​mounted 129; transducer
principles 108–​113, 109, 112, 114; for octave band spectral analysis 230, 230, 231
transportation hubs 188–​189 Ohel Jakob Synagogue, Munich, Germany 402,
Milan 283–​284, 290, 310–​311, 311 403
mixing consoles, architectural implications open-​air sports venues 7–​8; information system
49–​50 layouts 182–​187, 184, 185, 186
MLS see maximum length sequences (MLS) Open Control Alliance (OCA) 310
MLSSA (Maximum Length Sequence System opera houses see theatres, opera houses and
Analyzer) 331, 339 concert halls
modelling see acoustic modelling optimization 325–​329, 326, 327, 328
Modulation Transfer Index (MTI) 236, 240, Organ and Concert Hall Kharkov, Ukraine 391,
241 391, 392
monaural localization 39 Orthodox churches 406, 407
Monte Carlo approach 263, 268–​270, 268, 269, OSI model 290
270
mosques 11–​12, 410–​414, 414, 415 packet delay measurement 297–​298, 298
MTI see Modulation Transfer Index (MTI) packet headers 305
MTU see maximum transmission unit (MTU) packet jitter 297, 300–​301, 307, 311, 320
multicast and unicast 291–​295, 293, 294, 308, paging systems 3, 176, 352–​354, 365; case
315 studies 354–​358, 355, 356, 357, 359, 360;
multipurpose halls 11, 380, 393–​401; acoustics see also information system layouts
and system design 393–​396, 394, 395, pain threshold 35, 131
396; case studies 397–​401, 398, 399, 400; passive filters 156
electroacoustic enhancement systems percentage loss of consonants 244–​245, 245
394–​396, 396, 399; integration of sound performing arts centres: clubs/​discotheques 8;
system 49; integration of sound systems 47, music venues 8–​9; see also multipurpose halls;
48; measurement approaches 397; mixing theatres, opera houses and concert halls
consoles 49; reverberation times 29, 393, phantom power 115, 124–​125, 125
395–​396, 399; sport events 359–​361; system phantom sources 40, 198, 199, 200
layouts 206–​208, 397; target measures phase accuracy 285–​286, 286
396–​397 ‘phon’ scale 35, 330
museums 6, 365, 369–​372, 373, 374, 375 piezoelectric transducers 108
music clarity 31–​32 ping command 315, 316
music venues 8–​9; see also multipurpose halls; pink noise signals 220, 242, 323, 332, 333, 334,
theatres, opera houses and concert halls 335, 342, 342
musical instrument-​mounted microphones pink sweep 335
123–​124, 124 polar plots 86, 87, 94, 94
Musiktheater Linz, Austria 384–​389, 385, 386, polarity testing 350, 351
387, 388, 389, 390, 391 positive acoustic feedback 3, 45, 50, 52, 55, 56,
173–​174, 176, 190, 191
narrow band filters 154 power handling, loudspeakers 92–​93
National Museum, Beijing, China 372, 373, power supply, microphones 115, 124–​125,
374, 375 125
naturalness of sound reproduction 173, precedence effect 39–​40, 40, 194–​195, 197, 199,
199–​206, 200, 202, 203, 204, 205, 206, 207 202
423

Index 423
Precision Time Protocol (PTP) 285–​286, 287, ribbon microphones 112, 113, 119, 126
296, 297–​302, 297, 298, 299, 299, 300, 301, ring topology 291, 292
302, 308, 312, 313, 318–​319 RIR see room impulse response (RIR)
pressure field sensitivity, defined 126 room acoustics 22–​32; critical distance 27,
pressure gradient microphones 119, 120 28; energy-​time curve 30–​31, 30, 31;
pressure microphones 116–​119, 118 fundamentals 23, 24; general issues 22–​23, 22;
Protestant churches 405 measurements 325–​327, 326, 327, 343–​344,
pseudo-​random noise 78, 137, 331, 332, 334 344; reverberation time 26–​30, 28, 29; speech
PTP see Precision Time Protocol (PTP) intelligibility and music clarity criteria 31–​32;
public buildings 3–​6; convention centres 365, subjective assessment of sound quality 25–​26;
369–​372, 397–​399, 398, 399; exhibition halls see also acoustic feedback; electroacoustic
6, 181–​182, 365, 369–​372; hotels 5–​6, 365, enhancement systems
369–​372, 376, 377, 377; museums 6, 365, room impulse response (RIR) 259–​261,
369–​372, 373, 374, 375; shopping malls 4; 261
see also information systems; multipurpose room-​size impression 25
halls; transportation hubs room transfer function 261, 262
pyramid-​tracing methods 159, 263, 269, 269 RTAs see real-​time analysers (RTAs)
Pythagoras of Samos 20 RTP (Real-​Time Protocol) 305

Q-​LAN 284, 311 S/​N ratio see signal-​to-​noise (S/​N) ratio


Quality of Service (QoS) 295–​296, 296, 296, Sabine, Wallace Clement 21
300, 305, 308, 316–​318 sacral buildings 11–​12, 29, 401–​414; Buddhist
temples 401, 401; churches 11–​12, 102, 102,
radiosity method 270, 270 103, 134, 189–​190, 190, 405–​410, 406, 407,
railway stations 5, 103; case study 354, 356, 357; 408, 409, 410, 411; mosques 11–​12, 410–​414,
information system layouts 182–​183, 414, 415; synagogues 11–​12, 134, 401–​405,
187–​190, 188, 189 402, 403, 404
Rav2Sap software 303, 320 St. Ursen Cathedral, Solothurn, Switzerland
RAVENNA 284, 304, 311 409–​410, 411
ray-​tracing methods 158–​159, 160, 256, 263, SAP see Session Announcement Protocol
268–​269, 268, 271, 274, 278 (SAP)
Rayleigh, Lord 21 Saveur, Joseph 20
real-​time analysers (RTAs) 350 scattered reflections 23, 24
real time deconvolution 341, 341 schools 18
Real-​Time Protocol (RTP) 305 Schroeder diffuser 21, 260
relaxation period 38–​39 Schroeder frequency 21, 155–​156, 259, 261, 270
religious buildings 11–​12, 29, 401–​414; Buddhist Schroeder, Manfred Robert 21
temples 401, 401; churches 11–​12, 102, 102, SDP files 304, 304, 308, 313
103, 134, 189–​190, 190, 405–​410, 406, 407, seamless protection switching 315
408, 409, 410, 411; mosques 11–​12, 410–​414, Sennheiser MD 441 microphone 112, 113
414, 415; synagogues 11–​12, 134, 401–​405, sensitivity: loudspeakers 75, 75, 83, 90, 92,
402, 403, 404 92; microphones 110, 113, 115, 116,
reverberance 25 125–​126
reverberation, defined 25 Session Announcement Protocol (SAP) 303,
reverberation duration 25 313, 320
reverberation enhancement see electroacoustic Sheikh Zayed Mosque, Abu Dhabi, UAE
enhancement systems 413–​414, 414, 415
reverberation radius 27, 194 shopping malls 4
reverberation times 26–​30, 28, 29; cinemas shotgun microphones 122, 123
14–​15, 14, 15, 29; defined 26; measurements signal-​processing: automatic gain control
326, 326; media production facilities 13, 13; (AGC) 5, 188–​189, 222, 223, 241; echo
modelling 157–​158; multipurpose halls 29, suppression/​elimination 8, 151, 152, 153, 198,
393, 395–​396, 399; religious buildings 12, 29; 198, 223, 241; loudspeakers and 73, 74, 93;
speech intelligibility and 218, 218, 225–​226, noise cancellation 222, 241; noise-​dependent
226; sports venues 6, 7; theatres, opera houses volume regulation 188; speech intelligibility
and concert halls 9, 10, 29, 29, 326, 378, 380, and 222, 223, 241; see also digitally controlled
391, 392; transportation hubs 5 (DSP) loudspeaker arrays
42

424 Index
signal-​to-​noise (S/​N) ratio 133; microphones 232, 234, 244; electroacoustic measurements
126–​127, 222; speech intelligibility and 233–​235, 329; frequency response and 217,
217–​218, 220, 222, 228, 230, 230, 231, 234 246–​248, 246, 247, 248; hotels and museums
Simple Network Management Protocol 369–​371; measurement 233–​235, 326;
(SNMP) 310 modelling 166, 169; primary factors affecting
simulations see acoustic modelling 216, 217–​219, 218, 219; secondary factors
sine sweep signals 137, 242, 242, 323, 331, affecting 216–​217, 220–​223; signal-​processing
332–​333, 334–​335, 336, 339 and 222, 223, 241; signal-​to-​noise (S/​N) ratio
slash notation 290 and 217–​218, 220, 222, 228, 230, 230, 231,
SMAART (Sound Measurement and Acoustical 234; sound energy ratios 32, 232–​233, 232,
Analysis in Real Time) 341 234–​235; speech signal and system design
SMPTE standards: SMPTE ST 2022-​7 315, 223–​233, 223, 224, 225, 226, 227, 228, 229,
315; SMPTE ST 2059-​2 300, 312, 319; 230, 231, 232, 233; subjective intelligibility
SMPTE ST 2110 308, 309–​310, 309 tests 32, 233, 234, 329; summary of design factors
SNMP see Simple Network Management 248–​249; see also Speech Transmission Index (STI)
Protocol (SNMP) Speech Transmission Index (STI) 3, 32, 228,
SOR see source-​oriented reinforcement (SOR) 235–​244; description 235–​240, 235, 236,
sound energy density 27, 55–​56, 59, 174 237, 238; modelling 166, 169; qualification
sound energy ratios 32, 232–​233, 232, 234–​235 bands 239, 240; relationship with % loss of
sound focussing 220, 233, 233 consonants 245, 245; STIPA 236, 237, 239,
sound level calculation 174–​175 240–​243, 242, 243, 245, 326, 336; typical
sound level distribution 173, 177, 194, 196, 201 applications 239; use and limitations
sound power absorbed 26–​27 240–​244, 241, 242, 242, 243, 244
sound power levels 33; loudspeakers 71, 88–​90, Speech Transmission Index for Public Address
90, 174–​175 Systems (STIPA) 236, 237, 239, 240–​243,
sound pressure levels (SPL) 22, 32–​33, 131–​133, 242, 243, 245, 326, 336
132; hearing threshold 35, 131–​132, 132; spine/​leaf architecture 291, 292
loudspeakers 71, 74, 75, 88–​90, 90, 174–​175, SPL see sound pressure levels (SPL)
185–​186, 185; maximum 115, 127, 256, 329; sports venues 6–​8, 358–​364; arenas and large
measurements 326, 327, 329; microphones sport halls 7, 358, 359–​361, 364, 365, 366,
115, 127, 188–​189; modelling 162–​164, 166, 367; case studies 364, 365, 366, 367, 368, 369,
166, 167, 168; speech intelligibility and 217; 370; first guidelines for 42, 43; information
sports stadia 6–​7; troubleshooting 324 system layouts 182–​187, 184, 185, 186;
sound propagation 32–​40; acoustic localization integration of sound systems 47; multipurpose
39–​40, 40, 41; echo behaviour 38–​39; halls 359–​361; small sport halls 358–​359;
loudness perception and masking effect stadia 6–​7, 42, 43, 361–​364, 362, 368,
35–​38, 36, 37, 38; in open air 32–​35, 33, 34, 369, 370
35, 185, 185; precedence effect 39–​40, 40, stadia 6–​7, 42, 43, 361–​364, 362, 368, 369, 370
194–​195, 197, 199, 202 stage monitoring 65–​66, 156, 381–​383, 382
sound reinforcement systems: basic requirements standards: for audio networking 307–​311, 309;
47, 173–​174; categories 1–​2, 57; components for voice alarm systems 3, 4; see also AES
1, 2; historical overview 41–​45, 42, 43, 44, 45, standards; IEC standards; ISO standards;
46; integration into architectural design 47–​50, SMPTE standards
48; use of 57–​58, 62–​66; see also electroacoustic star topology 291, 291
enhancement systems; system layouts step response, loudspeakers 79, 80
source-​independent measurements (SIM) 341 STIPA see Speech Transmission Index for Public
source-​oriented reinforcement (SOR) 203, 203, Address Systems (STIPA)
204, 205, 206 STI see Speech Transmission Index (STI)
spaciousness 25, 179, 181, 378, 379, 383; see also STIPA see Speech Transmission Index for Public
electroacoustic enhancement systems Address Systems (STIPA)
spanning tree protocol (STP) 313–​314, 314 STP see spanning tree protocol (STP)
spatial impression 25, 31 stream discovery 303, 320
speech intelligibility 5, 6, 31–​32, 215–​249, 216; stream formats 304–​305, 306, 306, 308, 319
Articulation Loss of Consonants 32, 244–​245, stream redundancy 314–​315, 315
245; ceiling loudspeaker grids 178–​179, 179; subjective evaluation/​intelligibility tests 32, 233,
coherence 245; direct to reverberant ratio 234, 329
(DRR) and 219, 219, 220, 231–​233, 231, subnet masks 289–​290, 315
425

Index 425
summing localization effect 40 time varying control (TVC) 59, 59
swept sine signals 137, 242, 242, 323, 331, total harmonic distortion (THD): loudspeakers
332–​333, 334–​335, 336, 339 81–​83, 83; microphones 115, 127; speech
synagogues 11–​12, 134, 401–​405, 402, 403, 404 intelligibility and 220
system layouts 175–​176; assistive listening tour-​guide receivers 211, 212
systems (ALS) 208–​211, 209, 210, 211, 212; transducer principles: loudspeakers 68–​70, 69,
immersive sound systems 206, 207; multi-​ 70; microphones 108–​113, 109, 112, 114
channel systems 197–​199, 198, 202–​203, Transmission Control Protocol (TCP) 292
202, 203, 204, 205, 206; multipurpose transparent clock switches 301, 302, 311
systems 206–​208, 397; naturalness of sound transportation hubs 5; case studies 354, 355,
reproduction 173, 199–​206, 200, 202, 203, 356, 357; information system layouts
204, 205, 206, 207; simple sound systems 182–​183, 187–​190, 188, 189
190–​196, 191, 193, 193, 195, 196, 197; travel time phenomena 149–​151, 151
single-​channel delay system 201; see also troubleshooting 323–​324
information system layouts TSN see time-​sensitive networking (TSN)
SysTune measurement system 341–​342, 341 TVC see time varying control (TVC)
two-​dimensional loudspeaker arrays 106, 106
tail estimation methods 269 two-​way loudspeaker systems 94, 97–​98, 98, 99
TCP see Transmission Control Protocol (TCP)
TDS see time-​delay spectrometry (TDS) UDP see User Datagram Protocol (UDP)
TEF see Time Energy Frequency (TEF) unicast and multicast 291–​295, 293, 294,
analyser 308, 315
temperature, sound propagation and 33–​34, 34 uniformity of coverage 220
THD see total harmonic distortion (THD) universities 18
theatres, opera houses and concert halls 9–​11, User Datagram Protocol (UDP) 292, 293, 310
206, 377–​391; assistive listening systems
(ALS) 208–​211, 209, 210, 211; case studies video conferencing rooms 18
355–​358, 359, 360, 384–​391, 385, 386, virtual LANs (VLANs) 287–​288, 293, 303
387, 388, 389, 390, 391, 392; effect signal virtual room acoustic systems 60–​62, 61
sound systems 381, 382; electroacoustic virtual sound cards 319
enhancement systems 11, 380, 381, 384; Vitruv 20, 377
first guidelines for 42, 43; functions of Vivace system 62, 63
sound systems in 380–​383, 382; integration voice alarm systems 3, 4, 5, 6, 176, 352–​354,
of sound systems 47–​49, 48; measurement 365; case studies 354–​358, 355, 356, 357,
approaches 384; microphone selection 381; 359, 360; standards 3, 4; see also information
mixing consoles 49–​50; mobile systems 381; system layouts
repertoire and all-​purpose theatre 379–​380; volume control 188
reverberation times 9, 10, 29, 29, 326, 378,
380, 391, 392; single-​purpose facilities waterfall diagrams 79–​80, 81, 345, 346, 347
378–​379; stage monitoring 65–​66, 156, wave-​based models 259, 261, 270
381–​383, 382; system layouts 384; target wave-​field synthesis (WFS) 15, 16, 58
criteria 383 weather, sound propagation and 32–​35, 33, 34
THX standard 14–​15, 15 weighted sweep 335, 336
time-​delay spectrometry (TDS) 331, 339, 340 weighting curves 36–​37, 36, 330
time domain behaviour, loudspeakers 78–​80, 79, WFS see wave-​field synthesis (WFS)
80, 81 whistling see acoustic feedback
Time Energy Frequency (TEF) analyser 331 white sweep 335, 336
time-​sensitive networking (TSN) 283–​284, 290, Wilson, Woodrow 41
310–​311, 311 wind speeds, sound propagation and 34–​35, 35
426

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy