Sound Reinforcement For Audio Engineers
Sound Reinforcement For Audio Engineers
Sound Reinforcement for Audio Engineers illustrates the current state of the art in sound
reinforcement.
Beginning with an outline of various fields of applications, from sports venues to religious
venues, corporate environments and cinemas, this book is split into 11 chapters covering
room acoustics, loudspeakers, microphones and acoustic modelling among many other
topics.
This comprehensive book packed with references and a historical overview of sound
reinforcement design is an essential reference book for students of acoustics and electrical
engineering, but also for engineers looking to expand their knowledge of designing sound
reinforcement systems.
Dirk Noy is Director of Applied Science and Engineering at WSDG. He frequently lectures
at SAE, TBZ and the ffakustik School of Acoustical Engineering, all in Zurich. He is also a
member of the Education Committee for the AES’s Swiss section.
ii
iii
Contents
3 Loudspeakers 68
G O T T F R I E D K. B E HL E R
4 Microphones 108
G O T T F R I E D K. B E HL E R
vi Contents
10 Commissioning, Calibration, Optimization 322
G A B R I E L HAUS E R AND WOL FGANG AHNE RT
Index 417
vi
Figures
viii Figures
2.15 Frequency weighting curves recommended by the IEC 61672 for sound
level meters 36
2.16 Excitation level LE on the basilar membrane caused by narrow-band noise
with a centre frequency of 1 kHz and a noise level of LG (indicated on the
x-axis between 8 and 9 Bark) 37
2.17 Excitation level LE over the subjective pitch z (position on the basilar
membrane), when excitation is done by narrow-band noise of
LG =60 dB and a centre frequency of ƒc 38
2.18 Calculation of the overall level from two individual levels 38
2.19 Early to-late ratio of the sound reaching the ears, as a function of the
incidence angle 40
2.20 Variation of the sound level difference with discrete frequencies and
horizontal motion of the sound source around the head 40
2.21 Critical level difference ΔL between reflection and undelayed sound
producing an apparently equal loudness impression of both (speech)
signals, as a function of the delay time Δt 41
2.22 Use of a Magnavox sound system in 1919 to target 75,000 people in
San Diego 42
2.23 Mass event in Germany in the 1930s with the then newly developed horn
loudspeakers and condenser microphones 42
2.24 First sound reinforcement guidelines for stadia and theatres in 1957 43
2.25 Schematic diagram of the magnetic tape delay system with perpetual tape
loop, record head to print the original signal and various reproduction
heads at varying distances down the tape loop to obtain a variation of
delay times 44
2.26 Basic measures for first sound reinforcement and first feedback
considerations 45
2.27 (left) 3 dB attenuation by line array principle (C. Heil and M. Urban),
(right) digitally controlled line column by Duran Audio 46
2.28 Multipurpose hall with line array clusters 48
2.29 Hidden loudspeakers in the portal area behind the blue fabric 48
2.30 Feedback circuit 51
2.31 Feedback curve in open air 52
2.32 Sound transmission paths in closed room 53
2.33 Fragment of a frequency-dependent transmission curve 53
2.34 Difference of the peak and average value of the sound transmission curve 54
2.35 Relationship between the feedback threshold X and the required feedback
level LR 56
2.36 Block diagram of the ACS system 58
2.37 Block diagram of the AFC system 59
2.38 Basic routines of the Constellation system 61
2.39 Working schemata of the Vivace system 63
2.40 Work flow of the ASA system 64
2.41 Working schemata and block diagram of the Amadeus system 65
3.1 Sectional view of a magnetostatic ribbon tweeter of modern design.
The conducting wires are thin copper strips bonded directly to the thin
foil membrane 69
3.2 Cross-sectional view of a typical cone type dynamic loudspeaker 70
ix
Figures ix
3.3 Some typical loudspeaker types that can be described as point sources.
Typical compact PA systems in upper part (Klein+Hummel). Ceiling
speaker horn systems in the lower part: left, a large cinema horn by JBL
and right a Tannoy speaker 72
3.4 Comparison of the so-called ‘stacking’ array of speakers (left picture)
and the nowadays more common ‘line array’ concept (centre picture) to
cover large audience areas. Rightmost a typical active, digitally steerable
line array using multiple identical loudspeakers mounted in one line,
individually driven by a DSP amplifier allowing individual adjustment
of frequency, time and phase response of the 16 point sources to create a
coherent and nearly cylindrical wave front (Renkus-Heinz) 73
3.5 Frequency response of a loudspeaker system plotted for a rated input
voltage of 2.83 V and a measurement distance of 1 m on axis. Sensitivity
(see equation (3.2)) and bandwidth with respect to upper and lower cut-
off frequency is depicted with dashed lines 75
3.6 Frequency-dependent input impedance of a loudspeaker. For this system
the nominal impedance Zn was defined by the manufacturer as 8 ohms.
Taking the tolerance limit into account allowing a −20% undercut this
loudspeaker does not fulfil ISO standards 76
3.7 Upper graph: phase response curves of the sound pressure transfer function
(shown in Figure 3.5); lower graph: the phase response curve of the
impedance transfer function (shown in Figure 3.6) 77
3.8 Group delay distortion of the phase response shown in Figure 3.7 78
3.9 Impulse response of the loudspeaker shown in Figure 3.15 79
3.10 Step response of the loudspeaker shown in Figure 3.15 80
3.11 Waterfall diagram of the loudspeaker from Figure 3.5. The magenta curve
shows the theoretical decay time for constant relative damping equivalent
to 1 period of the related frequency 81
3.12 Typical distortion plot measured by EASERA 82
3.13 Frequency-dependent maximum achievable SPL for a defined limit of
the THD figure. The figure shows the sensitivity of the loudspeaker, the
theoretical SPL for the proclaimed power handling of 1000 W input
power (manufacturer rating) and the measured, achievable SPL for a
given THD of 3% and 10% 83
3.14 Cartesian system defining the angles for the measurements of directivities
with loudspeakers. Note that the distance to the point for the microphone
is not defined, but must be chosen with respect to the size of the
loudspeaker 84
3.15 Computer-controlled robot for measuring directionality information of
loudspeaker systems. Note the tilting of the z-axis, which in consequence
leads to an intersection of the x-axis with the ground plane of the pictured
half-anechoic room. At this point of intersection, the microphone is
placed so to ensure that there is only one signal path between source
and receiver. The distance in this set-up is 8 m. (Picture courtesy AAC
Anselm Goertz.) 85
3.16 Horizontal and vertical polar plots of a two-way PA loudspeaker system
with 12′′ woofer and 1′′ horn-loaded tweeter for different 1/3-octave
bands. The frontal direction points to the 0°. For positive and negative
x
x Figures
angles, the observation point –at a fixed distance –rotates around the
reference point, as shown in Figure 3.14 87
3.17 Isobar plots for a loudspeaker in 2D and 3D display for the horizontal and
vertical directivity. The frequency resolution is smoothed to 1/12th of
an octave. The horizontal x-axis shows the frequency. The vertical axis
shows the level relative to the frontal direction (0 degree) in different
colours, hence the 0° response is equal to 0 dB. The orange range covers
a deviation of ±3 dB around the 0 dB, whereas for all other colours the
range covers only 3 dB. The right axis shows the angle of rotation (either
horizontal or vertical) of the loudspeaker for a full 360° rotation 88
3.18 Balloon plot of a loudspeaker system for a given frequency 89
3.19 The relationship between free field sensitivity and diffuse field sensitivity.
The DI describes the difference between the two graphs. The diffuse field
sensitivity is typically measured in 1/3-octave bands; therefore, the free
field sensitivity needs to be averaged in the same bandwidth to evaluate
the DI 90
3.20 Sensitivity of a loudspeaker as a function of the Efficiency and Directivity
Index 92
3.21 Theoretical polar plots for a circular piston of 25 cm diameter (a typical
12′′ woofer) in an infinite baffle for frequencies from 500 Hz up to 2.5
kHz in steps of 1/3 octave. The total range is 50 dB. The dotted lines
have a distance of 6 dB, so that the intersection points at the −6 dB line
denote the beam width of the directivity, leading to approximately 150°
at 1 kHz, 100° at 1.25 kHz, 75° at 1.6 kHz, 58° at 2 kHz and 45° at 2.5
kHz. Obviously, omnidirectional sound radiation can be assumed for
frequencies below 500 Hz 94
3.22 JBL 2360 Bi-Radial® Constant Coverage (another name for CD) horn
with attached 2′′ compression driver (courtesy JBL). The left picture
shows the narrow slit in the neck of the waveguide, which continues into
the final horn and is intended to diffract the soundwave horizontally into
the final surface line of the horn, so to cover a wide horizontal angle of
90°. The vertical angle of 40° is maintained throughout the length of the
horn with a little bit of widening to the end of the horn mouth. 95
3.23 Horizontal directivity of the JBL 2360 with JBL 2445J 2′′ compression
driver. The aimed coverage of ±45° is met in the frequency range between
600 Hz and 10 kHz. To cover lower frequencies requires a larger horn and
for the higher frequencies the diffraction slit probably needs to be even
smaller (Measurement courtesy Anselm Goertz) 96
3.24 Electro Voice MH 6040AC stadium horn loudspeaker covering the full
frequency range from 100 Hz up to 20 kHz. The construction uses two
10′′ woofers to feed the large low-frequency horn and one 2′′ compression
driver feeding into the small horn placed coaxially into the large mouth.
The dimensions: height 1.5 m, width 1 m, length 1.88 m, weight 75 kg 97
3.25 Two-way PA loudspeaker system with 15′′ woofer and 2′′ compression
driver and CD horn (courtesy Klein+Hummel) 98
3.26 Standard isobaric plots for the horizontal and vertical directivity of an
ordinary two-way PA loudspeaker equipped with a 15′′ woofer and a CD
horn with 1.4′′ compression driver. While the horizontal directivity is
xi
Figures xi
fairly symmetrical with a slight narrowing at 2 kHz and becomes narrower
at higher frequencies, the vertical isobaric plot shows a typical asymmetry
due to the placement of the two speakers side by side and a strong
constriction of the directivity at the crossover frequency (between 800 Hz
and 1600 Hz) due to interference. (Courtesy four audio.) 99
3.27 The directivity of a loudspeaker column (with N =16 identical chassis,
membrane diameter a =6 cm; equally spaced d =8 cm). The figure shows
the resulting directivity (right picture) derived from the directivity of a
single driver (left picture), which also shows the horizontal directivity of
the column, and the directivity of equally spaced monopole sources (point
sources, in the middle). All directivities are calculated for the centre
frequencies (as listed in the plot) with an energetic averaging within
1/3-octave band 100
3.28 Same column as in Figure 3.27 except for the frequency-dependent low-
pass filtered loudspeaker arrangement to achieve a constant active length
of the column relative to wavelength. The width of the main lobe is
significantly greater for high frequencies though not smooth. The plot
shows a simulation with piston-like membranes and theoretical radiation
pattern; it reveals the great potential for DSP-controlled loudspeaker arrays 101
3.29 Left picture: a DSP-controlled loudspeaker column. By combining up
to nine elements a column with a length of almost 9 m can be realized.
(Courtesy Steffens.) Right picture: placement of DSP loudspeaker
columns in St. Paulus Cathedral in Münster (Germany). Photo taken by
the author during the celebration for the reopening of the cathedral after
renovation including the sound system. Note the installation above the
audience, allowing unobstructed sound propagation even to more distant
places in the audience. However, this requires a downward tilting of the
sound beam 102
3.30 A DSP-controlled loudspeaker line array (length 3.6 m) is optimized
to deliver sound to two different audience areas. Each picture shows
the optimization within a bandwidth of one octave, from upper left to
lower right: 250 Hz, 500 Hz, 1 kHz, 2 kHz, 4 kHz, 8 kHz. As expected,
the suppression of side-lobes at high frequencies is difficult. (Calculation
performed with the software dedicated to the Steffens Evolutone
loudspeaker.) 104
3.31 The transformation from circular input to rectangular output. The DOSC
geometry sets all possible sound path lengths to be identical from entrance
to exit, thus producing a flat source with equal phase and amplitude 104
3.32 Representation of the variation of the distance for cylindrical sound
radiation and far field divergence angle (spherical radiation) with
frequency for a straight-line source array of height 5.4 meters 105
3.33 Two-dimensional loudspeaker array using individual signal-processing and
power-amplifying for each driver. The software allows different types of
directional pattern and sound field applications 106
4.1 Basic design of a condenser microphone. To the left, a sectional
drawing of a classic measuring microphone is displayed; to the right,
the relationship between the components involved in its construction
(diaphragm mass, air volume behind the diaphragm as a spring, and
xi
xii Figures
viscous friction of the air between the diaphragm and the back electrode)
is shown. ©B&K 109
4.2 Section through the capsule structure of the legendary Sennheiser
MD441U. Note the multi-resonance construction with several chambers
and ports. Furthermore, a cone-like sound guide is placed in front of the
diaphragm, which serves to optimize the directional characteristics 112
4.3 Basic construction of a ribbon microphone (the left figure shows the
Beyer M130). The magnetic flux of the high-energy permanent magnets
is guided around the ribbon by the ring-shaped yoke wires made of highly
permeable, soft magnetic material. The internal magnetic field should be
as homogeneous and tangential as possible through the ribbon 112
4.4 Comparison of the directivity of two pressure microphones: left side a ¼′′
capsule, right side a 1′′ capsule. The polar plots show a clear directionality
for the large membrane at high frequencies whereas the small membrane
shows almost perfect omnidirectional sensitivity. The frequency response
curve for the ¼′′ microphone is flat for free field sound incidence, the one
for the 1′′ microphone shows a distinct presence boost in free field whereas
for diffuse field the response is rather flat until a roll off above 10 kHz (DPA)
118
4.5 Basic construction of double-membrane microphones (left side: AKG
K4; middle: Neumann M49; right side: Neumann SM 2). The K4 is set
to figure of eight only, whereas the M 49’s directivity can be changed
remotely. The SM 2 forms a stereo-microphone for coincidence
stereophony (either XY or MS). Both capsules are remotely adjustable for
the pattern (between sphere and figure of eight). By choice of the right
pattern and rotation of the upper capsule, the desired width for the
stereo-panorama can be set 120
4.6 Two microphones with higher-order directivities. Left: Sennheiser
Ambeo; right: Eigenmike. The capsules are individually connected to
allow any adjustment to the directivity by external signal-processing.
Whereas the Eigenmike capsules are pressure type microphones, the
Ambeo capsules are cardioid microphones 120
4.7 Line-array microphone with pronounced vertical (left panel) and wide
horizontal (cardioid, right panel) directivity (Microtech Gefell KEM 975).
The microphone is built with eight differently spaced cardioid capsules set
into a vertical line 121
4.8 The KEM 975 in use at the lectern at the German Bundestag. A diversity
switch ensures that only one of the two microphones (the one with higher
level) is in use at a time. (Courtesy Microtech Gefell) 123
4.9 Typical shotgun microphone (Sennheiser MKH 8070) and the
frequency-dependent directivity 123
4.10 Head-mounted microphone (left); Lavalier microphone (right)
(courtesy DPA). The placement of these microphones requires some
EQ to provide sound without colouration 124
4.11 Direct recording of the violin sound with a small condenser microphone,
which is mounted on the frame of the violin (courtesy DPA) 124
4.12 Microphone supply according to DIN EN IEC 61938 125
5.1 Audible level range for speech and music signals 132
5.2 Noise criteria (NC) curves as a function of frequency 133
xi
Figures xiii
5.3 Noise rating (NR) curves used in Europe 134
5.4 Relationship SPL values and noise rating curves 135
5.5 Computer-based measurement system for different excitation signals
(schematic block diagram) 136
5.6 Overlay of excitation, raw data and impulse response files 137
5.7 Transfer function as Fourier transform of the impulse response 138
5.8 Phase response of the impulse response 139
5.9 Frequency response and spectrogram presentation 139
5.10 Loudspeaker data and polar diagrams 141
5.11 Different aiming diagrams of a typical point source loudspeaker 144
5.12 Sophisticated data presentation of a point source loudspeaker 146
5.13 Sophisticated data presentation of a modern line array 147
5.14 SPL coverage figures for a line array in part of a stadium 148
5.15 Control panel for coverage control 149
5.16 Controlled radiation avoiding sound coverage in unoccupied zone 2 150
5.17 Loudspeaker system for suppressing echoes 152
5.18 Proportion of listeners still perceiving an echo, as a function of echo delay 153
5.19 Limit curve to be complied with for suppressing echo disturbances 153
5.20 Tolerance curves for the reproduction frequency response in different
applications: (a) recommended curve for reproduction of speech;
(b) recommended curve for studios or monitoring; (c) international
standard for cinemas; (d) recommended curve for loud rock and pop music 155
5.21 Attenuation behaviour of filters of constant bandwidth (a) and of
constant quality (b) 156
5.22 Figures a–c show the same view of a 3D computer model in AutoCAD,
SketchUp and in the simulation software EASE 158
5.23 Echogram in EASE-AURA 4.4 160
5.24 3D aiming presentation in a wireframe model (EASE4.4) 161
5.25 2D aiming mapping (EASE 4.4) 162
5.26 Delay presentations in simulation tools: (a) in ODEON 10.0 and (b) delay
pattern of first signal arrival in EASE 163
5.27 Echo detection in EASE: (a) initial time delay gap (ITD) mapping
to check echo occurrence in a stadium; (b) echogram in weighted
integration mode at 1 kHz; (c) echo detection curve for speech at 1 kHz 165
5.28 Waterfall presentation in EASE 166
5.29 Sound pressure level mapping in simulation tools: (a) 2D presentation in
CATT acoustics; (b) narrow-band presentation in EASE; (c) broadband
presentation in ODEON 167
5.30 Speech transmission index (STI) presentation in EASE: Top: three-dimensional
presentation in a hall and Bottom: STI presentation in a parliament 169
5.31 Parameter presentation in EASE: Top: Clarity C80 and Bottom: Sound
Strength G 170
5.32 Block diagram of an auralization routine 171
6.1 Radiation angle and loudspeaker height d above ear-height 177
6.2 Installation grids of ceiling loudspeakers. (a) Centre to centre;
(b) minimum overlap; (c) rim to rim 178
6.3 Loudspeaker coverage (left, 32 loudspeakers; right, 123 loudspeakers) 178
6.4 STI coverage (left with 32, right with 123 loudspeakers) 179
vxi
xiv Figures
6.5 Suspended directive loudspeakers for avoiding an excitation of the upper
reverberant space 180
6.6 Double loudspeakers arranged in a staggered pattern for covering a
large-surface room 181
6.7 Sound reinforcement system for an outdoor swimming pool area 184
6.8 Simple sound system at a sports ground 184
6.9 Level relations between two loudspeakers arranged at distance a 185
6.10 Loudspeaker arrangement for decentralized coverage of a large square.
(a) Loudspeakers with bidirectional characteristics; (b) loudspeakers with
cardioid characteristics 186
6.11 Installation of passive directed sound columns on the platform of the
Munich main station. © Duran Audio 188
6.12 Radiator block to cover a platform in the main station in Frankfurt/Main
with sound. © Holoplot 189
6.13 Decentralized coverage of a church nave by means of sound columns for
speech 190
6.14 Simple sound reinforcement channel position of source S, microphone
M, loudspeaker L and listener H as well as associated angles 191
6.15 Sound transmission index LXY as a function of the distance ratio rH/rXY 193
6.16 Use of built-in arrays for source localization 195
6.17 Geometric relations in the case of centralized coverage without delay
equipment 196
6.18 Sound-field relations with different loudspeaker arrangements 196
6.19 L-C-R arrays in a larger conference hall (EASE simulation) 197
6.20 Use of supporting loudspeaker for coverage of a listener’s seat 198
6.21 Explanation of localization and phantom source formation as a
function of the time delay of one partial signal 200
6.22 Acoustical localization of a sound source (in the speaker’s desk) by means
of a delayed sound system (schematic) 200
6.23 Schematic layout of a delta-stereophonic sound reinforcement system.
(a) Working principle. (b) Equipment structure of the DSS 202
6.24 Source-oriented reinforcement system. (a) Tracking and delay localization
zones for mobile sound sources. (b) Placement of hidden positions of
installed trackers. (c) Visualization of 12 localization zones on a stage 203
6.25 Loudspeakers on stage for source support and tracking procedure with
corresponding software. (a) Stage area with hidden support loudspeakers
in a stage design in the Albert Hall performance. (b) Computer graphic
for visualization of the tracking procedure of a talker or singer on a stage.
(c) One tracker placement on stage for a performance in the Szeged
Drone Arena 205
6.26 d&b Soundscape 360° system 207
6.27 Use of induction loops for compensation of hearing loss in a theatre
main floor 209
6.28 Infrared field strength coverage simulation on listener areas of a lecture
hall with two SZI1015 radiators in blue 210
6.29 Sennheiser SZI1015 radiator/modulator and infrared receiver 211
6.30 FM receiver EK 1039 212
7.1 Simplified sound system or audio channel intelligibility chain 216
xv
Figures xv
7.2 Subjective effect of delayed reflections and later arriving sounds 219
7.3 Typical speech waveform 223
7.4 Diagrammatic view of speech events (syllables or words) 224
7.5 Diagrammatic view of the effect of noise on speech, for high, moderate
and low signal to noise ratios 224
7.6 Diagrammatic effect of noise on speech elements of varying level 225
7.7 Diagrammatic effect of reverberation on speech elements of the same and
varying levels 225
7.8 Speech waveform for the word ‘back’ 226
7.9 Diagram showing the effect of reverberation times of 1.0 and 0.6 seconds
on the word ‘back’ 226
7.10 Temporal variability of speech: LAeq =73 dB, LAmax =82 dB, LCeq =78 dB,
LCpk =98 dB and average LCpk =89 dB 227
7.11 Typical speech spectrum 228
7.12 Speech and test signal spectra from IEC 60268-16 2011 and 2020
(Editions 4 and 5) 229
7.13 Speech spectra of six different voices and comparison with IEC 60268-16
2011 spectrum 229
7.14 Typical octave band contributions to speech intelligibility 230
7.15 Octave band analysis of speech and interfering noise –with good signal to
noise ratio 230
7.16 Octave band analysis of speech and interfering noise –with poor signal to
noise ratio 231
7.17 Energy time curve for sound arriving at listening position from distributed
sound system in 1.6 s RT space 231
7.18 Integrated energy plot for distributed sound system 232
7.19 Sound energy ratios –C7 is effectively the ‘direct sound’ alone. C50 and
C35 include early reflections that will integrate with and increase the
effective level of the direct sound 232
7.20 Example of strong echo occurring in circular reverberant space. 233
7.21 MTF plot for high-quality sound reinforcement system in 1.4 s RT space
(STI =0.61) 237
7.22 Effect of speech level on STI for three reverberant conditions 238
7.23 STI qualification bands (categories) 240
7.24 MTI plots for two sound systems exhibiting STI values of 0.61 and 0.49
respectively 241
7.25 Theoretical effect of a delayed sound (echo) on STI 242
7.26 1/12 octave analysis of Edition 5 STIPA signal (centre frequencies at
125, 250, 500 Hz, 1, 2, 4 and 8 kHz) 243
7.27 Typical target speech response curve 246
7.28 Frequency response for cathedral sound system measured at typical listener
location 247
7.29 Set of frequency response curves for a concert hall high-quality sound system
248
8.1 Top: computer model of the main railway station of Berlin (EASE
software by AFMG). Bottom: Holoplot loudspeaker system installed at
Frankfurt Hauptbahnhof (main station) 252
8.2 Exemplary distribution of direct SPL across listening areas in a
medium-size church (EASE software by AFMG) 253
xvi
xvi Figures
8.3 Ambisonics reproduction room 254
8.4 3D computer model of a German church (Frauenkirche Dresden) that
shows the level of geometrical details typically used for acoustic indoor
models (EASE software by AFMG) 255
8.5 Illustration of scattering effects: at low frequencies (left) the fine structure
of the surface is ignored. For wavelengths of the order of the structure’s
dimension, the incident sound wave is diffused. At shorter wave lengths
geometrical reflections dominate again (courtesy Vorländer 2006) 256
8.6 Directivity balloon for a line array (EASE SpeakerLab software by AFMG) 257
8.7 Material measurements in the reverberation chamber 258
8.8 Exemplary scattering and diffusion behaviour of a Schroeder diffuser
computed by AFMG Reflex 260
8.9 Schematic structure of a reflectogram 261
8.10 Exemplary room transfer function measured in a medium-size room
(EASERA software by AFMG). Typical smooth, modal structure in the
frequency range 50 Hz to 300 Hz; typical dense, statistical structure for
frequencies above 1 kHz 262
8.11 Computed modal sound field of a studio room (courtesy AFMG) showing
the surfaces of equal pressure 264
8.12 Numerical optimization scheme for sound system configurations as used by
AFMG FIRmaker 265
8.13 Positional maps showing an example of improvement of SPL uniformity
when using FIR numerical optimization. Top: without FIR optimization.
Bottom: with FIR optimization 266
8.14a Image source method. Top: construction of image source S1 by mirroring
at wall W1. The connection from image source S1 to receiver E
determines intersection point R1. Bottom: construction of the (possible)
reflection using the point R1 267
8.14b Image source method. Construction of image source S2 by mirroring at
wall W2. Intersection point R2 is outside of the actual room surface.
The reflection is impossible 267
8.15 Ray tracing. Rays are stochastically radiated by a source S in random
directions. Some hit the detection sphere E after one or more reflections.
In this example, the rays representing the floor reflection RF and the
ceiling reflection RC are shown. The direct sound path indicated by D is
computed deterministically 268
8.16 Pyramid-or cone tracing. Schematic illustration of a beam tracing
approach in two dimensions. Cones with a defined opening angle are used
to scan the room starting from the sound source S. Receivers E located
inside a cone are detected and validated 269
8.17 Radiosity method. Patch P3 is illuminated and excited by radiation from
patches P1 and P2. It may also radiate sound itself 270
8.18 Clarity C80 results shown as 3D mapping for theatre model
(courtesy EASE 5 AURA by AFMG) 273
8.19 Typical example for a result echogram generated by ray tracing simulation
methods (courtesy EASE 5 AURA by AFMG) 274
8.20 Binaural setup with HRTF selections by head tracker. Blue: Right ear
channel. Red: Left ear channel 274
xvi
Figures xvii
8.21 Part of a typical project report (courtesy EASE Focus 3 by AFMG) 275
8.22 Computer model of a theatre with different acoustic materials assigned to
walls, ceiling and floor (courtesy EASE 5 by AFMG) 276
8.23 Eyring reverberation time calculated for a medium-size church
(courtesy EASE 5 by AFMG) 279
8.24 Schematic overview of binaural auralization process 280
9.1 Link offset determines latency 286
9.2 Phase coherence by identical link offset 286
9.3 Unit types within a network 287
9.4 Router connecting subnet A to subnet C 288
9.5 Star topology 291
9.6 Spine/leaf architecture 292
9.7 Ring topology 292
9.8 Audio to multiple destinations using unicast 293
9.9 Audio to multiple destinations via multicast 294
9.10 Querier activated in a switch with high-bandwidth links 294
9.11 Quality of Service (QoS) concept 296
9.12 A PTP leader synchronizes the absolute time across all followers.
Each device then derives its own media clock from this. 297
9.13 Principles of the Precision Time Protocol (PTP) 298
9.14 Example of a PTP scenario with several devices that can be leaders 299
9.15 Concept of a boundary clock switch 301
9.16 Concept of a transparent clock switch 302
9.17 Example of an SDP file (relevant parameters in red) 304
9.18 Stream variants with identical packet size: number of channels versus
packet time 306
9.19 Elements of total latency 307
9.20 Principle of SMPTE ST 2110 309
9.21 Synchronizing older Dante devices in an AES67 environment 313
9.22 Loop detection by the Spanning Tree Protocol (STP) 314
9.23 Link aggregation as safety net for cabling issues 314
9.24 Maximum safety through double networks 315
9.25 Example of a successful (unicast) connection to a device 316
9.26 Matrix crosspoints in senders and receivers must be correctly set 317
9.27 Example of a well-set link offset. All packets arrived within the
set latency 318
9.28 Example of a link offset that is too short. Not all packets arrived within
the set latency 318
10.1 Top view of a convention hall, showing measurement locations (R)
in the auditorium and source locations on stage 326
10.2 Example of an artificial human speech source for room acoustic
measurements 327
10.3 Tolerance curves for the reproduction frequency response in different
applications: (a) recommended curve for reproduction of speech;
(b) recommended curve for studios or monitoring; (c) international
standard for cinemas; (d) recommended curve for loud rock and pop music 328
10.4 Characteristic frequency spectra for white noise and pink noise. Graph
shows the power density spectrum in dB using an arbitrary normalization 334
xvii
xviii Figures
10.5 Characteristic frequency spectra for white sweep, log sweep and
weighted sweep. Graph shows the power density spectrum with an
arbitrary normalization 336
10.6 Shift register for the construction of the maximal length sequence of
order N =3 337
10.7 Section of the time function of an MLS of order N =16. The sampling
rate is 24 kHz 338
10.8 TDS principle. (a) Measurement principle; (b) action of the
tracking filter 340
10.9 SysTune measurement system 341
10.10 Octave-band display of the spectral shape of white noise and pink noise.
Graph shows the band-related power sum spectrum with an arbitrary
normalization 342
10.11 Top view drawing of the Allianz arena in Munich, showing
measurement locations in the bleachers 343
10.12 Room-acoustic measurement setup 344
10.13 Partial spectrogram 346
10.14 Wavelet type presentation 347
10.15 Exemplary section of a measured impulse response where the sound
reinforcement system (after 90 ms) provides a higher signal level than
the original sound source on the stage (at about 44 ms) 349
10.16 Polarity checker from MEGA Professional Audio 351
11.1 Computer model with Atlas Sound Ceiling speakers FA 136 and
Renkus-Heinz Line arrays IC16 and listener areas 355
11.2 Calculated SPL distribution in the greeter area 355
11.3 Calculated intelligibility number STI from 0.5 to 0.75 in the
greeter area 356
11.4 Computer model of the main station in Berlin 356
11.5 Arrangement of the Duran Audio loudspeaker IntelliDiskDS-90 356
11.6 Radiation pattern of nine IntelliDisk DS-90 speakers along the upper
platform in the main station 357
11.7 Computer model of the complicated lobby structure 359
11.8 Overall RT, SPL and STI values in selected floor level 360
11.9 Olympia-Stadium in Berlin 2002 with no roof and sound coverage from
the perimeter of the field of play 362
11.10 View into the Getec Arena during a handball game 365
11.11 Wireframe model of the Getec Arena Hall 366
11.12 Overall RT, SPL and STI values in the Hall 367
11.13 Wireframe model of the inside geometry of the stadium 368
11.14 Twelve line array positions with a total of 124 Electro-Voice XLD 281
modules 369
11.15 Outstanding SPL and STI distributions in the bleachers 369
11.16 Location of the stadium in Capetown close to residential areas 370
11.17 Noise map during a night time game 370
11.18 NTi XL2 Handheld Acoustical Analyzer including measurement
microphone M2211 372
11.19 Wireframe model of the 250-m-wide entrance lobby 373
xi
Figures xix
11.20 Rendered view of the Grand Foyer including ceiling detail 374
11.21 Designed sound system with line arrays type JBL VT4888DP in the
centre of the foyer 375
11.22 Room acoustic design of the main hall. Left above: View to the hotel
design. Right above: Computer model of the main hall. Below: Echo
simulations in computer model 376
11.23 Recommended secondary structure in the main hall. (Left)
Architectural design of the hall (only hard surfaces); (right) hall with
acoustical treatment at ceiling and back wall. Light red faces at front
and back wall: Broadband absorber (e.g. slotted panels). Orange face at
back wall: Additional broadband absorber (slotted panels or curtain).
Dark red faces at ceiling: Broadband absorber or sound transparent grid.
Dark blue: Glass facade with inclined lamella structure 377
11.24 RT values in the main hall (left) without treatment and (right) with treatment
377
11.25 Main loudspeaker groups in a theatre 382
11.26 Rendered section of the Musik theatre Linz. ©Landestheater-Linz.at 385
11.27 Layout of the second floor 385
11.28 View of the stage opening with lowered line arrays 386
11.29 Positions of panorama loudspeakers along the railings of the three galleries 387
11.30 Loudspeaker layouts 387
11.31 View to the stage including loudspeaker systems 388
11.32 Overall sound level in the audience hall, broad band, A-weighted 388
11.33 STI mapping 389
11.34 Distribution of the STI values by consideration of noise and masking 389
11.35 View of the large (so-called Bruckner) rehearsal room 390
11.36 View of the studio theatre 390
11.37 Performance TRL stage manager system 391
11.38 Computer model and detail view of the wall structure 391
11.39 Modifying reverberation time by changing the ceiling height above the stage 392
11.40 Variable acoustics in the Concert Theatre Coesfeld, Germany 394
11.41 KKL Luzern Concert hall, rare view from within an echo chamber. © KKL Luzern 395
11.42 Variable low-and mid-frequency absorber aQflex by Flex Acoustics 396
11.43 Computer model showing all enhancement loudspeakers and a
rendered view of the hall 398
11.44 Opening concert (left) and hall with the new wall loudspeakers 399
11.45 The Anthem Hall Washington and view of the hall on the right 400
11.46 Computer model and calculated RT of the Anthem Hall 400
11.47 Mappings of SPL and STI 400
11.48 Buddistic temple inside and outside 401
11.49 Floor plan of a traditional synagogue 402
11.50 New Munich synagogue 403
11.51 Rendered computer model with calculation results in mapping and
probability distribution form 403
11.52 View into the Central Synagogue in New York 404
11.53 Ceiling detail of the Central Synagogue including small point-source
loudspeakers in the corners 404
11.54 Sound columns in the gothic church Maria Himmelfahrt in Bozen, Italy 406
x
xx Figures
11.55 View of the iconostasis with installed directed sound columns in the
Christ the Saviour Cathedral in Moscow 406
11.56 Centrally arranged loudspeaker cluster in a church 407
11.57 Decentralized small line arrays in a church 408
11.58 Test setup for new sound system in the cathedral 409
11.59 New main sound system in the cathedral 410
11.60 St. Ursen Cathedral with two visible line arrays for sound coverage 411
11.61 St. Ursen Cathedral exterior 411
11.62 View into the Sheikh Zayed Mosque in Abu Dhabi, UAE 414
11.63 Line arrays close to the Mihrab prayer niche 415
11.64 View into the Al Eman Mosque in Jeddah KSA 415
xxi
Tables
4.1 Typical parameters for microphones with different directivity patterns 117
5.1 Noise criteria values in different room types 134
5.2 Noise rating (NR) values in different studio facilities 135
5.3 Relationship of distance in m and ft and run time in ms (path time
relation at 20°C) 151
6.1 Examples of achieved sound reinforcement 193
7.1 Reverberation time and sound reinforcement system design 218
7.2 STI matrix 235
7.3 STI matrix for sound system measurement –STI =0.611 236
7.4 STIPA matrix 237
7.5 STI qualification bands and typical applications 239
7.6 Speech and STI/STIPA test signal characteristics 242
7.7 Minimum recommended number of spatially distributed STI measurements 244
7.8 Relationship between STI and % AlCons 245
9.1 Recommended QoS settings 296
9.2 Selection criteria for PTP leaders, sorted by BMCA rules 299
9.3 Recommended PTP parameter settings 300
9.4 Typical audio stream formats 306
9.5 Overview of the chosen approaches 311
xxi
Contributors
xxii
Contributors xxiii
speech intelligibility and is vice chair of the AES standards committee on acoustics and
audio. In 2014 he was awarded the AES Bronze medal and in 2020 the UK Institute of
Acoustics’ Engineering Medal.
Dirk Noy has a Master of Science (MSc) Diploma in Experimental Solid-State Physics
from the University of Basel, Switzerland, and is a graduate from Full Sail University,
Orlando, USA, where he was one of John Storyk’s students. He is Director of Applied
Science and Engineering at WSDG. He frequently lecturers at SAE, TBZ and the
ffakustik School of Acoustical Engineering all in Zurich. He is also a member of the
Education Committee for the AES’s Swiss section.
vxi
1
A. Input –wired and wireless microphones, CD and streaming media players, radio tuners,
instruments and any other signal sources
B. Processing –signal routing, dynamics processing (gates, compressors), frequency
conditioning (equalizers, crossovers), mixing (gain setting and combining) and
distribution
C. Output –amplifiers and loudspeaker system(s) (optional) control system to adjust
system parameters on all items mentioned above in real time, component switching,
preset store and recall, algorithms, interfacing with third-party technical systems etc.
According to the present state of the art sound reinforcement systems can be systematized
as per the following list, taking into consideration the critical location relationship between
the signal source (e.g., a stadium announcer or a musician) and the signal receiver (e.g., the
listener in the audience).
A. Sound reinforcement systems where the signal source is remote, possibly a prerecorded
message, mostly invisible and not particularly spatially related to the listener:
Public buildings
Paging systems
Voice alarm systems
Shopping malls
Transportation hubs
Hotels
Museums and exhibition halls
Sports venues
Stadia
Arenas
Outdoor fields/campuses
DOI: 10.4324/9781003220268-1
2
B. Sound reinforcement systems where the signal source is somewhat distanced (e.g., on
a stage, on the cinema screen or in a recording room), but clearly visible and visually
relevant to the listener:
Performing arts centres
Clubs /discotheques
Music venues
Theatres
Opera houses
Concert halls
Multipurpose halls
Religious venues
Churches
Synagogues
Mosques
Media production facilities
Audio recording studios
Broadcasting facilities
Cinemas
THX, Dolby, DTS
Immersive acoustics
Home cinemas
C. Sound reinforcement systems where the signal source and the listener are co-located
and basically interchangeable as any listener can become a sound source and vice versa:
Corporate environments
Meeting rooms
Boardrooms
Video conferencing
Educational facilities
3
Product standards
International ISO 7240-04 International
ISO 7240-19 ISO 7240-16 ISO 7240-19
ISO 7240-24
Product standards
CEN standard
CEN standard EN 54-04
EN 60849
CEN TS EN 54-32 EN 54-16
(in future EN 50849)
EN 54-24
National standards
National standards
BS 5839-8
BS 7827
NEN 2575
BS 6259
DIN VDE 0833-4 together with DIN 14675
DIN VDE 0828
NF S61-936
TRVB-158
Ö F3076
Ö F3074
NFPA 72
1.2.5 Hotels
Hotels are a collection of many specific types of rooms that can be found individually in
this chapter, such as a ballroom or a cinema. The main reason for a full-facility sound
reinforcement system is again the emergency voice alarm functionality. Care must be taken
that guests are not suspicious that the loudspeaker system might be used in reverse mode
to listen in to certain areas. Often the loudspeaker system is solely installed in the hallways
and other public spaces.
Beside the alarm and emergency systems conventional sound reinforcement systems
are required for ballrooms and lobbies. Hotels quite often host conferences, meetings and
exhibitions where there is a need for sound amplification mainly for highly intelligible
speech. Operational flexibility might ask for mobile walls; the various configurations make
dealing with sound systems a complex task unless using ceiling speakers. These lower-quality
6
1.3.1 Stadia
These single-purpose facilities are half-open or fully open rectangular or oval bowls with a
level field of play and a slanted spectator seating area, sometimes distributed over one, two,
three or four rank levels.
The walls and floors are commonly fabricated in concrete construction, the roof is often
a membranic material, sometimes visually transparent. Membranes act as low-frequency
absorbers (in addition to the panel resonance low frequencies may also pass through the
membranes better in comparison with high frequencies which are reflected), so in combin-
ation with the roof only the audience remains as a broadband absorber. If no further room
acoustical treatment is provided the reverberation times are very long under the roof, which
in turn then increases the design complexity for a sound system which allows voice alarm
calls with sufficiently high speech intelligibility.
To verify the sound quality and to study the acoustical, architectural and technical
parameters (locations, materialization and product specifications) during the design phase
the use of computer simulation is strongly recommended.
Target sound pressure levels are a matter of great discrepancies in technical specifications
and are often predetermined by a sport’s governing body such as FIFA, UEFA, IOC and
7
1.3.2 Arenas
In contrast to a stadium, an arena is a large-volume closed venue. Hence acoustical energy
cannot escape and without proper design will tend to build up. Arenas are likely to be multi-
purpose, say for various sport disciplines and for entertainment alike, sometimes dividable
and reconfigurable (e.g., with mobile audience ranks), making the acoustical considerations
more complex; hence it is strongly recommended to perform studies by use of computer
simulation platforms.
While designing a sound system in such spaces, initially the architecture of the
facility must be studied. If the architectural design is still in development and the acous-
tician and sound system designer are already involved both must inspire the architect to
keep the reverberation time at a reasonably low level, say below 2 seconds in the occu-
pied case. Quite often this cannot be achieved and modern sound system design must
succeed in providing clear and intelligible messages. The configuration of modern sound
systems such as line arrays might conflict with video facilities such as a centrally located
video cube.
The use of sound reinforcement systems in these facilities is often newer than the
building, which was not originally designed with sound systems in mind, but today a theatre
or concert hall without a perfect sound system is not imaginable. In a theatre we need
a sound system for the reproduction of all kind of sound effects and audio play-ins. The
sound system signals are a part of the production, just like the theatrical lighting system.
The localization of the played-in signals is important as well, so a considerable number of
loudspeakers are often arranged around the stage opening, distributed within the depth
1
recording purposes in concert halls or for voice-over in post-production, the larger ones
allow recordings of entire bands, orchestras or orchestra groups. The user profile predefines
the acoustic properties of the space. Usually low reverberation times are required: the
standards indicate values of 0.2 to 0.5 s (Figure 1.5).
1.7 Cinemas
1.5
RT factor
0.5
0
31.5 63 125 250 500 1000 2000 4000 8000 16000
Frequency / Hz
Figure 1.7 Dolby and THX recommendation for the reverberation time tolerance range.
of 2500 m3 (fewer than 200 seats) the reverberation time should be decreased to values of
around 0.6 s in the midrange.
Background noise levels should be kept to a minimum to allow for silent parts of the
movie to really be silent and undisturbed. The three primary potential noise sources are: (A)
mechanical equipment (HVAC), (B) noise from adjacent theatres and lobby and (C) out-
door noise.
After the first demonstration of a movie with sound in 1923 in New York horn
loudspeakers were widely used in cinemas in the thirties. Afterwards one-channel, two-
channel and four-channel systems were used until 1950. The Cinemascope format uses
35 mm movie formats. In the seventies Dolby noise suppression was introduced and in 1975
Dolby stereo sound (left, centre, right and surround) came to the cinema with a simple
matrixing algorithm. Tomlinson Holman created the THX standard in 1982.
Not only do the front loudspeakers have to be installed behind the acoustically trans-
parent screen but the screen has had to be curved with the THX standard. There followed
the 5.1 and 7.1 surround sound formats, and the digital ‘Cinema Digital Sound’ (CDS)
format was introduced in 1990. In 1992 the first use of the Dolby Digital format is seen.
One year later the Digital Theater System (DTS) is introduced with sound on a sep-
arate CD-ROM. In parallel, Sony offers the digital optical sound format SDDS for large
cinemas with eight audio channels. In 1999 still higher audio channel counts are offered
by Dolby Surround EX. After 2000 spatial audio formats are introduced mainly for tests
like wave field synthesis with a large number of surround speakers along the walls; see
Figure 1.8.
These types of formats facilitate the creation of localization effects even inside the lis-
tener area. The compatibility with other systems is not yet solved; therefore, movies with
WFS sound formats are not available practically. Research on this topic is being performed
to evaluate the audience localization performance in the case of 3D-video projection in
combination with 3D audio reproduction.
16
Usually, the furniture and furnishings in a living room would be sufficiently absorptive as
to control the reverberation time; for designated home theatres an acoustical study is to be
recommended, perhaps specifying absorbers like curtains or tapestries. Strong side and rear
wall reflections should be avoided at the listening position. The surround loudspeakers can
be installed higher than the listening zone and then be slightly tilted downwards to avoid
reflections as well [6].
Additional efforts must be undertaken in designing home cinema rooms for more than
two or three listeners.
1.9.2 Classrooms
Classrooms are comparable to larger meeting rooms or boardrooms; please refer to the
appropriate section. Remote teaching is more important than ever, and more often than not
the classroom may be equipped with video conferencing infrastructure, taking into account
the issues listed above.
19
DOI: 10.4324/9781003220268-2
21
• loudspeakers
• microphones
• loudspeaker arrays
• vibration exciters
Sound sources have a certain range of acoustic power they may radiate. Figure 2.1 shows the
range of sound pressure values.
For thousands of years human beings have been familiar with natural sound sources and
their increasing portfolio as new sounds are added. Sound systems have been available since
loudspeakers and microphones were developed, that is for just about 100 years.
Both types of sound sources –natural and electroacoustic –may be controlled in level
and frequency range. Natural speech or acoustic music is controlled by the talker, singer
Figure 2.3 Left figure: scattering coefficient in blue, important for simulation routines of sound
propagation. Right figure: reflection patterns for three selected frequencies, Reflection at
125 Hz (red), local reflections at 4000 Hz (blue) and scattering at 800 Hz (green).
25
0.163 V V
RT = ≈ 0.163 (2.1)
4 m V − S ln (1 − α ) A
V volume in m³
A equivalent absorption area in m²
α mean sound absorption coefficient (frequency-dependent)
S total surface of the room in m²
m damping coefficient as a function of air absorption and frequency in m−1 [1]
A = α S = ∑ α i si + ∑ A n + 4 m V (2.2)
i n
Pab = ¼ wr c A.
27
4P
wr = (2.3)
cA
c velocity of sound
While the sound energy density wr is approximately constant in the diffuse sound field, the
direct sound energy and thus also its density decrease in the near field of the source with the
squared distance r from the source, hence given as
P 1
wd = (2.4)
c 4πr 2
Strictly speaking, this is valid for spherical point sources only, but may at a sufficient dis-
tance from the source also be accepted for most of the practically available loudspeakers, in
which case the energy is considered by the directivity characteristics.
In this zone of dominating direct sound, the sound pressure loss results as p ~ 1/r. By
doubling the distance r the sound level drops by 6 dB.
If the energy densities of the direct sound and the diffuse sound are equal (wd = wr), (2.3)
and (2.4) can be equated, i.e. one can derive a particular distance from the source, the rever-
beration radius rH. Therefore, for a spherical source follows
A A V
rH = ≈ ≈ 0.141 A ≈ 0.057 (2.5)
16π 50 RT
For a directional source the eq. (2.5) must be changed to (2.5a) and we obtain the so-called
critical distance Dc:
Dc = Γ ( ϑ ) * Q * rH (2.5a)
In Figure 2.4 the variation of the overall energy density level 10 lg w dB is plotted as a
function of the distance r from the source (w = wd + wr). In the direct field of the source,
one observes a decrease of 6 dB per doubling of distance. For a directional sound source
with the directivity factor Q this can be expressed as 10 lg wd dB ≈ 10 lg Q dB − 20 lg r dB.
Hence it follows that, beyond the critical distance Dc, an area wherein a constant diffuse-
field level 10 lg wr dB ~ − 10 lg A dB prevails. In an absolute free field (A → ∞) the free-field
behaviour (6 dB decrease per distance doubling) would continue (dashed line in Figure2.4).
Figure 2.4 shows that the critical distance can also be derived graphically. The critical dis-
tance thus obtained is Dc =10 m.
28
Figure 2.4 Sound level in a closed space as a function of the distance from the source.
Figure 2.5 Recommended values of reverberation time at 500 Hz for different room types as a function
of volume.
1 Rooms for oratorios and organ music
2 Rooms for symphonic music
3 Rooms for solo and chamber music
4 Opera theatres, multi-purpose halls for music and speech
5 Drama theatres, assembly rooms, sports halls
29
Figure 2.6 Optimum reverberation time at 500 to 1000 Hz (according to Russel and Johnson).
Figure 2.7 Tolerance ranges for the recommended reverberation time values vs. frequency.
Figure 2.9 Time behaviour of the sound pressure p(t), of the sound intensity Jτo(t) integrated according
to the inertia of the hearing system, and of the sum energy E(t).
subjective sound impressions, which change from audience area to audience area along
with the variation of these initial reflections. Sound reinforcement systems provide the
option to improve unfavourable reflection behaviours (illustrated in reflectograms) (e.g.
low direct sound, missing short time reflections to support speech intelligibility) by electro-
acoustically compensating for the missing energy in those time intervals wherein room-
acoustical reflections do not occur.
The threshold between early and late energy portions depends on the genre of the
performance and on the build-up time and lies at about 50 ms for speech and at about
80 ms for symphonic music, after the direct sound. Early energy enhances clarity, late energy
enhances the spatial impression. Lateral incident energy within a time range of 25 to 80 ms
may even enhance both clarity and spatial impression [7]. This is of crucial importance for
the planning of sound reinforcement systems.
The definition measure C50 was derived from speech clarity D [8] as defined by Thiele:
50 ms
∫ p 2 (t ) dt
C 50 = 10 lg 0
∞
dB
(2.6)
∫ p 2 (t ) dt
50 ms
This means that the more sound energy arrives at the listener’s seat within the first 50 ms
the higher is the speech intelligibility, i.e. the definition. Good speech clarity is generally
given when C50 ≥ 0 dB.
The frequency-dependent definition measure C50 should increase by approx. 5 dB with
octave centre frequencies above 1 kHz (octave centre frequencies 2 kHz, 4 kHz and 8 kHz),
and decrease by this value with octave centre frequencies below 1 kHz (octave centre fre-
quencies 500 Hz, 250 Hz and 125 Hz).
Extensive investigations were carried out to establish a measure for the clarity of classical
music. It was found that with symphonic and choir music it is not necessary to distinguish
between temporal clarity and tonal clarity (the latter determines the distinction between
different timbres) [9]. Both are equally well described by the clarity measure C80:
80 ms
∫ p 2 (t ) dt
C80 = 10 lg 0
∞
dB
(2.7)
∫ p 2 (t ) dt
80 ms
The value for a good clarity measure C80 depends strongly on the musical genre. For romantic
and most classical music, a range of approximately −3 dB ≤ C80 ≤ +4 dB is regarded as good,
whereas jazz and modern music will allow for values of up to +6dB to +8dB.
1. Speech Transmission Index STI (developed by Houtgast and Steeneken in 1972 [10],
1985 [11])
2. Articulation Loss of Consonants Alcons (developed by Peutz and Klein in 1971 [12, 13])
3. Subjective intelligibility tests
As this topic is of major significance for the use of sound reinforcement systems, the details
will be laid out in a separate chapter (Chapter 7).
At a distance of r =0.28 m from the assumed point source, the sound pressure level Ld and
the sound power level LW are equal. At just 1 m distance (reference distance) both levels
already differ by 11 dB, i.e. with a sound power of 1 W (⇒
⇒ Lw =120 dB sound power level)
the sound pressure level amounts to just Ld =109 dB.
For open-air installations where the distance between loudspeaker and listener may be
exceptionally large, an additional propagation attenuation Dr depending on temperature
and relative humidity must be considered. In this case the sound pressure level at distance r
is calculated for the assumed point source as
Curve 3 in Figure 2.10 illustrates the average value of the empirically derived curve
family that should be used in practice. It reveals that up to a distance of 40 m no add-
itional attenuation needs to be considered. This for instance applies for nearly all indoor
rooms. The additional attenuation Dr increases with increasing frequency (Figure 2.11).
This behaviour needs to be taken into account when designing a sound reinforcement
system.
Owing to the heat expansion of the air, the speed of sound increases by about 0.6 m/
s per degree Kelvin. This implies that in a layered atmosphere in which the individual
air layers are of different temperatures, the sound propagation is modified accordingly
(Figure 2.12) [14].
Figure 2.11 Atmospheric damping Dr as a function of distance r at 20°C, 20% relative moisture and
very good weather.
Parameter: frequency ƒ
Figure 2.12 Sound propagation influenced by temperature. (a) Negative temperature gradient; sound
speed decreases with increasing height. (b) Positive temperature gradient; sound speed
increases with increasing height.
Where the air is warmer near the ground and colder in the upper layer, an upward diffraction
of sound takes place so that sound energy is withdrawn from the ground-near transmission
path with increasing deterioration of the propagating conditions (Figure 2.12a). This case
occurs for example with strong sunlight on plain terrain as well as in the evening over water
surfaces which were warmed up during the day. Inverse conditions prevail with cool air
at ground level and warmer air in the upper layer, as is the case over snow areas or in the
morning over water surfaces. Under these conditions sound energy is diffracted from the
upper layers down to the lower layers (Figure 2.12b).
Given that wind speeds are relatively low compared to the speed of sound (wind speed
in a storm is approx. 25 m/s, while the average speed of sound c =340 m/s), sound propaga-
tion normally is not significantly influenced by wind. However, due to the roughness of
the Earth’s surface the wind speed is lower at ground level than in higher layers, so sound
35
Figure 2.13 Sound propagation against the wind (a) or with the wind (b); wind speed increasing in
both cases with height.
propagation may indeed be modified by the wind gradient in a similar manner as by the tem-
perature gradient (Figure 2.13). Thus, speech intelligibility may be significantly decreased
by a very whirly and gusty wind [14].
or a sound intensity of
Jo =10−12 W/m2.
Figure 2.15 Frequency weighting curves recommended by the IEC 61672 for sound level meters.
The lower sensitivity of the auditory system for low and high frequencies at low sound levels
is approximated for determined loudness values by means of weighting curves that work like
filters in sound level meters and international standards (Figure 2.15). According to IEC
61672, the A-weighted curve approximately corresponds to the sensitivity of the ear at 30
phons, whereas the B-weighted curve and the C-weighted curves correspond more or less to
the sensitivity curves of the ear at 60 phons and 90 phons, respectively [16].
37
Figure 2.16 Excitation level LE on the basilar membrane caused by narrow-band noise with a centre
frequency of 1 kHz and a noise level of LG (indicated on the x-axis between 8 and
9 Bark).
z subjective pitch
38
Figure 2.17 Excitation level LE over the subjective pitch z (position on the basilar membrane), when
excitation is done by narrow-band noise of LG =60 dB and a centre frequency of ƒc.
LT resting threshold
neighbouring ranges. The inaudibility of weak background noises in the presence of much
louder audio signals may be attributed to the masking effect as well.
In this respect it is also interesting to know how to arithmetically summarize sound
levels in a given spectrum. Since the addition is that of the common logarithm of energy, it
is necessary to add the p 2-proportional energy contents. This is simple if n coherent sound
sources are of same level and same spectrum, resulting
With sound stimuli arriving at the listener’s position in addition to the direct sound from
sound reinforcement systems (simultaneously or briefly delayed) the sound components to
be added are often of different levels. For ascertaining the overall level, one may use the
nomogram given in Figure 2.18.
Figure 2.19 Early to-late ratio of the sound reaching the ears, as a function of the incidence angle.
Figure 2.20 Variation of the sound level difference with discrete frequencies and horizontal motion of
the sound source around the head.
but continues localization by the primary event. The blurring threshold (see above) is at
about 50 ms. With longer delay times (30 ms and longer) timbre changes are noticeable and
at ≥ 50 ms distinct echoes (‘chattering effect’) are audible.
The precedence effect is frequently used for the localization of sources in sound reinforce-
ment systems. This will be covered in detail in Chapter 5.
If identical signals from two loudspeakers show a mutual delay of max. 3 ms at the point
of arrival they merge into one single signal as far as localization is concerned. The resulting
effect is a phantom source between the two directions of incidence, called a summing local-
ization effect. If neither a delay nor a level difference exists between the two signals, the
phantom source is located on the bisecting line between the directions towards the two
loudspeakers. Delay or attenuation of one of the two loudspeaker signals causes a shift of
the phantom source determined by the summing localization effect and leaning towards the
loudspeaker radiating either the earlier or the louder signal.
41
Figure 2.21 Critical level difference ΔL between reflection and undelayed sound producing an appar-
ently equal loudness impression of both (speech) signals, as a function of the delay time Δt.
Soon it became obvious that the sound level produced by the system within the audience
area of the hall or in the open air must be sufficiently high. The following quantities are
related herewith:
Figure 2.22 Use of a Magnavox sound system in 1919 to target 75,000 people in San Diego.
Figure 2.23 Mass event in Germany in the 1930s with the then newly developed horn loudspeakers
and condenser microphones.
From the 1920s engineers strived to improve the components of a sound system step by step. In
the US, the development of horn and column loudspeakers was mainly driven by the expanding
movie industry, while during the Nazi regime in Germany efforts were being undertaken to
engineer sound systems to cover large halls and fields for mass events. New microphone types
based on the dynamic and electrostatic principle were developed as well; refer to Figure 2.23.
A rapid development of new loudspeaker types took off, mainly horn loudspeakers in the
US and line columns in Europe.
In 1957 a comprehensive book on sound system design was published: Petzold [25]
indicated the first basic design guidelines for sound coverage in stadia and theaters, Figure 2.24
The guidelines covered:
Figure 2.25 Schematic diagram of the magnetic tape delay system with perpetual tape loop, record
head to print the original signal and various reproduction heads at varying distances
down the tape loop to obtain a variation of delay times.
In the 1960s Olson introduced the first simple analogue tape delay lines (Figure 2.25) in
sound systems, which subsequently were further developed with the introduction of digital
delay units as a critical component of sound systems to compensate for travel time [26].
In 1975 Don Davis published his book ‘Sound System Engineering’ [27], which introduced
calculations regarding required acoustic gain and potential acoustic gain (the latter in con-
sideration of the positive acoustic feedback; Figure 2.26).
In the early 1980s the book ‘Basics in Sound Reinforcement’ was published in
German [28]. Around that period now-available computers were starting to be used to simu-
late acoustic and sound system relationships during the planning phase of a space, before the
installation was actually executed.
Acoustic calculation software packages like Odeon, CATT-Acoustics and EASE became
available.
In sound systems design, the following requirements could be found:
Figure 2.26 Basic measures for first sound reinforcement and first feedback considerations.
• the timbre depending on the transmission range and the frequency response of the
signal transmitted
• the required frequency response
• absence of distortions
In the beginning of the 1990s the first line array loudspeaker became available (Heil and
Urban, 1992 [29]) and in 1998 the first electronically controlled line column loudspeakers
were introduced (Duran Audio, 1998 [30]); compare Figure 2.27.
After the events of 11 September 2001, the focus of new developments and designs was
directed to high-performing emergency sound systems. The updated standard IEC 60268–
16 was published to specify the requirement of high speech intelligibility in public spaces.
hIn the existing acoustical software simulation packages, the influence of noise impact and
masking on speech intelligibility had to be considered and included. New loudspeaker design
developments can be observed as well such as highly directional loudspeaker arrangements
based on wave field synthesis [31].
In 2015 complex manipulation of sound fields was introduced in modern sound system
installations. Sound sources on stage should be correctly localized throughout the area [32],
while the concept of ‘immersive’ sound reproduction includes not only loudspeakers for
sound level coverage and speech intelligibility but also the creation of a natural acoustic
soundscape surrounding a listener in rooms and halls as well as in open-air installations [33].
newgenrtpdf
46
46
Wolfgang Ahnert and Dirk Noy
Figure 2.27 (left) 3 dB attenuation by line array principle (C. Heil and M. Urban), (right) digitally controlled line column by Duran Audio.
47
All these requirements are the basis for sound reinforcement design work and serve as an
intermediary between the requirements of the client and the boundaries of technical feasi-
bility. In this context it is required to be informed in as much detail as possible not only
about the technological requirements to be met by the system, but also about the room-
acoustical conditions under which the sound system must work. An optimal solution can
be achieved by this process.
These parameters are put together in a requirements document and are used as target
parameters for the new design.
Additionally, the design must consider all factors interfering with visibility conditions
for the audience. This is of particular importance as the loudspeakers are directed towards
the audience and thus are mostly arranged in an area susceptible to architectural or interior
design. Visual screening or ‘hiding’ of loudspeakers is possible only to a limited degree and
entails several acoustical problems like limited transmission range or poor sound coverage.
The requirements may be quite varying. For example, in sports facilities it is particularly
important to avoid impairing visibility onto the playing field. Requirements regarding field
clearance, as specified by international sport associations, must also be considered.
In modern multi-purpose halls, a visible arrangement of the loudspeakers is quite accept-
able (Figure 2.28), but the overall architectural design should not be disturbed nor must
the lighting and video projection facilities be impaired. Complicated conditions may arise
in highly prestigious or historically protected spaces. The example in Figure 2.29 illustrates
48
Figure 2.29 Hidden loudspeakers in the portal area behind the blue fabric.
49
These conditions are nearly all relevant for medium and large multi-purpose halls. The
exposed position of such an audio mixing console, the loss of good audience seats, the
necessity of providing easy accessibility without disturbing the audience or being disturbed
by them, as well as the avoidance of visibility impact involve a number of design challenges
which have to be solved in close cooperation between the architect, the manager of the
venue and the sound system designer.
With smaller halls and cultural centres, the situation may be different. Sound systems
therein are usually controlled from a separate audio control room. However, hook-up points
for mobile mixing consoles should be available in the audience area when required.
In theatres the conditions are similar: hook-up points for mobile or permanently installed
mixing consoles within the audience area are required for the adjustment of the loud-
speaker system during rehearsals and for the performance of musicals and other productions
requiring a high standard of audio engineering.
50
(a) The feedback loop of the electroacoustic amplification circuit does not only contain an
electrical part but also an acoustically audible part.
(b) It is practically impossible to separate the feedback path in the room by subdiv-
iding it into individual parts (e.g. electroacoustic installation, acoustical path in
the room).
(c) The feedback can occur via numerous loops and paths; the nature of acoustical feed-
back is more complex than that in purely electrical networks.
β0 − µβR β0 + µ β1β2
B (ω ) = A ⋅ (ω)
1 − µβR (2.10)
∞
( )
i
B (ω ) = (β0 − µβRβ0 + µβ1β2A (ω) ⋅ ∑ µβR (2.10a)
i=0
Whether the feedback system remains stable or becomes unstable depends on the term
∑ (µ β )
i
R
i=0
Because µ βR is frequency-dependent, the condition applies for all frequencies and not
only for the average values of the so-called open-loop amplification. With growing µ βR
the value B(ω) becomes very large. As βR and µ are complex numbers, the term can be
expressed as follows:
µ βR = Ge jϕ
In doing so, G and ϕ can be regarded as the amplification and phasing of the signal in a
closed loop. G and ϕ generally vary with the frequency. The feedback system is always stable
when the following two conditions are fulfilled:
Im{ µ βR } ≠ 0
{ }
Re µ βR < 1, i.e. G < 1
ϕ
d = λ(n + )
2π
52
c
fn = n ; n = 1, 2, 3
d
Acoustic positive feedback occurs at these frequencies fn (Figure 2.31). As c is always con-
stant, and d is constant for a certain configuration, f1 represents the fundamental frequency
at which the positive feedback sets in. It will then reappear at all integer multiples of this
fundamental frequency. A formerly even frequency response of an installation hence will
be modified by the feedback in a comb filter type pattern of periodic reoccurrence within the
audio spectrum.
Feedback occurs with the so-called loop gain ∣ µ βR∣ = ∣vS(f)∣ ⇾ 1. By increasing or
reducing distance d positive feedback sets in periodically (loop amplification vs ⇾1 is
assumed).
Whereas in a free sound field a comb filter curve characterizes the frequency behaviour
of the sound transmission from the loudspeaker to the microphone, many such comb filter
curves simultaneously act in the room due to reflections off the room’s limiting surfaces (a
sum of infinitely many curves) as a sound transmission curve between the loudspeaker and
the microphone. These statements, however, do not just apply to coefficient βR, but for all βi
in the room. Comprehensive investigations have been carried out regarding the frequency
independence of these transmission curves (i.e. frequency curves, sound transmission curves
of the room, marked as coefficient βi in Figure 2.31). Schroeder, Kuttruff and Thiele [35,
36, 37] showed that the statistical parameters of frequency curves in different rooms above a
threshold frequency fL are equal and depend only on the reverberation time RT.
RT
fL = 2000 (2.11)
V
54
2.6.1.3.1 DIFFERENCE BETWEEN THE AVERAGE VALUE AND THE PEAK VALUE OF A FREQUENCY CURVE
The peak values of the transmission curve shown in Figure 2.33 are the frequencies where
feedback will occur if the average amplitude of the transmission is appropriately enhanced.
The level difference between the peak and the average value of a transmission curve is of
interest: It has been demonstrated [37, 38] that –depending on the reverberation time RT
and the bandwidth B of the signal that is transmitted –feedback does occur when the peak
value exceeds the average value by a specific difference ∆L:
∆L =10 log10 ln N dB
The number N of values into which the investigated frequency range of the bandwidth B is
subdivided results in N =0,1 B*RT. Calculations have found the relationship for ∆L in dB
shown in Figure 2.34.
Figure 2.34 Difference of the peak and average value of the sound transmission curve with
B –bandwidth in Hz
RT –reverberation time in s
epsilon –error probability
5
2.6.1.3.2 DIFFERENCE BETWEEN THE AVERAGE VALUES OF THE FREQUENCY CURVE AND THE POSITIVE
FEEDBACK THRESHOLD LEVEL
For optimizing the achievable amplification, the determining factor is not the difference
between the average value and the peak value of the frequency curve, but the level diffe-
rence between the average value of the sound transmission curve and the positive feedback
threshold.
This difference X is identical to the so-called loop amplification vS. As a result, we obtain
X = − 20 log 10 vS dB
To avoid positive feedback, it must be ensured at any time that X does not fall below a cer-
tain value ∆L =≈ 6–12 dB (depending on the bandwidth). From experience it is known
that for voice transmissions one can operate with a feedback reserve of 3 dB, which results
in X =9–15 dB. Kuttruff used a feedback reserve of approx. 5 dB for speech, and 12 dB for
music [39]. Hence X results in 11–17 dB or 18–24 dB, respectively.
For computer simulations not just including loudspeakers but also microphones a feed-
back factor R(X) is introduced:
vS2 10 − X /10 dB
R (X) = =
1 − vS 1 − 10 − X /10 dB
2
It is obvious that R(X) is between 0.01 and 0.1 for the practical values of X (10–20 dB). This
order of magnitude may be used for practical calculations. By introducing the feedback level
LR =10 log10(R(X)) the relationship between LR and X is shown in Figure 2.35.
Whereas in a free sound field the frequencies which lead to a positive feedback are deter-
minable, sound systems in rooms do not permit any statements about the absolute values of
positive feedback frequencies. It is only apparent that at the particular frequency with the
greatest peak the positive feedback occurs if the amplification µ is increased. If the location
of the loudspeaker or the microphone in the room is modified the positive feedback will
most probably occur at a different frequency as the greatest peak for this new geometrical
layout is located at a different frequency.
Figure 2.35 Relationship between the feedback threshold X and the required feedback level LR.
PL = µ ( w MS + w ML ) = µ w M
The second term in the addition is determining the positive feedback sensitivity of the
system. If the loop amplification of the system
w ML
vS2 = →1
wM
the amplification circuit becomes unstable and after some modification results in
w ML vS2 X
R (X) =
− dB
= with vS2 = 10 10 , as shown above
w MS (1 − vS )
2
By inserting LR =10 log10 R(X) dB it can be stated that feedback can be avoided when using
a sound system where LMS – LML = LR, with:
LMS total sound level on the microphone produced by the original sound source
LML total sound level on the microphone produced by the loudspeaker system
LR feedback level
To securely avoid feedback in all instances LR should be 12–15 dB, and a recommended
minimum value of LR should not fall below 6–9 dB (use of directional microphones).
57
The given sequence reflects the technical complexity and sophistication for the systems.
An important consideration for selection and arrangement of the sound reinforce-
ment devices and thus also for the selection of the technical solution to be installed is
the spatial relationship in which the listener is located regarding the original or supposed
(playback) source of the transmitted sound event. This original source may be completely
separated from the listener, as is the case with the aforementioned announcement systems
(for instance in department stores, transportation hubs etc.); it may be located in the same
room as the listener (in the action area) but separated from the listener (in the reception
area), or both areas may also overlap. Acoustical feedback may occur in the latter two cases.
Sound systems facilitate the transportation of information from a source to the listener
and operate in an environment (hall or open space) determined by the room acoustic prop-
erties of that space. These room acoustic properties are not greatly influenced by the sound
system but may be much influenced by a specifically installed reverberation enhancement
system. Establishing good audibility indoors as well as in the open air has been and remains
the subject of room and electro-acoustics.
Integrating electro-and room acoustics as well as architectural solutions is often challen-
ging. Some of the critical areas are:
• the sound source in question has only a limited sound power rate, so the dimensions of
the speakers may become an issue for the architectural design
• modifications of room acoustics may consequently lead to major updates to the archi-
tectural design and thus might not be optimally applied
• measures regarding room acoustics may cause a considerable number of constructional
changes and these can only be optimally integrated for one single intended purpose of
the room
• the constructional modification, despite its high costs, may result in only a limited effect
Because of these reasons sound systems are increasingly employed to adjust specific room
acoustic properties, thus improving audibility, intelligibility and spaciousness. At the lis-
tener position the level of direct sound is of great significance. Also ‘short time reflections’,
enhancing intelligibility of speech and clarity of music, can be provided by means of sound
systems.
58
• direct sound
• initial reflections with direct-sound effect
• reverberant initial reflections
• reverberation
For this reason, electronic techniques have been developed that introduce the possibility
of increasing the direct sound or reverberation time and energy in the hall, hence directly
modifying the acoustic room properties.
These methods of enhancing the room-acoustic properties of spaces are the application
of so-called ‘electronic architecture’. A good acoustic design is achieved when listening to
an event it is indistinguishable whether the sound quality is a result of just the interaction of
the original source with the space or of employing an electro-acoustic enhancement system.
Therefore, enhancement systems are normally operated by the stage manager of the hall. The
configuration of such a system is calibrated during installation and programmed for different
applications like concerts, operas or speeches and the parameters cannot be modified by
the sound engineer. Hence the sound reinforcement system must be considered completely
separated from the enhancement system, although certain components of both systems like
loudspeakers may be used by both systems. An enhancement system is a designated part of
the room-acoustic properties of the hall which are not to be modified during a performance.
Sound
ACS Signal source
processor
Parameters Reflection
simulator Convolution
Reflections
The EMR unit scans the boundary microphones in cycles while the FIR filters impede
feedback.
For enhancing the reverberation, the microphones are partially located in the diffuse
sound field and partially in the source area (green dots in Figure 2.37b).
The loudspeakers are located at the wall and ceiling areas of the room. For enhancing
the early reflections four to eight microphones are located in the ceiling area near the
sources. The signals picked up by these are passed through FIR filters and are reproduced
as lateral reflections by loudspeakers located in the wall and ceiling areas of the room. The
loudspeakers are arranged in such a way that they cannot be located, since their signals are
to be perceived as natural reflections.
Furthermore, the AFC system allows signals to be picked up, e.g. in the central region of
the audience area, and the reproduction of them via ceiling loudspeakers in the area below
the balcony for the sake of enhancing spaciousness.
A (the theatre or concert hall) with a secondary room B (the ‘reverberant room processor’).
Simultaneously the number of reproduction channels is reduced along with the timbre
change of sound events. An enhancement of the early reflections is obtained as well; see
Figure 2.38.
Within the Constellation system a multitude of small loudspeakers L1 to LN (N =40 to
50) is distributed in the room, which, of course, may also be used for panorama and effect
purposes. Ten to 15 strategically located and visually inconspicuous microphones M1 to MN
pick up the sound and transmit it to the effect processor X(ω) in which the desired and
adjustable reverberation takes place. The output signals thus obtained are fed back into the
62
2.7.2.4 Vivace
Vivace, developed by Mueller BBM, ensures a high degree of detail veracity, accurate tran-
sient response, and exceptional feedback stability. The result is a homogeneous and entirely
realistic three-dimensional sound which meets individual on-site acoustic requirements
with high flexibility and accuracy.
Vivace also enables sources and effects to be moved around, virtually, in the acoustic
environment.
A Vivace system consists of a few microphones picking up the performance on the
stage, the room-enhancement mainframe, an audio I/O matrix system, multichannel
digital amplifiers, monitored remotely, and loudspeakers. Vivace digitizes, analyses and
processes incoming stage-microphone signals in real time and subsequently plays them
back over precisely positioned speakers; refer to Figure 2.39. Using an intelligent con-
volution algorithm Vivace can recreate almost any environment in a low-reverberant
space [44].
References
1. Vitruvius, The Ten Books on Architecture, edited by I.D. Rowland and T. N. Howe. Cambridge
University Press, March 2001.
2. Xiang, N., Architectural Acoustics Handbook, chapter 12. J. Ross Publishing, 2017.
3. Ballou, G., Handbook for Sound Engineers, 5th ed., chapter 9. New York and London:
Focal Press, 2015.
4. DIN 18041:2016–03.
5. ISO 3382-1 and 2 and 3.
6. Sabine, W.C. Collected papers on acoustics. Cambridge, MA: Harvard University Press, 1923.
7. Lehmann, U. Untersuchungen zur Bestimmung des Raumeindrucks bei Musikdarbietungen und
Grundlagen der Optimierung. Diss. Tech. Univ. Dresden, 1974.
8. Thiele, R. Richtungsverteilung und Zeitfolge der Schallrückwürfe in Räumen. Acustica (1953)
Beih. 2, p. 291.
9. Reichardt, W., Abdel Alim, O., and Schmidt, W. Definitionen und Meßgrundlage eines objektiven
Maßes zur Ermittlung der Grenze zwischen brauchbarer und unbrauchbarer Durchsichtigkeit bei
Musikdarbietungen. Acustica 32 (1975) 3, p. 126.
10. Houtgast, T., and Steeneken, H.J.M. Envelope spectrum and intelligibility of speech in enclosures,
presented at the JEEE -AFCRL 1972 Speech Conference.
11. Houtgast, T., and Steeneken, H.J.M. A review of the MTF concept in room acoustics and its use
for estimating speech intelligibility in auditoria. J. Acoust. Soc. Amer. 77 (1985) pp. 1060–1077.
12. Peutz, V.M.A. Articulation loss of consonants as a criterion for speech transmission in a room.
J. Audio Engng. Soc. 19 (1971) 11, pp. 915–919.
13. Klein, W. Articulation loss of consonants as a basis for the design and judgement of sound
reinforcement systems. Journal of the AES 19 (1971) 11, pp. 920–925.
14. Herrmann, U.F. Handbuch der Elektroakustik (Handbook of electroacoustics). Heidelberg:
Hüthig, 1983.
15. Barkhausen, H. Ein neuer Schallmesser für die Praxis. Z. tech. Physik (1926). Z. VDI (1927)
pp. 1471 ff.
16. Class 1 Sound Level Meter IEC 61672:2013.
17. Zwicker, E. Ein Verfahren zur Berechnung der Lautstärke. Acustica 10 (1960) p. 304.
18. ISO 532-1, Ed. 2017, Method for calculation of loudness –Part 1: Zwicker method.
19. Zwicker, E., and Feldtkeller, R. The Ear as a Communication Receiver. Acoustical Society of
America, 1999.
20. IEC 651 Ed. 1994. Precision sound level meters.
21. Dietsch, L. Objektive raumakustische Kriterien zur Erfassung von Echostörungen und Lautstärken
bei Sprach-und Musikdarbietungen (Objective room-acoustical criteria for registering echo
disturbances and loudnesses in speech and music performances). Diss. Tech. Univ. Dresden, 1983.
22. Blauert, J. Räumliches Hören (Stereophonic hearing). Stuttgart: Hirzel, 1974.
23. Jeffers, L.A., and McFadden, D. Differences of interaural phase and level detection and localiza-
tion. J. Acoust. Soc.Amer. 49 (1971) pp.1169–1179.
24. Haas, H. Über den Einfluß eines Einfachechos auf die Hörsamkeit von Sprache (On the influ-
ence of a single echo on the audibility of speech). Acustica 1 (1951) 2, pp. 49 ff.
67
Further Reading
Taschenbuch Akustik, Teil 2, VEB Verlag Technik Berlin, 1984.
Handbook of Acoustics, T.D. Rossing (Ed.), Chapter 18. Springer, 2007.
Everest, A.F. The Master Handbook of Acoustics, 4th edn. New York: McGraw-Hill, 2001.
68
3 Loudspeakers
Gottfried K. Behler
DOI: 10.4324/9781003220268-3
69
Loudspeakers 69
Figure 3.1 Sectional view of a magnetostatic ribbon tweeter of modern design. The conducting wires
are thin copper strips bonded directly to the thin foil membrane.
The more common magnetostatic loudspeaker is the modern version of the ribbon loud-
speaker and is thus a transducer working according to the dynamic principle. To achieve an
impedance between 4 and 16 ohms, which is typical for loudspeakers, the current-carrying
conductor (which in the classic ribbon was the low-impedance metallic ribbon itself) is
applied as a conductive flat voice coil on a thin plastic diaphragm, which is located within
a strong magnetic field.
The most common construction when high forces and large linear excursion of the mem-
brane are required (such as in a woofer for low-frequency reproduction) uses a circular voice
coil in a radial magnetic field.
The ring magnet between the two pole plates (usually a ferrite or neodymium compos-
ition) generates a constant magnetic field in the air gap of strength B. The cylindrical voice
coil is located in the centre of that gap; hence, the magnetic field crosses the wire section
of the voice coil. The voice coil is fixed to the membrane and is only allowed (by the spider
and the suspension) to move in an axial direction.
The current (I) in the voice coil and the magnetic
➝ field
➝ (B)➝ cross perpendicularly; hence,
the resulting force (Fcalculated from the formula F = I• l × B ) is directed in the axial direc-
tion and generates the desired excursion of the membrane. When the voice coil is moved
from its resting position, the spider is deformed, which produces the required restoring force
to take the membrane back to its resting position after switching off the current. The sus-
pension at the outer edge of the membrane guides the membrane without introducing much
force and is needed to seal the moveable parts from the fixed parts of the speaker.
Typical materials for the magnet are ferrite and metallic alloys like neodymium (NdFeB).
For high flux density and low weight, the material neodymium is used, which allows small,
highly efficient and lightweight transducers to be built. One of the critical characteristics of
NdFeB, however, is its relatively low Curie temperature, where the material loses its ferro-
magnetic properties. Temperatures higher than 100°C are critical and careful cooling of the
loudspeaker motor unit is required.
70
70 Gottfried K. Behler
Most membranes for woofers today are still manufactured from paper pulp. High-quality
synthetic membranes like cast polymer membranes or sandwich membranes are rare,
whereas membranes manufactured from foils using hot pressing techniques are more popular
because of the cheap manufacturing process. In compression drivers for high frequencies,
metallic membranes made of aluminium or titanium are often used due to their limited
weight and high stability.
The spider is made of synthetic fabric, which is pressed into folds to give a linear increasing
force with excursion and is soaked with a special resin to reach the appropriate stiffness. The
outer suspension of high-power transducers is often made of fabric whereas for higher excursions
a rolled rubber suspension is used. The outer suspension should not induce strong reaction forces
(as the spider does) but is intended to damp bending waves of the membrane to avoid standing
waves known as ‘breakup’ modes. These breakup modes are the cause of peaks and dips in small
frequency bands, which cause linear distortions of the transfer function and may become aud-
ible as resonances. The mechanical construction of the frame holding the heavy magnet as well
as the lightweight membrane must be sturdy and should be free from resonances. The selection
of moulded aluminium frames for high-quality speakers is quite common. Nevertheless, a well-
designed pressed-steel frame is good as well and may be less costly.
Loudspeakers 71
more transducers and the corresponding cabinet. In the following a distinction is made: the
loudspeaker as a transducer component, and the loudspeaker as an enclosure with an
arrangement of a certain number of spatially distributed point sources, which interact with
each other (e.g. line source, loudspeaker arrays, multi-way loudspeakers).
72 Gottfried K. Behler
Figure 3.3 Some typical loudspeaker types that can be described as point sources. Typical compact
PA systems in upper part (Klein+Hummel). Ceiling speaker horn systems in the lower
part: left, a large cinema horn by JBL and right a Tannoy speaker.
changed and for example tilted up-or downwards. These systems are very useful for ampli-
fication in reverberant spaces to increase the speech intelligibility by directing the amplified
sound to the audience seats and not exciting the room too much.
Since most arrays are actually constructed of individual point sources, the focus is put on
the measurement and description of the output of point source loudspeakers.
Loudspeakers 73
Figure 3.4 Comparison of the so-called ‘stacking’ array of speakers (left picture) and the nowadays
more common ‘line array’ concept (centre picture) to cover large audience areas. Rightmost
a typical active, digitally steerable line array using multiple identical loudspeakers mounted
in one line, individually driven by a DSP amplifier allowing individual adjustment of fre-
quency, time and phase response of the 16 point sources to create a coherent and nearly
cylindrical wave front (Renkus-Heinz).
• expectation for fidelity (linear and non- linear distortion) with respect to the
program
• budget limitations
• definition of target audience area and possibly areas with requirements for quietness
• architectural restraints (size, quantity and acceptable locations for loudspeakers)
With respect to the situation that current signal-processing can maintain almost any
required correction to the signals fed to the loudspeakers (i.e., equalizing, delaying etc.), the
main interest is in the specific output properties that are inherently connected to the loud-
speaker and cannot be changed by processing. Moreover, the order of importance is helpful
to find out which is the economically most efficient system.
74
74 Gottfried K. Behler
In the first place the application of the loudspeaker requires consideration: it makes a
big difference whether an emergency call is to be announced or music is to be reproduced
at the highest sonic quality. This, for example, defines the lower and the upper limit of the
frequency range and to some extent the smoothness of the response, the permissible distor-
tion and the maximum output capability. Further details will be discussed in the following
sections 3.2.1 to 3.2.4.
One of the most underestimated properties of loudspeakers is their directivity as it
cannot be changed by manipulation of the input signal since it is inherently connected
to the mechanical design of the cabinet and the arrangement of the different transducers.
An exception is digitally controlled (DSP) loudspeaker arrays, where each of the many
transducers (arranged either as a line source or as a two-dimensional planar array) can be
individually driven by dedicated amplifiers with signals suitably filtered for magnitude and
phase so to create a specific directivity pattern; see section 3.3.5.
Other parameters which are difficult to correct by signal-processing are:
• power handling
• power compression
• distortion
• resonance behavior
p TF
L measured = 20 log dB ( SPL ) (3.1)
p 0
75
Loudspeakers 75
Figure 3.5 Frequency response of a loudspeaker system plotted for a rated input voltage of 2.83 V and
a measurement distance of 1 m on axis. Sensitivity (see equation (3.2)) and bandwidth
with respect to upper and lower cut-off frequency is depicted with dashed lines.
In Figure 3.5 the sensitivity can be found to be 95 dB for a given input voltage of 2.83 volts
at a distance of 1 m. The lower cut-off frequency is at 38 Hz and the upper cut-off is at
18 kHz. The narrow dips in the frequency response above 12 kHz may be ignored because they
are less than 1/3 octave wide. The sensitivity definition takes the nominal input impedance
Zn of the loudspeaker into account, which can be derived from the frequency-dependent
impedance plot as given for example in Figure 3.6. To assess the efficiency of loudspeakers,
an electrical power of 1 watt into the nominal impedance is required. Therefore, the ‘nom-
inal’ input voltage can be calculated as U n = Zn ⋅1 W . For an 8 ohm nominal impedance
the input voltage needs to be 2.83 V.
For a correct measurement of the frequency response, the microphone should be placed
in the far field. This requirement leads to measurement distances rmeas of much more than
1 m, especially for large loudspeaker systems. The following equation refers to a nominal
power of 1 W during the measurement:
p TF r dB ( SPL )
L sens = 20 log + 20 log meas (3.2)
p 0 1m (1W / 1m )
76
76 Gottfried K. Behler
Figure 3.6 Frequency-dependent input impedance of a loudspeaker. For this system the nominal
impedance Zn was defined by the manufacturer as 8 ohms. Taking the tolerance limit into
account allowing a −20% undercut this loudspeaker does not fulfil ISO standards.
For both acoustical (sound pressure) and electrical (impedance) transfer functions,
in most cases the absolute value (magnitude) is shown while the imaginary part (phase
response) is often neglected. In Figure 3.7 the phase response plots for Figure 3.5 (acoustical
transfer function, upper graph) and Figure 3.6 (electrical transfer function, lower graph)
are depicted. The phase response of the acoustical output of a loudspeaker system should
be plotted without the excess phase shift introduced by the sound delay due to propagation
from the loudspeaker to the microphone. The phase of the electrical impedance clearly
shows the resonance of the system (bass reflex tuning frequency at about 43 Hz where the
phase response shows a zero-crossing) and the transition from compliance to mass loading
above 120 Hz. For higher frequencies the impedance is dominated by the crossover network.
Due to its high sensitivity, attenuation for the horn loaded compression driver is needed,
which in passive loudspeaker systems is introduced by the crossover network. Therefore,
the input impedance of the high-frequency unit is masked behind the higher impedance of
the network.
Another possible representation for the phase-related behaviour of a loudspeaker system
is given by the group delay calculated as the first derivate from phase response over angular
frequency:
dϕ
t gr = − s (3.3)
dω
Compared to the phase response, the group delay response is a linear distortion figure
showing the frequency-dependent delay introduced by the system (‘energy storage’ in the
7
Loudspeakers 77
Figure 3.7 Upper graph: phase response curves of the sound pressure transfer function (shown in
Figure 3.5); lower graph: the phase response curve of the impedance transfer function
(shown in Figure 3.6).
system). The audibility of group delay distortion very much depends on the frequency and
the amount of variation [2]. Audible effects with loudspeakers are mainly found at low
frequencies where the group delay variation is the highest [3]. The group delay response
derived from the phase response in Figure 3.7 is shown in Figure 3.8.
78
78 Gottfried K. Behler
Figure 3.8 Group delay distortion of the phase response shown in Figure 3.7.
The impulse response is said to be the only relevant measurement for a linear time invariant
system (LTI) containing all information about the system. Unfortunately, loudspeakers are
neither extremely linear nor extremely time invariant, mostly due to the mechanical nature
79
Loudspeakers 79
of the transformation of electrical current to acoustical sound. At low power levels most
loudspeakers can be considered as having low distortion factors. However, care must be
taken that a certain distortion limit is kept during the measurements. For them to be mean-
ingful, the distortion at low frequencies below 200 Hz should be THDmax = 3% and in the
mid and high frequencies THDmax = 1% should be kept (compare eq. (3.4)).
The step response is derived from the impulse response by integration over time. Likewise,
the impulse response can be calculated from the step response measurement by differenti-
ation over time. So, compared to the impulse response there is no specific new information
obtained by the step response function, it is just a different way of visualizing the same data.
The waterfall diagram combines the frequency response plot with time domain informa-
tion: it displays the frequency-dependent progression of the impulse response, or vice versa,
the time-dependent frequency response as a 3D Plot (x-axis: frequency [Hz], y-axis time
[s], z-axis SPL [dB]). However, since the duration of a single period of a sinusoidal signal
depends on the frequency, identical damping at different frequencies results in different
decay times. Simply speaking: the decay for low frequencies takes longer than for high fre-
quencies though both may die out during only one period, meaning they are identically
damped. Though apparently the decay time at higher frequencies is shorter, the damping
might be less. Therefore, the time in the waterfall plot should be time-compensated in a
way that one period independent of the frequency covers the same length on the time axis.
A plot like this can be achieved using a wavelet transform instead of the commonly used
sliding FFT-window plot.
80
80 Gottfried K. Behler
It is possible to show the same data in a 2D coloured plot: in Figure 3.11 the most
common display of the decay time over frequency is shown for the sample loudspeaker
(frequency response shown in Figure 3.3). It displays a very well damped behaviour for the
entire frequency range. However, the above-mentioned problem (neglecting the frequency-
dependent duration of a period) is visible and it appears that low frequencies are not treated
as well as the high frequencies. Therefore, the magenta curve shows the duration of exactly
one period at each frequency, which would be equivalent to a frequency-independent con-
stant damping factor.
3.2.2 Distortion
Often, loudspeakers are blamed as the weakest element in the audio reproduction chain.
With respect to frequency response and distortion value this certainly is true. Whereas
the frequency response can be linearized to some degree using EQs, distortion becomes a
severe problem the closer a loudspeaker is driven to its power limits, which is often the case
for installed sound systems. Hence, knowledge about these limitations is crucial to avoid
planning errors.
Several distortion factors must be considered with a dynamic loudspeaker driver under
load: linearity of the stiffness of the membrane suspension, linearity of the force factor, the
change in voice coil inductance, flow resistance of the port in vented cabinets, parasitic
resonators in the mechanical build of the driver, mechanical excursion limits at low fre-
quencies, thermal limits mainly at mid and high frequencies, and finally power compression
due to voice coil heat-up. All these limiting factors are connected to each other but may be
evaluated separately.
81
Loudspeakers 81
Figure 3.11 Waterfall diagram of the loudspeaker from Figure 3.5. The magenta curve shows the the-
oretical decay time for constant relative damping equivalent to 1 period of the related
frequency.
Apart from the linear distortion that was discussed earlier, the frequency-dependent non-
linear distortion must be measured, usually the harmonics of the fundamental frequency.
The most important ones are the second-(octave) and third-order harmonics. These can
be plotted relative to the linear frequency response; see Figure 3.12.
Total harmonic distortion THD values are used to quantify summarized distortion values.
N
U 2N
THD = 100∑ %
U12 (3.4)
2
Loudspeakers 83
Figure 3.13 Frequency-dependent maximum achievable SPL for a defined limit of the THD figure.
The figure shows the sensitivity of the loudspeaker, the theoretical SPL for the proclaimed
power handling of 1000 W input power (manufacturer rating) and the measured, achiev-
able SPL for a given THD of 3% and 10%.
actual THD measurements show much lower levels at low and high frequencies before the
3% or 10% distortion limit is reached. It can also be seen that for a wide frequency range
(200 Hz to 3 kHz) the achievable level for 10% THD seems to be identical to the 3% THD
limit. Due to the fact that the measurement was limited to the specified 1 kW it is possible
that neither distortion limit has been reached and the achievable SPL at those frequencies
could be higher given more input power.
84 Gottfried K. Behler
the loudspeaker cabinet, the size and placement of the transducers, the use of wave guides
(horns etc.) and the crossover frequencies (the latter not being mechanical in nature but
electrical, but often not user-accessible).
The information about the directivity is obtained by a set of frequency response
measurements carried out at different angles in space around the speaker. The most common
method is to define the on-axis front direction to be the 0 degree of a spherical coordinate
system. With respect to this, the elevation angle is defined to be positive to the upper and
negative to the lower hemisphere. Accordingly, to establish a coordinate system that fits the
Cartesian system the azimuth angle counts positive anticlockwise.
In Figure 3.14 the typical placement of the coordinate system is displayed. It illustrates
that from the reference point –which in most cases is coincident with the high-frequency
transducer –an axis (the x-axis) perpendicular to the front panel defines the 0° direction.
The same orientation is then used to measure the frequency response curve, as shown in
Figure 3.5. The distance to the microphone is referenced to this point. For a large loud-
speaker and rather close microphone, the distance between each individual driver and
the microphone will vary when rotating the loudspeaker around its reference point. In
the ideal far field, the loudspeaker becomes an actual point source, and the measurement
of angle-dependent transfer functions is perfect, whereas for shorter distances the error
due to the distance mismatch of each driver becomes more and more severe. As a rule
of thumb, the relative level error may be estimated by a simple formula considering the
distance variation given by the maximum offset of any of the loudspeakers in the front
plane ∆l to the reference point and the microphone distance D:
∆l
∆L = 20 ⋅ log + 1 dB (3.5)
2 D
Cartesian system defining the angles for the measurements of directivities with
Figure 3.14
loudspeakers. Note that the distance to the point for the microphone is not defined, but
must be chosen with respect to the size of the loudspeaker.
85
Loudspeakers 85
Consequently, the minimum microphone distance Dmin can be calculated from the
dimensions of a loudspeaker and a given maximum level error ∆L max :
∆l
Dmin = m
∆L20max (3.6)
2 10 − 1
Computer-
Figure 3.15 controlled robot for measuring directionality information of loudspeaker
systems. Note the tilting of the z-axis, which in consequence leads to an intersection
of the x-axis with the ground plane of the pictured half-anechoic room. At this point
of intersection, the microphone is placed so to ensure that there is only one signal path
between source and receiver. The distance in this set-up is 8 m. (Picture courtesy AAC
Anselm Goertz.)
86
86 Gottfried K. Behler
As a first result of this type of measurement, the directional factor Γ can be found, which
gives an angle-dependent relative value of the sound pressure found at any direction relative
to the front direction:
p (φ,θ)
Γ ( φ, θ ) = 1
p (φ = 0,θ = 0) (3.7)
Figures showing the directivity ratio are not quite typical whereas the directivity gain
showing the dB value of the directional gain D ( φ, θ ) is a common description found in
polar plots and other figures of the directivity such as the different plots shown in section
3.2.3.1:
p (φ,θ) p (φ = 0,θ = 0)
D ( φ, θ ) = 20log Γ ( φ, θ ) = 20log − 20log dB
p0 p0 (3.8)
The data acquisition for a full sphere may take several hours depending on the reso-
lution in space; compare Figure 3.15. The minimum spherical resolution acceptable is 15°,
which already requires the measuring of 288 frequency response functions. For a resolution
of 5° the number of individual measurements is 2522 (considering that the measurements
at the poles are only taken once)! If each single response takes 5 seconds measurement
time, the procedure lasts about 3.5 hours. It is obvious that the handling of these data may
be difficult and for a general visualization of the directivity a less detailed solution must
be found.
Loudspeakers 87
1/3-octave bands. The frontal direction points to the 0°. For positive and negative angles, the observation point –at a fixed dis-
tance –rotates around the reference point, as shown in Figure 3.14.
8
88 Gottfried K. Behler
Figure 3.17 Isobar plots for a loudspeaker in 2D and 3D display for the horizontal and vertical
directivity. The frequency resolution is smoothed to 1/12th of an octave. The horizontal
x-axis shows the frequency. The vertical axis shows the level relative to the frontal dir-
ection (0 degree) in different colours, hence the 0° response is equal to 0 dB. The orange
range covers a deviation of ±3 dB around the 0 dB, whereas for all other colours the range
covers only 3 dB. The right axis shows the angle of rotation (either horizontal or vertical)
of the loudspeaker for a full 360° rotation.
Loudspeakers 89
P
LW = 10 ⋅ log 10 mit P0 = 10 −12 WdB (3.9)
P0
with P being the measured sound power of the source which is derived from sound pressure
level measurements performed in a reverberant chamber according to the regulations
stated in ISO 3741. In the case of loudspeaker measurements, the measurement conditions
shall be exactly the same as for the sensitivity measurements. Especially the input voltage
needs to be the same since in the end a comparison of the two measurements (eq. (3.12))
is done.
The calculation of the sound power is derived from the average level L calculated
from N pressure level measurements (Li) taken at several positions in the reverberant
chamber:
1 N Li 10
L = 10 ⋅ log 10 ∑10
N i =1
dB (3.10)
T60 / s
LWLS = L − 10 ⋅ log 10 − 14 dB dB (3.11)
VRC / m 3
90
90 Gottfried K. Behler
The relationship between free field sensitivity and diffuse field sensitivity. The DI
Figure 3.19
describes the difference between the two graphs. The diffuse field sensitivity is typically
measured in 1/3-octave bands; therefore, the free field sensitivity needs to be averaged in
the same bandwidth to evaluate the DI.
The corrections in this formula are required due to the frequency-dependent absorption
and the volume of the reverberant chamber, which both affect the sound pressure level
measured in the diffuse field. For a detailed derivation please refer to ISO 3741. Therefore,
the reverberation time T60 as well as the volume VRC of the reverberation chamber must
be known. Some further correction terms denoted in ISO (with respect to meteorological
conditions) are required for laboratory accuracy. The sound power level is equivalent to a
sound pressure level of an omnidirectional source measured at a distance of 28 cm. To com-
pare this with the typical free field measurements of loudspeakers taken at a distance of 1 m
the sound power level must be reduced by 11 dB. This finally leads to the calculation of the
directivity index (see Figure 3.19):
This value is often called the direct or frontal to random index, which is the logarithmic dB
value derived from the frontal to random factor, also known as directivity factor Q:
Q=
∫ p
S
0R
2
dS
=
S
(3.13)
∫ p R 2 ( φ, θ ) dS ∫ Γ 2 dS
S S
DI = 10 log Q dB (3.14)
91
Loudspeakers 91
where
p R is the sound pressure at a given distance R from the defined centre of radiation
p 0 R is measured in the frontal direction at the distance R
S is the spherical surface around the speaker at the distance R
ϕ, θ are the angles according to Figure 3.14
Since all these numbers strongly depend on frequency, many manufacturers show either
tables or plots of this dependency in the data sheets of their products. For approximate
calculations or measurements of the directivity index at least the range of the 500 Hz and
1000 Hz octave band needs to be covered.
For combining the influence of the directional effect as well as that of the distribution
between directional and omnidirectional energy, the product of both quantities called the
directivity deviation factor g ( φ, θ ) is used:
g ( φ, θ ) = Q ⋅ Γ 2 ( φ, θ ) . (3.15)
Obviously, the directivity deviation factor is based on the previously introduced directional
factor Γ (eq. (3.7)) and is determined by multiplying by the constant value of the directivity
factor Q; only the range of values of the angle-dependent function changes. Since Γ 2 ( φ, θ )
is 1 for the reference angle ( φ, θ = 0 ) (cf. Figure 3.14), the ‘directivity deviation factor’ can
be understood as the angle-dependent decrease of the directivity factor (application see also
section 6.3.2.1).
The logarithmic expression of the directivity deviation factor g ( φ, θ ) is the directivity
deviation index G denoting the deviation in dB, which gives a better estimate for practical
application:
G ( φ, θ ) = 10 log g ( φ, θ ) = D ( φ, θ ) + DI dB (3.16)
The reader should be aware of the partially contradicting usage of the introduced values.
It is important to note that Q ( f ) and DI ( f ) are frequency-dependent and present a single
number of the directional behaviour (integrated over the full sphere) at certain frequencies
or frequency bands. By definition, they need to be calculated with respect to the frontal dir-
ection as defined by the manufacturer. On the other hand, the two properties G ( φ, θ, f ) and
g ( φ, θ, f ) depend on frequency and spatial direction.
Pac E2 4 πr0
η= ⋅ 100% = LS ⋅ ⋅ 100% (3.17)
Pel ρ0 c QLS
where
Pac total radiated acoustic sound power
Pel electric power fed into the loudspeaker
92
92 Gottfried K. Behler
E LS sensitivity of the speaker defined as the sound pressure level in frontal direction per 1
W at 1 m distance
r0 reference distance, 1 m
Q LS directivity factor of the speaker
E2 LS
η= 3 ⋅% (3.18)
QLS
This correlation can be seen in Figure 3.20. In practice, the efficiency of loudspeaker
systems is between 0.1 and 10%. Eq. (3.18) suggests that owing to the frequency depend-
ence of the directivity factor (cf. Figure 3.19 and eq. (3.14)) and the mostly insignificant
frequency dependence of the free-field sensitivity, the loudspeaker system efficiency also
depends heavily on the frequency.
Loudspeaker efficiency has become less important with the availability of high-power
amplifiers and high power-handling capacities of the loudspeaker drive units. Nevertheless,
it should be kept in mind that an almost inaudible level increase of 3 dB requires double the
power. To achieve twice the perceived loudness (a level increase of about 10 dB) requires 10
times the power (i.e., from 500 W to 5 kW).
Loudspeakers 93
material. It should be lightweight and stable with respect to its shape and dimensions when
heated up to temperatures of up to 250°C. Temperatures that high are not common but
possible during operation at the limit of the power-handling capability. At temperatures
above 200°C the risk of damaging the transducer is imminent. The following rules can be
followed to determine the maximum acceptable input power with respect to the voice coil
temperature.
According to IEC 60268-5 [1], short-time power handling (paragraph 17.2) and long-
time power handling (paragraph 17.3) are defined as the maximum input voltage of typical
program material that does not damage the device. For the short-time limit, the input signal
duration is 1 second and the signal is repeated 60 times with a pause of 1 min in between.
For the long-time limit, the input signal duration is 1 minute and this signal is repeated 10
times with 2 minutes pause in between. Besides these two figures, a continuous sinus-signal
power-handling is defined in paragraph 17.4 of the standard. This is the sinusoidal input
voltage at a given frequency (within the range of application) that does not damage the
device when presented for at least 1 hour.
It is important to understand that these ‘rated values’ are not results of measurements but
are stated by the manufacturer and can be verified or falsified with the above-mentioned
measurement procedure stated in IEC 60268-5. The Audio Engineering Society has defined
methods for the measurement of ‘drive units’ in the AES2-2012 standard [6].The differences
from the IEC 60268-5 are small and in most cases the AES2 standard refers to the IEC one.
94 Gottfried K. Behler
Figure 3.21 Theoretical polar plots for a circular piston of 25 cm diameter (a typical 12′′ woofer) in an
infinite baffle for frequencies from 500 Hz up to 2.5 kHz in steps of 1/3 octave. The total
range is 50 dB. The dotted lines have a distance of 6 dB, so that the intersection points
at the −6 dB line denote the beam width of the directivity, leading to approximately 150°
at 1 kHz, 100° at 1.25 kHz, 75° at 1.6 kHz, 58° at 2 kHz and 45° at 2.5 kHz. Obviously,
omnidirectional sound radiation can be assumed for frequencies below 500 Hz.
Loudspeakers 95
to quite bulky devices where low frequencies are included, since the horn size needs to be
in the range of the wavelength to become effective. Therefore, the application is restricted
mainly to mid and high frequencies.
Current horn constructions differ greatly from the conical shape and the first
improvements were made by the invention of the exponential horn, which was used for
the early gramophones. Webster’s horn theory [7] was issued in 1914 long before the first
electroacoustic systems were available. However, the pure exponential horn shape is not
often found, but the growth of the cross-section in general follows an exponential law. The
main benefit of the exponential horn compared to the conical horn with identical low
cut-off frequency is its significantly reduced length, which makes it applicable in standard
loudspeaker cabinets in combination with a 10 ′′ to 15′′ woofer. This very popular two-way
loudspeaker system can be found in many different applications, from mobile DJ systems to
installed sound reinforcement systems in churches, theatres and other small venues. The
examples presented in section 3.2 illustrate such a loudspeaker.
Figure 3.22 JBL 2360 Bi-Radial® Constant Coverage (another name for CD) horn with attached 2′′
compression driver (courtesy JBL). The left picture shows the narrow slit in the neck
of the waveguide, which continues into the final horn and is intended to diffract the
soundwave horizontally into the final surface line of the horn, so to cover a wide hori-
zontal angle of 90°. The vertical angle of 40° is maintained throughout the length of the
horn with a little bit of widening to the end of the horn mouth.
96
96 Gottfried K. Behler
A typical example of a CD horn is the well-known theatre horn JBL 2360. Figure 3.22
Figure 3.23 Horizontal directivity of the JBL 2360 with JBL 2445J 2′′ compression driver. The aimed
coverage of ±45° is met in the frequency range between 600 Hz and 10 kHz. To cover
lower frequencies requires a larger horn and for the higher frequencies the diffraction slit
probably needs to be even smaller. (Measurement courtesy Anselm Goertz.)
shows the horn with attached compression driver. In Figure 3.23 the horizontal directivity
achieved by this horn is shown.
One of the disadvantages of this construction is the long horn neck with only a small
growth rate in cross section. Rather, the neck is supposed to produce a flat wavefront that
propagates to the diffraction slot. The sound intensity must be distributed evenly over the
wavefront to achieve a uniform sound level in the desired angular range. Although this goal
may be achieved, the result is paid for with higher distortion due to non-linear wave propa-
gation at high levels. The longer the horn neck becomes, the more distortions are created.
For this reason, horns of this type are used less frequently.
Besides the JBL 2360A, which was somewhat typical, many other solutions from
different manufacturers were on the market. All followed the same idea and had similar
design principles. Whereas the ‘smaller’ horns like the JBL 2360 could only be used in two-
way systems with an additional low-frequency box, other manufacturers such as Electro
Voice offered full-range horn-loaded constant directivity designs. Figure 3.24 shows one
example of a large horn-loaded stadium loudspeaker. The application is focused on speech
reproduction; the lower cut-off frequency is at around 100 Hz. Though the loudspeaker is
a two-way loudspeaker, it follows the principle of coincident placement of low-and high-
frequency transducer. The benefit of this arrangement is that the directivity at the crossover
frequency is not affected by interference due to the side-by-side placement. This phenom-
enon is discussed in the next section.
Modern horn design is still based on the CD horn principles. However, disadvantages
like the distortion-producing horn throat shape are found less often. In today’s more
common line arrays, the horn designs must essentially have a uniform horizontal directivity,
while vertically they should primarily produce a flat wavefront with constant sound pressure
and phase. This allows the creation of an approximately homogeneous line source when
97
Loudspeakers 97
Figure 3.24 Electro Voice MH 6040AC stadium horn loudspeaker covering the full frequency range
from 100 Hz up to 20 kHz. The construction uses two 10′′ woofers to feed the large
low-frequency horn and one 2′′ compression driver feeding into the small horn placed
coaxially into the large mouth. The dimensions: height 1.5 m, width 1 m, length 1.88 m,
weight 75 kg.
98 Gottfried K. Behler
Figure 3.25 Two-way PA loudspeaker system with 15′′ woofer and 2′′ compression driver and CD horn
(courtesy Klein+Hummel).
almost identical and lead to the same dispersion angle at the crossover frequency. This is
valid for the horizontal plane since the lateral dimensions are identical and the superpos-
ition of the two elements does not change this. Vertically, the two sources combine to more
than twice the size of a single unit at the crossover frequency. Consequently, the radiation
of sound narrows.
Figure 3.26 shows this typical behaviour in the two directivity plots. It is clearly visible
that for the horizontal directivity the beamwidth does not change much at the crossover
point, whereas vertically a narrowing and asymmetry in the frequency range from 800 Hz
to 1.6 kHz is found. The crossover network design defines the frequency range and the
symmetry of the narrowing. The higher the order of the crossover filter design the smaller
the narrowing range in frequency will become due to the steep filter curves. If the phase
response deviates from the intended ‘in phase’ (n times 360° phase shift between the two
elements) design the main lobe of the sound beam will tilt away from the 0° direction.
Loudspeakers 99
Figure 3.26 Standard isobaric plots for the horizontal and vertical directivity of an ordinary two-way
PA loudspeaker equipped with a 15′′ woofer and a CD horn with 1.4′′ compression driver.
While the horizontal directivity is fairly symmetrical with a slight narrowing at 2 kHz and
becomes narrower at higher frequencies, the vertical isobaric plot shows a typical asym-
metry due to the placement of the two speakers side by side and a strong constriction of
the directivity at the crossover frequency (between 800 Hz and 1600 Hz) due to interfer-
ence. (Courtesy four audio.)
Firstly, to understand line array technique the straight line array created with identical
sources shall be discussed.
Loudspeakers 101
is well described in [10, 11]. In theory, to form a straight-line source, the individual point
sources need to be less than half a wavelength of the maximum desired frequency apart in
order to form a coherent wavefront. Figure 3.27 illustrates how the two directivities com-
bine for an example of a large column with 16 loudspeakers with a membrane diameter of
6 cm spaced at 8 cm from centre to centre.
As one can see in Figure 3.27, the resulting directivity is neither constant nor uni-
form: with increasing frequency, the directivity gets narrower and higher frequencies are
radiated into a very small angle, plus multiple side lobes (off-axis SPL maxima) can be
observed.
To optimize the first issue of this array of drivers, the active length of the column should
be frequency-dependent (longer at low frequencies, shorter towards higher frequencies),
so that a constant size ratio relative to the wavelength is obtained. This can be achieved
by a frequency-dependent control of the individual loudspeaker drivers in such a way that
only very few loudspeakers (either at one end or in the centre of the column) radiate the
entire frequency range from lowest to highest frequencies. For all other loudspeakers in the
column, the signal is low-pass filtered with decreasing cut-off frequency for loudspeakers
further away (Figure 3.28). The second issue described above has its cause in the distance
between each loudspeaker driver in the column. Once the distance reaches half a wave-
length or more, side lobes with rather high intensity are created due to angle-dependent
constructive and destructive interference. Figure 3.27 shows this behaviour in the rightmost
plot for frequencies above 4 kHz.
Figure 3.28 Same column as in Figure 3.27 except for the frequency-dependent low-pass filtered loud-
speaker arrangement to achieve a constant active length of the column relative to wave-
length. The width of the main lobe is significantly greater for high frequencies though
not smooth. The plot shows a simulation with piston-like membranes and theoretical
radiation pattern; it reveals the great potential for DSP-controlled loudspeaker arrays.
102
Loudspeakers 103
Figure 3.30 A DSP-controlled loudspeaker line array (length 3.6 m) is optimized to deliver sound
to two different audience areas. Each picture shows the optimization within a band-
width of one octave, from upper left to lower right: 250 Hz, 500 Hz, 1 kHz, 2 kHz,
4 kHz, 8 kHz. As expected, the suppression of side-lobes at high frequencies is diffi-
cult. (Calculation performed with the software dedicated to the Steffens Evolutone
loudspeaker.)
Figure 3.31 The transformation from circular input to rectangular output. The DOSC geometry sets
all possible sound path lengths to be identical from entrance to exit, thus producing a flat
source with equal phase and amplitude.
from the circular opening of the compression chamber driver to a linear, planar wave front
at the output, with the amplitude and phase along the line being equal. However, to achieve
this the geometry of the wave-guide became rather complicated [15].
By arranging several of these elements one above the other to obtain a straight line for
the high frequencies, and besides this keeping the distance of the low-frequency transducers
105
Loudspeakers 105
Figure 3.32 Representation of the variation of the distance for cylindrical sound radiation and far field
divergence angle (spherical radiation) with frequency for a straight-line source array of
height 5.4 meters.
Two-
Figure 3.33 dimensional loudspeaker array using individual signal- processing and power-
amplifying for each driver. The software allows different types of directional pattern and
sound field applications.
in front of the loudspeaker shows a more even declining characteristic. The next generation
of large line arrays will have digital signal- processing (comparable to the line arrays
described in section 3.3.5.1) to supply each element with its individually corrected, delayed,
equalized and band-limited signal.
References
1. EC 60268-5 Sound System Equipment -Part 5: Loudspeakers, 2010.
2. W. Woszczyk and G. Soulodre. The audibility of spectral precedence, in Audio Engineering
Society Convention 93, 1992.
107
Loudspeakers 107
3. G.J. Krauss. On the audibility of group delay distortion at low frequencies, in 88th Convention
of the Audio Engineering Society, Montreux, 1990.
4. S. Müller and P. Massarani. Transfer-function measurement with sweeps, J. Audio Eng. Soc.
49(6), pp. 443–471, 2001.
5. DIN EN ISO 3741 -Determination of sound power levels and sound energy levels of noise
sources using sound pressure -Precision methods for reverberation test rooms, 2011.
6. Audio Engineering Society, AES2-2012; AES standard for acoustics -Methods of measuring
and specifying the performance of loudspeakers for professional applications -Drive units.
New York: Audio Engineering Society, Inc., 2012.
7. A.G. Webster. Acoustical impedance, and the theory of horns and of the phonograph. Proc Natl
Acad Sci USA, pp. 275–282, July 1919.
8. P.W. Klipsch. Loud-Speaker Horn. USA Patent 2537141, 15 June 1951.
9. J.D.B. Keele. What’s so sacred about exponential horns?, in 51st Convention of the Audio
Engineering Society, 1975.
10. H.F. Olson. Acoustical Engineering, 2nd edn. New York: D. van Nostrand, 1960.
11. L.L. Beranek. Acoustics. New York: McGraw-Hill, 1954.
12. G. de Vries and G. van Beuningen. A digital control unit for loudspeaker arrays, in 96th
Convention of the Audio Engineering Society, Amsterdam, 1994.
13. F. Straube, F. Schultz, M. Makarski, S. Spors and S. Weinzierl. Evaluation strategies for the opti-
mization of line source arrays, in AES 59th International Conference, Montreal, Canada, 2015.
14. C. Heil and M. Urban. Sound fields radiated by multiple sound sources arrays, in 92nd Convention
of the Audio Engineering Society, Vienna, 1992.
15. M. Urban, C. Heil and P. Baumann. Wavefront sculpture technology, in 111th Convention of
the Audio Engineering Society, New York, 2001.
16. S.P. Lipshitz and J. Vanderkoy. The acoustic radiation if line sources of finite length, in Preprint
2417, 81st Convention of the Audio Engineering Society, Los Angeles, 1986.
17. www.holoplot.com.
108
4 Microphones
Gottfried K. Behler
This chapter gives a general overview on the topic of microphones and their applications.
There are two main types of microphones commonly used, the condenser and the dynamic
microphone. In the first section the principle of operation is discussed; in the second section
the characteristics that are important for the application, which are primarily determined
by the directionality characteristics and the intended use, are discussed. Please refer to the
additional literature [1] for more in-depth information.
Capacitive transducer
Piezoelectric transducer
Dynamic transducer
Magnetic transducer
DOI: 10.4324/9781003220268-4
109
Microphones 109
Figure 4.1 Basic design of a condenser microphone. To the left, a sectional drawing of a classic meas-
uring microphone is displayed; to the right, the relationship between the components
involved in its construction (diaphragm mass, air volume behind the diaphragm as a
spring, and viscous friction of the air between the diaphragm and the back electrode) is
shown. ©B&K
plates. This capacitor with the static capacitance C0 is now charged with a fixed voltage U0
in order to produce a defined charge Q0. It can now be assumed that this charge remains
unchanged during operation of the microphone, since it is brought to the capacitor via a
very high-impedance resistor. In the currently widely used electret condenser microphone,
this charge is built into the capacitor as a permanently electrically charged polymer plastic
material and therefore does not have to be fed by an external voltage source.
To prevent the resting position of the diaphragm of a condenser microphone from being
modified by the static external atmospheric air pressure, the volume behind the diaphragm
has a defined ‘leak’ (capillary tube), which allows the air pressure in the volume to equalize
with the ambient air pressure. This mechanically determines the lower cut-off frequency fu
of the microphone. The upper cut-off frequency is determined by the resonance frequency
created by the mass of the membrane and the compliance of the air cavity behind the
membrane. Above this resonance frequency, the mechanical impedance has a mass-spring
character and thus the excursion of the diaphragm declines by 12 dB per octave (filter of
second order).
Considering the relationship between the constant charge Q0 and the voltage applied
across the capacitor, the following formula can be found:
Q0 Q0 ⋅ ( d 0 + d ~ )
U = U0 + U~ = = V (4.1)
C ε0 ⋅ A
10
Microphones 111
It is obvious that condenser microphones cannot be operated as purely passive
transducers. They always need an electronic circuit that has to render different tasks. Even
the generation of the capsule bias voltage in classic LF condenser microphones requires a
special circuit: e.g., a DC-DC converter that generates the required capsule bias voltage of
60–100 volts from the low phantom voltage of 12–48 volts. In particular, an impedance
converter with very high input resistance (in the range of 109 ohms) and low output imped-
ance (typically < 200 ohms) is needed to drive a line between the microphone output and
the input on the mixing console or similar. The increasing miniaturization of electronics
makes it possible to accommodate these circuits even in very small microphones. For elec-
tret microphones, which do not require a bias voltage, a field-effect transistor as impedance
converter and a battery in the microphone as voltage source are sufficient. Microphones
that operate in the high-frequency range using the frequency modulation method are not
inherently conceivable without appropriate electronics. The required power to feed the
electronics normally comes from the mixer via the cable if there is no internal battery.
There are several standardized supply variants (refer to section 4.2.1.2).
Here Bl is the transducer constant defined by the length of the voice coil wire and the mag-
netic field strength. In the equation, v is the velocity of the coil moving in the magnetic
field. It can therefore be stated that for a proportional conversion of the sound pressure
present in front of the diaphragm, the velocity of the diaphragm (and not the excursion as
with the condenser microphone) must follow the force acting on the diaphragm. Therefore,
the impedance of the mechanical system (formed by membrane mass, compliance of the
suspension spring and friction elements) must be friction-like. In this case, the relationship
between force and velocity is linear and independent of frequency. The corresponding for-
mula then provides the relationship between force (sound pressure multiplied by the mem-
brane area) and mechanical frictional resistance:
F
v= V (4.3)
rmech
This correlation means that dynamic microphones must be operated as highly damped res-
onance systems. The resonance frequency is somewhere in the centre of the frequency range
12
Figure 4.2 Section through the capsule structure of the legendary Sennheiser MD441U. Note the
multi-resonance construction with several chambers and ports. Furthermore, a cone-like
sound guide is placed in front of the diaphragm, which serves to optimize the directional
characteristics.
Figure 4.3 Basic construction of a ribbon microphone (the left figure shows the Beyer M130).
The magnetic flux of the high-energy permanent magnets is guided around the ribbon
by the ring-shaped yoke wires made of highly permeable, soft magnetic material. The
internal magnetic field should be as homogeneous and tangential as possible through
the ribbon.
13
Microphones 113
that a dynamic microphone is supposed to cover. Naturally, the frequency response of such
a construction can only be flat to a limited extent as there is a trade-off between high sen-
sitivity (system weakly damped) and a resonance curve that is as flat as possible (system
strongly damped). In the first case the velocity at resonance is higher due to weak damping
and therefore the frequency response is boosted around the resonance; in the second case
the strong damping in the whole frequency range leads to a low but relatively frequency-
independent (flat) output voltage. To increase the bandwidth and sensitivity of dynamic
microphones, they are designed as multi-resonance systems. These are realized by coupling
resonance chambers in front of and behind the membrane. In this way microphones can be
designed which are almost equal to condenser microphones in terms of frequency range and
linearity. One example of a microphone that still meets the highest demands in this respect
is the Sennheiser MD 441 (see Figure 4.2):
A special dynamic microphone is the ribbon microphone (see Figure 4.3), in which no
cylindrical coil oscillates in the magnetic field, but a metal ribbon, usually made of very
thin aluminium, which is located inside a magnetic field transversely and tangentially to
the ribbon. Both surfaces of the ribbon are exposed to the sound field, so that the movement
of the ribbon is caused by the difference in force between the front and back of the ribbon.
With appropriate damping, the velocity of the ribbon follows the sound pressure gradient,
resulting in directivity with a figure of eight. If one side of the ribbon is covered by a housing
the microphone can be used as a pressure sensor as well.
The sensitivity of ribbon microphones is very low for two reasons: first, the length of the
conductor in the magnetic field is short, and second, due to the width of the ribbon the gap
in the magnetic circuit is relatively wide, resulting in a weak magnetic flux B. Therefore,
ribbon microphones very often use an output transformer to achieve a higher output voltage
and a higher output impedance.
Microphones 115
Requirement of power supply
• Dynamic microphones usually do not need an external power supply
• Capacitive microphones need a power supply for the impedance converter
and for the bias voltage of the condenser (not in the case of an electret mic).
The most common power supply is the so-called ‘phantom-power’, which
feeds the microphone with 48 V via the symmetric signal cable. Some modern
microphones are able to operate on a wider range, i.e., from 12 V to 48 V (so-
called universal phantom supply). The supply must comply with IEC 61938 [3]
• If a phantom power is needed, the required supply current shall be stated
Microphone sensitivity
• The output voltage relative to the input sound pressure. Most commonly used,
the output voltage in mV relative to 1 Pa
• The sensitivity needs to respect the application of the mic. The conditions for
the measurement are either free field or diffuse field for typical microphones
in PA systems
• Transfer function (frequency response curve)
• The frequency-dependent sensitivity of the microphone. Unless otherwise
stated, measurements are made in free field conditions, with the frequency
response referring to plane waves whose wave front propagates in the direction
of the reference axis of the microphone
• If the microphone is intended for near field or other special use, the frequency
response shown must relate to that application
• The frequency curve shows the logarithmic sensitivity (in dB) over frequency
in compliance with IEC 60268-1
• Upper and lower cut-off frequencies (this is typical misleading information
used to look better than the competitor. In reality, it says almost nothing
and the important information is already contained in the frequency
response curve)
Distortion figures and max. sound pressure level
• Total harmonic distortion (THD) for a given sound pressure level
• The maximum sound pressure that the microphone can convert linearly into
an output voltage for a given distortion limit (either 0.5% or 1% THD)
Equivalent input noise level
• An assumed sound pressure level that produces the same weighted output
voltage as that produced by the microphone’s self-noise in the absence of
an external sound field. This quantity provides a good estimate of the lowest
signal the microphone can pick up
Environmental conditions for ±2 dB deviation from stated parameters
• Temperature range
• Range of static air pressure
• Range of relative humidity
16
To characterize microphones with respect to their directivity, Table 4.1 lists the basic
categories and shows how the combination of pressure receiver (sphere) and pressure gra-
dient receiver (figure of eight) leads to desired directivities for practical application. Since
idealized directional functions do not exist in the real world, it makes sense to present some
special cases for the microphone types discussed here.
The membrane of a pressure receiver moves according to the ambient pressure change.
Since there is no directional dependence for the pressure in the sound field (it is a scalar
field), the physical orientation of a very small microphone is irrelevant. This leads to the
newgenrtpdf
17
Table 4.1 Typical parameters for microphones with different directivity patterns
Type/pattern Sphere Wide cardioid Cardioid Super-cardioid Hyper-cardioid Figure of eight Interference
Pattern
Microphones 117
newgenrtpdf
18
118
Gottfried K. Behler
Figure 4.4 Comparison of the directivity of two pressure microphones: left side a ¼′′ capsule, right side a 1′′ capsule. The polar plots
show a clear directionality for the large membrane at high frequencies whereas the small membrane shows almost perfect
omnidirectional sensitivity. The frequency response curve for the ¼′′ microphone is flat for free field sound incidence, the
one for the 1′′ microphone shows a distinct presence boost in free field whereas for diffuse field the response is rather flat
until a roll off above 10 kHz (DPA).
19
Microphones 119
non-directional sensitivity of omnidirectional microphones. However, for high frequencies
with wavelengths similar to or smaller than the circumference of the microphone capsule,
a directional effect occurs caused by diffraction. For large-diaphragm microphones, even
if they are pressure receivers, this results in a distinct directional characteristic at high
frequencies. Figure 4.4 shows the comparison of two omnidirectional microphones with
different membrane diameters.
The pressure gradient receiver is a microphone in which the diaphragm moves due to the
difference of the forces on the two sides of the diaphragm. If the membrane is sufficiently
small in relation to the wavelength, this force corresponds to the so-called pressure gradient,
which is proportional to the sound velocity (the second, vectorial sound field property). This
results in a sensitivity that depends on the orientation of the membrane in the sound field. If
the sound wave propagates tangentially across the membrane, the resulting force difference
becomes zero and no membrane movement and therefore no output signal is created. When
turning the membrane with one side towards the incident wave the membrane movement
and therefore output signal increases until it reaches a maximum for perpendicular inci-
dence of the sound wave. The directivity of such a microphone looks like a figure of eight
(see Table 4.1). Figure 4.3 shows a typical microphone with a figure of eight directivity.
Besides the ribbon microphone –a dynamic microphone –condenser microphones with
figure of eight directivity are common.
Many modern condenser microphones do not only provide a figure of eight pattern but
allow a stepwise selection of directivities between spherical and figure of eight. This fea-
ture is achieved by combining two membranes placed in front of either one common or
two separated back electrodes. Typical constructions of such microphones are shown in
Figure 4.5. The wiring shows for both microphones three connectors, one for the common
back electrode and two for the membranes. Whereas one membrane (the one that provides
in-phase polarity of the output signal relative to the sound) is in general set to fixed bias
voltage (typically between 60 and 120 V), the second membrane can be changed from
positive to negative bias voltage either continuously or in defined steps in order to achieve
dedicated directivities when adding the signals of both membranes. Each side of such a cap-
sule construction looks the same therefore the side providing in-phase output needs to be
clearly marked.
If a higher directionality is required, either for the pickup of faraway sources or interviews
in noisy environments, the aforementioned directivities may not be directional enough.
The DI listed in Table 4.1 clearly shows that there is a limit for the frontal to random gain
of about 6 dB. To improve this, microphones with higher-order pattern are needed. To
create higher-order directivities requires microphone capsules with more than two active
elements. Microphones for the recording of 3D sound are typical representatives of this
type. Recording for first-order ambisonics (compare section 8.2) requires a setup with four
microphones of cardioid type (see left picture in Figure 4.6). For higher-order ambisonics
(HOA) microphones with up to 32 capsules arranged on a spherical solid body are commer-
cially available (see right picture in Figure 4.6) and setups with up to 256 capsules can be
found in research facilities.
120
In typical sound reinforcement applications, these microphones are not very relevant, but
for applications such as teleconferences, discussion groups, lecterns, etc. array microphones
with a larger number of capsules in various arrangements can be beneficial. They all work
on the same principle, the phased array technology. A sound wave coming from a certain
direction arrives at different times at the microphone capsules. By applying appropriate
delay times (to compensate for the different arrival times), the signals are added up in phase,
leading to an amplification of the signal by +6 dB each time the number of capsules doubles,
but only for the particular direction from which the wave comes. For any other direction,
the signals from the capsules add up randomly, and the amplification factor is only +3 dB per
newgenrtpdf
12
Microphones 121
Figure 4.7 Line-array microphone with pronounced vertical (left panel) and wide horizontal (cardioid, right panel) directivity (Microtech
Gefell KEM 975). The microphone is built with eight differently spaced cardioid capsules set into a vertical line.
12
Microphones 123
Figure 4.8 The KEM 975 in use at the lectern at the German Bundestag. A diversity switch ensures
that only one of the two microphones (the one with higher level) is in use at a time.
(Courtesy Microtech Gefell.)
almost invisible, and even in TV shows it is a solution often used today (refer to left picture
in Figure 4.10).
A tried and tested alternative is the so-called Lavalier microphone. Whereas in former
times the microphone was carried by means of a neck cord in front of the talker’s chest,
today’s application is a clip that fixes the microphone to the garment of the talker (refer to
right picture in Figure 4.10). The benefit in comparison to the neck cord is clear: a more
stable placement in combination with a decoupling from the breastbone that creates a peak
in the frequency range at 700 Hz.
Another application is the pickup of musical instruments with close-up microphones.
Many manufacturers provide a wide range of clamps and clips for different instruments to
meet all possible applications (Figure 4.11). The microphone for this purpose needs to be
small, lightweight and capable of withstanding very high sound pressure. A trumpet easily
124
Head-
Figure 4.10 mounted microphone (left); Lavalier microphone (right) (courtesy DPA).
The placement of these microphones requires some EQ to provide sound without
colouration.
Figure 4.11 Direct recording of the violin sound with a small condenser microphone, which is
mounted on the frame of the violin (courtesy DPA).
reaches more than 150 dB of sound pressure right in front of its bell. The advantage again
is obvious: the sound operator can adjust the level of each instrument independently from
other instruments since the close placement of the microphone reduces the crosstalk to
other instruments.
Microphones 125
Figure 4.12 Microphone supply according to DIN EN IEC 61938 [3]. (a) Phantom power supply
U =48 V, R1 = R2 =6800 Ω, Imax =10 mA; U =24 V, R1 = R2 =1200 Ω, Imax =10 mA;
U =12 V, R1 = R2 =680 Ω, Imax =15 mA. (b) A-B power supply: U =12 V, R1 = R2 =180
Ω, Imax =15 mA.
u V
ME = (4.4)
p Pa
M 1V
GE = 20 log E using M0 = dB (4.5)
0
M 1Pa
mV
• Studio condenser microphones: ME ~10 to 100 or GE ~−40 to − 20 dB
Pa
mV
• Lavalier-type condenser microphones: ME ~2 to 20 or GE ~−54 to − 34 dB
Pa
mV
• Dynamic microphones: ME ~1 to 20 or GE ~−60 to − 34 dB
Pa
mV
• Ribbon microphones: ME ~ 0.1 to 2 or GE ~−80 to − 54 dB
Pa
In addition to this definition, the reference sound field needs to be specified. The following
definitions include a frequency-dependent result for the sensitivity:
• The pressure field sensitivity: the ratio between the effective output voltage and the
effective sound pressure at the microphone created as a pressure field without propaga-
tion. The measurement conditions are created with the microphone in a chamber with
dimensions smaller than 1/2 of a wavelength.
• The free field sensitivity: the ratio between the effective output voltage and the effective
sound pressure of a propagating planar sound wave in free field. The front of the micro-
phone points perpendicular to the incident wave front. The measurement takes into
account the diffraction effects due to the microphone size and body.
• The diffuse field sensitivity: the ratio between the effective output voltage and the
effective sound pressure in a diffuse sound field. A diffuse sound field creates a sort of
average for all possible incidence angles of sound waves with equal likeliness.
For all microphones, the different measurement conditions result in identical results for
the sensitivity at very low frequencies (i.e. < 250 Hz), because effects like diffraction or
directivity can be ignored.
Another important specification is the signal-to-noise (S/N) ratio, which defines the
self-noise created by the microphone. There are different definitions used in datasheets, the
most common being:
127
Microphones 127
• Signal to-noise ratio, CCIR (re. 94 dB SPL). It states the level difference between the
noise floor and a signal of 94 dB. The frequencies of the noise are linearly weighted.
• Signal to-noise ratio, A-weighted (re. 94 dB SPL). It states the level difference between
the noise floor and a signal of 94 dB. The frequencies of the noise are A-weighted.
• Equivalent input noise level (dB), either A-weighted or CCIR (linear). The equivalent
noise level is calculated by subtracting the S/N ratio from 94 dB (level for 1 Pa). It is
most informative, since it directly tells what signal level would create the same output
level as the internal noise of the microphone. This information helps in deciding which
microphone is suitable for the task at hand.
Finally, the maximum output level of a microphone is important. Here, there are depend-
encies between the sensitivity and the available dynamic range. The more sensitive a
microphone is, the less capable it can be for high sound pressure. In the microphone
specifications the maximum output voltage is rated with respect to a given distortion
number. Most common is a THD (eq. (3.4)) of 0.5%. The calculation for the maximum
output voltage is
maxSPL − 94
maxOutputVoltage = sensitivity 10 20 V (4.6)
This formula ignores the potential output voltage limit. It simply calculates the required
RMS value for the given SPL limit stated by the manufacturer. The internal electronic
circuit determines whether this output voltage can be provided. It allows assessment
of whether the input available at the mixing console or microphone amplifier can pro-
cess the output level of the microphone without distortion even at the highest acoustic
level. Attenuation switches (typ. −10 dB) are available with many microphones and
can prevent the risk of overloading, but usually with the disadvantage of an increased
noise floor.
If the maximum output voltage of a microphone is known, the maximum possible sound
pressure level can be derived from the sensitivity. If the sensitivity is 50 mV/PA and the
typical maximum output voltage is +18 dBm (6.2 V) then the upper limit for the sound
pressure level is 136 dB (SPL). The calculation is as follows:
MEd ( ϑ )
Γ (ϑ ) = 1 (4.8)
MEd ( 0 )
• the directional gain D as the 20-fold common logarithm of the directional factor Γ
• the coverage angle as the angular range within which the directional gain does not drop
by more than 3 dB (or 6 dB or 9 dB respectively) against the reference axis.
The relationship between the sensitivities by reception of a plane wave and those with
diffuse excitation characterizes the suppression of the room-sound components against the
direct sound of a source. This energy ratio is described by the following parameters:
• the directivity factor QM: if the sensitivity was measured in the direct field as MEd and in
the diffuse field as MEr, the directivity factor is
MEd 2
QM = 1 (4.9)
MEr 2
• the directivity index DI as the 10-fold common logarithm of the directivity factor.
DI = 10 ⋅ log QM dB (4.10)
• They need to be robust against mechanical impacts and structure-borne sound. This
leads to microphone designs with robust housing and lightweight and small membranes.
129
Microphones 129
Quite often, they are used at a close distance to the mouth, which requires protection
against pop and wind noise as well as plosive sounds. To achieve this, foam covers are
often used. This helps as well against spit and humidity. Though dynamic microphones
in general are more sensitive to structure-borne sound (due to the larger membrane
mass compared to condenser microphones) they are robust and little sensitive to
humidity. However, there are dedicated designs with condenser microphones for hand-
held use as well.
• Any microphone can be used on a stand, of course; however, a typical microphone that
needs a stand is the opposite of a handheld microphone. In this case, we are talking
about microphones in studios, or on stage, and, in general, indoors. All condenser
microphones with large membranes belong to this group. They are most sensitive to
structure-borne sound, pop sound and humidity. Many small membrane condenser
microphones are intended for use on stands as well. Mostly they offer little protec-
tion against structure-borne sound and will transfer grip sounds quite well. All these
microphones are best used indoors and have only little protection against humidity
and wind noise. However, protection against wind noise can be achieved using a wind-
screen (foam or basket).
• Microphones for close-up recording of sources require two main features: capability for
high sound pressure level and a correction for the proximity effect. The first is met with
small membrane condenser microphones that cover a range up to 150 dB, the latter
depends on the construction. A pressure microphone does not show any proximity
effect, whereas a cardioid microphone will boost the low frequencies more and more
when approaching the source at short distances. This needs to be compensated either
directly with an appropriate filter built into the microphone or in the signal-processing
later on.
• Microphones for recording sound from larger distance should have a very good S/N
ratio. The dynamic range should rather start at a very low level than allowing a very
high maximum sound pressure level. Typical equivalent noise level ratings for good
microphones are in the range 10 to 20 dB(A). The upper limit is not that much of an
issue; even with fortissimo passages in a large symphonic orchestra the peak level at
some 10 m distance will not exceed 125 dB (SPL).
References
1. Eargle, John. The Microphone Book. London: Focal Press, 2012.
2. International Electrotechnical Commission. IEC 60268- 4 Sound System Equipment -
Part 4: Microphones, 2019-08.
3. International Electrotechnical Commission. IEC 61938 Multimedia systems –Guide to the
recommended characteristics of analogue interfaces to achieve interoperability, 2018.
4. M. Zollner and E. Zwicker. Elektroakustik, 3rd edn. Berlin: Springer-Verlag, 1993.
5. W. Reichardt. Grundlagen der Elektroakustik. Leipzig: Akademische Verlagsgesellschaft Geest &
Portig K.G., 1960.
13
5.1 Introduction
This chapter deals with the most important design criteria for specifying a sound system
for the wide range of different applications. Many of the criteria are identical for different
applications but some of them are particular. Two major differentiations are the application
of the sound system either for speech or for music, which call for a different design approach
regarding, for example:
• sound amplification
• support of stage performances
• influence of acoustic impressions
• reproduction of sound events
p
L = 20 log10 dB ( with p 0 = 20µPa )
p 0
DOI: 10.4324/9781003220268-5
132
One factor which has not yet been mentioned but that might significantly determine
the dimensioning of the amplification is the existing noise floor in the venue. Often, a spe-
cific signal to-noise ratio (S/N) is asked for in the specification for a sound system design.
Standards require an S/N ratio of 10–15 dB. Within rooms mostly the air conditioning
and ventilation systems produce any disturbing noise and the requirement can be achieved
relatively easily. To characterize background sound levels the so-called noise criteria (NC)
curves have been developed; see Figure 5.2.
Different NC values are required for different room types; refer to Table 5.1.
Slightly adapted noise rating (GK) curves are in use according to DIN 15996; see
Figure 5.3; refer also to Table 5.2. Above NR20 the so-called ‘Grenzkurven’ GK are iden-
tical to the noise rating curves NR.
A relationship between the noise rating curves and the overall sound pressure values in
dB(A) is shown in Figure 5.4. One could ask the question why could not just dB(A) or any
other sound pressure level be used to determine background noise levels: the explanation is
that all SPL dB standards are averaged over certain frequency ranges –hence the noise level
at one specific frequency could be very high while the average value is still very low. This
cannot happen with NC and NR curves.
The values in Table 5.2 are mainly in use for recording studios but sometimes also asked for
in performing arts facilities. Recording studios for symphonic music are similar to high-end
concert halls; therefore, the noise floor should not exceed the NR5 curve (max. 18 dB(A));
this is only achievable with high demands and significant complexity of the HVAC system.
Achieving a noise floor between 18 and 20 dB(A) in a concert hall is very costly. All
technical equipment (HVAC, light fixtures, video projectors, loudspeaker systems on
standby etc.) must adhere to very stringent criteria regarding not creating any noise. This
will increase the expenses for such systems significantly. In most concert halls the NR15
noise rating curve should not be exceeded; this corresponds to 25 dB(A).
134
but rather by means of using predetermined measurement signals and fast post-processing
algorithms in a computer. Figure 5.5 shows the schematic block diagram to acquire impulse
responses of a space by means of computer software such as SMAART, DIRAC, EASERA
or Room EQ Wizard REW.
These advanced measurement systems can measure the complex transfer function or
impulse response of the system under test. For this purpose, the system is excited with a
known test signal and its response is recorded.
Figure 5.5 Computer-based measurement system for different excitation signals (schematic block
diagram).
Assuming that the system is a linear, time-invariant (LTI) system, the transfer behaviour
can be obtained from the deconvolution of the two data sets. That is because the response
function a(t) is the convolution product of the excitation signal e(t) and the transfer
function h(t):
a (t ) = h (t ) ⊗ e (t ) (5.1)
A (ω )
H (ω ) = (5.2)
E (ω )
An inverse Fourier transform will lead back to the time domain in the form of an impulse
response.
∞
h (t ) = ∫ H ( ω ) e ω dω
j t
(5.3)
−∞
The integration from negative to positive infinity can be simplified by introducing a lower
and a higher frequency threshold –which is not really limiting in the case of the audio spec-
trum as human hearing in itself is limited (20 Hz –20 kHz). In addition, the integration
process itself can be significantly accelerated (fast Fourier transformation) by introducing a
sample rate (basically dividing the curve into a series of discrete steps) –again, for audio this
137
is no major limitation as the signal is already digitized (converted to steps) once it enters
the algorithm. For measurement algorithms utilizing this deconvolution method, either
pseudo-random noise, swept-sine, MLS signals or other well-defined excitation signals are
used. Also, an impulse-like test signal is possible in theory. However, this is not much used
in practice since the short duration of the signal requires a high amplitude to sufficiently
excite the system.
Figure 5.6 shows an overlay of three signals: green is the excitation signal (an MLS
signal of 15th-order) radiated by the loudspeaker (in Figure 5.5). This signal corresponds
to signal e(t) in eq. (5.1) and following. Blue is the so-called raw data signal, recorded with
the microphone in Figure 5.5, and corresponds to signal a(t) in eq. (5.1) and following. By
post-processing according eqs. (5.2) and (5.3) the red impulse response is obtained (the
amplitudes of the curves have been adjusted slightly to show all three signal parts in one
graph).
The transfer function H(ω) in eq. (5.2) is the Fourier transform of the impulse response
and characterizes the frequency dependence of the transfer behaviour of the system under
test; see Figure 5.7. This is not the actual frequency response measured for a loudspeaker or
natural sources.
With the Fourier transformation of an impulse response, not only is the magnitude
of the transfer function obtained, but also its phase response; see Figure 5.8. This figure
shows a wrapped presentation, i.e., the phase jumps between −180° and +180°. For loud-
speaker measurements and other audio equipment (such as filters) the phase information
is very important, as it can point out frequency-dependent timing issues; for room-acoustic
measurements though the phase information can be ignored.
Only when the source that excites the space has a 100% accurate, ideal and flat fre-
quency response will the transfer function and the measured frequency response measured
at the same location correlate with each other; their amplitudes may be different depending
on the excitation level of the source. Usually though the measured frequency response in
a room is determined not only by the room behaviour but also by the frequency behaviour
and the directivity of the source; see Figure 5.9.
newgenrtpdf
138
138
Wolfgang Ahnert
Figure 5.7 Transfer function as Fourier transform of the impulse response.
139
The frequency response curve is usually shown on a logarithmic scale. So, the magnitude
is squared and logarithmized. By squaring the phase information is lost.
In Figure 5.9 are shown not just the spectrum (here frequency response) of the meas-
urement signal, but also its spectrogram, i.e., the frequency dependency over time of the
measured signal. Such presentations are common in sound level meters [1] and have been
used since the 1970s, and today in modern handheld sound level meters such as those
manufactured by B&K or Norsonics.
140
5.4.1.1 Introduction
Covering a specific area with sound is the basic job for any sound system. To select the
correct loudspeaker its data must be studied, most specifically its radiation pattern, also
called coverage or balloon data due to the appearance of the 3D data representation. The
radiation pattern basically answers the question what sound pressure level is projected by
the loudspeaker in which direction. When installing a loudspeaker, the main radiation
pattern must be determined and off-axis sound radiation must be assessed, specifically to
reduce sound energy that is radiated towards undesirable areas of the room like back to the
stage, to the ceiling or to the back wall. By not considering such influences feedback on
stage may happen.
Figure 5.12 shows a renewed data presentation for a similar point source such as in
Figure 5.10. It may be seen that not only are the phase data used but filter settings as well,
to obtain the required frequency response. These filter settings may be used to correctly set
up the amplifiers and equalizers.
More complex loudspeaker data may be shown as well. Figure 5.13 shows the data for a
line array consisting of eight modules. Filter settings and array curving may be used to cover
the audience area with a flat frequency response.
newgenrtpdf
14
Design for Sound Reinforcement Systems 141
Figure 5.10 Loudspeaker data and polar diagrams.
newgenrtpdf
142
142 Wolfgang Ahnert
Figure 5.10 Continued
newgenrtpdf
143
Design for Sound Reinforcement Systems 143
Figure 5.10 Continued
14
Figure 5.11 Continued
By knowing the exact loudspeaker data most simulation programs are able to display the
coverage. Figure 5.14 shows the coverage pattern of a line array in a stadium segment.
The audience area in Figure 5.14 is not well covered by sound; the lower and upper parts
suffer from poor coverage. This could be improved with sound better directed to these areas.
Often digital signal processors (DSP) are used to control the sound radiation to the areas
which must be covered and to avoid sound in areas which should remain silent. Hence
not just the loudspeakers or line arrays determine the radiation pattern but also the zoning
that controls sound levels per area. To illustrate this, say, as an example, the middle zone 2
should not be covered with sound. By using the relevant control buttons in Figure 5.15 it is
possible to control the DSP settings by data export and to achieve the appropriate coverage
pattern shown in Figure 5.16.
146
By comparing Figure 5.14 with 5.16 it is clear that the physical configuration of the line
arrays has not been changed, just the control setting. In contrast to the poor coverage results
in Figure 5.14 for the lower and upper zones the sound energy in Figure 5.16 is now signifi-
cantly improved and the middle zone 2 is almost not covered with sound.
5.4.2 Delay Issues, Time Alignment, Equalization and Gain before Feedback
c = 331.4 1 + 0.00366 ϑ
1 2.92 3.28
2 5.83 6.56
3 8.75 9.84
4 11.66 13.12
5 14.58 16.40
6 17.49 19.68
7 20.41 22.96
8 23.32 26.24
9 26.24 29.52
10 29.15 32.81
required for compensating a travel path difference of say 20 m. In addition to the metric
scale, the British foot is also indicated (1 ft =0.3048 m). The foot scale offers a comfortable
simplification in that 1 ms travel time corresponds to about 1 ft travel path.
• The feedback loop of an electro-acoustic amplifier channel consists not only of an elec-
trical, but also of an acoustic part
• It is practically impossible to separate the feedback path into its different components
(e.g., the electro-acoustic and the room-acoustic part)
154
To avoid acoustic feedback the operator or installer of sound systems needs to compre-
hend the physical background of acoustic feedback, including the basics of how to arrange
microphones and loudspeakers in relation to each other. In particular, the level of monitor
loudspeakers on stage, quite often very high, may result in problems which appear when
a singer with a wireless microphone is performing directly in front of these monitor
loudspeakers.
Another approach to reducing the probability of acoustic feedback is paying attention
to the so-called secondary structure of the space, more specifically to the wall and ceiling
materials close to microphones or loudspeakers. Wall areas that are covered by absorbers
reduce the occurrence of acoustic feedback otherwise caused by strong reflections. In add-
ition, sound-focusing effects in concave spaces or concert shells must be avoided as they
would support feedback.
Normally a sound engineer does not have much influence on the wall or ceiling design
of a space, but some knowledge regarding acoustic feedback will assist in reducing this
unpleasant behaviour.
If such relatively simple methods cannot be employed to avoid acoustic feedback some
technical procedures like filters, frequency and phase shifting as well as other feedback
suppressor devices may be used.
The timbre optimization depends on the application of the system. A balanced frequency
response over the entire audio spectrum may for instance be desirable for high-quality
systems designed for music transmissions. For improving intelligibility in mere speech trans-
mission systems, a reduction in the lower frequency range and an enhancement of certain
formants in the range of about 2 kHz is appropriate [6]. Figure 5.20 shows recommended
frequency response target curves for different speech or music applications [7].
Figure 5.20 Tolerance curves for the reproduction frequency response in different applications: (a)
recommended curve for reproduction of speech; (b) recommended curve for studios or
monitoring; (c) international standard for cinemas; (d) recommended curve for loud rock
and pop music.
156
Figure 5.21 Attenuation behaviour of filters of constant bandwidth (a) and of constant quality (b).
Another set of requirements may be appropriate for optimizing the timbre of stage
monitoring. In larger sound reinforcement systems filters are used at various points. For
influencing the microphone frequency response and in many cases also for suppressing the
most dominant positive feedback frequencies, these filters are normally located in the input
channel of the mixing console.
It has to be pointed out that the use of filters is nearly always at the expense of the
maximum attainable sound level. For this reason, it is necessary to consider corresponding
power reserves when designing the system.
One distinguishes passive filters, which operate without additional power supply and
thus do not offer any amplification possibility (level enhancement) or any reduction of the
signal-to-noise ratio, and active filters; thanks to their more universal applicability, their
smaller dimensions and their lower price, the latter are nowadays almost exclusively used in
professional audio equipment.
Another distinguishing characteristic is the influence of damping on the behaviour of
the filter curve. In this respect one distinguishes between filters of constant bandwidth and
filters of constant quality q (Figure 5.21).
For reasons of cost, equipment complexity and ease of operation a number of different
practical designs are used:
1. To use 2D drawings and create the model by entering planes based on x, y, z coordinates
2. To use pre-programmed prototypes or sub-modules and adapt the coordinates as required
3. To import from AutoCAD or SketchUp files
4. To import from other simulation programs
Most newcomers to acoustical simulation believe that importing from any well-known
CAD platform would solve all problems as the architect can provide a 3D model that is
relatively simple to import into a simulation program. Most of the time though this does
not work without considerable additional effort as the architectural drawings show far too
many details that are irrelevant for the acoustical calculations and would lead to massively
increased calculation times.
Reverberation time The simplest way to obtain results quickly is to use the direct sound
of one or more sources (loudspeakers) and to calculate the reverberation level of the room
by means of reverberation time equations, assuming the room follows a statistically evenly
distributed sound decay (homogeneous, isotropic diffuse sound field, that is, the rever-
beration time RT is constant over the room). Based on known room dimension data and
the associated surface absorption coefficients a computer program is able to very quickly
calculate the RT according to the Sabine and Norris-Eyring equations (compare (2.1)).
Also, the volume of the closed space may be calculated, quite often an interesting number
for architects, because the common design tools of the architect might not deliver that
information.
158
Figure 5.22 Figures a–c show the same view of a 3D computer model in AutoCAD, SketchUp and in
the simulation software EASE.
As an example, for the ray tracing approach, the EASE AURA algorithm [15] calculates
the transfer function of a room for a given receiver position using the active sound sources.
159
Figure 5.22 Continued
For this purpose, a hybrid model is employed that uses an exact image source model for
early specular reflections and an energy-based ray tracing model for late and scattered
reflections. The transition between the two models is determined by a fixed reflection order;
see section 8.5.
For each receiver (and for all 1/3 band octave frequencies), a so-called echogram is
created which contains energy bins linearly spaced over time. When a receiver is hit, the
energy of the detected particle is added to the bin that corresponds to the time of flight.
Also, as a separate step, the contributions of the image source model are added to the bins.
The particle model accounts for scattering in the following way: whenever a particle hits
a surface, its energy will be diminished as a function of the material’s sound absorption
characteristics. Then, a random number is generated and depending on the scattering factor
the particle is either reflected geometrically or it is scattered under a random angle based
on a Lambert distribution. Subsequently, the particle will continue to be traced until it hits
either a receiver or another wall.
Modern multithread, multiprocessor, cloud-based and other network calculations
significantly decrease the calculation times for complex situations from days down
to hours.
In the case of cone-tracing a directional ray distribution starting at one single point and
then fanning out conically over a certain room angle is employed; a special form is pyramid-
tracing, where the cross section is not a circle but a square. The cone-tracing approach
allows for very fast ray calculations but the fact that the cones do not cover the entire
source ‘sphere’ (but only a single point) turns out to be a disadvantage. It is therefore neces-
sary to have adjacent cones overlap and employ an algorithm to avoid multiple detections.
160
5.5.2.1 Aiming
Aiming the individual loudspeakers is a critical step ensuring the proper spatial arrangement
and orientation of the sound reinforcement systems. Once the corresponding room or
open-space model is available and the mechanical and acoustical data of the loudspeaker
systems are accurately known, these systems are approximately positioned and possibly
fine-tuned within the same step. This also includes the beam settings for digitally con-
trolled arrays and/or their delays. Modern simulation programs employ an isobeam/isobar
method to initially aim the loudspeakers, preferably utilizing the −3 dB, −6 dB or −9 dB
contours.
Figure 5.24 shows various types of projection of the −3 dB, −6 dB and −9 dB contours
into the room. The superimposed aiming curves for multiple speakers can be studied in audi-
ence areas (Figure 5.25).
Figure 5.26 Continued
routines. As long as a good direct sound coverage over the listener area is predicted, perfect
intelligibility indices are expected as well, under the condition that the reverberation is
well controlled.
The corresponding sound pressure calculations should take into account either measured
phase data for the individual loudspeakers or the run-time phases of different travel-time
differences of individual loudspeakers if phase differences among these loudspeakers can be
ignored.
A complex summation (including phase conditions as well as travel-time differences)
has to be used as a standard method of calculating the direct SPL. In simulation algorithms
the complex sound pressure components of different coherent sources are added up and
afterwards squared to obtain SPL values. The so-called DLL or GLL approaches (see section
8.6) calculate the complex sum of all sources in the array.
Modern simulation programs are analysing programs, capable of calculating which
levels can be obtained by which loudspeakers and under which acoustical conditions. But
questions are often asked inversely: an advanced algorithm could query the user for a desired
average target SPL of the system, and subsequently adjust the power provided to each loud-
speaker (with a warning indication when the power required exceeds the loudspeaker’s cap-
abilities), taking into account target SPL, the sensitivity and directivity of the loudspeaker,
the distance of throw and the number of loudspeakers.
The goal of all these efforts is to evenly cover the entire audience area(s) with music-
ally pleasing and intelligible sound, while providing sound pressure levels suitable for the
intended purpose; compare Figure 5.28.
165
Figure 5.27 Echo detection in EASE: (a) initial time delay gap (ITD) mapping to check echo
occurrence in a stadium; (b) echogram in weighted integration mode at 1 kHz; (c) echo
detection curve for speech at 1 kHz.
16
(a) Predicted sound pressure level at 1 octave or 1/3 octave band frequencies, and at an
average of these frequencies (Figure 5.29)
(b) Predicted speech intelligibility values, listed as STI values (see Figure 5.30)
(c) Predicted acoustic parameters (at octave or 1/3 octave band frequencies), such as C80,
C50, Centre Time, Strength and other values according to ISO standard 3382 (com-
pare Figure 5.31).
5.5.3 Auralization
To make acoustical data more approachable and comprehensive for parties not necessarily
able to meaningfully analyse graphically displayed acoustic data, auralization was introduced
at the beginning of the 1990s. Auralization is a post-processing routine which utilizes the
calculated impulse responses and through convolution transforms an anechoic pre-recorded
167
Figure 5.29 Sound pressure level mapping in simulation tools: (a) 2D presentation in CATT
acoustics; (b) narrow-
band presentation in EASE; (c) broadband presentation
in ODEON.
168
Figure 5.29 Continued
music or speech signal to a sound file that overlays the room’s acoustic signature to the file,
hence giving the aural impression of being positioned right inside the simulated space that
might in reality not even be built yet. Auralization routines generate binaural data files
in WAV or similar formats (Figure 5.32), but other more sophisticated sound file formats
such as a multichannel B-format can be generated as well. By coupling these formats to a
head-tracking device, the creation of a full three-dimensional sonic representation can be
achieved which properly changes its directionality while moving one’s head, e.g. the side-
wall reflection becomes a full-on sound content when rotating the head to face the wall
surface.
• Making final design decisions on the base of just two auralizations of competing projects
• Exclusive use of auralization for acoustic design work
169
Figure 5.30 Speech transmission index (STI) presentation in EASE: Top: three-dimensional presen-
tation in a hall and Bottom: STI presentation in a parliament.
newgenrtpdf
170
170
Wolfgang Ahnert
Figure 5.31 Parameter presentation in EASE: Top: Clarity C80 and Bottom: Sound Strength G.
17
References
1. StandardIEC 651, 1979 -Sound Level Meters.
2. Bolt, R.H., and Doak, P.E. A tentative criterion for the short-term transient response of auditor-
iums. J. Acoust. Soc. Amer. 22 (1950) p. 507.
3. Schroeder, M.R. Frequency response in rooms. J. Acoust. Soc. Amer. 34 (1962) pp. 1819–1823.
4. Schroeder, M.R., and Kuttruff, H. On frequency response curves in rooms. J. Acoust. Soc. Amer.
34 (1962) pp. 76 ff.
5. Behringer | Product | FBQ2496.
6. Tool, F.E. Loudspeaker measurements and their relationship to listener preferences. JAES 34
(1986) 4, pp. 227–235, 5, pp. 323–348.
7. Mapp, P. Technical reference book ‘The Audio System Designer’, edited by Klark Teknik in 1995.
8. ODEON software, version 16, www.odeon.dk.
9. CATT-Acoustic software, version 9.1, www.catt.se.
10. EASE software, version 4.4, www.afmg.eu.
11. Bose-Modeler, version 6.11, worldwide.bose.com.
12. https://meyersound.com/product/mapp-xt/.
13. www.l-acoustics.com/products/soundvision/.
14. www.dbaudio.com/global/de/produkte/software/arraycalc/.
15. Schmitz, O., Feistel, S., Ahnert, W., and Vorländer, M. Merging software for sound reinforce-
ment systems and for room acoustics. Presented at the 110th AES Convention, May 12–15,
2001, Amsterdam, Preprint No. 5352.
16. Dalenbäck, B.-I. Verification of prediction based on randomized tail-corrected cone-tracing and
array modeling, 137th ASA/2nd EAA Berlin, March 1999
17. Naylor, G. M. ODEON -Another hybrid room acoustical model. Applied Acoustics 38 (1993)
2–4, p. 131.
173
6.1 Introduction
As mentioned in Chapter 1, sound reinforcement systems can be designed using different
approaches. Certain basic requirements, however, are given in all cases.
The sound level produced by the system within the venue’s audience area must ensure:
The sound level distribution, which enables assessment of the spatial distribution and coverage
of sound, must be sufficiently uniform. It depends on
• the timbre depending on the transmission range and the frequency response of the
signal transmitted
• the frequency response
• absence of distortions
DOI: 10.4324/9781003220268-6
174
where the 1 m, 1 W level is then Ld,1m,1 W = LK (sensitivity) and ΓL(ϑ) the directional factor
according to (3.7)
For greater distances (above 40 m) the meteorological propagation loss Dr = DLH
according to Figure 2.10 has to be considered in (6.1).
The frequency range between 200 Hz and 4 kHz is generally recommended in the
standards (ISO, DIN) for ascertaining the characteristic sensitivity level of broadband
loudspeakers.
The above-mentioned method for calculating the sound level in the free field is gener-
ally applicable to outdoor systems, but the free-field propagation is also of interest for indoor
systems.
If in special cases only the sound power level LW of the loudspeaker and the directivity
index DI =10 lg QL dB are known, it is possible to calculate the required characteristic
sound level LK according to:
Contrary to the direct sound level, all simultaneously active loudspeakers contribute to the
formation of the diffuse sound level. Instead of Pel (installed power of a loudspeaker), the
installed power of all loudspeakers simultaneously active in the room Pelsum is entered under
consideration of their possibly different characteristic sound levels (or sensitivities LKi) and
directivity indices. This assumes the formation of a sufficiently diffuse sound field and there-
fore applies neither to flat rooms nor to very large or heavily damped rooms. These types of
calculations are now performed utilizing computer programs and will not be elaborated on
in detail (see [1]).
Γ 2L ( ϑ H ) 16π
L = L K + 10 lg Pel dB + 10 lg + dB ; (6.5)
rLH 2
QL max A
These equations can only be applied if one single loudspeaker is present or if the
loudspeakers are arranged in a very concentrated form, as is the case, for instance, with a
monocluster, where the directional factor Γges and the directivity factor Qges of the array
must be known.
The use of simulation programs is recommended for more complex loudspeaker
arrangements to achieve correct calculation results [1].
• foyers in airports, railway stations, congress and cultural centres, hotels, theatres
• restaurants
• shopping centres, sales floors
• open-plan offices
• museums, galleries, exhibitions
• workshops, factories, storerooms
Various loudspeaker arrangements have been developed for these applications, the most
important of which are mentioned here.
17
The ceiling loudspeakers used are uniformly distributed in the ceiling area and radiate
downwards. With this arrangement it is possible to obtain a uniform sound level distribu-
tion over a large area, also in the case of dense furnishing or partition walls put up between
individual areas (e.g. exhibition booths or office cubicles). All loudspeakers work simultan-
eously without any electronic delays.
To avoid flutter echoes between room ceiling and floor, it is necessary that either
The reverberation time in the midfrequency range should be kept below 2–2.5 s, specifically
for high ceiling heights.
Calculating the quantity of required speakers. The spacing between the loudspeakers and
thus the number of loudspeakers per surface area depends on the desired uniformity of sound
level distribution, the installation height of the loudspeakers and also the desired timbre of
the sound radiation. This number is furthermore affected by the radiation characteristics
of the loudspeakers, which in some cases are widened by means of additional components
arranged in front of the loudspeakers.
Figure 6.1 illustrates the relevant relationship between radiation angle and separation
of the loudspeaker from the average ear height above the floor. This relation between ear
height (1.2 m /4 ft for seated audience, 1.7 m /5.5 ft for standing audience), ceiling height
and directivity pattern of the loudspeakers may be optimized by different loudspeaker
arrangements:
• edge-to-edge arrangement
• minimum overlap
In Figure 6.2 these arrangements are explained for hexagonal or square grid arrangements.
For comparison Figure 6.3 illustrates the sound coverage for a flat room (room height
4 m (13 ft), average reverberation time 1.2 s) with different distances between the speakers.
Left: edge-to-edge arrangement needs up to 32 loudspeakers in an average distance of
3.2 m (10.5 ft). Right: centre to-centre arrangement needs 123 loudspeakers in a distance
of 1.6 m (5.3 ft).
The solution on the right is rather costly; normally a distance between the
loudspeakers of 4 m (13.2 ft) is sufficient for even coverage, therefore only 24 loudspeakers
are needed.
Several simulation programs are available which calculate the level coverage as well
as the achieved intelligibility. In this case the reverberation time has to be known or may
be calculated with the software; compare the STI results for speech intelligibility with an
average reverberation time of 1.2 s in Figure 6.4.
Figure 6.2 Installation grids of ceiling loudspeakers. (a) Centre to centre; (b) minimum overlap;
(c) rim to rim.
Figure 6.4 STI coverage (left with 32, right with 123 loudspeakers).
The green colour in Figure 6.4 indicates STI values > 0.5, relevant for the installation of
voice alarm systems (compare standards CEN TS EN 54-32 in Europe and NFPA 72 in the
US; see section 1.1). With 123 loudspeakers the same STI values are achieved as with 32
and even with just 24 loudspeakers (4 m average distance of the loudspeakers). Therefore,
it does not make sense to make an excessive effort by installation of too many loudspeakers,
as just the cost but not the quality of the system is increased.
To obtain high speech intelligibility for information systems the installation height
of the loudspeakers should not exceed 6 m (maximum of 8 m). Heights above 4 m are
acceptable only for very heavily damped rooms or/and spaces with very large volumes. With
greater installation heights one has to consider that several widely spaced loudspeakers are
perceived simultaneously and travel-time differences increase the spaciousness, which is
detrimental for definition and speech intelligibility.
Loudspeaker installation. Various installation techniques exist to mount the loudspeakers
into the ceiling. Aesthetically most pleasing is certainly the integration of the loudspeakers
within a closed or acoustically transparent false ceiling.
For installation in a closed false ceiling, it is possible to use open loudspeaker chassis which
are mounted directly above the acoustically transparent openings of the ceiling. The false
ceiling then serves as a practically ‘infinite’ baffle.
If the loudspeakers are installed above an acoustically transparent false ceiling, e.g.
a complete sound-absorbent ceiling for room damping, they must be enclosed in a
backbox or arranged on baffle boards so as to avoid an ‘acoustic short-circuit’ through
the ceiling perforation. The loudspeakers can then be installed without any particular
opening; hence the architectural concept is not impaired. A precondition for this kind
of installation is sufficient acoustic transparency of the perforated boards or baffled false
ceiling. This transparency depends on one hand on the degree of perforation of the
visual covering, i.e. on the ratio between the open and the closed portions of the sur-
face beneath the radiation area of the loudspeaker. On the other hand, it also depends
on the actual perforation, on its depth and to a minor degree on the spacing of the
perforations. All these factors determine the upper frequency limit for which the ceiling
covers are transparent. A thin steel plate of 1 to 2 mm thickness, even if it has a rela-
tively low degree of perforation of about 15%, may be more favourable than a 20 mm
thick gypsum board with a considerably higher degree of perforation. But also, for thicker
plates numerous small openings are more favourable than a couple of large ones, since
the latter may additionally give rise to narrow-band blocking resonances (frequency-
selective attenuations or notches).
180
If ceiling installation of the loudspeakers cannot be realized, three alternatives are available:
• The loudspeakers are installed directly under the ceiling and radiate downwards.
• The loudspeakers are arranged on walls and supports and radiate horizontally or slightly
inclined.
• The loudspeakers are suspended from the ceiling by means of long cables and radiate
downwards. (This solution is also to be chosen when the room to be covered is higher
than 6 m or if the ceiling area is heavily occupied by ductwork, bridging joists, etc.).
Suspended loudspeakers. For loudspeakers installed directly below the ceiling or suspended
from short pendants the same conditions apply as for loudspeakers installed in the ceiling.
Other installation variants, however, require some additional observations.
In very reverberant, high rooms one can achieve, for instance, a significant improvement
of the signal’s definition by suspending the loudspeakers not too far above the ear-height
level (Figure 6.5). This is very effective if the upward radiation of the loudspeakers is min-
imal and if the covered area is sound-absorbent (sound projection to the audience). Both
conditions contribute to minimizing the excitation of the upper part of the room.
The installation height of the loudspeakers has to be chosen in such a way that the audi-
ence is still within the range of the critical distance of the loudspeakers Dc: the maximum
installation height above ear-height level is approximatively given by
Figure 6.5 Suspended directive loudspeakers for avoiding an excitation of the upper reverberant space.
18
• The loudspeakers must be installed above the head level of standing persons, to avoid
masking by the audience. The preferable installation height is between 2 and 3 m. If a
greater height is required the aiming should be inclined accordingly.
• The spacing between neighbouring loudspeakers should not exceed 17 to 20 m.
• The loudspeakers are to be aimed in such a way that no initial or phase-coherent wave
front hits a planar reflecting surface.
A special form for covering extensive rooms or long aisles consists in using radiators with
bidirectional characteristics, consisting, for example, of two loudspeaker boxes joined
back-to-back. As shown in Figure 6.6, these double loudspeakers are arranged in a staggered
pattern so that the widest coverage area of one loudspeaker fits into the narrowest coverage
area of the staggered next loudspeaker, to obtain a relatively uniform sound level and timbre
distribution.
Figure 6.6 Double loudspeakers arranged in a staggered pattern for covering a large-surface room.
182
• sound systems for large exhibition grounds, factory installations and other large open-
air sites
• sound and information systems for sports centres like outdoor swimming pools, sports
grounds, stadiums
• information systems for station platforms, bus terminals, etc.
The goal is to cover large areas economically with echo-free sound at a sufficient signal-to-
noise ratio (about 10 dB). Unlike indoor sound reinforcement, the diffuse sound caused
by reflections can be ignored. For covering large distances, as is rather frequently the case
with open-air sound reinforcement, one has, however, to consider an additional weather-
dependent attenuation (see Figures 2.10 to 2.13). Since in the most cases the information
systems concerned are used for speech transmission, a treble drop above 10 kHz is usually
acceptable.
With sound transmissions over large distances, one has to take care that listeners close to
the loudspeakers are not exposed to excessive sound levels. This risk is particularly relevant
for centralized arrangement of the loudspeakers.
With a decentralized arrangement of the loudspeakers, however, ‘double hearing’ may
occur when two wave fronts from two separately located loudspeakers or loudspeaker arrays
arrive at the listener with a time difference of more than 50 ms so as to be perceived separ-
ately. This may also occur if the listener perceives the direct sound as well as a strong reflec-
tion. Additionally, it must be ensured that the sound levels arriving on adjacent areas that
are not to be covered are kept within the legally permitted limits.
Centralized Arrangement of the Loudspeakers This is the common solution for the coverage
of smaller sports facilities like outdoor swimming pools and sports grounds. The loudspeakers
are installed at a central elevated position near the paging station. In an outdoor swimming
pool this may, for example, be the lifeguard’s cabin roof around which the pools and the
leisure areas are grouped (Figure 6.7). The individual loudspeakers and loudspeaker arrays
are dimensioned in such a way that the desired sound level L is achieved at the greatest dis-
tance to be covered in each individual case.
It is not always possible to realize a centralized radial coverage. On small sports grounds,
for instance, the loudspeakers have to be installed at the edge of the playground, mostly in the
vicinity of the grandstand (Figure 6.8), in which case the spacing of the loudspeakers should
not exceed 15 m. If the system is used mostly for speech announcements and only occa-
sionally for music transmissions, it is possible to use compression driver horn loudspeakers.
With larger sports grounds higher standards as to reproduction quality, and particularly
to the transmissible frequency range, are required, hence loudspeakers capable of produ-
cing a wider transmission range must be used. Moreover, it is necessary to pay much more
attention to the weather-dependent attenuation, as well as to an adequate electro-acoustical
headroom.
Stadiums and sport grounds require special design solutions; refer to section 11.2.3.
echo disturbances to be expected, since then the wave fronts stemming from adjacent
loudspeakers arrive at the listener’s location within 50 ms. With spacings exceeding 17 m a
level difference of > 10 dB between the near and the remote loudspeakers has to be ensured.
Then the nearer and thus louder loudspeaker masks the remote one so as to eliminate the
risk of double audibility, i.e., of echoes.
Figure 6.9 shows the sound propagation of two loudspeakers arranged at a distance a from
each other. At a point between the loudspeakers, i.e., at a distance r1 from the loudspeaker
S1 and r2 from the loudspeaker S2, the sound pressure difference is
If the radiated sound pressure levels are equal, no echo effects occur in the region of
r1 − r2 ≤17 m.
Since according to Figure 6.9 the distance between the loudspeakers results a = r1 + r2, one
obtains a maximum in-line spacing of
2
α = 17m ⋅ ∆L /20 dB + 1 . (6.7)
10 −1
s [m]
∆t [ ms ] ≈ (6.8)
0, 341
Delay systems enable large distances between the loudspeakers. It is important in this
respect, however, that the backward radiation, i.e. the radiation in the direction contrary
to that of the delay used, is suppressed, since otherwise the risk of echo formation owing to
the travel time plus additional delay is significantly increased. Therefore, loudspeakers used
Figure 6.10 Loudspeaker arrangement for decentralized coverage of a large square. (a) Loudspeakers
with bidirectional characteristics; (b) loudspeakers with cardioid characteristics.
187
On platforms, under protective roofs as well as to a certain extent in large halls room-
acoustical conditions prevail which lie somewhere in between those of indoor rooms and
open spaces. Owing to the magnitude of the resulting equivalent sound absorption areas
one may generally expect free-field propagation conditions. On the other hand, it is neces-
sary to diminish the perceived reverberation, which may be excessive, especially with
large halls.
The sound coverage of platforms is one of the most frequent and also most complicated
sound reinforcement tasks for transportation hubs. This holds specifically true if the
platforms are covered by a large, mostly closed dome. Under these conditions the sound
reinforcement system has to meet the following requirements:
• realization of a largely uniform sound level and a consistent timbre along the total
length (150 to 350 m) and width (7.5 to 15 m) of the platform
• minimizing crosstalk to adjacent platforms (distance 8 to 20 m)
• adaptation of loudness to the constantly varying environmental noise levels (65 to
90 dB(A))
• High intelligibility according to the applicable standards
Figure 6.11 Installation of passive directed sound columns on the platform of the Munich main
station. © Duran Audio.
Figure 6.12 Radiator block to cover a platform in the main station in Frankfurt/Main with sound.
© Holoplot.
190
Figure 6.13 Decentralized coverage of a church nave by means of sound columns for speech.
A centrally arranged array system will provide good sound intelligibility as no other
loudspeakers are installed which would produce reverberant sound in the listener areas
covered by the main system. In this case two or more staggered arranged loudspeakers may
provide worse intelligibility values in comparison to one single array. Of course, the main
system may be substituted by two or three arrays in front of the audience.
On the other hand, an even closer path to the listeners in reverberant spaces can be
achieved by means of an individual radiation by loudspeakers integrated in the seat backs or
tables directly in front of the listeners. The loudspeakers are aimed directly at the listeners,
who largely absorb the sound.
Many sound reinforcement systems fall into this category. A system must consist of at least
a microphone, an amplifier and a loudspeaker.
19
Based on Figure 6.14 and by using the mathematics in Appendix 6.1 the maximum sound
gain vL is obtained as:
qSM qLH
υ L = R(X) . (6.9)
qLM qSH
The corresponding directivity and beaming properties are summarized in the directivity
factors: see Appendix 6.1.
In level notation the acoustic gain index is obtained
VE =10 lg vL dB
or
where LR is the feedback index and Lxy =10 log qxy is the transmission measure between the
quantities X and Y.
In a room it is mostly possible to disregard the additional level attenuations LSH, LLM
(i.e., here Li =0 dB). Since with a centralized sound reinforcement the distances listener-
source rSH and loudspeaker-microphone rLM are larger than the critical distance Dc = √Q rH
prevailing in the room, (6.10) is simplified to
VE = L R + LSM + L LH . (6.11)
For the feedback index LR the values to be expected are between −6 and −15 dB, depending
on the degree of equalization of the sound reinforcement system (recommended LR =−9 dB).
The values for the sound transmission measures LSM and LLH can be gathered from
Figure 6.15, which shows the dependencies of the sound transmission measures LXY in gen-
eral terms as a function of the ratio rH/rXY. Parameters are the respective directivity factors
or the coupling factors Q(ϑ).
Further examples are compiled in Table 6.1, in which the directivity factor of the source
is QS =1 (which means a source with omnidirectional characteristics).
Since in practice it is often possible to approximate the distances rSH and rLH and since
the microphone is directed towards the source and the loudspeaker towards the listener,
(6.11) is simplified after some conversions so as to produce an acoustic gain index of
Figure 6.15 Sound transmission index LXY as a function of the distance ratio rH/rXY.
Parameter: directivity factor Q(ϑ) or coupling factor Q(ϑ,ϕ).
1 5 1 3m 2 12 m 0.5 ≈0
2 5 1 0.5 m 12 12 m 0.5 7
3 5 3 3m 2 12 m 0.5 ≈0
4 5 3 0.5 m 12 12 m 0.5 12
5 5 3 10 cm 60 12 m 0.5 26
6 10 3 10 cm 60 24 m 0.25 23
7 10 5 5 cm 120 24 m 0.25 31
8 10 5 5 cm 120 6m 1 43
For transducers with omnidirectional characteristic: Q(ϑ) =1; for directional transducers
the coupling factors can be Q(ϑ) =50 to 150.
6.3.2.1.3 CONCLUSIONS
For rough calculations of the achievable sound reinforcement in rooms and in the open it is
sufficient to consider only one reinforcement channel: the one with the loop amplification
194
1. determination of the distance relations to be taken into account rH/rSM, rH/rLH or rLM/rSM,
etc. (the indexes mean: S source, M microphone, L loudspeaker, H listener)
2. determination of the actual directivity factor or the coupling factor Q(ϑ)
3. reading of the actual sound transmission measure LXY from Figure 6.15
4. application of (6.11), (6.12) or (6.13)
A more exact calculation of the sound level values can be realized only by means of a
computer simulation program that renounces approximations and takes into consideration
the exact interactions existing between the different operating quantities. For practical
purposes, however, the above algorithm will suffice.
• extensive coherence of the wave fronts of the loudspeaker sound (of the secondary
sources) and generally also of the sound emitted by the original sources
• the acoustic orientation of the listeners is largely directed towards the original source (a
mislocalization towards the top may occur near the action area))
• no delayed sound stemming from secondary sources can cause travel time interferences
in the action area
This loudspeaker arrangement offers an enlargement of the critical distance, owing to the
directional effect according to (2.5a):
Dc = √QL(ϑ)*rH;
As an approximation, the directional factor ΓL(ϑ) in (2.5a) may be reduced to 1, since the
loudspeaker arrangement is directed towards the audience area. The reverberation radius rH
of one loudspeaker may thus be enlarged by the square root of the directivity factor.
This arrangement enables a relatively large critical distance and thus a large direct-
sound-determined (reverberation-free) area to develop, which results in high speech intel-
ligibility in medium-sized acoustically difficult rooms.
If the action area is relatively small, e.g. a platform of less than 15 m × 15 m, a speaker’s
desk or a boxing ring, it is possible to ensure acoustic localization without delay equipment
by an appropriate arrangement of the loudspeakers. A precondition in this regard is that
the precedence effect is considered, i.e. that the direct sound from the original source must
reach the listener before the amplified signal of the central loudspeaker array and that the
195
level of the original direct sound must not be more than 6 to 10 dB lower than that of the
amplified signal. The level of the original source often does not suffice, therefore support
loudspeakers for boosting the original sound are required in the range of the source. A typ-
ical example is a loudspeaker built into the speaker’s desk; see Figure 6.16. Additionally, a
delay of more than 30 to 50 ms has to be avoided between the first wave front arriving at
the listener from the original source or its support loudspeaker and the wave fronts of the
amplifying sources. Figure 2.21 illustrates the time and level conditions which have to be
considered in this case.
For succeeding without delay equipment, the following additional requirements are
applicable:
• The distance loudspeaker–listener rLH should be greater than the distance original source–
listener rSH (Figure 6.17). This condition must be given for the greatest possible distance
between source and listener.
• A sufficient loudness of the original source has to be ensured (if needed by support
loudspeakers).
Apart from enabling a broad coincidence of the visual and acoustical source localization,
the centralized loudspeaker arrangement offers the advantage that unintentional enhance-
ment of spaciousness does not occur, as only slight travel-time differences exist for the
listener.
This condition becomes evident in Figure 6.18. One sees that with a large time diffe-
rence of the sound arriving at the lateral front seats (H1) the effective critical distance DC1
of the radiators becomes reduced by the parallel arrangement, since loudspeaker L2 increases
the reverberant component in the coverage area of the other loudspeaker L1. For a listener’s
location H2 in the central or rear area of the hall this is not the case, since the travel time
from the two loudspeakers L1 and L2 is almost equal.
196
Figure 6.17 Geometric relations in the case of centralized coverage without delay equipment.
Figure 6.18 Sound-field relations with different loudspeaker arrangements. L1 and L2, loudspeakers
at the stage portal (left and right) with the critical distances Dc1 and Dc2; L3, supporting
loudspeaker above the balcony with critical distance Dc3.
The critical distance depends not only on the room-acoustical and technical loudspeaker
data, but also on the arrangement of the loudspeakers. A delayed loudspeaker L3 in the
rear area of the hall, for instance, always has a clarity-reducing effect for the front seats. Its
energy therefore must be radiated directly onto the target audience area so that it becomes
largely absorbed. At a rear seat H3, however, the front loudspeakers may also contribute
to enhancing the intelligibility of the signal, if the signal is fed to the local loudspeaker
with such a delay that it arrives at the listener slightly later than the signals from the front
loudspeakers.
This problem does not occur with installed L-C-R line arrays. Just the travel-time
difference original source–arrays (small distance) and arrays–listener (small to large dis-
tance) as well as the uniformity of sound level distribution must be observed. All these
conditions may be verified in advance with modern simulation programs; compare
Figure 6.19.
197
(a) Although the source is located near a centralized loudspeaker, it is not the signal from
the source, but an amplified sound signal coming from the infill loudspeaker which the
listener perceives as the originating point (cause: level L2 > L1 and distance l2 < l1) (pre-
cedence effect; see section 2.3.4).
(b) With increasing distance between main and infill loudspeakers, definition decreases
and with distances above 17 m the risk increases that the signal arriving from the
loudspeakers is perceived separately by the ear and is considered as echo.
While case (a) concerns rather the sound impression and the incident direction
(mislocalization and thus confusion are the consequence of diverging acoustical and visual
impressions), case (b) has to be avoided, since the echoes lead to poor speech intelligibility
198
Figure 6.20 Use of supporting loudspeaker for coverage of a listener’s seat. L1, central loudspeaker;
L2, loudspeaker near the listener. (a) Increased level at the listener’s location because of
close supporting loudspeaker. (b) Echo elimination by travel time compensation: ∆t =(l1
− l2)/c. (c) Acoustical localization with a delay slightly longer than the transmission
path: ∆t =(l1 − l2)/c +15 ms.
and non-consistent sound images. It is required to operate the infill loudspeakers with
time delay to avoid this. The delay times required for travel-time compensation may be
calculated by using eq. (6.8) or Table 5.3.
If just the acoustical travel-time difference source–listener and infill loudspeaker–listener is
compensated, a phantom source is located somewhere between the two loudspeakers when
the levels and spectra of both arriving signals are approximately equal.
By introducing a further delay of 15 to 20 ms, localization clearly jumps over to loud-
speaker L1, provided the level of L2 is not more than 6 to 10 dB higher than that of L1
(Figure 6.20c).
The echo elimination is obtained in both cases, i.e., either by the travel path compensa-
tion (Figure 6.20b), or also by an additional delay for ensuring localization (Figure 6.20c).
Currently, most multi-channel systems prefer to use delay units for smooth coverage of
the audience areas.
Decentralized systems are characterized by the use of sometimes a great number of indi-
vidual loudspeakers at close range to the listeners, so as to achieve high speech intelligibility,
eliminating the reverberant sound through direct irradiation. A consistent product of this
decentralization is the loudspeaker built into the back of the seat. Such systems in which a loud-
speaker is assigned to every listener at close range have proved their outstanding efficiency
for congresses with large numbers of attendants. Although the front seat is mostly located
in the direction of the platform so that the visual contact to the speaker is not impaired,
the use of delay equipment is sometimes advisable, e.g. to avoid echo disturbances, if such a
system is to be operated together with a portal system.
With a decentralized arrangement of the loudspeakers, it is generally not possible for the
‘internal travelling time’ of the system, i.e. the travel-time difference between the nearest
and the furthest audible loudspeaker, to be kept below 50 ms. Therefore, either the infill
19
• The reproduction loudness should be adapted to the environment (the noise level).
• No unusual non-linear distortions are perceptible.
• No additional disturbing noises (noise, cracking, etc.) are produced.
• No feedback phenomena occur.
• The occurring linear distortions should correspond to the standard room-acoustical
conditions.
• The visual and acoustical localisation of the real or imaginary sources (primary sources)
should coincide.
Except for the last requirement, all problems can be solved by technical measures. Apart
from technical factors, psychoacoustical properties are of primary importance for localiza-
tion. The fundamentals hereof were dealt with earlier in Chapter 2, the application of the
precedence effect (see section 2.3.4) is discussed here.
Figure 6.21 explains the fundamental effect of time delay. Without time delay one
localizes a phantom source in the centre between the two loudspeakers. If the signal is
delayed in one channel, the loudspeaker of the non-delayed channel is localized, and only
if the level of the delayed signal is increased by more than 6 to 10 dB (see section 2.3.4,
Figure 2.21) does the localization point (the phantom source) return to the centre between
the loudspeakers and finally move over to the delayed loudspeaker [3].
For finding the required delays, eq. (6.8) can be used. Starting from one loudspeaker or
original source providing the reference level, the times and levels required by the individual
loudspeakers for ensuring localization of the source or the reference loudspeaker are to be
chosen with reference to the source location and according to the travel time required in
accordance with distance from the listener.
This is illustrated by Figure 6.22. The voice of a talker at the desk is radiated by a
loudspeaker integrated in the desk. The sound level of this loudspeaker does not have
to prevail over the sound reinforcement system of the hall, but only to supply the localiza-
tion reference. The loudspeakers integrated for instance into the seats and used as the
main sound reinforcement system are supplied via delay devices according to their distance
from the speaker’s desk and produce a sufficient sound level thanks to their proximity to
20
Figure 6.21 Explanation of localization and phantom source formation as a function of the time delay
of one partial signal.
Figure 6.22 Acoustical localization of a sound source (in the speaker’s desk) by means of a delayed
sound system (schematic).
the listeners. Acoustically localized, however, is the speaker’s desk, since its signal arrives
earlier, though lower in level at the listener’s location. In the rearmost rows it is possible
for the less delayed front loudspeakers to serve as a localization reference, so that also
here localization is towards the front, although the desk loudspeaker is perhaps no longer
audible.
201
Performance area
Sound-mixing console DSS hardware Filters loudspeaker system
µ l
Simulation loudspeakers
g
Q t4 Σ µ
t3 Q1 l Near-field loudspeakers
Δt µ h
t2
t0 l Proscenium system
Qn
t1 µ µ p Reception area system
Δt
H t5 Σ l
Δψ l r
Tracking system
(a)
Figure 6.24 Source-oriented reinforcement system. (a) Tracking and delay localization zones for mobile
sound sources. (b) Placement of hidden positions of installed trackers. (c) Visualization of
12 localization zones on a stage.
204
Figure 6.24 Continued
205
Figure 6.25 Loudspeakers on stage for source support and tracking procedure with corresponding
software. (a) Stage area with hidden support loudspeakers in a stage design in the Albert
Hall performance. (b) Computer graphic for visualization of the tracking procedure of a
talker or singer on a stage. (c) One tracker placement on stage for a performance in the
Szeged Drone Arena.
206
Figure 6.25 Continued
Over the last few years the world of ‘immersive sound’ coming from studio and cinema
applications has found its way into the sound reinforcement business. Hereby the sound
systems give the listener the impression of being enveloped by sound. Loudspeakers not only
on or above stages but also around the listener areas are used to create a complex spatial
sound impression for the listeners. Tracker devices allow the simulation of moving sound
sources on stage. Figure 6.26 shows such a loudspeaker arrangement for a ‘Sound Scape’
sound reproduction.
1. Pure classic concert halls with highest acoustic criteria (less than 5% of existing halls)
Speech performances by using sophisticated sound systems, the performance quality of
speech reproduction is limited, concert acoustics have priority, the reverberation time
up to 2.2 s, depending on volume.
newgenrtpdf
207
System Design Approaches 207
Figure 6.26 d&b Soundscape 360° system.
208
2. Multipurpose halls with high acoustic quality suitable for classic concert performances
(around 20% of existing halls)
Classic concerts may be performed in high quality, and speech-related events like con-
gress sessions or company meetings will happen quite often. Also chamber music events
or jazz performances might happen. Reverberation time should not exceed 1.7 to 1.9 s.
3. Pure multipurpose halls for speech and music performances with good acoustic quality
(around 50% of existing halls)
Often existing city halls or similar assembly halls for every type of event and gathering,
from speech presentations to classic concerts, the latter with less acoustic success.
Average reverberation time does not exceed 1.6 s.
4. Assembly halls mainly used for speech performances (around 25% of existing halls)
Mainly used for speech, but chamber music performances are possible too. Most halls
of this type handle electroacoustic supported music reproduction well, including rock
and pop concerts. Reverberation time varies between 1.0 and 1.4 s depending on
volume.
5. Hall types 1 and 4 are not really multipurpose halls. In type 1 a sound system is to be
designed for announcements and speech reproduction and in type 4 a sound system
may or may not be required for speech; music performances are supported by sound
systems permanently or temporarily installed.
As 70% of all existing larger halls are either of type 2 or 3, these will be explained in more
detail.
Type 2 halls demand extra efforts in acoustics. Here the room-acoustic design will
anyway be carefully checked by computer simulation and for larger halls even by scale-
model measurements. The sound systems must be designed carefully as well, so the same
computer model should be used to verify the sound system design for good coverage and
high speech intelligibility.
The design of Type 3 halls is not less complicated if good acoustic properties are to
be achieved. Any standard solutions must be avoided. In close cooperation between the
architect and the acoustician the room-acoustic properties of the hall must be designed.
Depending on the size and the shape of the hall computer simulation is recommended, espe-
cially if glass structures or rounded wall parts dominate. Also, a sound system should not be a
‘standard’ one. Depending on seat count, hall geometry and stage structure, different sound
systems may be advisable [7]. Quite often the architect wants to hide the loudspeakers; this
is often possible, but the sound system design must ensure that the covering surface in front
of the loudspeaker is acoustically entirely transparent. For rock and pop concerts in these
halls the artists might bring with them their own preferred equipment. In most of these
cases the sound quality is often worse in comparison to a hall-related, firmly installed sound
system. So, this ‘house’ system is often additionally needed to support the rented system to
achieve good sound quality.
• induction loops inside the floor and by using visible personal induction receivers
• infrared transmission to visible infrared receivers
209
Figure 6.27 Use of induction loops for compensation of hearing loss in a theatre main floor.
210
• there is substantial background noise, which will reduce the effectiveness of any
assistive listening system
• there is no practical way to install the loop cable
• there is no sufficiently good-quality audio source available
• electrical instruments such as electric guitars or dynamic microphones are used within
the area covered by the loop.
Figure 6.28 Infrared field strength coverage simulation on listener areas of a lecture hall with two
SZI1015 radiators in blue.
21
In Figure 6.28 a computer model of a theatre is shown equipped with two SZI1015
radiators (Sennheiser) in the broadband range 30 kHz –6 MHz; see Figure 6.29 left. We
discover a smooth coverage of 25 dB and more on all listener seats in the theatre. If the
influence of noise is high these strength values are insufficient, and stronger infrared radi-
ator settings may be selected or the position of the radiators may be modified, to be verified
by means of computer routines.
Mobile infrared receivers are used to convert the modulated infrared signal into an audio
signal; see Figure 6.29 on the right. Languages may be switched; multi-language support is
possible.
6.3.6.3 FM Transmission
Tour-guide receivers are also in use as ALS systems. They are designed for applications such
as guided tours, multi-language interpretation and command applications, for example in
the fields of sports, and also for assistive listening. As an example, the EK 1039 by Sennheiser
(see Figure 6.30) can be used with corresponding headphones or personally worn induction
slings. The handling and operation of the receiver are very simple and intuitive. Speech is
highly intelligible thanks to an audio bandwidth of 15 kHz.
Reliability:
• 75 MHz switching bandwidth –the tour-guide system with the widest range
• Adaptive diversity technology for improved RF reception quality
• Reliable operation with standard AA cells or rechargeable batteries
21
Appendix 6.1
qSM qLH
υ L = R(X) . (6.9)
qLM qSH
If the loudspeaker is mainly aimed at the audience area so as to provide the advantage of
exciting the internal reverberation of the room less, the angle-dependent directivity factor
of the loudspeaker is increased by the equivalent beaming factor QPL.
214
VE =10 lg vL dB or
References
1. https://ease.afmg.eu or www.catt.se or www.odeon.dk.
2. Blauert, J. Spatial Hearing. Cambridge, MA: MIT, 1983.
3. Haas, H. On the influence of a single echo on the audibility of speech (in German). Acustica 1
(1951) 2, 49–58.
4. Steinke, G., Ahnert, W., Fels, P., and Hoeg, W. True directional sound system orientated to
the original sound and diffuse sound structures -new applications of the Delta Stereophony
System (DSS). Presented at the 82nd AES Convention, 1987, March 10–13, London, Preprint
No. 2427.
5. DP 394/584: Procedure and arrangement for a locally as well as temporally variable signal dis-
tribution over a large sound reinforcement system, especially for audiovisual performances in
auditoriums, preferably dome-shaped rooms.
6. www.outboard.co.uk.
7. Adelman-Larsen, N.W. Rock and Pop Venues. Springer, 2014.
8. Induction and Hearing Loop System Design and Amplifiers (ampetronic.com).
215
Intelligibility is the single most important factor when it comes to designing and operating
sound systems intended for amplification and distribution of speech. Indeed, if such a system
isn’t intelligible, then there is no point in having it. It should be realised that speech intelli-
gibility and sound quality are not the same thing, as it is quite possible to have a fairly poor-
sounding system that is highly intelligible (consider a telephone for example); conversely,
an expensive Hi-Fi system, whilst sounding wonderful in a domestic setting, is unlikely to
provide adequate intelligibility in a reverberant swimming pool, railway station concourse
or large reverberant cathedral. Intelligibility, however, is a not a binary condition flipping
between intelligible and unintelligible but instead a gradual change occurs and the level
of intelligibility required for one application may not be suitable for another. For example,
the degree of intelligibility required for passenger information announcements in a railway
station would not need to be as high as that needed when listening to a complex lecture,
theatrical drama or in a law court, where every fragment of speech needs to be easily and
fully heard.
The aim of this chapter is to explore the factors that affect speech intelligibility and
see how these affect a potential design and to then establish how intelligibility can be
measured and quantified. However, these processes are interactive and some knowledge
of how intelligibility is rated is required in order to understand certain design aspects and
implications. Several descriptors are frequently used to try and express what is meant by
intelligibility, such as ‘audibility’, ‘clarity’, ‘articulation’ and ‘understanding’. The ability
to understand a particular piece of speech, however inherently intelligible, if spoken in a
language that the listener is not familiar with or has fluency in, may not be understood.
ISO 9921 [1] defines intelligibility as ‘a measure of the effectiveness of understanding
speech’.
DOI: 10.4324/9781003220268-7
216
Secondary Factors
As can be seen from the above list, there are a large number of factors that can potentially
affect the perceived intelligibility of a sound reinforcement system or an audio transmission
channel.
enough to understand basic announcements and messages –albeit with the need to listen
carefully with slightly increased effort. (This assumes that the received speech is free from
reverberation or other degradations.)
Increasing the directivity of a loudspeaker (narrowing its coverage) reduces the level of
reverberant excitation within a space as the radiated sound is confined into a smaller area
and nominally away from reflective surfaces such as the ceiling and potentially the walls,
depending on how the loudspeaker system is set up and aimed.
The factors highlighted+ combine to produce the direct-to-reverberant ratio (DRR), which
ideally needs to have a positive value in order to yield good intelligibility (though it may be as
low as −3 dB to −5 dB). However, it is RT-dependent and can only be used as a rough guide
as early reflections (useful reflections arriving within approximately 35–50 ms of the direct
sound) serve to increase the ‘direct’ component and increase the perceived intelligibility. This
concept is illustrated in Figure 7.2 and discussed later in the chapter when considering sound
clarity (C50) measures.
Figure 7.5 Diagrammatic view of the effect of noise on speech, for high, moderate and low signal to
noise ratios.
In Figure 7.5(top), a low level of steady ambient noise has been added, but this would not
affect the intelligibility of the speech as this is well below the level of the speech. However,
in Figure 7.5(middle) the level of the noise is greater and now obscures or ‘masks’ part of the
speech signal (the lower amplitude components). In the lower diagram, the level of noise
has been increased so that it is masking most of the speech signal. However, parts of the
signal will still be heard and potentially understood.
In practice, words and syllables will have different amplitudes and so will be affected
differently by a given level of noise, as depicted in Figure 7.6. Here certain elements (words
or syllables) are either not affected or only partially affected by the noise, whilst others are
masked completely.
It may well be that a listener is able to ‘piece together’ the meaning of the speech
by recognising some parts and inferring what the missing elements might be. The more
complex or unfamiliar the speech is, the greater the task and the greater the cogni-
tive load.
25
Figure 7.7 Diagrammatic effect of reverberation on speech elements of the same and varying levels.
Figure 7.9 Diagram showing the effect of reverberation times of 1.0 and 0.6 seconds on the word
‘back’.
survives and is audible. This enables the word ‘back’ to be distinguished from other words
beginning with ‘ba’ such as bag, ban, band or bath for example.
In reality, the amplitude of a speech signal varies significantly from one second to the
next, as shown in Figure 7.10, which shows a 30-second speech extract measured with
a resolution of 100 milliseconds. The lower curve presents the amplitude in terms of an
r.m.s. measurement whilst the upper curve shows the corresponding true peak levels. The
long-term average peak level is 20 dB greater than the r.m.s., indicating this speech extract
to have a crest factor of 20 dB –which is a very typical value. Clearly, the amplitude of
speech varies significantly over a given time period. In order to ascribe a level to speech,
27
Figure 7.10 Temporal variability of speech: LAeq =73 dB, LAmax =82 dB, LCeq =78 dB, LCpk =98 dB
and average LCpk =89 dB.
it is therefore customary to measure the long-term average or the average value over the
length of a message or the speech segment under consideration. This is most conveniently
carried out by measuring the equivalent energy level (Leq). In the example in Figure 7.10,
the average SPL (LAeq) is 73 dB, whilst the maximum r.m.s. level is 82 dBA and the peak
level is 98 dBC and the average peak level is 89 dBC.
It is important to appreciate that speech is not produced at a static level but typic-
ally may vary by about 10 dBA, though for some talkers this range may be greater.
Having a knowledge of not only the dynamic range of speech but its likely maximum
levels is extremely important when designing a PA or sound reinforcement system, as this
determines the necessary amplifier voltage (and power) headroom. However, for signal
to noise ratio calculations the LAeq is usually employed to set or determine the speech
level –though it can be argued that a measure such as the LA10 (10% level exceedance) also
provides a realistic approach. A more detailed discussion of speech signal characteristics
can be found in [9].
The other primary factor of importance for speech is its spectral characteristic. Typically
speech sounds occur over the range 100 Hz to 8–10 kHz. Individual voices vary enormously
and again in order to be able to predict or measure the potential intelligibility of a system,
an average spectrum needs to be used. Various studies have been carried out and a number
of standards have been developed. These all employ the long-term average spectrum and
whilst generally depicting the same general trends vary considerably, primarily due to the
way the measurement was performed and the size and composition of the talker sample
employed. A typical speech spectrum is shown in Figure 7.11.
28
As the figure shows, the maximum energy occurs around 250 Hz to 500 Hz and then
decreases with increasing frequency. The Speech Transmission Index (STI) standard (IEC
60268-16 [10]) follows a similar approach but was updated in 2020 (Edition 5) to bring it
better into line with other standards and the latest research. Figure 7.12 compares the old
and new STI spectra.
The spectra are idealised, in that the high frequencies roll off at a constant 6 dB per
octave. In reality however, most real speech has an increased high-frequency content as
compared to the standard, as shown for example by Figure 7.13, which compares the long-
term spectra of six different male voices and the standardised STI male spectrum.
In Western languages, the vowels carry most of the power (SPL) of the voice and
typically cover the range from 125 Hz to 1 kHz, whilst the consonants carry the infor-
mation and occur at frequencies above approximately 1 kHz (note that this is a gross
simplification).
In terms of intelligibility, the 2 kHz octave band information is the most important,
followed by the 4 kHz and 1 kHz bands, as depicted in Figure 7.14. Different standards apply
slightly different importance weightings to the speech frequency bands dependent on how
the background research and testing was conducted and the nature of the speech test signal
under consideration.
As noted above, although the low and mid frequencies carry the power of the voice,
it is the higher but significantly weaker high frequencies that are mainly responsible for
intelligibility. Therefore a single, wideband signal-to-noise measurement may not be par-
ticularly useful but using an ‘A’ weighted measurement, whereby the 1, 2 and 4 kHz bands
are emphasised and the 500 Hz and 250 Hz bands are attenuated, can provide a reasonable
single-figure approximation (hence the earlier empirical guidance of 6 and 10 dBA signal to
noise ratios s noted in section 7.1).
29
Figure 7.12 Speech and test signal spectra from IEC 60268-16 2011 and 2020 (Editions 4 and 5).
Figure 7.13 Speech spectra of six different voices and comparison with IEC 60268-16 2011 spectrum.
230
Figure 7.15 Octave band analysis of speech and interfering noise –with good signal to noise ratio.
By carrying out an octave band spectral analysis of the interfering noise and the speech
content, a more detailed evaluation of a given signal-to-noise issue can be undertaken.
Furthermore, by considering the octave band intelligibility weightings, it is possible to see
not only where the problem lies but also to gain an insight into what may be done to
improve the intelligibility with respect to the noise. Figure 7.15 demonstrates the idea.
Here the speech signal is well above the ambient noise in each of the octave bands and
so it can be inferred that intelligibility should be good. In contrast the analysis shown in
Figure 7.16 indicates that the intelligibility will be poor as the higher (consonant) frequen-
cies are below the noise.
231
Figure 7.16 Octave band analysis of speech and interfering noise –with poor signal to noise ratio.
Figure 7.17 Energy time curve for sound arriving at listening position from distributed sound system
in 1.6 s RT space.
In a similar manner to signal to noise ratio, the direct to reverberant ratio and temporal
aspects of a system can be assessed by evaluating the impulse response of the system in a
room or space. Figure 7.17 shows a typical example.
Figure 7.17 shows the energy time curve (ETC) of a distributed sound system in a
church having a mid-frequency reverberation time of 1.6 seconds. The plot shows the
direct sound (first spike on the left of the graph) to be strong and well above the reverber-
ation (sloping decay towards the right). Immediately following the direct sound from the
nearest loudspeaker there are a number of other discrete (separately identifiable) sound
arrivals –primarily from other loudspeakers in the system. These sounds all arrive within
50 ms of the primary arrival and so should integrate and enhance the intelligibility. This
is indicated by Figure 7.18, which is an integrated energy plot and implies that the direct
sound and early reflections will combine to enhance the subjectively perceived direct to
reverberant ratio.
23
Sound energy ratios –C7 is effectively the ‘direct sound’ alone. C50 and C35
Figure 7.19
include early reflections that will integrate with and increase the effective level of the
direct sound.
Using the information presented in Figure 7.17, the direct to reverberant ratio can be
quantified and as might be expected is frequency-dependent. Figure 7.19 plots the Direct
to Reverberant sound energy ratio in decibels for three analysis settings C7, C35 and C50,
where the suffix denotes the length of the window set (in ms) to include the ‘direct energy’
or ‘useful’ sound component. C50 is defined as the ratio of the sound energy arriving within
the first 50 ms to the total sound energy arriving after 50 ms and is expressed in dB. It was
noted in 7.1 that 50 ms is often used as a measure of the useful sound which includes the
direct component and early reflections.
23
Energy ( 0 − 50 ms )
C50 = 10 x log , dB
Energy ( 50 ms − end )
0.63 Hz x x x x x x x
0.80 Hz x x x x x x x
1.0 Hz x x x x x x x
1.25 Hz x x x x x x x
1.6 Hz x x x x x x x
2.0 Hz x x x x x x x
2.5 Hz x x x x x x x
3.15 Hz x x x x x x x
4.0 Hz x x x x x x x
5.0 Hz x x x x x x x
6.3 Hz x x x x x x x
8.0 Hz x x x x x x x
10.0 Hz x x x x x x x
12.5 Hz x x x x x x x
236
These indexes are then weighted with respect to their contribution to intelligibility (see
Figure 7.14) and combined to produce a single, overall STI value. An example is shown in
Table 7.3.
When first developed in the 1970s, a mainframe computer was required to calculate the
STI. In the 1980s a method of deriving the STI from the system-room impulse response
was found based on the relationship established by Schroeder [15].2 However whilst appro-
priately taking account of the reverberant component, the background noise element has
to be manually added, frequently resulting in inaccurate or erroneous assessments being
made. The IR-based procedure still required a relatively powerful PC or laptop computer
and so could not be easily used on site, though as laptop devices became more powerful and
measurement software improved on-site measurement became a reality. In 2001, a simpli-
fied method of obtaining the STI using a handheld portable device or modified sound level
meter was introduced, specifically with the assessment of sound systems in mind. The new
method was termed STIPA and used a sparse modulation matrix, as shown in Table 7.4.
Here just 14 rather than 98 modulation data points are employed. However, despite the
reduction in resolution, STIPA and STI measurements agree extremely well [16, 17] and
STIPA has become widely adopted for measuring (and predicting) public address (PA) and
voice alarm (VA) system performance.
Plotting the modulation data in a graphical format can provide a useful insight into what
is happening in each frequency band and by examining the decay it can be ascertained if
the modulation reductions are due to reverberation, noise or echo interference. Figure 7.21
shows the data presented in Table 7.3 graphically. The STI scale ranges from 0 (completely
unintelligible) to 1.0 (perfect intelligibility). The just noticeable difference (JND) is usually
taken to be 0.03 STI. As noted earlier, intelligibility is non-linear and so STI follows this
trend. The difference between 0.40 and 0.45 or 0.45 and 0.55 is very much more notice-
able than a corresponding change say between 0.65 and 0.70 or 0.70 and 0.80 for normal
speech or a speech announcement. The relationships between STI, sentences, phonetically
balanced (PB) words and consonant-vowel-consonant (CVC) words is given in annex E of
the IEC 60268-16 standard.
237
0.63 Hz x
0.80 Hz x
1.0 Hz x
1.25 Hz x
1.6 Hz x
2.0 Hz x
2.5 Hz x
3.15 Hz x
4.0 Hz x
5.0 Hz x
6.3 Hz x
8.0 Hz x
10.0 Hz x
12.5 Hz x
Figure 7.21 MTF plot for high-quality sound reinforcement system in 1.4 s RT space (STI =0.61).
Apart from replicating the speech spectrum and speech modulations, later enhancements
to STI (and STIPA) also incorporated adjacent band redundancy factors, the effects of
frequency masking and a sound level dependency function. The first two functions take
account of how adjacent octave bands can contribute to speech intelligibility and also how
lower frequencies can mask higher-frequency information. The latter effect is based on the
psychoacoustic phenomenon known as ‘the upward spread of masking’ whereby a lower
frequency or band of frequencies, if sufficiently higher in level than the adjacent band, will
238
cause the higher frequencies to become completely or partially masked, effectively making
them inaudible. The effect is level-dependent with the masking slopes becoming steeper at
higher sound levels and therefore increasing the masking effect as the sound level increases.
Associated with, although separate from, the effect of frequency masking is the observation
that higher-level speech (above approximately 80 dB) gradually becomes less intelligible as
the level is further increased. Conversely, there is a corresponding effect at low levels below
about 50 dB SPL. STI takes both of these effects into account. However, the reduction in
STI due to the absolute SPL is not linear but is dependent on the pre-existing reduction
in modulation caused, for example, by reverberation or other factors. Figure 7.22 illustrates
the realisation of this, demonstrating the reduction in STI as a function of sound level for
three different reverberant conditions corresponding to reverberation times of 0, 1.0 and
2.0 seconds.
Under anechoic conditions (0 seconds RT) or for an electronic system without echo or
distortion, the upper trace in Figure 7.22 clearly shows the decrease in STI due to the effect
of SPL and associated masking to be quite significant. However, the reduction in STI is not
linear, as shown by the reduced reductions in STI for the same range of SPLs calculated
for different reverberation conditions (equivalent to 1.0 and 2.0 seconds) presented in the
lower traces. Further information on how STI accounts for the effects of masking and abso-
lute SPL on STI can be found in Annex A of the STI standard [10].
The Speech Transmission Index is a sophisticated measurement and to a first-order
approximation generally gives good agreement with perceived intelligibility. However, as
we have seen, the way that the ear and brain work together to extract meaning from the
acoustic signals that make up the speech sounds we use for communication is remarkably
complex. It is therefore not surprising that it is possible to fool the relatively simple STI
technique or for STI not to have the resolution or refinement to successfully predict the
potential intelligibility in some situations. STI knows nothing, for example, about clarity of
articulation of the talker, neither their rate of speech nor the potential binaural advantage
of the listener.
239
intelligibility for a noisy and reverberant railway station but this would be totally inadequate
for a theatre or law court –yet the same criteria were often being applied. Intelligibility was
being considered to be a binary, black and white issue.
Figure 7.23 presents a more detailed scale from the STI standard (IEC 60268-16).
Figure 7.24 MTI plots for two sound systems exhibiting STI values of 0.61 and 0.49 respectively.
to natural variations in the results and so at least three measurements should be made at
each measurement position and an average taken –provided that the difference between
readings is 0.03 STI or less. Any reading outside this range should be ignored and the meas-
urement should be repeated.
In many cases, readings cannot be taken under normal operating or occupied conditions
and so are made out of hours to avoid disturbance to the public or to building occupants.
However, the ambient noise at this time will not be representative of when the system is
normally used. This problem, however, can be readily overcome by separately measuring
the normal ambient noise and then correcting the STI readings to account for the reduced
signal to noise ratio that will occur during ‘out of hours’ testing. Indeed, many meters and
software provide this function or enable the measured data to be exported to a spreadsheet
for background noise correction.
The sparse nature of the STIPA matrix makes it unsuitable for assessing the effects of
echoes, which requires the full STI matrix. Even then there is some difference of opinion
as to the suitability of STI for this task –a view perhaps influenced by subjective impres-
sion which relates more to the ‘ease of listening’ than to the objective loss of intelligibility
[18]. Figure 7.25 shows the effect of a single echo (of similar amplitude to the direct sound)
on STI.
Some forms of signal-processing such as AGC, echo and noise cancellation can also
affect STI measurements –which are particularly dependent on the nature of the test signal
employed. It should also be understood that the crest factor and dynamic characteristics of
the STI/STIPA test signals are very different to real speech. Table 7.6 summaries a number
of characteristic parameters. From the table it can be seen that the crest factors of speech
and STIPA are noticeably different and so could potentially affect the transmission of the
24
Parameter Crest factor Crest factor LA1 − LA10 − LAeq LAmax > LCmax >
(dB) (dB) LAeq LAeq LAeq
Speech (typical) 20 17 7 3 9 12
Pink noise 12.0 11.2 2.0 0.1 0.4 2.6
STIPA 12.4 11.6 1.8 1.0 1.8 9.2
Sinewave 3 3 0 0 - -
signal through a sound system. However, there is little energy in the peaks. Of greater con-
cern are the differences in the dynamic r.m.s. behaviours. Typically the r.m.s. speech signal
maxima may be around 10 dB above the long-term average –and these maxima contain
significant energy and will cause signal processors such as compressors and limiters to react.
The A weighted r.m.s. maxima for the STIPA signal on the other hand are typically some
7 dB lower and the average maximum level is approximately 5 dB lower. Clearly the STIPA
signal does not replicate the energetic and dynamic behaviour of typical speech. The dis-
crepancy is significantly worse if a sine sweep is employed as the test signal, as indicated in
Table 7.6.
A further common error that widely occurs in practice relates to the setting up of a
measurement and adjusting the equivalent speech and STIPA test signal levels. It is often
thought that this should be achieved by setting the STIPA signal to the same LAeq value as
that of a normal speech announcement or reinforcement level (that is, measure the LAeq of
the speech and set the STIPA signal to the same LAeq). However, this is incorrect. According
243
Figure 7.26 1/12 octave analysis of Edition 5 STIPA signal (centre frequencies at 125, 250, 500 Hz,
1, 2, 4 and 8 kHz).
to IEC 60268-16, for equivalence, the STIPA signal must be set to an LAeq value that is 3
dB higher than the speech. This difference can have a significant impact when dealing with
low signal to noise ratios and may also put a higher demand on system amplifiers. It should
also be realised that the STI/STIPA signal does not have a continuous spectrum as perhaps
suggested by Figure 7.12 but instead consists of a discrete series of separated ½-octave-wide
elements, as illustrated by Figure 7.26.
Whilst STI is the best metric that we have for assessing the potential intelligibility
of a sound system, it is far from infallible, but it should provide a reasonable indication.
Assuming that it has been measured correctly, it is unlikely that a system achieving a STI
score of ≥0.50 will be unintelligible. However, STI knows nothing about the clarity or spec-
tral makeup of the talker’s voice or their rate of speech –all factors that will significantly
affect the perceived intelligibility. Equally, nothing is known about the listener’s hearing
acuity and language skills or the relative location of the source of sound and interfering
noise. One of the great advantages of STI, however, is that it can be predicted from a know-
ledge of the basic acoustic details of a room or space (for example, the volume, reverberation
time, surface treatments /acoustic absorption coefficients). This enables sound systems to
be designed with a high degree of confidence that they will be intelligible or capable of
meeting a given STI target. However, this task requires the talents of a skilled computer
modeller, particularly when dealing with challenging acoustic environments. Acoustic com-
puter modelling programs are only as good as the data that are fed into them but at times
they can positively encourage the user to compute an incorrect result –particularly with
respect to STI. A detailed understanding of room acoustics, the way in which loudspeakers
behave in a given space, how loudspeakers radiate sound and a good understanding of STI
and its underlying science are all required.
24
<25 2
25–100 3
100–500 6
500–1500 10
1500–2500 15
>1500 15 per 2500 m2
% ALcons =200 D2 RT2 (n+1) / V Q m [applicable where D < 3.16 Dc] (7.1)
where:
D =distance to the loudspeaker (or talker) (m)
RT =reverberation time (s)
n =number of loudspeakers operating and contributing to the reverberant field
V =volume of the space (m3)
Q =directivity of the source
m =acoustic modifier (usually set to 1)
Dc =critical distance
245
15 0.45
10 0.52
5 0.65
≤3 ≥0.75
Note the equivalence only really works for the disused RaSTI method, which was limited to just the
500 Hz and 2 kHz octave bands.
Being simpler to calculate, % Alcons was for a while used as a quick method of esti-
mating the likely STI value for a PA system, using the relationship between STI and %
ALcons determined by Houtgast and Steeneken and Farrell Becker. Typical equivalent
values are shown in Table 7.8.
For many years, the acceptable maximum percentage loss of consonants (% ALcons)
was 15%, and was based on the requirement to have at least 25 dB of signal to noise ratio
together with a uniform frequency response in the 2–4 kHz region –the critical range for
speech intelligibility. In later years, the continuing study of speech intelligibility (in part
due to the increasing use and understanding of STI) led to the conclusion that 10% was
a more appropriate value (maximum loss) for most purposes. When the information being
delivered is familiar or expected, 10% is quite acceptable. In a learning environment, espe-
cially for people with hearing impairment, the target % ALcons should be closer to 5%.
Whereas % ALcons can still be useful for assessing the Direct to Reverberant ratio effects,
the method was not developed to the same extent as STI nor standardised. In practice, STI
or STIPA has become the preferred and internationally standardised method for rating the
potential intelligibility of a sound system.
7.4.3.2 Coherence
A new measurement technique that does show some promise is based on the measurement
of the coherence of the signal received by a listener [24]. This, however, does require know-
ledge of the original signal to act as the reference. Whilst in some applications this may be
possible, in many cases there is no way of obtaining this reference –remote measurements
in an airport terminal or large industrial site for example. Here either an asynchronous tech-
nique or a stand-alone signal such as STIPA is required. The use of coherence as a measure
does have the advantage that it provides a more stable result for reflections and may also be
viewed in real time.
At this stage, there is no proven relationship between coherence and speech intelli-
gibility. Considerably more research is required in order to explore and exploit the link
and derive a robust measurement scale. As with STI however, coherence is an indirect
method of assessment, relying on an acoustic parameter that correlates with intelligibility.
Furthermore, it is not clear how the parameter may be predicted from a knowledge of room
acoustic and loudspeaker parameters. The ultimate goal for the measurement and assessment
of speech intelligibility must be to use real speech and the transmitted speech signal itself –
though this is a considerable way off being readily realised.
246
Figure 7.29 Set of frequency response curves for a concert hall high-quality sound system.
the unoccupied venue. There is no doubt that system frequency response and perceived
intelligibility are inextricably linked. Whereas we know how to measure intelligibility we
still do not fully understand how to measure the perceived spectral response of a sound
system in a large space. That the direct sound should be nominally flat and well extended
is certainly true but how to take account of the sound of the early and late reflections and
reverberation is not well understood.
Notes
1 Uniformity of coverage affects both the direct to reverberant ratio and signal to noise ratio.
2 IEC 60268-16 makes a clear distinction between the two methods of making an STI measurement,
referring to them as the ‘direct’ and ‘indirect methods’.
References
1. ISO 9921-Ergonomics –Assessment of Speech Communication (2003).
2. Haas, H. The influence of a single echo on the audibility of speech. J. Audio Eng. Soc. 20(2)
(1972).
3. Muncey, R.W., Nickson, A.F.B.. and Dubout, P. The acceptability of artificial echoes with rever-
berant speech and music. Acustica 4 (1954).
4. Lochner, J.P.A., and Burger, J.F. The subjective masking of short time delayed echoes by their
primary sounds and their contribution to the intelligibility of speech. Acustica 8 (1958).
5. Dietsch, L., and Kraak, W. Ein objectives Kriterium zur Erfassung von Echostorungen bei Musik
und Sprachdarbietungen. Acustica 60 (1986).
6. Mapp, P. Frequency response and systematic errors in STI measurements. Proc IOA Vol 27 Pt 8,
Reproduced Sound 19 (2003).
7. Leembruggen, G., Hippler, M., and Mapp, P. Further investigations into improving STI’s rec-
ognition of the effects of poor frequency response on subjective intelligibility. AES 128th
Convention, London (2010).
8. Wijngaarden, S., Steeneken, H., and Houtgast, T. Quantifying the intelligibility of speech in
noise for non-native listeners. JASA 111(4) (2002).
9. Mapp, P. Some effects of speech signal characteristics on PA system performance. AES 139th
Convention, New York (2015).
10. IEC 60268- 16: 2011 and 2020, Objective Rating of Speech Intelligibility by Speech
Transmission Index.
11. ANSI standard S3.5 1969 Methods for calculation of the Articulation Index.
12. Ansi S3-5 1997 (R 2017) Methods for Calculation of The Speech Intelligibility Index.
250
DOI: 10.4324/9781003220268-8
25
Figure 8.1 Top: computer model of the main railway station of Berlin (EASE software by AFMG).
Bottom: Holoplot loudspeaker system installed at Frankfurt Hauptbahnhof (main station).
newgenrtpdf
253
Acoustic Modelling – Basics 253
Figure 8.2 Exemplary distribution of direct SPL across listening areas in a medium-size church (EASE software by AFMG).
254
adequately document possible problems in reports as well as proposed design solutions. More
often than not the client or the contracting authority has to be convinced of the conceptual
approach. The problem and the solution approach must be presented in a way that is easily
understood by people who are not experts in the acoustic field (Figure 8.2). While objective
quantities in the form of tables and graphics will help with that, an actual demonstration of
the modelled performance of a sound system or the acoustics of a room will often be most
helpful. The acoustic characteristics of the space can be easily evaluated subjectively by
means of auralization, which is the process of making a sound in the room audible through
computational means, and without the need to actually construct the facility. For this pur-
pose, headphones (binaural reproduction) or specific loudspeaker setups (e.g., Ambisonics)
can be used; compare Figure 8.3.
Acoustic modelling is used in other specific applications as well. For example, many AR/
VR applications rely to some degree on simulated acoustics in order to realistically repro-
duce certain sounds in the virtual environment. Acoustic simulations are also often used
for educational purposes to illustrate or teach basic concepts, e.g., to university students.
A field that is remotely related but also gaining traction is the creation of acoustic effects
for movies, games or AR/VR scenes by using basic simulation models.
1. Model geometry
2. Acoustic materials
3. Loudspeaker data
Figure 8.4 3D computer model of a German church (Frauenkirche Dresden) that shows the level of
geometrical details typically used for acoustic indoor models (EASE software by AFMG).
256
Figure 8.5 Illustration of scattering effects: at low frequencies (left) the fine structure of the surface
is ignored. For wavelengths of the order of the structure’s dimension, the incident sound
wave is diffused. At shorter wave lengths geometrical reflections dominate again (courtesy
Vorländer 2006).
Figure 8.6 Directivity balloon for a line array (EASE SpeakerLab software by AFMG).
magnitude data at a 10° angular resolution, modern approaches are based on 1/24th octave
data at up to 1° or 2° angular spacing (Figure 8.6). In addition, low-frequency models such
as BEM are used in an effort to better describe the baffle effects between neighbour cabinets
of an array which cannot be measured easily because of the size and the weight of a large
system. For some specific applications loudspeaker directivity data are also represented in
spherical harmonics instead of directional transfer functions.
A related area of research has moved into focus over the last few years. It deals with
the question how natural sources like human speakers, singers or musical instruments
can be modelled with reasonable accuracy. In this respect the frequency response as well
as the directional radiation patterns are of great interest as they vary over time and fre-
quency and depend on the talker or musician, as well. For reliable simulation results these
characteristics must be quantified in a reproducible manner. It is of similar interest which
parameters can be used to describe and configure such types of sources in the simulation
258
tradeoff between these aspects depends strongly on the type of room. For simpler rooms faster
and less accurate methods may be employed whereas for complex spaces time-consuming,
more precise modelling approaches must be used. It is often within the experience of the
user to understand in which cases one should use which approach.
In the time domain the RIR can be characterized by dividing it into three distinct parts
(Figure 8.9). The first part is the direct sound arrival from the source or sources at the
receiver. The second part consists of early, discrete reflections that have a defined level and
arrival time. The third part is the diffuse reverberation which consists of many late specular
reflections that overlap as well as scattered reflections from random directions. The point of
time at which a significant number of reflections are received and subjectively are no longer
perceived as individual reflections is called the reverberation onset.
In the frequency domain the room transfer function consists of two different parts
(Figure 8.10). For low frequencies the room response is normally dominated by modal
behaviour. In this region the transfer function therefore shows the peaks and dips of the
room modes that are excited by the source. These are usually few, which is why the response
shows a rather smooth contour. For high frequencies the density of modes is very high and
modes overlap strongly so that the course of the response function represents the statistical
average across many modes. The transition region between these two regimes is located
around the Schroeder frequency, which again is a function of reverberation time and room
volume; compare eq. (2.11).
Most simulation methods have been developed exploiting these characteristics. In the
time domain, different methods are used to determine the direct sound, the early reflections
as well as the reverberation. In the frequency domain, wave-based methods are typic-
ally used for the low-frequency region and particle-based methods are used for the high-
frequency range. An overview over the most common approaches is given in the following.
Figure 8.10 Exemplary room transfer function measured in a medium-size room (EASERA software
by AFMG). Typical smooth, modal structure in the frequency range 50 Hz to 300 Hz;
typical dense, statistical structure for frequencies above 1 kHz.
are high enough. Using such data, simulation results can be accurate up to about ±1 dB for
typical sound systems. However, this so-called CDPS (complex directivity point source)
model only works in the far field of the single element or measured transducer. It also
assumes that diffraction or boundary effects by the loudspeaker cabinets are included within
the measurements. For that reason, the detailed low-frequency analysis of a loudspeaker
arrangement is often conducted during the product design phase using BEM (boundary
elements method), which uses a wave-based approach in the frequency domain and can
account for any baffle or edge diffraction effects by the loudspeaker case or its surroundings.
However, it is not computationally affordable at high frequencies. Worth mentioning is
also another specialized method, which is the measurement and storage of loudspeaker
radiation data in the form of spherical harmonics coefficients. Since fairly high orders are
needed to reproduce the magnitude and phase information at high frequencies this format
has advantages primarily for single transducers and simpler loudspeakers with less complex
directional characteristics.
8.3.3 Reverberation
The reverberation part of the room response is usually determined differently from the
direct field and the early reflection’s part. That is because the precise knowledge of indi-
vidual sound arrivals is not as important for late or scattered (non-specular) reflections.
The density of reflections grows over time and their respective level declines. Therefore,
often Monte Carlo-based statistical analysis is used to estimate the reverberant field. For
example, in AFMG’s EASE AURA software, particles are emitted by the sound source in
random directions, traced through the room and detected at receiver locations. If enough
particles are used and the randomization is statistically correct, the result will converge
quickly even though only a small fraction of all possible reflection paths are actually
included.
Sometimes the reverberant tail of the room response is artificially generated based on
results from statistical room acoustics, e.g., using the Eyring RT equation. This approach
will, however, only work in rooms that have an approximately homogeneous and isotropic
diffuse field, which is not true for most acoustically challenging environments.
1. Finite element method (FEM): This approach is based on numerically solving the wave
equation in the frequency domain. It is mostly used for closed spaces, from loudspeaker
cabinets to small rooms. Because the space has to be meshed at a resolution finer than
the wavelength it is computationally very expensive and not feasible to use it for large
venues or high frequencies (Figure 8.11).
2. Boundary element method (BEM): Similar to FEM this approach also solves the wave
equation numerically in the frequency domain. However, it uses points only on the
modelled surface. It is often used for modelling the acoustic properties of loudspeaker
cones, baffles, diffusers and other structured surfaces. It is limited in a way similar to FEM
as the surface of interest also has to be resolved into a grid finer than the wavelength.
264
Figure 8.11 Computed modal sound field of a studio room (courtesy AFMG) showing the surfaces of
equal pressure.
3. Finite difference time domain method (FDTD): Due to the obstacles faced by using
FEM and BEM, FDTD has received more attention in the past years. This approach
is applied in the time domain. It models the sound wave numerically as it propagates
through the room. While FDTD still requires a spatial grid the calculations can be
conducted in a computationally more efficient way and a consistent, broad-band result
is obtained. However, in practice this approach is primarily employed for academic
research due to the necessary calculation times, which are still long.
Using such methods, low-frequency room modes, pressure distributions as well as transfer
functions at receiver positions may be calculated. As a practical example, even if the
receiver has no direct line of sight to the position of the source, e.g., if it is located behind
a pillar, the diffracted direct sound is computed at the receiver. As an approximation of
this wave-based approach, some particle-or ray-based simulation programs have introduced
edge diffraction models. These allow accounting for diffracted sound, for example, from the
orchestra in the pit to the visually shadowed stalls (see references 8 and 9 in Chapter 5).
Figure 8.12 Numerical optimization scheme for sound system configurations as used by AFMG
FIRmaker.
innovation: beam-steering based on FIR filters. This approach uses modelling results for
the unprocessed loudspeaker systems in order to calculate FIR processing configurations
specifically for a given venue or audience layout. In this manner the sound radiation of loud-
speaker arrays can be optimized to match the geometry of each individual space as well as
possible. In addition, it is also possible to avoid the radiation of unwanted sound into other
selected parts of the room.
The FIR filters are typically derived using numerical optimization methods. The room
geometry, the location of the sound sources and the design goals are given as input values.
The algorithm then evaluates a large number of possible sound system configurations, i.e.
FIR filters, and tries to find those configurations that yield the best results with respect to
SPL, coverage, power efficiency and other criteria (Figure 8.12).
The improvements that can be achieved by this kind of optimization can be substan-
tial, for example, with respect to SPL uniformity. The diagrams of Figure 8.13 show the
measured frequency response on-axis of a line array of 16 boxes covering a distance of
about 70 m. These so-called positional maps show the colour-coded level as a function
of frequency and distance from the array. The map on the top depicts the unprocessed
array whereas the map on the bottom displays the optimized results using one FIR filter
per box.
Obviously, the level distribution across the hall has become much smoother for the
entire frequency range. The standard deviation dropped from about 2–3 dB between 200 Hz
and 6 kHz to about 1–1.5 dB. Generally, pattern control is improved up to 13 kHz.
Given today’s abundance of computing power even in conventional desktop PCs as well
as the increasing capabilities of modern loudspeaker systems with respect to mechanical
control and signal-processing, numerical optimization solutions will become a standard tool
in the future. Already now, so-called Auto-Splay features are widely provided by software
that is used for mechanical aiming of line arrays, such as AFMG’s EASE Focus. At this
point in time the first mass-production 2D loudspeaker arrays, such as those offered by the
manufacturer Holoplot [3], are also entering the market and provide beam-steering and
26
Figure 8.13 Positional maps showing an example of improvement of SPL uniformity when using FIR
numerical optimization. Top: without FIR optimization. Bottom: with FIR optimization.
beam-shaping functions in the horizontal as well as vertical planes instead of being limited
to just the vertical plane.
Figure 8.14a Image source method. Top: construction of image source S1 by mirroring at wall W1.
The connection from image source S1 to receiver E determines intersection point R1.
Bottom: construction of the (possible) reflection using the point R1.
Figure 8.14b Image source method. Construction of image source S2 by mirroring at wall W2.
Intersection point R2 is outside of the actual room surface. The reflection is impossible.
⇨ If the connecting line bypasses the surface element there is no geometrically pos-
sible reflection (Figure 8.14b).
Obviously, this procedure can be applied to determine first-order reflections. It can also
be used recursively in order to calculate higher-order reflections. Additionally, due to
the reciprocity of the algorithm the receiver could be mirrored instead of the source
as well.
268
Figure 8.15 Ray tracing. Rays are stochastically radiated by a source S in random directions. Some hit
the detection sphere E after one or more reflections. In this example, the rays representing
the floor reflection RF and the ceiling reflection RC are shown. The direct sound path
indicated by D is computed deterministically.
While this method provides the complete set of possible reflections it is at the same
time computationally very expensive (the effort grows by N³). Therefore, it is mostly used
for academic or theoretical investigations or for determining first-order reflections only. In
practice different methods are required to compute reflectograms in complicated rooms and
at acoustically relevant reverberation times. It also has to be noted that this direct image
source algorithm cannot account for scattering effects.
Figure 8.16 Pyramid-or cone tracing. Schematic illustration of a beam tracing approach in two
dimensions. Cones with a defined opening angle are used to scan the room starting from
the sound source S. Receivers E located inside a cone are detected and validated.
270
Figure 8.17 Radiosity method. Patch P3 is illuminated and excited by radiation from patches P1 and
P2. It may also radiate sound itself.
The radiosity method is another concept that tries to model the diffuse field itself. In
this case the original sound sources are not considered, but the surface elements or surface
patches are regarded as diffusely radiating boundaries instead. Each surface element is
considered as an emitter of sound energy that is sent into the room and onto other surface
elements (Figure 8.17).
Another way to improve performance and memory requirements of the Monte Carlo
approach above is to use particles instead of rays. The basic idea is that sound particles are
emitted by the sound sources and detected by receivers. However, their propagation path is
not stored and duplicates are considered in a statistical way. In other words, the particle does
not carry any information except for its energy, propagation time and travelling direction.
8.5.6 Limitations
When considering the numerical methods mentioned above, one must be aware that
any approach has its shortfalls, inaccuracies and limitations. One of the most important
limitations of the geometrical approach has already been stated: typically, these methods
cannot provide accurate information in the low-frequency range. It is also a limitation of
typical ray-tracing approaches that if they are supposed to provide highly accurate results,
e.g., for listening purposes, they require large amounts of memory in order to store time,
level, directional and spectral information. Therefore, these can only be run in parallel for
a limited number of receiver locations.
Still, much of the result data may have to be compacted to some degree. For example,
reflectograms or impulse responses may be using a reduced time resolution of 1 or 5 ms, at
least for the late part. The computed time length of the response may have to be limited as
well. Such adaptations will obviously affect the result accuracy.
In many cases it is also important to consider how the algorithm treats phase relationships
between sources or reflections. The coherent radiation of sound, for example with respect
to the elements of a line array, requires that phase relationships are maintained. Discarding
phase information originating from the source radiation characteristics and the propaga-
tion path, as is often done for example when using particle models, will only work if the
phase response of one signal arrival (either direct sound or reflection) can be considered as
random (incoherent) relative to any other arrival. Obviously, there are other limitations as
well that stem from the underlying physical model for the simulation process. For example,
many algorithms assume that any reflections happen locally, i.e., that the surface absorbs
and reflects sound only at the point of intersection with the ray. It is also often assumed that
the propagation medium is homogeneous and isotropic, which is not true, for example, if
there are temperature gradients or air flow.
Another limitation is possibly imposed by the number of receivers, sources and surface
elements that are supported by the modelling algorithm. Calculation times may be very
long or the calculation may not be possible at all due to memory limitations. Similarly,
the number of particles or rays that are used for the simulation may be limited. This could
lead to results that are not as accurate as required because the number of rays is not suffi-
ciently high.
1. For example, the modelling process may yield a fairly detailed table about the proposed
sound system components, which brands and models are used, where loudspeakers
should be positioned and aimed and how they should be configured with respect to
gain, delay and filter settings.
Especially for line array systems it can be very helpful for system engineers to receive
the planned configuration of the system ahead of the event since they need to consider
the number of boxes required as well as the splay angles between individual elements
(Figure 8.21).
2. In a similar way, the acoustic treatment of the room can be documented in order to
generate reports concerning the acoustic materials that have to be used, for example,
to achieve a certain target reverberation time (Figure 8.22). Such a list may also be part
of a set of documents supplied for a tender.
3. Last but not least, the room geometry itself can be documented in order to either show
the underlying data and assumptions made or the proposed changes in order to improve
the acoustic performance. This could include, for example, the positions of absorbers or
the shapes of mounting niches for loudspeakers.
Obviously the room geometry can also be exported in other ways such as in a CAD
file format such as DWG or SKP.
4. Most importantly, graphics of relevant calculation results and modelling data are
created in order to be made available to other parties working on the project, to be
included in reports or in documents for a tender.
newgenrtpdf
273
Acoustic Modelling – Basics 273
Figure 8.18 Clarity C80 results shown as 3D mapping for theatre model (courtesy EASE 5 AURA by AFMG).
274
Figure 8.19 Typical example for a result echogram generated by ray tracing simulation methods (cour-
tesy EASE 5 AURA by AFMG).
Figure 8.20 Binaural setup with HRTF selections by head tracker. Blue: Right ear channel. Red: Left
ear channel.
275
1. The room geometry can only be entered or modelled as accurately as the dimensions
that are given by architectural drawings or on-site dimension measurements. For
new rooms or venues, it is also foreseeable that the final space may not be built pre-
cisely according to the blueprint. In either case, dimensions may simply be wrong
by a few centimeters. Additionally, details may be forgotten or omitted on purpose
in the simulated room. It makes little sense to model a power plug or the knob of
a door for acoustic purposes because their effect is negligible at low frequencies
and at high frequencies calculation results (and uncertainties) are dominated by
other contributions. However, some other geometrical elements of the room may or
may not affect the end result significantly, e.g., small windows, door steps, ceiling
grids etc.
2. The acoustic materials used for the simulation contribute to the overall uncertainty
as well. First of all, in many cases there are no accurate absorption data available for a
modelled wall, floor or ceiling. In this case assumptions have to be made or dedicated
absorption measurements have to be conducted. Secondly, even if there are reasonable
measurement data available they are limited to the assumptions and conditions made by
the measurement standard, such as ISO 354 or ASTM C423. Depending on mounting,
size and circumference, the actual absorption characteristics of the material in the room
will be somewhat different from the data acquired for the standardized sample in the
reverberation room. Last but not least, most absorption data sets are published only at
one-octave resolution and only for the range 125 Hz to 4 kHz. Therefore, extrapolation
and interpolation steps are required for many calculation approaches, e.g., if they work
with ⅓ octave data from 50 Hz to 20 kHz.
It is even more difficult to acquire additionally required data such as the scattering
coefficient of a surface material or structure. Often these have to be estimated based on
experience.
2. The data available for the sound sources used in the model play a significant role as
well. For most applications it is critical that the simulated loudspeakers have been
measured at high resolution, i.e., using impulse response or equivalent data, to deter-
mine the directional characteristics used in the simulation. These directional data sets
must include phase information and have a frequency resolution that is high enough
to avoid interpolation artefacts. Also, the angular resolution must be high enough to
avoid any sampling problems.
278
It has been shown in various publications that if the quality of all input data is sufficiently
high the direct field predictions are accurate within 1 dB compared to measurements. Room
acoustic results such as for the speech intelligibility index (STI) can be within the just-
noticeable-difference (JND), as well. However, if some of the input quantities are less pre-
cise, their effect on the result accuracy will depend on how much they contribute to the
overall result. This is not always easy to estimate.
frequencies where, for example, a large part of a light-construction wall can resonate when
a smaller part is excited by an incident sound wave.
It is equally important to precisely understand the settings and parameters for the calcu-
lation. In the simplest case, choosing a resolution setting for mapping on surfaces that is too
rough may easily lead to errors. Similarly, choosing too few particles may lead to erroneously
detected echoes or missing major reflections that will be established properly only when
using sufficiently high particle quantities. Sometimes there are other complex settings that
can be modified by the user and that may have a less clear effect on the end results.
8.8 Auralization
The goal and purpose of auralization are to make the properties of the room and sound
system audible. This requires several steps (Figure 8.24):
1. The detailed response function for the listening point(s) has to be computed. This typ-
ically includes information about the direct sound arrivals as well as reflections with
respect to arrival time, level, direction and frequency response. This data set is not yet
specific for any type of receiver or listening setup.
2. For binaural reproduction, e.g., via a headphone, the response function is convolved
with the head-related transfer function (HRTF) selected for the modelled listener. For
each ear, the individual arrivals are weighted based on the HRTF according to their
direction of incidence. This yields a set of two monaural impulse responses, one for
each ear. It is a disadvantage of binaural setups that in-head localization may occur if
Auralization is a very powerful and effective tool to present acoustic problems and solutions
to laymen. It is also a good tool to obtain a general acoustic impression of a space at various
locations in the room.
For practitioners of auralization a number of aspects are important to consider.
Auralization in general cannot and must not be the only basis for acoustic design
decisions or even commercial tenders. Auralizations are always subjective as they depend
on the perception of the listener and therefore can only complement objective, quanti-
tative results. When working with laymen in the field of acoustics auralization is a good
tool to create a general awareness of different acoustic effects and more clearly illustrate
the difference between different design options. In contrast, professional listeners, such
as musicians, may often be distracted by the perceived artificiality of the auralization
as a simulation result cannot accurately reproduce all details of a real-world listening
experience.
It should be clear that auralization results are limited by the same factors that were
discussed before for general simulation results. For example, to this day, no available simu-
lation program can adequately model the low-frequency behaviour of rooms, particularly
also diffraction effects. Aspects that are not considered by the simulation engine will be
missing in the auralization result as well. Comparably, if the input data into the simu-
lation are low-quality the auralization results will suffer as well. A typical example is
insufficient information about the absorption characteristics of the wall materials. As
a consequence, simulations of venues with high-quality demands on acoustics, such as
concert halls, should at present be generally complemented by scale-model measurements
and studies.
Finally, auralization is a tool that builds on top of modelling results. The methods and
circumstances used for the auralization will also affect the accuracy of the reproduction.
Both binaural and spatially based auralization will always add uncertainties and detrimental
effects as indicated above. That means that even if the simulation results are highly accurate
a poor reproduction setup may still cause unconvincing auralization results.
References
1. SEACEN project, Weinzierl. 2017. A database of anechoic microphone array measurements of
musical instruments.
28
9 Audio Networking
Stefan Ledergerber
9.1 Introduction
Since the late 1990, the professional audio industry has been shifting from point-to-point
digital transmission formats (such as AES/EBU or MADI) to IP-based standards (such as
AES67). This packet-based networking has brought massive flexibility, as well as enhanced
control and monitoring capabilities, to audio systems. It offers the flexibility of a physically
fixed installation becoming adaptable and expandable at a later stage through software con-
figuration and updates. Signal paths are no longer tied to physical cables but can be changed
at any time with the click of a mouse –without the need for dedicated audio routing hard-
ware. It is in the nature of packet-oriented transmission that audio signals automatically
reach the desired destination via the IT network.
Launched in 1996 by Cirrus Logic, CobraNet is widely regarded as the first successful
audio-over-ethernet network implementation and has become the backbone of many audio
installations, such as convention centres, theatres, concert halls, airports and theme parks.
There are still plenty of CobraNet installations, but issues with a relatively high latency
and limited scalability restricted its suitability in latency-sensitive applications such as live
sound, recording studios and broadcast facilities.
Dante, developed by Australian company Audinate [1] and introduced about 10 years
after CobraNet, stands for ‘Digital Audio Network Through Ethernet’. Dante offers sev-
eral major benefits over the first generation of audio-over-IP technologies, including
better usability and higher compatibility with standard network infrastructure. Dante
benefits from a huge equipment ecosystem with thousands of devices by hundreds of
manufacturers.
Before Dante reached its current position of dominance, there was considerable excite-
ment around a technology called AVB (Audio Video Bridging) due to its robust nature and
high level of automatic configuration of AVB-capable network hardware. Other industries
such as automotive and industrial automation adopted AVB and gave it a more general
name, as it no longer relates just to audio and video applications: AVB was renamed as TSN
(Time-Sensitive Networking) by the developing group of cross-industry manufacturers, the
AVnu Alliance [2]. Subsequently the Milan working group, a consortium of audio/video
manufacturers, decided to develop a more refined specification for use in professional audio/
video systems, called Milan. It is a specific version of TSN focusing on providing interoper-
ability amongst audio/video vendors. This was not given using the basic TSN specifications.
However, TSN requires special IT hardware to take care of audio requirements and there
is only a limited number of switch models available that support TSN. Furthermore, it has
severe limitations in terms of size and scalability of installations. For these reasons this
DOI: 10.4324/9781003220268-9
284
On the other hand, audio-over-IP networks may confront the user with the following
disadvantages:
• Since several audio samples of a channel are usually put into one packet to improve
overall efficiency, there is a given minimal latency, since the sender must first wait for
the audio samples made available before sending them over the network. This latency is
normally higher than with point-to-point digital audio standards but can be minimized
and approximated by using optimal packet formats and network setups.
• Since IT networks are not deterministic in terms of travel time of packets, a safety
margin in the form of an audio buffer must be inserted on the receiver end. This buffer
results in added latency. The fewer packet collisions are present in the network, the
more this safety margin (and therefore the latency) can be reduced.
• Added complexity by the variety of audio packet formats, requiring receivers and
senders to be aligned to the same settings. The complexity of audio-over-IP technology
is significantly higher than with previous technologies. The industry still has significant
work to do to reduce this complexity for the user by introducing intelligent and user-
friendly software solutions for managing audio networks.
time offset at all receivers, the so-called link offset. When a packet arrives at a receiver, it
remains in its buffer until it is time to play it out. Hence, the moment the audio gets played
out equals the sending time plus the link offset.
All receivers (e.g., loudspeakers with built-in audio-over-IP connectors) can achieve
phase accuracy amongst each other under two conditions:
1. Accurate time synchronization to the PTP clock leader (identical time base)
2. Identical link offset value set by the user in all receiving devices
Consequently, the link offset must be chosen based on the worst-case delay of all connections
in question. It is recommended to include a certain allowance in case of unforeseen
deviations of the package delivery times.
In the following text, these three topics will serve as a guideline for illustrating all relevant
aspects of audio networks and comparing the technologies available today.
9.2 Connectivity
This chapter covers some of the aspects that are essential for understanding the requirements
of audio networking. It also introduces terminology that will be referred to in the following
sections.
Deciding whether two IP addresses belong to the same subnet is impossible without veri-
fying their corresponding subnet masks. If the destination IP address of a packet is not in
the same subnet, the sending device must direct it to the IP address of the router instead of
sending it straight to the receiving device.
Sending packets within a subnet has many similarities with company-internal telephone
calls. All company numbers start with the same digits and differ only in the last part or in the
extensions. Likewise, two hosts within the same subnet have similar IP addresses, differing
only in the last digits. The first part is called network address, the second, individual to a
device, is called host address. The split between the two is indicated in the subnet mask by
the position of the digits ‘0’: The network address gets marked in the subnet mask by a value
greater than ‘0’, while the host address is the remaining right part, where the subnet mask
indicates ‘0’. This can best be understood by means of examples:
Host A
Host B
290
Host C
➔ Host C is in a different subnet than host A and B because it differs in its network
address (192.168.134 instead of 192.168.020). Host C cannot exchange packets with
hosts A and B without a router. In order to allow this host to communicate with A and
B, it must get a different IP address, starting with 192.168.134… . Alternatively, a
different subnet mask may be chosen for the entire setup, such as 255.255.0.0.
The decimal notation of a subnet mask used above is called dot-decimal notation. To indi-
cate the same information in shorter writing there is an alternative method commonly used
in IT departments. It is called CIDR notation or slash notation: right after the IP address
followed by slash, it specifies the subnet mask by indicating the number of digits greater
than ‘0’. But this notation refers to the binary form of the subnet mask, hence ‘255’ corres-
ponds to ‘11111111’. The subnet masks in the examples above therefore contain 24 ‘1’s in
their binary form.
Hosts from the above example in CIDR notation:
Installations using routers and connecting multiple subnets operate on layer 3 of the so-
called OSI model. This model divides the generic functionality of a network into seven
layers, each describing a set of functionalities networking devices must provide in order to
guarantee correct transmission of information, such as forwarding packets to the correct
recipient. All current IT devices follow this well-defined abstraction layer concept for facili-
tating interoperability between manufacturers. Installations operating on layer 3 can inter-
pret IP addresses, subnet masks etc. and therefore may forward packets across subnets. All
the technologies discussed here can operate in such scenarios. In contrast, some technolo-
gies are restricted to layer 2. This means their packets are delivered exclusively based on
MAC addresses and do not contain subnet information. Consequently, layer 2 networks
cannot be split into multiple subnets, their packets cannot be forwarded via routers and
their scalability is therefore somewhat limited. A popular example of layer 2 networks is the
already mentioned TSN/Milan as well as –further back in history -CobraNet.
9.2.3.1 Star
The star is in many ways the preferred topology. Multiple hosts are connected to a redirec-
tion device such as a switch or router.
Today, networks often combine two levels of stars to form a ‘star of stars’ in a hierarchical
sense. This is called spine/leaf architecture. The central switch/router (spine) must usually
forward more traffic than the peripheral switch (leaf), as most traffic between the segments
may pass through it. If the high-bandwidth link between spine and leaf is unable to forward
the traffic of all hosts simultaneously, this design is blocking. The opposite is a non-blocking
network design, where the high-bandwidth links are capable of transmitting the total traffic
of all hosts connected to the respective leaf switch.
9.2.3.2 Ring
Nodes can have multiple interfaces: at least two are required to realize a ring topology. Each
link between two nodes offers full bandwidth and it is up to the nodes to forward packets within
the ring. In this sense, each node acts as a switch, forwarding packets between its two interfaces.
Choosing a ring topology often makes sense when large distances need to be bridged and
the connections are costly. Practical examples are a network between several locations, but
rings are also formed for connection of devices within a rack where there is no space for an
additional switch. Ring topologies offer a certain built-in redundancy –all devices can be
reached even if an interconnection is broken.
Unicast often uses the Transmission Control Protocol (TCP), whereby the receiver
confirms successful reception of each packet back to the sender. If the confirmation is not
received, the sender automatically re-sends the packet. An alternative to TCP is UDP
(User Datagram Protocol). In this case, the sender trusts the network that the packets will
successfully arrive at the receiver. There is no acknowledgement of receipt, and if the packet
is lost, the content will be lost. Although maybe unexpected, this in fact is the preferred
transmission mode for professional audio networks. As latency must be low, retransmission
of packets cannot be afforded as this would cost time and therefore increase the overall
latency of the audio transmission. In the case of live audio packet loss, it seems best to con-
tinue playing the next audio samples instead of trying to recover the previous one.
In audio applications, there is often a requirement to receive an audio signal at multiple
destinations in parallel, e.g. a microphone signal routed in parallel to the mixing consoles
for front-of-house and monitoring. Even a third destination could exist such as an audio
recording device. When the sender transmits audio packets in unicast, the audio signal
comes as three packets with identical content but different destination addresses. This
293
results in an unnecessary processor load on the sending host, but also takes up bandwidth to
each of the three destinations. This can be optimized using multicast.
The use of multicast has numerous advantages, including less processor load on the
sender and less overall traffic on the network. The sender addresses its packets to multi-
cast addresses and not to host addresses. It does not know which recipient(s) the packets
will arrive at. Multicast addresses are comparable to frequencies on a radio: anyone who is
interested can tune in and receive the content. The sender puts the audio data in a packet
once, sends it to a multicast address and the receivers need to know which multicast address
they want to listen to. Hence, this is more of a ‘pull’ than a ‘push’ principle.
Note that multicast addresses are inherently unrelated to subnets, as they are not related
to nodes and their IP addresses. Therefore, multicast packets are received by devices across
subnets unless they are separated by VLANs (which they usually are).
The only difference between a UDP unicast packet and a multicast packet is its destin-
ation address. By definition, any packet with a destination address in the range 224.x.x.x–
239.x.x.x is a multicast packet and gets treated accordingly by the network. Several
subsequent packets containing a particular audio signal are called an audio stream, or a
multicast stream, to specifically describe multicast operation. The switches involved must
have multicast forwarding enabled. This is a device-internal setting. If the destination
address of a packet is in the aforementioned range, a multicast-capable switch will forward
these packets to multiple interfaces. This bears the risk of unnecessary (over)load of
the network, as not all connected hosts are interested in a certain audio stream. Some of
them may be, for example, printers that have nothing to do with audio at all. Therefore, it
is important that the multicast traffic only reaches hosts asking for it. The solution for this
is IGMP snooping (Internet Group Management Protocol). All discussed audio network
technologies support IGMP snooping by default. If enabled within the switch, multicast
294
packets get only sent through those interfaces where periodical IGMP request arrive from
the connected host. If no requests are received, the corresponding multicast is stopped so
that no unnecessary traffic gets on that link. IGMP snooping can be considered as a kind of
floodgate that is closed by default and only opened on request. It is strongly recommended
to enable IGMP snooping in a multicast network –in all switches. Otherwise, the risk of
overloading the network and non-participating hosts is unnecessarily high.
The function of an IGMP querier should also be mentioned. It triggers the regular
requests of all hosts. Normally this function is activated in one particular switch. Although
295
DSCP value
Highest priority (for example, queue 4) 46 and 56 (synchronization and Dante audio)
Medium priority (for example, queue 2 or 3) 34 (audio, all with the exception of Dante)
Lowest priority (for example, queue 1) All others
Dante networks audio gets tagged with number 46 while synchronization gets the number
56. Hence, unfortunately number 46 indicates two traffic types: Dante uses it for its audio
packets while all others use it for synchronization. Therefore, some non-Dante devices
allow the user to alter this value manually, so that they can be adjusted to the Dante policy.
But if this is not possible, the set-up in Table 9.1 is a good compromise and likely to work
best in practice.
9.3 Synchronization
To operate synchronously and with low latency, all IP senders and receivers must be
synchronized to the same clock. With traditional audio technologies, the devices were
synchronized either by a separate word clock connection or by using synchronous audio
formats such as AES/EBU or MADI. The receivers were able to directly derive their fre-
quency and phase from these formats, as they provide some kind of ‘pulse’ to indicate
the moment when an audio sample is being generated or played back, for example in
analogue-to-digital converters.
297
Figure 9.12 A PTP leader synchronizes the absolute time across all followers. Each device then derives
its own media clock from this.
Audio-over-IP no longer relies on traditional clocks in the form of ‘pulses’, but on abso-
lute time information instead. All devices on the network get synchronized to the same
time of day using the Precision Time Protocol (PTP). The origin of this time is in a device
called the clock leader (also called clock master) while the devices adjusting to it are clock
followers (also called clock slaves). Each device must then generate the desired traditional
clock internally (for example, 48 kHz), derived from the absolute time received via PTP.
This resulting internal clock is the media clock. If properly implemented by the manufac-
turer, the media clocks of each device show the same frequency and phase. High accuracy
can be achieved technically but represents a challenge for audio manufacturers. Therefore,
the phase accuracy between PTP synchronized devices may vary depending on its quality.
An acceptable variance is <1 µs.
Since IT networks are not sufficiently deterministic as to when a packet gets delivered,
accurate synchronization of the devices requires a sophisticated approach. It is the principal
job of each PTP follower to compensate for two effects occurring in any network:
1. JITTER COMPENSATION
The current time is indicated in sync messages by the PTP leader to all followers using a well-
known multicast address (224.0.1.129). By the nature of the network and queues in switches
being also used by other traffic, this information does not always arrive with a constant delay
at the follower end. This variance is called packet jitter or packet delay variation (PDV) and
must be smoothed out by every PTP follower. Typically, audio networks use a sync rate of
1–8 messages per second with 8 being the recommended value for maximum compatibility.
2. DELAY MEASUREMENT
The second crucial task for a follower is the measurement of the packet delay between
leader and follower. This is necessary to correct the time received in the sync messages by
298
the time they took to travel through the network. This measurement includes delays of all
components between them, including cables and switches. In other words, the cable length
and the number of switches inserted between leader and follower does not matter. The PTP
time amongst all followers is the same with an accuracy down to nanoseconds. The only
condition for PTP to work accurately is for the delay to stay constant and symmetrical in
both directions, from leader to follower and vice versa. But this condition is normally met
in local network installations. The delay gets measured by an exchange of two messages,
the delay request from the follower and subsequently the delay response from the leader.
Measuring this delay is often carried out at the same rate as the sync rate, anywhere between
1 and 8 times per second.
Since a leader must be able to exchange messages with every single follower, there is a
limit to the maximum number of followers a leader can handle. Unfortunately, this is not
a clearly specified limit as it depends on the message rates used. In typical audio setups
without a dedicated PTP leader device, master-capable audio devices may typically serve
anywhere from 25 to 250 followers.
Figure 9.14 Example of a PTP scenario with several devices that can be leaders.
wins over all others. If a priority 1 value is lower on one unit than others, that unit becomes
the leader. The value under priority 2 is only relevant if all others before it –including pri-
ority 1 –are the same in multiple units. This can occur in installations with two identical
types of devices whose priority 1 has been set to the same value by the user. In this case, the
priority 2 determines which of the two units acts as the main leader and which one acts as
a backup.
Note that some devices do not offer the possibility of entering a numerical value for pri-
ority 1 and 2 to the user. Instead, they simply show an option to select a device as ‘Preferred
Leader’. Technically these products use a fixed value in their announce messages, defined by
the manufacturer. It will therefore still be possible to win over such a device by entering an
even lower value in another PTP leader. Some devices also support a setting called ‘Slave
only’ or ‘Follower only’. When enabled, the device never tries to take over and become the
PTP leader.
On top of the two user-definable values priority 1 and 2, there are further criteria that
influence the leader selection. But those are set by the manufacturer and cannot normally
be changed by the user. They specify the accuracy of the oscillators and whether the leader
follows an external time source such as GPS. The complete list of these criteria by which
the BMCA process selects the leader is given in Table 9.2.
Parameter Value
Note: The PTP domain may allow the use of multiple time bases on the same network. For audio
applications, it is not beneficial to use more than one PTP domain within the same installation.
Unfortunately, some products do not allow the changing of this value by the user and are fixed at
0. Therefore, operating on domain 0 throughout all installations is recommended.
audio, standardization organizations such as AES and SMPTE have defined recommended
sets of values referred to as PTP profiles. The most important ones are currently
• Standard [6]
• AES67 Media [6]
• SMPTE ST 2059-2 [7]
To meet all these requirements and operate audio equipment as reliably as possible, the
settings in Table 9.3 are recommended in practice (see also [8]).
1. Prioritization of PTP
The minimum requirement is to prioritize all PTP messages over others. As described
earlier, the corresponding functionality is available in most switches today and is called
Quality of Service (QoS). If the switch is correctly set up, it forwards PTP packets
immediately whenever there is one waiting in the queue. All other packets are tempor-
arily held back to prioritize PTP. In networks where PTP and audio share the network
with other traffic like office-type applications, enabling QoS is essential. In separated
networks though, it may not make a significant difference –provided there is sufficient
bandwidth for PTP and audio.
2. Boundary clock switch
Some newer switch models actively participate in PTP synchronization rather than just
forwarding its packets. These boundary clock switches synchronize themselves to the
leader in the same way as any other follower and become themselves the leader for all
subsequent devices, using the identical time base. The unit that synchronizes the entire
setup centrally is now referred to as the grandmaster or primary leader.
Communicating with followers such as individually answering delay requests
requires processing power in the leader devices. Using boundary clock switches
this load gets distributed amongst all switches. Offloading synchronization tasks to
301
switches saves processing resources on the grandmaster, but it also can keep the PTP
packet jitter low throughout the network since every switch individually re-generates
the sync messages.
Boundary clock switches get their own PTP priority settings. If the grandmaster is
lost, these settings get disseminated by announce messages as with any other leader.
They may even be able to run freely for a while, temporarily synchronizing the entire
network themselves. Because of all these advantages, using boundary clocks in a setup
makes PTP scalable up to very large systems with virtually no size limitations.
3. Transparent clock switch
This type of switch actively participates in PTP synchronization as well, although
not to the same degree as boundary clocks. All PTP messages pass through the
switch, forwarded from the grandmaster to its followers. Hence, the grandmaster’s
message load is not reduced in any way. But transparent clock switches measure the
variable delay of each packet as it passes through them. This delay gets entered into
the designated correction field in sync and delay request message. Therefore, packet
jitter occurring within this switch can easily be compensated in the follower. It just
initially needs to take the correction value into account during the synchroniza-
tion process. In practice this means transparent clock switches do not introduce
any relevant packet jitter to PTP but keep the processing load of the grandmaster
unaltered.
9.3.3.1 Recommendation
PTP is the foundation for any audio-over-IP technology discussed here. Proper PTP syn-
chronization has a major impact on system reliability. Shortcomings such as high jitter or
loss of synchronization cause all sorts of undesirable effects on audio transmission, including
dropouts or total loss of signal.
302
A clean PTP concept is comparable to a solid grounding concept during analogue audio
days. If it is done properly, the likelihood of problems is low. If not carefully set up, intermit-
tent and seemingly random issues may occur that lead to long troubleshooting sessions.
For small systems, using non-PTP-aware switches is not a problem, possibly with QoS
enabled. However, when it comes to scalable or larger systems, using boundary clock
switches throughout the network is the recommended approach to achieve full stability. It
is difficult to define a limit in system size beyond which the use of boundary clock switches
is necessary, since it depends on how well the chosen products can cope with gradually
degraded PTP. Recommendations range from 30 to 250 nodes.
• Multicast address and port of the multicast packets containing the desired audio
• Settings used by the sender:
• Sampling frequency (e.g., 48 kHz)
• Audio resolution (e.g., 24 bit)
• Number of channels combined in one packet
• Number of audio samples of each channel (packet time, e.g., 1 ms)
Variations in these values result in different stream formats, and not all devices are capable
of generating or receiving every combination. For this reason, it makes sense to reduce these
variations, as given in the AES67 standard [6].
A management system is needed to oversee the entire system and to set up senders
and receivers. Several manufacturers have started to develop their own connection
management tool. One of these is the Dante Controller software by Audinate [1],
304
which is used to establish audio connections between Dante devices. Another tool is
ANEMAN by Merging Technologies [5], which can be used for a variety of RAVENNA-
based products.
To describe a stream, virtually all technologies make use of a standardized method: the
SDP file. The file is generated by the sender and contains all the information required by
the receiver to properly retrieve the respective audio stream. It is the control software’s task
to copy this file from the sender to the receiver.
An example of an SDP file is given in Figure 9.17, just to illustrate how all relevant infor-
mation for the receiver is contained therein.
The SDP file is usually hidden from the user, but some products show it in their extended
user interface. The SDP file is an IT industry standard that has been used for many years in
applications such as video conferencing or IP telephony. Even the popular free VLC Media
Player software [10] can interpret its content, subsequently receive an audio stream and play
it back via the PC’s internal speakers.
In practice, audio packets contain a specific number of samples per channel and mul-
tiple channels on top of that to achieve a reasonable packet size and thus maximize
the use of bandwidth in the network as well as the processing power of the connected
nodes.
The most common stream formats are shown in Table 9.4 (48 kHz, 24-bit resolution).
9.5 Latency
If a packet contains, for example, 1 ms of audio, the latency of this connection will always
be >1 ms. This is obvious because the sender must then buffer 1 ms of audio before putting it
into a packet and then sending it over the network. This delay is followed by the travel time
through the network with all its switches and queues before finally reaching the buffer in
306
Figure 9.18 Stream variants with identical packet size: number of channels versus packet time.
2 48 samples (1 ms)
2 6 samples (0.125 ms)
8 48 samples (1 ms)
8 6 samples (0.125 ms)
16 6 samples (0.125 ms)
64 6 samples (0.125 ms)
Note: 16-or 64-channel streams are not feasible with 1 ms packet time, as packing 16 channels and
more with 48 samples each into one packet would exceed the MTU.
the receiving device. The total latency of an IP audio connection, the link offset, is therefore
the sum of these three factors:
1. Packet time
2. Travel time on the network
3. Receive buffer
In practice the technical term link offset carries many alternative names, depending on
the manufacturer, for example delay or latency. It is the user’s responsibility to choose a
link offset that is sufficiently long so the receive buffer never runs empty and hence audio
interruptions never occur. Despite the widespread perception that networks are slow, in a
typical situation the network is not the most dominant factor to overall latency. In a net-
work with very little traffic, it is in fact most likely the sender’s packet time. Therefore,
307
some users may decide to select a shorter packet time despite the increased processing power
requirements and packet overhead. But if there is significant competing traffic in the net-
work, the packet jitter of the audio stream increases (more queuing-up in the switches) and
the receive buffer will be the dominant contributor to the total latency as it must compen-
sate for the packet jitter.
Reducing the competing traffic in a network directly leads to a reduction in packet jitter,
which subsequently reduces the need for its compensation in the receive buffer and then
ultimately allows for a minimum latency setting. It is generally worthwhile to reduce the
packet time only when the jitter effects have been minimized.
9.6 Standards
Looking at the previous explanations it becomes apparent that a great deal of parameters
can be user-defined within an audio-over-IP installation. In order to limit the number of
variations, the availability of open standards is a good thing. In many ways, all manufacturers
have attempted to achieve the same thing, often without any substantial technical reason
not to do it the identical way as others. Fortunately, though, after some manufacturers had
developed their own technology, they agreed on a common denominator that all would
use as a basis to go forward. This was the initiation of the AES67 standard in 2013. This
standard defines a minimum set of parameters that must be supported by all manufacturers
that adhere to AES67. Hence, certain manufacturers may support more parameter settings
than the AES67 minimum; in this case, it is the user’s responsibility to verify that all devices
on the network support those.
9.6.1 AES67
According to the AES67 standard [6], all devices must meet the following minimum
specifications (excerpt):
308
Note: Many additional parameters and values are stated in the standard, but they are not a
minimum requirement as listed above.
To keep all these streams in sync and to avoid the problem of audio being out of sync with
the video (bad lip sync), all essence streams must include a PTP timestamp in each packet.
A receiver can then easily reconcile them and ensure that there are no lip sync issues,
also known as AV delay. This overall concept is described in a separate standard with the
number SMPTE ST 2110-10 [11].
Fortunately, the audio part of this concept is almost identical to the specifications of
AES67 [6], hence any AES67-compatible device can participate as an audio device in an
SMPTE ST 2110 network.
309
9.6.3 NMOS
While the previously mentioned standards focus on the interoperability of audio and video
streams between manufacturers, the Networked Media Open Specifications (NMOS) purely
specify the control aspects of the application which are in fact fully independent of audio or
video. Therefore, NMOS is a well-suited complement to the above standards, completing
them by specifying topics such as device/stream discovery and connection management.
NMOS is a series of constantly evolving specifications attempting to standardize more and
more subtopics. The most important of these are:
• IS-04: Device and stream discovery (initially using Bonjour/mDNS, but also for larger
facilities with multiple subnets).
• IS-05: Connection management (e.g., how multicast addresses are assigned to senders
and how SDP files are transferred from senders to receivers).
• IS-08: Control of audio crosspoint matrices in senders and receivers (which channels
are fed into/from a stream).
Further documents are under development, specifying additional topics, e.g., net-
work security, management of devices etc. The NMOS specifications are developed
by a group of industry representatives called Advanced Media Workflow Association
(AMWA) [9].
9.6.5 AES70
Even before the development of the above standards, a group of companies called Open
Control Alliance (OCA) [17] developed a protocol in 2011 with the goal to make devices
from different manufacturers interoperable in terms of device control and monitoring,
such as changing processing parameters in an audio mixing console or setting a micro-
phone gain.
In addition, the protocol contains a specification of how audio stream- related IP
parameters are set. Although AES70 is sufficiently generic to serve audio and video
applications in general, its primary focus was audio. Its acceptance in the market remained
somewhat limited [18, 19, 20].
9.6.6 SNMP
The IT industry has been using the Simple Network Management Protocol (SNMP) [21]
for many years to monitor the health of hardware components. SNMP traps contain an
individual message ID and are sent spontaneously by each monitored device. Monitoring
software must have access to the device-specific MIP file to translate the received ID
into a human-readable error message. One of the shortcomings of SNMP is the fact that
messages are transmitted using the UDP protocol. Therefore, if the message is lost on the
network, the missing information about the error state of a device is not immediately
retransmitted.
• No oversubscription
An audio stream does not get established unless there is sufficient available bandwidth
throughout the network. Other packets cannot overload the link since priority is given
to audio by the nature of this technology. Due to this automatic bandwidth reservation,
there is no need for manual configuration of prioritization.
31
• Traffic shaping
Packet jitter is reduced by each switch upon exiting its queues. Each switch takes care
not to transmit audio at irregular intervals.
• Transparent clock by default
AVB/TSN-capable switches operate in a mode similar to a PTP transparent clock and
therefore provide stable PTP synchronization without further considerations.
9.7.1 Ember+
Talking about control protocols, another variant is Ember+. The specification as well as
source code was made openly available by the company Lawo [22]. The Ember+philosophy
312
9.7.2.1 Connectivity
When it comes to multicast addresses, Dante products allow specification of a ‘prefix’ for
the addresses used by the system, narrowing down the range of multicast addresses within an
installation. This means that this product can then only work within this range of multicast
addresses and other AES67-compatible products must operate within the same range as well.
9.7.2.2 Synchronization
Dante products use PTP version 1 (PTPv1) by default, while AES67 requires at least PTP
version 2 (PTPv2). It is important to note that these two versions are not compatible, although
they can run in parallel on a given network. When setting up an AES67 network, it is
recommended to switch all Dante products to AES67 mode and thus enable PTPv2 synchron-
ization. Subsequently the device with the highest clock quality should be selected as the PTP
leader or grandmaster. AES67-capable Dante devices will then automatically send out PTPv1
messages just in case some older Dante products are also present on the network. Therefore,
care must be taken to ensure that in every VLAN containing older Dante products (including
the Dante Virtual Soundcard software) at least one AES67-enabled Dante device is present
for the purpose of PTP translation. However, if multiple AES67-enabled Dante devices are
present, all of them will synchronize to PTPv2 directly and only legacy devices will use PTPv1.
Dante products seem to require at least 4 sync messages per second. Therefore, the use
of the profile AES67 Media [6] or SMPTE ST 2059-2 [7] which use 8 sync messages per
second is mandatory, while the default profile (1 sync message per second) does not meet
the synchronization requirements of Dante products. It should also be noted that Dante
devices will operate in PTP domain 0. This can only be modified by using Audinate’s Dante
Domain Manager software.
AES67 option must be selected. In this case the packet time for Dante senders is fixed to 1
ms, but receivers can accept shorter packets as well.
Dante sends the SDP file to all interested devices using the Session Announcement
Protocol (SAP) on multicast address 239.255.255.255. Since no discovery protocol is speci-
fied in the AES67 standard, the user should check if any of the other AES67 devices have
the option of SAP support. If yes, discovery will work automatically across technologies. If
not, conversion software tools are available, as mentioned in section 9.4.1.
9.8 Redundancy
In the early days of audio networking, some users were sceptical about the reliability of
IT hardware. Due to its widespread use though, IT equipment is well proven and often
more reliable than traditional audio equipment. In addition, most IT network components
offer several mechanisms for diagnosis and quick problem-solving in the case of equipment
failure, some of them described here. Note that most likely these processes must be manu-
ally activated in the network switches.
receiver. In such a setup, each node needs to provide two network interfaces connected to
both networks. The sender creates two packets with identical (audio) content, stamps the
identical PTP time on both and then sends them over both networks. On the receiver end,
both packets are received and unpacked. Even if one of the packets is lost, the remaining
packet contains all the information and ensures that the audio continues without interrup-
tion. In fact, this mechanism is the only approach in a network to compensate for occa-
sional packet loss without having to repeat them from the sender and therefore add latency.
In other words, stream redundancy protects against the loss of single packets all the way up
to the complete failure of one half of the network.
Fortunately, a standard describes this mechanism called seamless protection switching
or hitless merge. The standard is SMPTE ST 2022-7 [23] and describes how content is
duplicated and how the packets are handled on the receiving end to ensure uninterrupted
signal flow. It is formulated to be independent of the actual data type contained in the
packets and is therefore applicable to audio and video.
Figure 9.27 Example of a well-set link offset. All packets arrived within the set latency.
Figure 9.28 Example of a link offset that is too short. Not all packets arrived within the set latency.
other traffic. Otherwise, distorted audio or occasional audio dropouts may be observed.
Another rule of thumb is: do not build mixed networks without active Quality of
Service!
(a) Extend the link offset of the connections to/from the computer
(b) Close some applications running on this computer
(c) Change the respective sender setup to use a longer packet time and thus generate fewer
packets per second, or
(d) Over-provision the computer hardware in terms of computing power, since common
operating systems do not guarantee real-time execution of critical applications.
Since conditions on a particular PC may change over time and with any newly installed
software package it is advisable to leave a device in the same software state as long as pos-
sible unless one can re-test audio for an extended period of time before using it product-
ively again.
Last, but not least: it should not be forgotten to disable any standby or power-saving
modes on a PC during audio use. Not doing so may unexpectedly interrupt audio playout
or recording.
• Multicast
• IGMP snooping, including querier
• Quality of Service (QoS)
In addition, there is one more thing to consider: some switches save energy by grouping
packets in their queues and then sending them all together at once. With audio, such
320
References
1. www.audinate.com.
2. https://avnu.org.
3. www.qsc.com.
4. www.ravenna-network.com.
5. www.merging.com.
6. AES67-2018: AES standard for audio applications of networks -High-performance streaming
audio-over-IP interoperability, available on https://aes.org.
7. SMPTE ST 2059-2:2021: SMPTE Profile for Use of IEEE-1588 Precision Time Protocol in
Professional Broadcast Applications, available on www.smpte.org.
8. AES-R16-2016: AES project report -PTP parameters for AES67 and SMPTE ST 2059-2 inter-
operability, available on https://aes.org.
9. www.amwa.tv.
10. www.videolan.org.
11. SMPTE ST 2110-10:2017: Professional Media Over Managed IP Networks: System Timing and
Definitions, available on www.smpte.org.
12. SMPTE ST 2110-20:2017: Professional Media Over Managed IP Networks: Uncompressed
Active Video, available on www.smpte.org.
13. SMPTE ST 2110-30:2017: Professional Media Over Managed IP Networks: PCM Digital Audio,
available on www.smpte.org.
14. SMPTE ST 2110-40:2018: Professional Media Over Managed IP Networks: SMPTE ST 291-1
Ancillary Data, available on www.smpte.org.
15. https://ipmx.io.
16. https://aimsalliance.org.
17. www.ocaalliance.com.
18. AES70-1-2018: AES standard for audio applications of networks -Open Control Architecture -
Part 1: Framework, available on https://aes.org.
19. AES70-2-2018: AES standard for audio applications of networks -Open Control Architecture -
Part 2: Class structure, available on https://aes.org.
20. AES70-3-2018: AES standard for audio applications of networks -Open Control Architecture -
Part 3: OCP.1: Protocol for IP Networks, available on https://aes.org.
321
10.1 Introduction
Prior to putting a sound reinforcement system into operation, the installer and/or the con-
sultant need to perform tests regarding its electrical, mechanical as well as its acoustical
functionality. These tests ensure that the system is ready for operation according to the
specifications and include checking for electrical and mechanical defects, polarity of the
loudspeakers, Sound Pressure level calibration (including amplifier gain, SPL for different
zones, SPL limiters), time delay settings as well as adaptation of the speakers to the existing
room acoustical conditions, using frequency response optimizations (equalization), minim-
izing room excitation and boundary reflections, and maximizing uniformity of SPL distribu-
tion and speech intelligibility for the entire audience area.
The following sections will help in getting an overview on the topics that need careful
consideration during setup and optimization of new installations as well as maintaining and
improving existing systems.
10.2.1 Electrical
Before powering up the system, some investigations concerning the electrical connectivity
and setup are required. First item would be electrical power: how are the various electronic
devices powered, what power phase do they use, what is the configuration of the fuses. It is
recommended to use a sequenced power-on schedule that powers the amplifiers (or active
loudspeakers) last when turning on the system and makes sure to power them down first
when switching off. Is electrical power redundancy available in the system (universal power
supply, backup generator or similar) and is it operational?
Secondly, it is beneficial to verify whether the signal interconnectivity between devices
is correct –balanced vs. unbalanced cables and connectors, line level vs. microphone
level, digital or analogue signals, etc. In modern systems, this task involves substantial IT
knowhow, as signal linking increasingly takes place in a digital matrix or in a network
switch (see Chapter 9 for details). It is advisable to have the person responsible for the
installation at hand for this type of in-depth testing.
After the signal routing test is completed, it is advisable to test the impedance of the load
on each amplifier channel (assuming the amplifiers are not built into the loudspeakers). For
this, the loudspeakers must be disconnected from the amplifier output and a classic resist-
ance test needs to be performed. This procedure enables identification of short circuits,
DOI: 10.4324/9781003220268-10
32
10.2.2 Mechanical
Verifying the correct mechanical installation includes checking the installation racks
for sufficient airflow and/or cooling to and from the heat-producing equipment (such
as amplifiers), making sure no vents are blocked by wiring harnesses etc.; all cables
need to be securely fixed to the rack in order to minimize stress on the connectors.
When it comes to the mechanical side of the loudspeaker installation, the correct
positioning and aiming of each device in the room needs to be inspected, and the
choice of mounting hardware and whether the loudspeakers are properly secured, espe-
cially if they are mounted above audience. While it might not be the responsibility of
the consultant to install the loudspeakers correctly and according to local code, it is the
planner’s responsibility to draw the attention of the appropriate installer if something
needs to be improved or fixed.
10.2.3 Acoustical
Finally, after testing electrical and mechanical matters, the system can be switched on and
initial acoustical tests can be performed. It will be immediately evident if there are any
undesirable noises when turning on the system. There should not be any loud plops or
clicks, nor any constantly audible noise, buzz, hum etc. after turning on the system.
Testing should start with low gain (low sound pressure levels) at first to make sure that no
components are harmed if there are still some faults in the installation. It is also advisable
to test one loudspeaker or, if possible, even just one driver at a time. Pink noise test signals
can quickly help in identifying the frequency content of a device under test but are not
very helpful in determining if the loudspeaker is causing distortion or if mechanical noise or
rattling occurs. These issues become more obvious when using sine sweep signals, a sinus-
oidal test signal that sweeps the frequency range from very low (usually 20 Hz) to very high
frequencies (usually 20 kHz) (see 10.6.2).
10.3 Troubleshooting
• Humming
This effect usually occurs at the electrical mains frequency (50/60 Hz and harmonics).
The cause could be:
324
10.4.1 Electrical
Electrical optimization and commissioning of larger sound reinforcement systems is an
important part of the overall tuning process and typically represents a precondition for the
subsequent acoustic tuning of the system.
Firstly, the gain structure throughout the signal chain needs to be optimal: all devices
within the system need to operate within their nominal input and output voltage range,
otherwise signal degradation will occur. All source devices that will be used in the system
need to be tested for their actual output level and signal type (balanced, unbalanced, digital
with all its sub-types). Unnecessary signal conversions need to be avoided, be it from
balanced to unbalanced or from digital to analogue and back.
If an analogue input is available for public use (for example a line input to plug in a guest
device for rehearsals etc.), it needs to be correctly limited in level to make sure that the
input to the system cannot be overloaded. Microphone gain settings also must be set some-
times for a multitude of possible users; this requires some careful setting of gain, enhancer,
compressor and limiter parameters.
If gain before feedback is an issue in the installation, compression on microphones needs
to be set with caution. The use of a dedicated narrow notch filter bank, an automatic feed-
back suppressor or careful equalization is recommended.
In addition to the above, crosstalk between channels as well as signal-to-noise ratios
(S/N ratios) can be measured.
10.4.2 Mechanical
It has to be verified that the loudspeakers are firmly mounted in their brackets, the truss
or the bumper and that nothing is rattling. It is very important that the loudspeakers can
radiate freely, with no obstruction of the dispersion through installations such as light
fixtures, trusses, beams, curtains or pillars. Sometimes not all of these elements are known
in the planning phase and might have been added just prior to commissioning.
It is often possible to significantly optimize the coverage of the audience area by
adjusting the alignment (orientation or aim) of loudspeakers, the inter-box angles in line
arrays or the directivity pattern of digitally steered line sources. This includes directing
sound energy away from open microphones (increasing gain-before-feedback) and hard
reflecting surfaces.
10.4.3 Acoustical
The acoustic optimization and tuning of a sound reinforcement system should always be
preceded by the evaluation of the room acoustical properties of the space. This is necessary
to determine to which degree the quality of the sound transmission will be affected by the
room acoustic environment. In a professional design of a sound system this factor will have
been considered prior to the installation, maybe in computer simulations, and can now be
compared to the actual measurements of the finished room.
Room acoustical data can sometimes be obtained from the measurements performed by
the responsible acoustic consultant.
326
Figure 10.1 Top view of a convention hall, showing measurement locations (R) in the auditorium and
source locations on stage.
327
Figure 10.2 Example of an artificial human speech source for room acoustic measurements.
only display single number SPL values but can analyse the spectral frequency response
of the noise. This allows for later analysis of the respective noise criteria (NC –Noise
Criterion, NR –Noise Rating, GK –GrenzKurve).
• Sound level distribution in the auditorium and on the stage area. The level diffe-
rence between the highest and the lowest sound level measured over the relevant
audience area should not exceed ±3 dB, but in any case should stay within ±5 dB
(SPL, dBA and dBZ).
• The frequency response in the audience and stage areas. Geometry, room acoustics
and the choice of loudspeakers usually do not result in a flat frequency response in
the audience areas throughout the venue. Measuring the frequency response allows
one to optimize it using equalizers. The tolerance ranges of reproduction curves for
different applications are given by Mapp [1] (Figure 10.3). Not all acoustic phenomena
that cause deviations in the frequency response can be compensated by equalization,
though: destructive interference from boundary reflections or loudspeaker placement
for example will remain unmodified after an equalizer is applied since the direct sound
newgenrtpdf
328
328
Gabriel Hauser and Wolfgang Ahnert
Figure 10.3 Tolerance curves for the reproduction frequency response in different applications: (a) recommended curve for
reproduction of speech; (b) recommended curve for studios or monitoring; (c) international standard for cinemas;
(d) recommended curve for loud rock and pop music.
329
10.5 Documentation
All too often this part of the commissioning is neglected since it has no immediate effect on
the outcome of the process. But in later phases of the project, maybe after some years when
the first update or replacement of equipment occurs, the quality of the documentation will
determine the quality of the work that was originally done. It is well possible that another
party is responsible for the update, and if they can rely on proper documentation, it will be
appreciated.
30
10.5.1 Electrical
All settings and presets need to be exported, saved and backed up, and written notes on
firmware versions, routing and settings that cannot be saved (such as analogue equalizer and
gain settings, microphone types used etc.) need to be taken and archived. Choosing logical,
unique names for the files, including the date, will prove helpful.
10.5.2 Mechanical
All changes that were made in the process need to be documented, such as loudspeaker
orientation and positioning. If anything needs to be modified or added (safety mounting,
doors or hoods for racks and mixing desks etc.), this should be documented as well.
10.5.3 Acoustical
Documentation of the state of the venue during measurements: was an audience present,
seats, positions of acoustic banners, iron curtain, state of dividing walls/doors/windows?
All measurements that were taken should be saved and named appropriately. It might be
required to measure the system in multiple setups (i.e., all loudspeakers active, only indi-
vidual loudspeakers or zones active, each with different presets etc.).
10.6.2.1 Fundamentals
Acoustic measurement methods are generally based on the recording of sound signals and
their evaluation. Since the very beginning, frequency analysis was found to be an important
tool for the assessment of recorded data, as it enabled investigations comparable to the
human perception of sound.
In general, Fourier analysis (Fourier, French mathematician, 1768–1830) is understood
as the spectral decomposition of a time signal with respect to the contributing harmonic
frequencies. A simple continuous sine wave will look like a narrow line at a particular fre-
quency in the spectrum, representing the single frequency that is present. Mathematically,
the signal amplitude at a given frequency is determined by the scalar product of the time
signal a(t) and the harmonic function at frequency ω.
The resulting complex frequency spectrum Ã(ω) provides insight into the contributions
of individual frequencies or tones to the sum signal. Therefore, Fourier analysis is on one
hand useful for comparing subjectively perceived spectra of tones with objectively measured
ones, but also to draw conclusions from the objective measurement regarding the subjective
impression. Furthermore, this method facilitates the identification of resonances in com-
plex processes which cannot be observed by human hearing when masked by the overall
signal. Therefore, evaluation in the frequency domain has become a common measurement
method and it complements investigations in the time domain.
The development of digital signal-processing and computer-based measurement systems
along with the use of analogue to digital (A/D) converters made it necessary to manage
32
1
fmax = fs (10.1)
2
The sampling interval T determines the density Δf of the discrete frequency spectrum:
1
∆f ≈ (10.2)
T
Accordingly, all digital audio and measurement systems are subject to these basic constraints.
In practice a variety of such measurement systems are being utilized. Typical
representatives of simple measurement systems, which only display and analyse the fre-
quency spectrum at the input, are complex handheld sound level meters (e.g. B&K
2250, Norsonic Nor145, NTi XL2) and mobile analysers and mobile phone apps (Faber
Acoustical, Ivie IE-45, decibel app etc.). They provide broadband figures such as the overall
sound pressure level as well as results based on a ⅓-octave or octave band resolution. This
method also allows for the determination of the frequency response of the system under test
(SUT), but only with respect to its magnitude. For this purpose, a broadband noise signal
is employed with a spectral profile that is pink or white. In a display of sum levels based on
fractional octave frequency bands the pink noise turns into a constant function over fre-
quency. In consequence, the frequency response of the system under test can be assessed
immediately by the deviation of the curve when the signal is fed into the system under test.
These devices are unaware and are independent of the signal that is fed into the system
under test.
Advanced measurement systems can measure the complex transfer function or impulse
response of the system of interest (real and imaginary part). For this purpose, the system is
excited with a known test signal and its response is recorded compare section 5.3.
In practice this technique, also known as inverse filtering, is not well suited for low
signal-to-noise ratios and excitation signals e(t) of insufficient density or limited spectral
coverage, as frequencies not contained in the input signal cannot be accounted for.
Therefore, for measurements utilizing the deconvolution method, pseudo-random noise,
swept sine and other well-defined excitation signals are commonly used as they cover the
entire spectrum.
Real-world systems are only approximately linear. In this respect, several aspects must
be considered. On one hand, the noise floor which always exists and re-appears in the
impulse response must be accounted for. On the other hand, higher non-linear terms
also come into play; they may be caused by the loudspeaker system (distortion), by other
parts of the measurement chain or by inhomogeneities of air as the transfer medium,
such as air movements and temperature gradients. Often, time variance of the system
under test is a cause of measurement errors, especially when measuring outside (stadia,
open-air venues).
Therefore, the best-suited measurement signal should be chosen depending on the
nature of the disruptions. Swept sine signals are advantageous because the deconvolution
3
1
S (ω ) ∝ (10.3)
ω
Figure 10.4 Characteristic frequency spectra for white noise and pink noise. Graph shows the power
density spectrum in dB using an arbitrary normalization.
To determine the complex transfer function of the system under test by means of decon-
volution, exact knowledge of the excitation signal’s time function is required. Therefore,
many computer-based measurement systems use pseudo-random noise which is precisely
determined in advance with regard to its amplitude function over time while mimicking the
properties of true random noise.
Another classic excitation technique is the impulse test. For this purpose, typically an
alarm pistol, clapping sticks or a balloon burst are used. The loud impulse sound is recorded
and subsequently analysed in post-processing. Mainly time parameters such as RT60, EDT
etc. can be analysed since the frequency response highly depends on the stimulus used
and its capability to excite very low or high frequencies. This method is limited to certain
investigations in room acoustics, but is not applicable for the tuning and alignment of sound
reinforcement systems.
In this definition, the phase φ(t) can depend on time t in different ways. Usually, this rela-
tionship is given by the instantaneous frequency
d
Ω (t ) = φ(t) (10.5)
dt
If the instantaneous frequency changes linearly over time, the sweep is a simple sweep, a
so-called ‘white sweep’.
Ω (t ) = α × t + ω 0 (10.6)
In this case, the sweep rate α in Hz/s is constant since the signal covers the same frequency
range in the same period of time.
If the dependency is exponential, the sweep is called ‘pink sweep’ or ‘log sweep’, which
has a pink-noise-type energy distribution.
The pink sweep has a constant sweep rate of β = β1 /ln (2) in octaves/s, as the same number
of fractional octave bands is covered in the same period of time. In addition to the sweep
rate, also the start and stop frequencies are important parameters. These should be defined
so that they include the entire frequency range of interest [5].
Comparing the swept sine with other signal types, such as pink noise or maximum length
sequences (MLS), it represents a continuous function over frequency. This is advantageous
for example for digital to analogue (D/A) converters: compared to stepped or discontinuous
signals the probability of overshoot in the anti-aliasing filters of the D/A converters is lower.
In addition, depending on the exact type and length of the sweep, non-linearities caused by
distortions in the measurement chain can be removed fairly easy from the measured impulse
response. In particular the log sweep allows precise identification, isolation and analysis of
all higher-order harmonics separate from the fundamental. Furthermore, the sweep is also
less vulnerable to small time variances of the system under test [6]. Finally, another sig-
nificant advantage is that sweep measurements allow the engineer to subjectively identify
distortions and perturbations during the measurement itself, which is more difficult with
noise signals.
Figure 10.5 shows three fundamental sweep signals, a white sweep, a log sweep and a
weighted sweep. The latter provides an adapted spectral shape which is especially useful
for loudspeaker measurements (higher signal excitation at low frequencies). In contrast to
the log sweep its level reduction in the high-frequency range is smaller and therefore the
signal-to-noise ratio is usually higher.
36
Figure 10.5 Characteristic frequency spectra for white sweep, log sweep and weighted sweep. Graph
shows the power density spectrum with an arbitrary normalization.
Figure 10.6 Shift register for the construction of the maximal length sequence of order N =3.
combinations of N bits are counted through except for the zero vector. Creating a sequence
with 2N − 1 sample length requires only a minimal amount of memory, that is, the number
of N register bits.
Depending on the algorithm several different MLS for the same order N may result; see
Figure 10.7. It can be advantageous to choose among these different versions, for example if
the first one proves to be inappropriate due to small non-linearities [8].
The MLS measurement itself is a two-step process. First the generated maximum length
sequence is sent into the system under test and the response is recorded. Then, the correl-
ation of the two sequences (original and recorded) is computed, which results in the impulse
response of the system under test. Due to the nature of the MLS the numerically expen-
sive computation of the correlation function can be dramatically simplified. This specific
transformation which exploits the particular properties of the MLS is called the Hadamard
transform.
The spectrum of the MLS signal is constant over frequency (white). Its crest factor is min-
imal (in theory 0 dB) and allows for a high signal-to-noise ratio. The strongly discontinuous
newgenrtpdf
38
338
Gabriel Hauser and Wolfgang Ahnert
Figure 10.7 Section of the time function of an MLS of order N =16. The sampling rate is 24 kHz.
39
Figure 10.8 TDS principle. (a) Measurement principle; (b) action of the tracking filter.
341
Figure 10.10 Octave-band display of the spectral shape of white noise and pink noise. Graph shows
the band-related power sum spectrum with an arbitrary normalization.
34
Figure 10.11 Top view drawing of the Allianz arena in Munich, showing measurement locations in
the bleachers.
Figure 10.15 Exemplary section of a measured impulse response where the sound reinforcement
system (after 90 ms) provides a higher signal level than the original sound source on the
stage (at about 44 ms).
time differences of more than 50 ms, the second wave front might be perceived as a dis-
crete echo.
With analysing the measured impulse responses, the arrival times of the individual
sources can be determined. For this purpose, sequential measurements are made; ini-
tially just a single loudspeaker on stage is active (possibly even a speaker simulator
on the lectern in front of the microphone) and then additional loudspeaker groups
are switched on step by step. With this procedure the influence of each group can be
individually studied. Figure 10.15 shows the impulse response of a situation where the
direct sound will be disturbed by the late and loud amplified sound arriving at the lis-
tener position.
Localization errors will occur if the initial impulse of the source that should be localized
arrives later than the initial impulse of the loudspeaker, or if the initial impulse is exceeded
by more than 10 dB within the first 30 ms and by more than 6dB within 30 to 60 ms (rule
of the first wave front; compare Figure 2.21).
References
1. Mapp, P. First published in Audio System Design & Engineering. Klark Teknik, 1985.
2. Oppenheim, A.V., and Schafer, R.W. Zeitdiskrete Signalverarbeitung. München: R. Oldenbourg
Verlag, 1992.
3. Rife, D.D., and Vanderkooy, J. Transfer function measurement with maximum-length sequences.
J. Audio Eng. Soc., vol. 37, no 6 (1989), pp. 419–444.
4. AES /Heyser, R.C. Time Delay Spectrometry -An Anthology of the Works of Richard C. Heyser.
New York: AES, 1988.
5. Farina, A. Simultaneous measurement of impulse response and distortion with a swept-sine tech-
nique., Presented at the AES 108th Convention –Paris (19–22 February 2000).
6. Müller, S., and Massarani, P. Transfer-function measurement with sweeps. J. Audio Eng. Soc.,
vol. 49, no 6 (2001), pp. 443–471.
7. Jacob, K., Steeneken, H., Verhave, J., and McManus, S. Development of an accurate, handheld
simple-to-use, meter for the prediction of speech intelligibility. Proc. IOA, vol. 23, Pt 8.
8. Vanderkooy, J. Aspects of MLS measuring systems. J. Audio Eng. Soc., vol. 42 (1994), p. 219.
9. Vorländer, M., and Bietz, H. Der Einfluss von Zeitvarianzen bei Maximalfolgenmessungen.
DAGA (1995), p. 675.
10. http://easera.afmg.eu.
11. SIM Audio Analyzer, www.meyersound.com/products/#sim, Meyer. Sound, Berkeley, CA, USA;
www.meyersound.com.
12. Smaart Software, www.rationalacoustics.com, Rational Acoustics, Putnam, CT, USA.
13. SysTune, http://systune.afmg.eu, AFMG Technologies GmbH, Berlin, Germany, http://afmg.eu.
14. www.iris.co.nz.
15. Standard DIN EN ISO 3382: 2008–09: Acoustics -Measurement of room acoustic parameters.
16. IEC 61260: 1995: Octave-Band and Fractional-Octave-Band Filters.
17. IEC 61672-1:2013: Sound Level Meters-Part 1: Specifications.
352
The following chapter illustrates a number of real-world applications and case studies,
characterized by their functionality and particular type of venue.
What are the common features and where are the differences?
DOI: 10.4324/9781003220268-11
35
• Glass walls
• Concrete or gypsum board surfaces
• Natural stone floors
• Metal structures
All these materials basically completely reflect the sound and this will increase the rever-
beration time. If an acoustician is involved in the project the sound system designer should
motivate him to implement sound-absorbing areas. The necessity for absorption should be
discussed in the early design phase and corresponding treatment is to be implemented in the
agreed design. The situation should be avoided where the architect finishes his or her design
work without any consultation of the paging and emergency system designer.
Figure 11.1 Computer model with Atlas Sound Ceiling speakers FA 136 and Renkus-Heinz Line
arrays IC16 and listener areas.
Figure 11.3 Calculated intelligibility number STI from 0.5 to 0.75 in the greeter area.
Sound systems again serve two purposes: the support of matchday operations
(announcements, information, paging, missing persons, advertisements) and also their
use as voice alarm systems, as emergency instructions must be understood clearly and loud
enough. Standards such as ISO 7240-19 [8] or US NFPA 72 [9] provide design guidelines and
functional requirements. In Europe the Standard EN 54-32 [10] plus national regulations
are applicable. Figure 1.2 gives an overview of the current standard situation. All selected
loudspeaker types must be certified according to ISO 7240-24 or EN 54-24 [1]. Additionally,
sports facilities designed for international big ticket events (such as football world cups or
Olympic competitions) have to comply with regulations by the appropriate sports authority
such as UEFA, FIFA, IOC or other specifications.
Each of the facility types mentioned above has different acoustic and sound reinforce-
ment requirements. In school and university environments the acoustic measures
are usually rather basic. Most countries have standards and guidelines concerning the
recommended reverberation time in sport halls and the implementation of emergency
call systems.
In larger halls and arenas used for sport events a specific acoustical design is needed,
focussing on the geometry of the halls, the acoustic quality of wall, floor and ceiling surface
materials and also on issues such as exterior noise intrusion into the hall or internal noise
from HVAC systems.
While decentralized systems are well acceptable for the radiation of information
signals, sound coverage that offers acoustical localization is desirable in certain cases, for
instance for:
This is enabled by loudspeaker arrangements capable not only of ensuring speech intelligi-
bility and music clarity, but also of localization of the sound sources.
11.2.1 Large Meeting Rooms Used for Sport Events and Smaller Sports Halls
In smaller Sport halls or gyms of universities and schools (volumes up to 20,000 m³) basic
announcement systems are needed for paging, information and emergency calls. Quite often
359
ceiling or wall loudspeakers are employed, as well as centrally arranged loudspeaker systems
installed on one end of the hall close to a platform in operation. Just as with centralized
systems, the required speech intelligibility is achieved by a staggered installation of sound
reinforcement systems, often of single loudspeakers or clusters. Acoustic localization towards
the action area can be achieved by varying the installation height or appropriately delaying
certain loudspeakers. Appropriate staggering of the loudspeakers (spacing ≤15 m) in the
depth of the hall prevents travel-time differences that could cause echoes.
• Schedule guidance
• Opening and closing rituals
• Team and athlete introductions
• Goal announcements
• Winner honouring
• Announcements of special sport events
362
For advanced music and speech transmission in a stadium additional technical efforts
are required, such as an extended frequency band to be transmitted by use of subwoofers
(total minimum frequency range 60 Hz to 12 kHz). Typical programming with higher
requirements, e.g.:
• Pre and post event: general sound coverage for background music or information
• Advertisement clips with acoustic localization to the active video wall, hence acoustic
and visual impressions are identically localized for a stronger impression
• Music reproduction including dynamic popular tracks including anthems and other
content
Figure 11.9 Olympia-Stadium in Berlin 2002 with no roof and sound coverage from the perimeter of
the field of play.
36
Depending on these architectural solutions different acoustic measures are called for.
The shape of the roof must be studied and the roof may be acoustically treated, taking into
account the following issues:
• A curved roof will quite often create higher reverberation with longer reverberation
time in comparison to a flat one
• The roof underside should be absorptive if possible
• Different acoustic behaviour below the roof; i.e., pure steel construction, absorptive
layer or membrane material produces more or less reverberation; this must be considered
during computer simulation
364
• Use of tilted walls and windows (e.g., at skyboxes) to avoid direct reflections and
echoes
• Reverberation time under the roof should never exceed 3 s when occupied
• Use of tilted walls and window parts to deflect sound from the PA system
• Good sight lines promise good direct sound coverage (important if loudspeakers are
mounted at the pitch edge)
• No large building walls outside the stadium so as to avoid echoes
Figure 11.14 Twelve line array positions with a total of 124 Electro-Voice XLD 281 modules.
In spaces such as ballrooms, exhibition halls and convention centres modern digital-
controlled as well as passive line arrays are used. In the design phase computer simulation
is highly recommended. For this purpose, the room acoustic data of the appropriate surfaces
must be known.
• Glass panes
• Concrete or gypsum board surfaces
• Natural stone floors
• Metal structures
All these materials more or less completely reflect sound and this will increase the reverber-
ation time, hence it is important that the need for implementing acoustical absorption is
discussed in the early design phase and corresponding treatment is integrated in the space
design.
Figure 11.18 NTi XL2 Handheld Acoustical Analyzer including measurement microphone M2211.
Figure 11.23 Recommended secondary structure in the main hall. (Left) Architectural design of the
hall (only hard surfaces); (right) hall with acoustical treatment at ceiling and back wall.
Light red faces at front and back wall: Broadband absorber (e.g. slotted panels). Orange
face at back wall: Additional broadband absorber (slotted panels or curtain). Dark red
faces at ceiling: Broadband absorber or sound transparent grid. Dark blue: Glass facade
with inclined lamella structure.
Figure 11.24 RT values in the main hall (left) without treatment and (right) with treatment.
This design resulted in an acceptable reverberation time and absence of echoes (see
Figures 11.23 and 11.24).
(a) Concert hall similar to an opera house, but employing an orchestra shell in the rear
part of the stage in lieu of a stage house.
• The most famous hall of this kind is the Carnegie Hall in New York, which is a
horseshoe theatre with the favourable properties of such a shape. As in a theatre all
seats have approximately equal distance to the podium, good sight lines and hence
good direct sound coverage. The structured proscenium and the balustrades of the
galleries supply enough short time reflections. Multiple reflections are rare because
of the absence of larger flat wall parts. Therefore, high clarity values combined
with lower spaciousness and reverberance are observed in these halls.
(b) Shoebox-shaped halls, such as medieval guild halls –these have also been used for
music performances since their construction. An example is the Guerzenich Hall in
Cologne, Germany, built in the fifteenth century, in use for musical performances for
over 200 years. Another example is the ‘Gewandhaus’ in Leipzig, now with the third
379
The last type of a single-purpose hall is the drama theatre. The original dramatic theatre
type goes back to Greek and Roman times. The renaissance period did see a significant
increase of performing actor groups playing their shows at fairs or in the courts of nobles,
with the prominent example of the actor group surrounding William Shakespeare in the
late sixteenth and early seventeenth century. Even the first covered performance space,
the Globe Theatre in London, was built. During the seventeenth and eighteenth centuries
more and more playhouses were built, first only for kings or dukes and their entourages,
but later also for the general public. In 1821 in Berlin a famous playhouse was designed
by Karl Friedrich Schinkel, which was rebuilt as a concert hall in 1984 after destruction
during the Second World War. Further theatre buildings mainly in horseshoe shape are the
Burg Theatre in Vienna (1888), the Royal Dramatic Theatre Stockholm (1908) and the
Comédie-Française Paris (1900). Today many different geometries are being selected, and
most of them are not just used for speech performances but also for concert shows and opera
presentations.
(a) The traditional approach is to build a hall with a rather long reverberation time of
1.5–1.8 s and then to use banners, curtains or other architectural measures to reduce
the natural reverberation time down to 1.1–1.4 s. This approach is well suited for music
and opera performances.
(b) If the new build or existing theatre is being designed mainly for drama theatre an
electroacoustic enhancement system may be used to adapt the theatre hall for music
performances. These systems offer an acoustic quality that for a layman is not distin-
guishable from the natural acoustic properties of the hall.
• Providing an adequate sound pressure level with the required speech intelligibility and
clarity for the audience for quiet sources (singer or talker) or for sources which are
to be reproduced only by means of electroacoustic amplification (e.g., electroacoustic
instruments)
381
In addition to providing these functions, the systems should ideally be installed unobtru-
sively into the structure of the building.
This system mainly serves to provide sufficient sound pressure level in the audience area.
In a theatre such as that shown in Figure 11.25 the loudspeakers or often line arrays are
installed left and right in the side walls of the forestage and above the stage proscenium.
Additionally, delay systems cover the back of the stalls or any balcony areas with sound (see
main groups 1 to 3).
If the main system is not producing sufficient low-frequency energy corresponding
subwoofers are required.
11.4.2.2.1.1 Effect Signal Sound System Powerful sound systems are installed left, right in
front of and above a backstage opening to play back effect signals out of the depth of the
stage; see main groups 4 and 6 in Figure 11.25. As these systems are often covered by scenery
or curtains, they must radiate signals with high acoustic power. Furthermore, loudspeaker
groups can be installed on side galleries (main group 7) and on the rear wall of an existing
back stage (main group 5).
All systems used consist of mid/ high speaker arrangements with additional
powerful subs.
Highly directive sound systems are used for these long-throw applications.
To play back effect signals or for moving sound signals additional loudspeakers are installed
in the audience hall. Altogether around 40 to 70 speakers can be evenly distributed on side
and back walls and on the ceiling, to better integrate the audience into the events on stage,
sometimes even providing 3D immersive sound impressions. These same loudspeakers may
also be used as an enhancement system to radiate simulated early and late reflections out of
the different room directions with the aim of increasing reverberation, i.e., the prolongation
of the reverberation time.
Sometimes loudspeakers are installed in the centre of the ceiling or hidden in the light
installation for directional playback of signals from above the audience.
11.4.2.2.1.2 Stage Monitors Often the performance area itself needs a sound system to
supply audio monitoring for the actors to facilitate acoustic control of the play, mutual
newgenrtpdf
382
382
Wolfgang Ahnert and Dirk Noy
Figure 11.25 Main loudspeaker groups in a theatre.
38
11.4.2.2.1.3 Mobile Systems A number of stage boxes are equipped with outputs for
connecting mobile loudspeakers.
The routing of amplifiers to the outputs is done by patch fields or electronic matrix
systems.
11.4.2.2.1.4 Microphone Selection Any theatre or concert hall needs a basic supplement
of various microphones of different type, directivity and sensitivity, some of them being
wireless.
• Adequate sound coverage, i.e., the sound pressure level should be evenly distributed
over the entire audience zone
• The sound level difference from the first to the last row should not exceed 10 dB.
This may also be indicated as strength, which should be higher than 0 dB and not
exceed 10 dB
• Based on ISO 7240-19 or NPA72 the speech transmission index STI must exceed 0.5
in 90% of the audience areas; compare section 7.4.2. This must be the case under the
given room acoustic conditions, independent of the reverberation time [8, 9]
• Echoes from walls or ceilings or slap-back reflections from a stage house caused by the
sound system cannot be accepted
• The frequency range of the sound system must be adequate for the content of the signal
These subsystems describe the electroacoustic system in a theatre or concert hall and result
in different acoustic perceptions in the context of the existing room acoustic properties. It
is essential to establish congruence between room and electroacoustic properties.
Motorized line array loudspeakers were specified in accordance with the planned use,
suspended on the left and the right as well along the central axis of the proscenium. The
following components are installed:
The accompanying subwoofer loudspeakers are installed above the right and left arrays.
For nearfield coverage on the stage and for source location on stage, side fills and small
loudspeaker systems are installed (16× d&b E0 at the front of the stage).
Each rigging point consists of two motor-driven pulling ropes per loudspeaker location.
Compact loudspeaker systems are installed on three levels at the right and left side of the
proscenium for cases when the line arrays are not present (6× d&b Ci90-90×40).
The above-mentioned sub-loudspeakers and the installed localization systems will be
used in parallel in this case.
Additionally, a panorama sound system is installed on three levels to produce moving
sound images throughout the hall, amended by small ceiling loudspeakers. These
approximatively 50 loudspeakers on the ceiling and the sidewalls may also be used for
electro-acoustic enhancement.
386
Six d&b Ci80s and two d&b C4 tops are installed on stage, in both portal towers and
above the entrance to the rear stage. Further mobile loudspeakers may be installed ad hoc
as required in the stage area and connected to floor and wall boxes.
The installed Salzbrenner Aurus consoles may limit the frequency response for speech: 200
Hz (with 6 dB rolloff towards lower frequencies) up to 4 kHz (3 dB rolloff towards higher
frequencies). Variations of the response level are lower than ±3 dB.
For musical performances subwoofers will be used for the lower-frequency range with a
resulting frequency response after appropriate equalization of 50 Hz to 16 kHz ±3dB.
The power amplifiers of the loudspeaker system, installed at three locations, offer the
possibility of digital signal-processing, thus enabling equalization, delay and volume regula-
tion for each loudspeaker group. The configuration of the system can be done by means of
a network, in order to enable setup, monitoring and calibration from distributed locations
such as the sound control booth or via a wireless tablet PC in the audience hall.
The digital signal distribution and matrix routing to the amplifiers of the loudspeaker system
are realised via the digital audio network Nexus by Stagetec. Fifteen base devices are installed
at various locations in the hall, on stage and on rehearsal stages. The Aurus mixing consoles
can be connected here: two are permanently installed and two are for mobile use in the house.
The system is designed to provide a continuous minimum sound level of 94 dB plus an add-
itional level reserve of 12 dB at all seats. In the case of higher sound pressure levels, it is
possible to reduce the general loudness level of the system to avoid producing high distortions
387
Figure 11.29 Positions of panorama loudspeakers along the railings of the three galleries.
in the transmitted frequency response. For short-time peak levels the system is able to reach
values of 120 dB without producing distortions.
All loudspeaker units are operated so that the origin of the sound is localized on stage.
According to ISO 7240 speech intelligibility of STI > 0.5 is required for 90% of the sound
system’s coverage area.
The equivalent noise level in the hall with and without the new sound system is rated at
NR-10 and does not exceed 23 dB(A). These values are valid for the entire hall.
38
Figure 11.32 Overall sound level in the audience hall, broad band, A-weighted.
For limiting the background sound levels in the hall, special fanless components have
been specified.
The large orchestra rehearsal stage may also be used for chamber music concerts and other
perfomances; compare Figure 11.35.
389
An audience of up to 150 may be present in this room. Three load bars are installed at
the ceiling to connect audio and lighting components. Further rehearsal spaces are present
for the chorus and the ballet.
The studio theatre is equipped with fixed and mobile loudspeakers, loudspeaker suspen-
sion devices, cabling and transport cases, so that a flexible system is at hand, to be installed
and modified for diverse performance requirements. The amplifiers are installed in a tech-
nical room nearby and their outputs are supplied via permanently installed cables to floor
and wall boxes.
Finally, the comprehensive intercom and stage manager systems are based on Delec
Oratis and Mediacontrol components.
Figure 11.37 shows the Performance TRL stage manager console with five displays for
different views. The console may be put on either side of the stage.
390
• Case 1: Mainly a conference hall with some concerts or musical events during the
year or
• Case 2: Mainly a hall for musical presentations with some conferences or congresses
In the first case the hall is laid out for speech performances, so depending on the size of the
hall the reverberation time should vary between 1 and 1.4s. But what measures can be taken
when more reverberation is desired for the musical events? Two approaches are feasible:
In the case of variable room acoustics, the secondary and/or even the primary structure of
the hall will be modified either manually or by motorized installations such as curtains or
draperies that are removed and thereby expose reflecting or absorptive surfaces. This may
happen by vertical or horizontal movements of wall parts or by turning wall and ceiling
parts back and forth (Figure 11.40).
Another option is the modification of the hall’s volume. By opening hinged or sliding
doors or other wall parts the volume of the hall may be increased to create higher reverber-
ation times with added sub-volumes; see for example the echo chamber solution in the KKL
Luzern concert hall (Figure 11.41).
394
Another option to modify the room acoustic layout and to enhance the reverberation
including first short-time reflections is the application of methods known as ‘electronic
architecture’. This term was introduced by C. Jaffe to describe such a system [14]. Around
ten different approaches are known to increase the spaciousness and reverberation: some
have common features, and others are unique. All of them pick up the source signal above
the proscenium or in the diffuse sound field, subsequently post-process it in different ways
and finally distribute the audio signal over an arrangement of numerous loudspeakers hidden
in walls and ceiling. By selection of stored presets different spaciousness and reverberation
values may be obtained in the hall. Because they use microphones and loudspeakers these
systems are quite often called sound systems as well. This may lead to rejection of these
systems especially by musicians; they don’t appreciate being supported by sound systems.
Neverthless, modern enhancement systems are of such high sonic accuracy and quality that
world-renowned conductors recommend the use of such systems in halls with poor room
acoustic conditions. A detailed explanation of the most important enhancement systems
can be found in section 2.7.2.
After it has been determined that the spaciousness of a multipurpose hall needs
to be enhanced, it has to be asserted that spoken words will be intelligible under all
acoustic conditions of the hall. Without the enhancement system (or with exposed
absorbers in the case of mechanically variable acoustics), i.e., with low reverberation
times, the sound system design is straightforward. The situation is more complex in
concert mode, i.e. with higher reverberation times. Highly directional sound systems
such as line arrays or electronically steered sound columns will ensure that spoken
words are intelligible.
395
Figure 11.41 KKL Luzern Concert hall, rare view from within an echo chamber. © KKL Luzern.
Since such halls are often rather wide with a relatively low ceiling, the required loud-
speaker arrangement can be elaborated by means of a sophisticated loudspeaker arrangement.
Owing to the strong level decrease over the depths of such halls and the directivity pattern
of the main front arrays used it is often not possible for these arrays to provide a uniform
coverage of all audience areas. The main loudspeaker arrays should operate with a default
delay setting to correctly localize the original sources on the stage throughout the venue.
Further balcony or sub-balcony loudspeakers must consider this basic delay. At a greater dis-
tance from the action area where the angles between the original sources on stage and the
front loudspeaker arrays are small plus the sources on stage are no longer perceived because
of their low acoustic power, the front loudspeaker arrays can be used as reference sources.
In the case of a multipurpose hall designed for mainly natural music use and amplified
events only every now and then (case 2) the target reverberation times are in the range
of 1.6 to 1.8 s, with higher values applicable for pure classic concert halls or spaces to per-
form organ concerts. Variable acoustics may be used to decrease the spaciousness and rever-
beration for speech events, with the variable absorption resulting at reverberation times
396
of 1.3–1.4 s. Inflatable membrane absorbers may be used specifically for low frequencies
just by turning the air pump on and off for the inflatable membranes; see Flex Acoustics
(Figure 11.42).
An enhancement system may be used to increase the reverberation time even more to
values over 2 s. The electronic reduction of reverberation time is not feasible.
The sophisticated sound system is similar to the system used in case 1; see above.
• Conferences
• Meetings
• Exhibitions
• Rock and pop concerts
• Jazz concerts
• Theatre performances
• Presentations
• Athletic events
For other performances (estimated 20% of the user profile), such as:
• Symphonic concerts
• Musicals
• Folk music performances
• Brass music concerts
Figure 11.44 Opening concert (left) and hall with the new wall loudspeakers.
and presentations have since demanded the extension of the complex with new restaurants
and a covered lobby area. Various types of cultural performances are now possible, including
symphonic concerts by using a mobile concert shell on stage. As during the 1990s the acous-
tics of the hall were optimized for shows and pop music, the sonic quality for symphonic
concerts was unsatisfactory. In 2018/19 the implementation of an enhancement system
increased the reverberation time and supplies early reflections.
Seventy-four loudspeakers (JBL Control 28) in the ceiling (originally from an old rever-
beration system) and 35 newly installed wall loudspeakers (Renkus-Heinz CX61) have been
used to radiate the stage and ceiling microphone signals. These signals are processed by the
‘Amadeus Concert Hall Processor’ and hence allow the following settings:
Figure 11.45 The Anthem Hall Washington and view of the hall on the right.
and right arrays (14 d&b J-series boxes per array), with the option to use centre and front
fill loudspeakers and a directional subwoofer array (d&b J-Subs and J-Infra Subs) for even
low-frequency distribution.
11.6.1 Buddhism
Siddharta Gautama was the founder of the Buddhist teachings, living in India from about
560 to 480 BC. He gave up his royal life at an adult age and moved through the country
as an ascetic and found ‘the highest salvation and unparalleled peace’ in April/May of 528
BC. Since then, he has been considered a Buddha, an ‘enlightened one’, because he has
recognized the four ‘truths’.
The Buddhist place of worship is the temple, initially referred to as a stupa, later as a
pagoda. The stupa is on one hand a relics place, but also a monument, and is understood as
a place of pilgrimage. The Buddhist temples and monasteries hosting the Buddhist rituals
have a hall with one or more Buddhas in the centre. The meeting and meditation rooms are
mostly large and carpeted.
Some monks hold sermons inside the temples on holidays such as the birth of Buddha or
the first sermon of Buddha. Social ceremonies are celebrated in the temples such as births,
becoming an adult, weddings or memorial services.
All spoken words must be clearly understood and are partly repeated, hence sound
systems are currently standard equipment in temples. Small point sources or line arrays
cover between 20 and more than 3000 worshippers.
11.6.2 Judaism
synagogues from medieval Prague or Budapest are built in Gothic style. In the nineteenth
century, after the synagogue had been authorized as a representative building, oriental his-
toricism prevailed for several decades, e.g., the new Berlin Synagogue. Modern architectural
designs are also very common today, such as the synagogues in Munich and Dresden. Most
synagogues have a rectangular floorplan with a second floor for female worshippers, and nor-
mally use sound systems for preaching and praying.
Depending on the size and the shape of the halls simple or more complex sound systems
are specified; the room acoustical design of the synagogues therefore is to be considered and
excessive reverberation must be avoided. A small group of traditional Jewish worshippers
reject any electricity in their houses, so their synagogues are not equipped with sound
systems. In these cases, the room acoustic design is especially important to ensure perfect
intelligibility of the spoken words of the rabbi.
The old main synagogue in Munich was destroyed in 1938 by the Nazis; the new syna-
gogue was built between 2004 and 2006. Room-and electro-acoustic studies have been
conducted.
All surfaces in the synagogue are acoustically hard (stone and glass), just the worshippers
absorb the sound. Long reverberation times result, and highly directional, digitally con-
trolled line arrays have been installed left and right of the podium to secure high speech
intelligibility for the worshippers.
403
Figure 11.51 Rendered computer model with calculation results in mapping and probability distri-
bution form.
40
Figure 11.53 Ceiling detail of the Central Synagogue including small point-source loudspeakers in
the corners.
405
Devastated by a catastrophic fire in August 1998, the Central Synagogue in New York
City was reduced to a burned-out shell. Following this disaster, the treasured syna-
gogue was rebuilt. To augment the natural acoustics, and to provide additional support
for organ performances and other musical events, the synagogue is equipped with a
LARES electronic reverberation enhancement system. Instead of the traditional speaker
cluster, the system includes 48 smaller loudspeakers positioned discreetly around the
temple.
In-house audio and video recording and playback capabilities are an equally integral and
multi-purpose element of the Central Synagogue A/V installation. In addition to serving
to document weddings, bar mitzvahs, lectures, memorial services and religious services, the
system was designed to facilitate remote broadcast to enable people outside the facility to
participate in services and events.
11.6.3 Christianity
• The basilica is the most important basic type of an early and medieval church building,
the interior of which is separated into several longitudinal naves by rows of columns.
• The single-room church is a single-nave church building, which consists of a single,
hall-like room, usually with elevated choir.
• The hall church is similar to the basilica, but its longitudinal naves are of the same or
approximately the same height and usually united under a common pitched roof.
• In centralized construction types, the major axes are of the same length, resulting in
circular, oval, square, cross-shaped or otherwise centralized floor plans. The central
building is widespread in Western Europe, especially in Italy and is frequently applied
to Eastern Orthodox churches.
The main architectural elements of a traditional European church are the choir (altar
house), the transept and the nave. The facade often has one or two belltowers. The nave
is usually multi-nave, i.e., it has a nave and two or four side aisles. The crossing is located
between the transept and the nave.
The principal musical instrument in these churches is the organ; the worshippers sit in
pews on all accessible floor levels.
Since the reformation, Protestant churches have increasingly become multipurpose
buildings. Protestant meeting houses are thus not only used for worship, but also for various
other gatherings.
In the last 30 years the sound coverage in reverberant spaces has been achieved with
distributed sound systems installed on the pillars (Figure 11.54), currently also by more cen-
trally arranged line arrays.
406
Figure 11.54 Sound columns in the gothic church Maria Himmelfahrt in Bozen, Italy.
Figure 11.55 View of the iconostasis with installed directed sound columns in the Christ the Saviour
Cathedral in Moscow.
407
The interior of Orthodox church buildings is designed according to the requirements of the
Eastern Church rite, in Europe mostly the Byzantine rite:
• The sanctuary is visually separated from the communal room by a partition wall covered
with icons, the iconostasis. This wall is transparently designed in such a way that, des-
pite this division of space, the liturgy spoken and sung behind the partition can be well
understood in the community room.
• Orthodox churches do not have organs because Orthodox Christianity considers the
human voice as the only acceptable instrument for praising God.
• Orthodox churches usually have no pews; the worshippers remain standing during the
liturgy. A handful of chairs may be available, mainly used for elderly people who are
unable to stand for a prolonged time.
In large churches (especially in gothic ones) with sometimes excessively long reverber-
ation times and a large number of columns shadowing the sound and screening the side
aisles, a centralized loudspeaker arrangement is not a successful approach. Decentralized
arrangements with highly directive digitally controlled line arrays installed at the columns
in close range to the audience are then preferred.
The Berlin Cathedral in its current form is based on the construction of the new cath-
edral that began in 1895 under the direction of builder Julis Carl Raschdorff. The church
was consecrated in 1905. It was heavily damaged during the Second World War. It was
decided to rebuild it in 1973 and, in a first phase, completed in 1980 with the completion
of the outer skin and the consecration of the baptistery and wedding chapel to the south for
worship purposes. The reopening of the Sermon Church subsequently took place on 3 June
1993 after extensive interior restoration work.
409
The main room of the Oberpfarr-und Domkirche zu Berlin is the so-called Sermon
Church. This centrally located room has a quasi-circular base area of around 40 m diameter
with a dome spanning it with a height of around 60 m and is used in a west-east direction.
The main and diagonal axes hold two-storey apsidal areas which, as boxes, provide add-
itional seating next to the main room on the ground floor. A total of 1586 people can be
seated in the church.
In 1953 a special cluster consisting of Klein & Hummel loudspeakers was installed
(still visible in the background in Figure 11.58 during a test setup with line arrays in
2001). This cluster was then substituted in 2002 by Duran Audio line arrays; compare
also Figure 11.59.
Solothurn is recognized as Switzerland’s most significant baroque town. Its major hallmark
and tourist attraction is the St. Ursen Cathedral. In January 2011, a fire set by a men-
tally disturbed person massively damaged the Cathedral’s 60 m × 30 m centre congrega-
tion area and side aisles. A careful assessment determined that a full cleaning and repair
of all surfaces could restore the damaged room to its former glory. The project included
all aspects of the building: surfaces, art, lighting, heating, electrical and electro-acoustic
infrastructure.
410
11.6.4 Islam
The geometry of a mosque is different from any of the spaces discussed so far. Larger mosques
mostly have a substantial central space, topped by one or more domes. Smaller mosques are
not domed and have a rectangular floorplan.
A mosque has the following basic interior elements: a carpet that covers the floor,
a niche, the mihrab, in the Qibla wall for determining the direction to Mecca, and,
finally, every mosque has a minbar on which the imam holds the Friday sermon. Women
have a separate space so that the strict gender segregation can also be observed during
prayer.
41
Figure 11.60 St. Ursen Cathedral with two visible line arrays for sound coverage.
Mosques are multi-functional public spaces where various worship activities are performed
through various modes of use. Three distinct activities are performed in the mosque: one is
praying individually or in a group led by an imam. The second is a preaching being delivered
separately or in conjunction with Friday prayers. The third is to listen to or to recite verses
from the Quran. While conducting these activities in the mosque, two general modes of use
may be identified:
The optimum acoustical environments in the mosque may be expressed in terms of some
basic aural requirements such as:
Most of the existing mosques have sound-reflecting finish materials on all surfaces, except
the floor area, which is usually carpeted. They have wooden doors and large single-glazed
windows.
Other factors that emphasize the importance of the acoustical environment in mosques
include the fact that Arabic is used exclusively during prayers, even though many Muslims
worldwide are non-native-Arabic speakers. Therefore, a low background noise floor is one
of the most important qualities for Muslims during prayers. Sound absorption in mosques
is very limited; it is mainly provided through the carpeted floor as well as the worshippers.
The different modes of use as well as the variation in the number of worshippers attending
daily prayer and Friday prayer greatly affect the total sound absorption in the mosque
and therefore make control over the actual reverberation time quite complex. All these
factors illustrate the importance and requirements for special acoustical environments in
mosques.
Modifying the primary structure of the mosque (height, length and width of the mosque)
is not feasible in most cases, so the secondary structure has to be considered: the material
and geometry of the walls, ceiling and floor of the mosque. As mentioned above most of
the walls are covered with ornamented marble and gypsum and are therefore only slightly
413
Sufficient sound pressure levels can be achieved differently by sound systems, by covering
the prayer area
The most important task of the sound system is to achieve high speech intelligibility. As
always, good intelligibility goes with highly directional sound and a high level of early
reflections. Because of columns the coverage from the front will quite often not be adequate,
and high ceilings and large domes additionally reduce the direct sound, hence a system with
coverage from the front while additionally using delayed loudspeakers is recommended. This
can be achieved by modern digitally controlled line arrays. Speech transmission indices STI
> 0.5 must be reached.
The last design criterion, localizing the imam, can only be achieved using sound systems
localizing the imam in the direction of the Qibla wall. Ceiling loudspeaker systems mostly
fail in this regard.
11.6.1.4.4.1 Mosque –Sheikh Zayed Bin Sultan Al Nahyan Mosque, Abu Dhabi, UAE The
sound system design for the Sheikh Zayed Bin Sultan Al Nahyan Mosque is based on
optimized room-acoustic measures; see a view into the main hall in Figure 11.62.
Regarding the room acoustic design, the carpeted floor acts as a mid-high-frequency
absorber and the originally planned letter wall as a low-frequency absorber. The letter
wall was to be a huge marble wall with large perforated Arabic letters embedded. These
hollow letters were to be approx. 5 cm wide and would have helped to achieve low-
frequency absorption due to the Helmholtz effect. It was then decided to cover the back
side of the wall with a golden fleece with a high specific flow resistance. Furthermore, an
airgap of approx. 80 cm to the concrete wall behind it was planned, thereby implementing
a large 135 × 22 m low-frequency absorber, with an estimated absorption maximum at 250
Hz. In the course of the construction, however, this letter wall was not implemented and
was replaced by a closed, letter-bearing marble wall. The absence of the low-frequency
absorber has led to higher reverb times in the low frequencies, but nevertheless for low
41
Figure 11.62 View into the Sheikh Zayed Mosque in Abu Dhabi, UAE.
frequencies the RT will not exceed 6 s and at 500 to 2000 Hz (speech domain) it will not
exceed 4 s.
The sound design is based on digitally controlled line arrays; compare Figure 11.63.
Sound Columns of different size and power are used which are installed behind the
existing sound-transparent fleece or integrated into the structure columns. Each array gets
an individual input signal (by use of a distribution matrix) to switch line arrays on or off as
a function of the mosque’s occupation.
11.6.1.4.4.2 Mosque –Daily used Al Eman Mosque, Jeddah, Saudi Arabia Most of the daily
used mosques do not have domes or other sophisticated architectural features. Quite often
these are rectangular rooms, rarely with columns. One wall is the Qibla wall with a simple
praying niche. Figure 11.64 shows the Al Eman mosque in Jeddah in Saudi Arabia. The
mosque is small (just 555 m3); the Qibla wall with the Mihrab niche is visible on the left.
The lines on the carpet are very common in such mosques to highlight the order of max.
300 worshippers in the mosque.
The reverberation time in the unoccupied mosque (including the carpet) is relatively
high at 2.5 s in the mid-frequency range and will increase to 3.5 s at low frequencies. The
simple loudspeaker arrangement along the walls by use of the small point-source Tannoy
CPA5 loudspeaker supports the voice of the imam insufficiently; the speech intelligibility
could be improved by use of modern line arrays.
415
Index
Note: Page numbers in italic denote figures and in bold denote tables.
Acoustic Control System (ACS) 58–59, 58 160, 256, 263, 268–269, 268, 271, 274, 278;
acoustic enhancement see electroacoustic reliability 277–280, 279; results and output
enhancement systems 160–161, 160, 271–272, 273, 274, 275, 276;
acoustic feedback 50–56, 51; calculation of reverberation 157–158, 263; room impulse
55–56; in closed rooms 52–55, 53, 54, 56; response (RIR) 259–261, 261; of room modes
measurements 350; in open air 51–52, 52; 263–264, 264; room transfer function 261,
positive 3, 45, 50, 52, 55, 56, 173–174, 262; simulation methods 259–264, 261; sound
176, 190, 191; suppression 153–154; pressure levels (SPL) 162–164, 166, 166, 167,
troubleshooting 324 168; speech intelligibility 166, 169; surface
acoustic gain calculation 192–193, 193, 193, material data 258–259, 258, 260, 277; tail
212–214 estimation methods 269; time-arrivals,
acoustic gain index 192, 214 delay and alignment 162, 163, 164, 165;
acoustic localization 39–40, 40, 41; multi- validation of results 280; wave-based models
channel systems 197–198, 198, 202–203, 259, 261, 270
202, 203, 204, 205, 206; naturalness of sound acoustic optimization 325–329, 326, 327,
reproduction and 199–203, 200, 202, 203, 328
204, 205, 206; simple sound systems 190, acoustic overall impression 25
194–195, 195, 196; single-channel delay acoustical measurements 323, 330–350; acoustic
system 201 feedback 350; alignment 349–350; with
acoustic modelling 156–168, 158, 159, 251–281, arbitrary excitation signals 331, 339–342, 341;
252, 253; aiming loudspeakers 161, 161, 162; averaging 348; with conventional excitation
auralization 166–168, 171, 254, 254, 263, 272, signals 333–334, 334; determining timing
274, 280–281, 280; boundary element method of sources 348–349, 349; electroacoustic
(BEM) 259, 262, 263; cone-tracing methods properties 327–329, 328; filtering 348;
159–160, 263, 269, 269; of direct field frequency domain 345; with frequency sweeps
261–262; of early reflections 262–263; finite 323, 332–333, 334–335, 336; fundamentals
difference time domain method (FDTD) 259, 331–333; with maximum length sequences
264; finite element method (FEM) 259, 263, (MLS) 331, 336–339, 337, 338; measurement
264; geometrical data 255–256, 255, 256, location selection 343, 343; with other noise
277; image source method 262–263, 266–268, signals 336; performing 342–350, 343, 344,
267; input data 254–259, 255, 256, 257, 346, 347, 349, 351; polarity testing 350,
258, 260, 277–278; interpretation of results 351; room acoustic properties 325–327, 326,
279–280; limitations 271; loudspeaker data 327, 343–344, 344; time-delay spectrometry
256–258, 257, 277–278; model calibration (TDS) 331, 339, 340; time domain 345;
278, 279; modelling engine 278–279; Monte waterfall diagrams 345, 346, 347
Carlo approach 263, 268–270, 268, 269, 270; acoustics 20–32; critical distance 27, 28; energy-
numerical optimization 264–266, 265, 266; time curve 30–31, 30, 31; fundamentals
parameter presentation 166, 170; performance 23, 24; general issues 22–23, 22; historical
considerations 270–271; pyramid-tracing overview 20–21; reverberation time 26–30,
methods 159, 263, 269, 269; radiosity method 28, 29; speech intelligibility and music clarity
270, 270; ray-tracing methods 158–159, criteria 31–32; subjective assessment of sound
418
418 Index
quality 25–26; see also acoustic feedback; proprietary technologies 311–313, 311, 313;
electroacoustic enhancement systems Quality of Service (QoS) 295–296, 296, 296,
ACS see Acoustic Control System (ACS) 300, 305, 308, 316–318; redundancy
Active Field Control (AFC) 59, 59, 60 313–315, 314, 315; setup of senders and
active filters 156 receivers 303–304, 304; spanning tree
AES standards: AES2-2012 93; AES67 283, protocol (STP) 313–314, 314; standards
300, 302, 307–308, 311, 311, 312–313, 313, 307–311, 309; stream discovery 303, 320;
319; AES70 310 stream formats 304–305, 306, 306, 308,
AFC see Active Field Control (AFC) 319; subnet masks 289–290, 315; switches
AGC see automatic gain control (AGC) 287–288, 288, 293–294, 294, 300–302, 301,
airport buildings 5, 189, 354, 355, 356 302, 311, 313, 314, 319–320; synchronization
Al Eman Mosque, Jeddah, Saudi Arabia 414, 285–286, 287, 296–302, 296, 297, 298, 299,
415 299, 300, 301, 302, 308, 312, 313, 318–319;
alignment measurements 349–350 transparent clock switches 301, 302, 311;
ALS see assistive listening systems (ALS) unicast and multicast 291–295, 293, 294, 308,
Amadeus Active Acoustics 62, 65 315; virtual sound cards 319
Ambisonics 254, 254, 272, 281 audio recording studios 12–14, 13
ANEMAN 304 audio video bridging (AVB) 283, 310–311,
ANSI standard S3.5 234 311
Anthem Hall, Washington DC, USA 399–401, auralization 166–168, 171, 254, 254, 263, 272,
400 274, 280–281, 280
arenas, sports 7, 358, 359–361, 364, 365, 366, automatic gain control (AGC) 5, 188–189, 222,
367 223, 241
array loudspeakers: line arrays 45, 46, 48, 71–72, automatic volume control 188
73, 98–106, 100, 101, 102, 104, 105; two- AVB see audio video bridging (AVB)
dimensional arrays 106, 106 averaging acoustical measurements 348
array microphones 120–121, 121, 123
Articulation Index (AI) 234 background noise measurements 326–327
Articulation Loss of Consonants 32, 244–245, balloon plots 86, 89
245 basilicas 405
artificial human speakers 326, 327 BEM see boundary element method (BEM)
ASA see Astro Spatial Audio (ASA) Beranek, Leo Leroy 21
assembly halls 11, 208 Berlin Cathedral, Germany 408–409, 409, 410
assistive listening systems (ALS) 208–211, 209, Berlin Main Station, Germany 354, 356, 357
210, 211, 212 Best Master Clock Algorithm (BMCA)
ASTM C423 standard 258, 277 298–299, 299, 299
Astro Spatial Audio (ASA) 62, 64 binaural auralization 168, 171, 254, 272, 274,
audio networking 283–320; advantages and 280–281, 280
disadvantages 284–285; Best Master Clock binaural localization 39–40, 40, 41
Algorithm (BMCA) 298–299, 299, 299; BMCA see Best Master Clock Algorithm
boundary clock switches 300–301, 301, 302, (BMCA)
311, 320; common mistakes 315–320, 316, boardrooms 17–18
317, 318; connection management 287, Bonjour 289, 302–303, 320
302–305, 308; connectivity 287–296, 287, boundary clock switches 300–301, 301, 302,
288, 291, 292, 293, 294, 296, 296, 308; 311, 320
device discovery 289, 302–303, 320; IGMP boundary element method (BEM) 93, 259, 262,
snooping 293–295, 294, 316, 320; 263
IP addresses 288–290, 315; latency 286, broadcasting facilities 12–14, 13
305–307, 307, 316, 318; link aggregation 314, Buddhist temples 401, 401
314; link offset 286, 286, 306–307, 307, 316,
318; network topologies 290–291, 291, 292; calibration and optimization 325–329, 326, 327,
packet delay measurement 297–298, 298; 328
packet headers 305; packet jitter 297, capacitive transducers 108–111, 109
300–301, 307, 311, 320; phase accuracy case studies: churches 408–410, 409, 410, 411;
285–286, 286; Precision Time Protocol (PTP) hotels 372, 376, 377, 377; mosques 413–414,
285–286, 287, 296, 297–302, 297, 298, 299, 414, 415; multipurpose halls 397–401, 398,
299, 300, 301, 302, 308, 312, 313, 318–319; 399, 400; museums 372, 373, 374, 375; paging
419
Index 419
and voice alarm systems 354–358, 355, 356, Dante 283, 295–296, 303–304, 311, 312–313,
357, 359, 360; sports venues 364, 365, 366, 313, 316
367, 368, 369, 370; synagogues 402–405, 403, DRR see direct to reverberant ratio (DRR)
404; theatres, opera houses and concert halls deconvolution method 136–137, 332–333, 334,
355–358, 359, 360, 384–391, 385, 386, 387, 341, 341
388, 389, 390, 391, 392; transportation hubs delay issues 149–151, 151
354, 355, 356, 357 delay systems 186, 187, 198; acoustic
cathedrals see churches localization without 194–195, 196; multi-
Catholic churches 405 channel 202–203, 202, 203, 204, 205, 206;
CD horns see constant directivity (CD) horns single-channel 201
CDPS (complex directivity point source) model Delta Stereophony System (DSS) 202, 202
261–262, 278 device discovery 289, 302–303, 320
CDS see Cinema Digital Sound (CDS) format Differentiated Services Code Point (DSCP)
Central Synagogue, New York City 404, 405 value 295–296, 296, 296, 305, 308
Chladni, Ernst F.F. 21 DiffServ (Differentiated Services) 295
Christianity see churches diffuse field sensitivity, defined 126
churches 11–12, 102, 102, 103, 134, 189–190, diffuse reflections 23, 24
190, 405–410, 406, 407, 408, 409, 410, 411 diffusion, defined 26
CIDR notation 290 Digital Theater System (DTS) 15
Cinema Digital Sound (CDS) format 15 digitally controlled (DSP) loudspeaker arrays
cinemas 14–17, 14, 15, 16, 17 71–72, 73, 74, 102, 103, 104, 145
circular pistons 94, 94 Dirac delta function 78
clarity 25, 31–32, 232–233, 232, direct to reverberant ratio (DRR) 219, 219, 220,
234–235, 383 231–233, 231, 232, 234, 244
classrooms 18 directed reflections 23, 24
clipping protection 35 directional factor: loudspeakers 91; microphones
clubs 8 128
CobraNet 283, 290 directional gain: loudspeakers 86; microphones
coherence, speech intelligibility and 245 128
comb filter effects 8, 52, 53 directivity deviation factor, loudspeakers 91
commissioning: calibration and optimization directivity factor: acoustic gain calculation
325–329, 326, 327, 328; documentation 192–193, 193, 193, 213; loudspeakers 90–91,
329–330; functional testing and installation 92; microphones 128
verification 322–323; subjective evaluation directivity index: loudspeakers 88–91, 90;
329; troubleshooting 323–324; see also microphones 116, 128
acoustical measurements directivity of loudspeakers 71, 74, 83–92,
computer modelling see acoustic modelling 93–106; circular pistons 94, 94; digitally
concert halls see theatres, opera houses and controlled (DSP) arrays 71–72, 74, 102,
concert halls 103, 104; directional factor 91; directivity
condenser microphones 108–111, 109, 113, 115, deviation factor 91; directivity factor 90–91,
119, 124–125, 124, 126, 129 92; directivity index 88–91, 90; display of
cone-tracing methods 159–160, 263, 269, 86, 87, 88, 89; efficiency and 91–92, 92;
269 horn-loaded systems 94–97, 95, 96, 97; line
Congress Centrum Suhl, Germany 397–399, arrays 71–72, 98–106, 100, 101, 102, 104,
398, 399 105; measurements 83–86, 84, 85; speech
consonants, percentage loss of 244–245, 245 intelligibility and 219; two-dimensional arrays
constant directivity (CD) horns 95–97, 95, 96, 106, 106; two-way systems 94, 97–98, 98, 99
97 directivity of microphones 113, 114, 116–122,
Constellation system 60–62, 61 117, 118, 120, 121, 123, 127–128
convention centres 365, 369–372, 397–399, discotheques 8
398, 399 distortion: loudspeakers 80–83, 82, 83;
corporate environments 17–18 microphones 115, 127; speech intelligibility
coverage angle, microphones 128 and 220
coverage issues 140–149, 141, 142, 143, 144, documentation 329–330
145, 146, 147, 148, 149, 150 Dolby standard 14–15, 14, 15
coverage, uniformity of 220 dot-decimal notation 290
critical distance 27, 28 double hearing 183, 186
420
420 Index
DSCP (Differentiated Services Code Point) FIR filters 59, 59, 265, 265
value 295–296, 296, 296, 305, 308 fire detection regulations 3, 4
DSP (digital signal processors) see digitally fluctuating FIR (fluc-FIR) 59, 59
controlled (DSP) loudspeaker arrays flutter echoes 26, 177, 181
DSS see Delta Stereophony System (DSS) FM transmission 211, 212
DTS see Digital Theater System (DTS) Fourier analysis 331, 341
dynamic microphones 111–113, 112, 115, 119, Fourier transform 78, 136–137, 138, 259, 345
124–126, 129 free field sensitivity, defined 126
dynamic transducers: loudspeakers 68–70, 69, frequency response 134–139, 136, 137, 138,
70; microphones 108, 111–113, 112 139, 155–156, 155, 173; loudspeakers 74,
75, 76, 77, 78; measurements 327–329, 328;
echoes: behaviour 38–39; defined 25; flutter 26, microphones 113, 115, 118, 156; simple
177, 181; perceptibility of 39; suppression/ sound systems 190; speech intelligibility and
elimination 8, 151, 152, 153, 198, 198, 223, 217, 246–248, 246, 247, 248
241 frequency shifters 154–155
educational facilities 18 frequency sweeps 323, 332–333, 334–335, 336
effect signal sound systems 381, 382 frequency weighting curves 36–37, 36, 330
Elbphilharmonic Hall, Hamburg, Germany functional testing 322–323
355–358, 359, 360 fundamental frequency 52, 81
electrical functional testing 322–323
electrical optimization 325 geometrical reflections 23, 24
Electro Voice horn loudspeakers 96, 97 Getec-Arena, Magdeburg, Germany 364, 365,
electroacoustic enhancement systems 11, 23, 366, 367
57–62; Acoustic Control System (ACS) Green Point Stadium, Cape Town, South Africa
58–59, 58; Active Field Control (AFC) 364, 368, 369, 370
59, 59, 60; Amadeus Active Acoustics 62, group delay distortion: loudspeakers 76–77, 78;
65; Astro Spatial Audio (ASA) 62, 64; microphones 113
Constellation system 60–62, 61; multipurpose
halls 394–396, 396, 399; theatres, opera Hagia Sophia, Istanbul, Turkey 12
houses and concert halls 11, 380, 381, 384; Hamad International Airport, Doha, Qatar 354,
Vivace system 62, 63 355, 356
electroacoustic measurements 233–235, handheld sound level meters 330, 332, 354, 371,
327–329, 328 372
electronic interference 223, 324 hearing impairment 222, 245; assistive listening
electronic microphone rotators (EMRs) 59, 59 systems (ALS) 208–211, 209, 210, 211, 212
Ember+ 311–312 hearing threshold and range 35–38, 36, 37, 38,
energy-time curves (ETC) 30–31, 30, 31, 231, 131–132, 132
231, 233, 233 Helmholtz, Hermann von 21
equalization 155–156, 155, 156, 220 hitless merge 315
equivalent sound absorption area 26, 174, 187 home cinemas 16–17, 17
exhibition halls 6, 181–182, 365, 369–372 horn-loaded loudspeaker systems 94–97, 95, 96,
97
false ceilings 179 hotels 5–6, 365, 369–372, 376, 377, 377
fast Fourier transform (FFT) 78, 136–137 howling see acoustic feedback
FDTD see finite difference time domain method humming 323–324
(FDTD)
feedback see acoustic feedback iconostasis 406, 407
FEM see finite element method (FEM) IEC standards: IEC 60268-1 114, 115; IEC
figure of eight directivity, microphones 113, 116, 60268-4 114; IEC 60268-5 93; IEC 60268-16
117, 119, 120 45, 228, 229, 236, 239, 240, 243, 272; IEC
filters: acoustical measurement and 348; active 61672 36–37, 36; IEC 61672-1 132; IEC
and passive 156; equalization by filtering 61938 115, 125
155–156, 155, 156; FIR 59, 59, 265, 265; IGMP snooping 293–295, 294, 316, 320
narrow band 154; notch 154 image source method 262–263, 266–268,
finite difference time domain method (FDTD) 267
259, 264 immersive sound systems 16, 206, 207
finite element method (FEM) 259, 263, 264 impedance, loudspeakers 69, 75–76, 76, 77
421
Index 421
impulse response 134–139, 136, 137, 138, loudness perception 35–38, 36, 37, 38
139; loudspeakers 78–79, 79; room impulse loudspeakers 68–106; aiming 161, 161, 162;
response (RIR) modelling 259–261, 261 alignment measurements 349–350; artificial
impulse tests 334 human speakers 326, 327; ceiling grids
induction loops 209–210, 209 176–181, 177, 178, 179; as complex systems
information system layouts 176–190, 353–354, 70–72, 72, 73; coverage issues 140–149, 141,
371; ceiling loudspeaker grids 176–181, 177, 142, 143, 144, 145, 146, 147, 148, 149, 150;
178, 179; complexes of individual rooms digitally controlled (DSP) arrays 71–72, 73,
182; factory and exhibition halls 181–182; 74, 102, 103, 104, 145; directional factor 91;
flat rooms 176–181, 177, 178, 179, 180, 181; directivity deviation factor 91; directivity
horizontally radiating loudspeakers 181, display 86, 87, 88, 89; directivity factor
181; outdoor areas 182–187, 184, 185, 186; 90–91, 92; directivity index 88–91, 90;
reverberant halls 189–190, 190; suspended directivity measurements 83–86, 84,
loudspeakers 180–181, 180; transportation 85; directivity of circular pistons 94, 94;
hubs 182–183, 187–190, 188, 189 distortion 80–83, 82, 83; dynamic transducer
information systems 3, 4, 5, 6, 176, 352–354, principle 68–70, 69, 70; efficiency 91–92, 92,
365; case studies 354–358, 355, 356, 357, 171–172; frequency response 74–77, 75, 76,
359, 360; standards 3, 4; see also information 77, 78, 83–84; group delay distortion 76–77,
system layouts 78; horizontally radiating 181, 181; horn-
infrared transmission 210–211, 210, 211 loaded systems 94–97, 95, 96, 97; impedance
installation verification 322–323 69, 75–76, 76, 77; impulse response 78–79,
intelligibility see speech intelligibility 79; line arrays 45, 46, 48, 71–72, 73, 98–106,
interference, electronic 223, 324 100, 101, 102, 104, 105; linear dynamic
Internet Group Management Protocol see IGMP behaviour 74–80, 75, 76, 77, 78, 79, 80, 81;
snooping parameter overview 72–74; point source
inverse filtering 332 model 71, 72; power handling 92–93;
IP addresses 288–290, 315 sensitivity 75, 75, 83, 90, 92, 92; sound level
IP networks see audio networking calculation 174–175; sound power levels
IPMX standard 309–310 71, 88–90, 90, 174–175; step response 79,
Islam see mosques 80; suspended 180–181, 180; time domain
ISO standards: ISO 354 258, 277; ISO 3382 behaviour 78–80, 79, 80, 81; two-dimensional
272; ISO 3741 88–89, 90; ISO 7240 244; ISO arrays 106, 106; two-way systems 94, 97–98,
17497 259; ISO 60268-5 75; for voice alarm 98, 99; waterfall diagrams 79–80, 81; see also
systems 3, 4 speech intelligibility; system layouts
isobar plots 86, 88, 98, 99 LTI see linear time invariant (LTI) systems
Lusail Hotel, Doha, Qatar 372, 376, 377, 377
JBL 2360 horn loudspeaker 95, 96, 96
Judaism see synagogues magnetic transducers 108
masking effect 37–38, 37, 38; speech
Kircher, Athanasius 20 intelligibility and 237–238
maximum length sequences (MLS) 331,
Lavalier microphones 123, 124, 126 336–339, 337, 338
layouts see information system layouts; system maximum transmission unit (MTU) 295, 305
layouts mDNS 289, 302–303
Lee effect 65 mechanical installation verification 323
line array loudspeakers 45, 46, 48, 71–72, 73, mechanical optimization 325
98–106, 100, 101, 102, 104, 105 media production facilities 12–14, 13
linear dynamic behaviour, loudspeakers 74–80, meeting rooms, corporate 17–18
75, 76, 77, 78, 79, 80, 81 microphones 108–130; array microphones
linear time invariant (LTI) systems 78, 136 120–121, 121, 123; condenser microphones
link aggregation 314, 314 108–111, 109, 113, 115, 119, 124–125, 124,
link offset 286, 286, 306–307, 307, 316, 318 126, 129; directivity 113, 114, 116–122,
localization, acoustic 39–40, 40, 41 117, 118, 120, 121, 123, 127–128; distance
localization errors 349 to sound source 129; distortion 115, 127;
locally directed reflections 23, 24 dynamic microphones 111–113, 112, 115,
log sweep 335, 336 119, 124–126, 129; environmental conditions
loop amplification 45, 51, 52, 54, 55, 56 115, 129–130; equivalent input noise level
42
422 Index
115, 127; handheld 128–129; interconnection networking see audio networking
of 114, 124; Lavalier microphones 123, 124, NMOS (Networked Media Open Specifications)
126; maximum sound pressure level 115, 127; 309, 310
musical instrument-mounted 123–124, 124; noise cancellation techniques 222, 241
parameter overview 114–116; power supply noise criteria 133, 133, 134
115, 124–125, 125; pressure gradient receivers noise-dependent volume regulation 188
119, 120; pressure receivers 116–119, noise measurements 326–327
118; ribbon microphones 112, 113, 119, noise rating 133, 134, 135, 135
126; selection guidelines 128–130, 383; noise-to-signal ratio see signal-to-noise (S/N)
sensitivity 110, 113, 115, 116, 125–126; ratio
shotgun microphones 122, 123; signal-to- notch filters 154
noise (S/N) ratio 126–127, 222; speaker-worn Nyquist frequency 332
122–123, 124; stand-mounted 129; transducer
principles 108–113, 109, 112, 114; for octave band spectral analysis 230, 230, 231
transportation hubs 188–189 Ohel Jakob Synagogue, Munich, Germany 402,
Milan 283–284, 290, 310–311, 311 403
mixing consoles, architectural implications open-air sports venues 7–8; information system
49–50 layouts 182–187, 184, 185, 186
MLS see maximum length sequences (MLS) Open Control Alliance (OCA) 310
MLSSA (Maximum Length Sequence System opera houses see theatres, opera houses and
Analyzer) 331, 339 concert halls
modelling see acoustic modelling optimization 325–329, 326, 327, 328
Modulation Transfer Index (MTI) 236, 240, Organ and Concert Hall Kharkov, Ukraine 391,
241 391, 392
monaural localization 39 Orthodox churches 406, 407
Monte Carlo approach 263, 268–270, 268, 269, OSI model 290
270
mosques 11–12, 410–414, 414, 415 packet delay measurement 297–298, 298
MTI see Modulation Transfer Index (MTI) packet headers 305
MTU see maximum transmission unit (MTU) packet jitter 297, 300–301, 307, 311, 320
multicast and unicast 291–295, 293, 294, 308, paging systems 3, 176, 352–354, 365; case
315 studies 354–358, 355, 356, 357, 359, 360;
multipurpose halls 11, 380, 393–401; acoustics see also information system layouts
and system design 393–396, 394, 395, pain threshold 35, 131
396; case studies 397–401, 398, 399, 400; passive filters 156
electroacoustic enhancement systems percentage loss of consonants 244–245, 245
394–396, 396, 399; integration of sound performing arts centres: clubs/discotheques 8;
system 49; integration of sound systems 47, music venues 8–9; see also multipurpose halls;
48; measurement approaches 397; mixing theatres, opera houses and concert halls
consoles 49; reverberation times 29, 393, phantom power 115, 124–125, 125
395–396, 399; sport events 359–361; system phantom sources 40, 198, 199, 200
layouts 206–208, 397; target measures phase accuracy 285–286, 286
396–397 ‘phon’ scale 35, 330
museums 6, 365, 369–372, 373, 374, 375 piezoelectric transducers 108
music clarity 31–32 ping command 315, 316
music venues 8–9; see also multipurpose halls; pink noise signals 220, 242, 323, 332, 333, 334,
theatres, opera houses and concert halls 335, 342, 342
musical instrument-mounted microphones pink sweep 335
123–124, 124 polar plots 86, 87, 94, 94
Musiktheater Linz, Austria 384–389, 385, 386, polarity testing 350, 351
387, 388, 389, 390, 391 positive acoustic feedback 3, 45, 50, 52, 55, 56,
173–174, 176, 190, 191
narrow band filters 154 power handling, loudspeakers 92–93
National Museum, Beijing, China 372, 373, power supply, microphones 115, 124–125,
374, 375 125
naturalness of sound reproduction 173, precedence effect 39–40, 40, 194–195, 197, 199,
199–206, 200, 202, 203, 204, 205, 206, 207 202
423
Index 423
Precision Time Protocol (PTP) 285–286, 287, ribbon microphones 112, 113, 119, 126
296, 297–302, 297, 298, 299, 299, 300, 301, ring topology 291, 292
302, 308, 312, 313, 318–319 RIR see room impulse response (RIR)
pressure field sensitivity, defined 126 room acoustics 22–32; critical distance 27,
pressure gradient microphones 119, 120 28; energy-time curve 30–31, 30, 31;
pressure microphones 116–119, 118 fundamentals 23, 24; general issues 22–23, 22;
Protestant churches 405 measurements 325–327, 326, 327, 343–344,
pseudo-random noise 78, 137, 331, 332, 334 344; reverberation time 26–30, 28, 29; speech
PTP see Precision Time Protocol (PTP) intelligibility and music clarity criteria 31–32;
public buildings 3–6; convention centres 365, subjective assessment of sound quality 25–26;
369–372, 397–399, 398, 399; exhibition halls see also acoustic feedback; electroacoustic
6, 181–182, 365, 369–372; hotels 5–6, 365, enhancement systems
369–372, 376, 377, 377; museums 6, 365, room impulse response (RIR) 259–261,
369–372, 373, 374, 375; shopping malls 4; 261
see also information systems; multipurpose room-size impression 25
halls; transportation hubs room transfer function 261, 262
pyramid-tracing methods 159, 263, 269, 269 RTAs see real-time analysers (RTAs)
Pythagoras of Samos 20 RTP (Real-Time Protocol) 305
424 Index
signal-to-noise (S/N) ratio 133; microphones 232, 234, 244; electroacoustic measurements
126–127, 222; speech intelligibility and 233–235, 329; frequency response and 217,
217–218, 220, 222, 228, 230, 230, 231, 234 246–248, 246, 247, 248; hotels and museums
Simple Network Management Protocol 369–371; measurement 233–235, 326;
(SNMP) 310 modelling 166, 169; primary factors affecting
simulations see acoustic modelling 216, 217–219, 218, 219; secondary factors
sine sweep signals 137, 242, 242, 323, 331, affecting 216–217, 220–223; signal-processing
332–333, 334–335, 336, 339 and 222, 223, 241; signal-to-noise (S/N) ratio
slash notation 290 and 217–218, 220, 222, 228, 230, 230, 231,
SMAART (Sound Measurement and Acoustical 234; sound energy ratios 32, 232–233, 232,
Analysis in Real Time) 341 234–235; speech signal and system design
SMPTE standards: SMPTE ST 2022-7 315, 223–233, 223, 224, 225, 226, 227, 228, 229,
315; SMPTE ST 2059-2 300, 312, 319; 230, 231, 232, 233; subjective intelligibility
SMPTE ST 2110 308, 309–310, 309 tests 32, 233, 234, 329; summary of design factors
SNMP see Simple Network Management 248–249; see also Speech Transmission Index (STI)
Protocol (SNMP) Speech Transmission Index (STI) 3, 32, 228,
SOR see source-oriented reinforcement (SOR) 235–244; description 235–240, 235, 236,
sound energy density 27, 55–56, 59, 174 237, 238; modelling 166, 169; qualification
sound energy ratios 32, 232–233, 232, 234–235 bands 239, 240; relationship with % loss of
sound focussing 220, 233, 233 consonants 245, 245; STIPA 236, 237, 239,
sound level calculation 174–175 240–243, 242, 243, 245, 326, 336; typical
sound level distribution 173, 177, 194, 196, 201 applications 239; use and limitations
sound power absorbed 26–27 240–244, 241, 242, 242, 243, 244
sound power levels 33; loudspeakers 71, 88–90, Speech Transmission Index for Public Address
90, 174–175 Systems (STIPA) 236, 237, 239, 240–243,
sound pressure levels (SPL) 22, 32–33, 131–133, 242, 243, 245, 326, 336
132; hearing threshold 35, 131–132, 132; spine/leaf architecture 291, 292
loudspeakers 71, 74, 75, 88–90, 90, 174–175, SPL see sound pressure levels (SPL)
185–186, 185; maximum 115, 127, 256, 329; sports venues 6–8, 358–364; arenas and large
measurements 326, 327, 329; microphones sport halls 7, 358, 359–361, 364, 365, 366,
115, 127, 188–189; modelling 162–164, 166, 367; case studies 364, 365, 366, 367, 368, 369,
166, 167, 168; speech intelligibility and 217; 370; first guidelines for 42, 43; information
sports stadia 6–7; troubleshooting 324 system layouts 182–187, 184, 185, 186;
sound propagation 32–40; acoustic localization integration of sound systems 47; multipurpose
39–40, 40, 41; echo behaviour 38–39; halls 359–361; small sport halls 358–359;
loudness perception and masking effect stadia 6–7, 42, 43, 361–364, 362, 368,
35–38, 36, 37, 38; in open air 32–35, 33, 34, 369, 370
35, 185, 185; precedence effect 39–40, 40, stadia 6–7, 42, 43, 361–364, 362, 368, 369, 370
194–195, 197, 199, 202 stage monitoring 65–66, 156, 381–383, 382
sound reinforcement systems: basic requirements standards: for audio networking 307–311, 309;
47, 173–174; categories 1–2, 57; components for voice alarm systems 3, 4; see also AES
1, 2; historical overview 41–45, 42, 43, 44, 45, standards; IEC standards; ISO standards;
46; integration into architectural design 47–50, SMPTE standards
48; use of 57–58, 62–66; see also electroacoustic star topology 291, 291
enhancement systems; system layouts step response, loudspeakers 79, 80
source-independent measurements (SIM) 341 STIPA see Speech Transmission Index for Public
source-oriented reinforcement (SOR) 203, 203, Address Systems (STIPA)
204, 205, 206 STI see Speech Transmission Index (STI)
spaciousness 25, 179, 181, 378, 379, 383; see also STIPA see Speech Transmission Index for Public
electroacoustic enhancement systems Address Systems (STIPA)
spanning tree protocol (STP) 313–314, 314 STP see spanning tree protocol (STP)
spatial impression 25, 31 stream discovery 303, 320
speech intelligibility 5, 6, 31–32, 215–249, 216; stream formats 304–305, 306, 306, 308, 319
Articulation Loss of Consonants 32, 244–245, stream redundancy 314–315, 315
245; ceiling loudspeaker grids 178–179, 179; subjective evaluation/intelligibility tests 32, 233,
coherence 245; direct to reverberant ratio 234, 329
(DRR) and 219, 219, 220, 231–233, 231, subnet masks 289–290, 315
425
Index 425
summing localization effect 40 time varying control (TVC) 59, 59
swept sine signals 137, 242, 242, 323, 331, total harmonic distortion (THD): loudspeakers
332–333, 334–335, 336, 339 81–83, 83; microphones 115, 127; speech
synagogues 11–12, 134, 401–405, 402, 403, 404 intelligibility and 220
system layouts 175–176; assistive listening tour-guide receivers 211, 212
systems (ALS) 208–211, 209, 210, 211, 212; transducer principles: loudspeakers 68–70, 69,
immersive sound systems 206, 207; multi- 70; microphones 108–113, 109, 112, 114
channel systems 197–199, 198, 202–203, Transmission Control Protocol (TCP) 292
202, 203, 204, 205, 206; multipurpose transparent clock switches 301, 302, 311
systems 206–208, 397; naturalness of sound transportation hubs 5; case studies 354, 355,
reproduction 173, 199–206, 200, 202, 203, 356, 357; information system layouts
204, 205, 206, 207; simple sound systems 182–183, 187–190, 188, 189
190–196, 191, 193, 193, 195, 196, 197; travel time phenomena 149–151, 151
single-channel delay system 201; see also troubleshooting 323–324
information system layouts TSN see time-sensitive networking (TSN)
SysTune measurement system 341–342, 341 TVC see time varying control (TVC)
two-dimensional loudspeaker arrays 106, 106
tail estimation methods 269 two-way loudspeaker systems 94, 97–98, 98, 99
TCP see Transmission Control Protocol (TCP)
TDS see time-delay spectrometry (TDS) UDP see User Datagram Protocol (UDP)
TEF see Time Energy Frequency (TEF) unicast and multicast 291–295, 293, 294,
analyser 308, 315
temperature, sound propagation and 33–34, 34 uniformity of coverage 220
THD see total harmonic distortion (THD) universities 18
theatres, opera houses and concert halls 9–11, User Datagram Protocol (UDP) 292, 293, 310
206, 377–391; assistive listening systems
(ALS) 208–211, 209, 210, 211; case studies video conferencing rooms 18
355–358, 359, 360, 384–391, 385, 386, virtual LANs (VLANs) 287–288, 293, 303
387, 388, 389, 390, 391, 392; effect signal virtual room acoustic systems 60–62, 61
sound systems 381, 382; electroacoustic virtual sound cards 319
enhancement systems 11, 380, 381, 384; Vitruv 20, 377
first guidelines for 42, 43; functions of Vivace system 62, 63
sound systems in 380–383, 382; integration voice alarm systems 3, 4, 5, 6, 176, 352–354,
of sound systems 47–49, 48; measurement 365; case studies 354–358, 355, 356, 357,
approaches 384; microphone selection 381; 359, 360; standards 3, 4; see also information
mixing consoles 49–50; mobile systems 381; system layouts
repertoire and all-purpose theatre 379–380; volume control 188
reverberation times 9, 10, 29, 29, 326, 378,
380, 391, 392; single-purpose facilities waterfall diagrams 79–80, 81, 345, 346, 347
378–379; stage monitoring 65–66, 156, wave-based models 259, 261, 270
381–383, 382; system layouts 384; target wave-field synthesis (WFS) 15, 16, 58
criteria 383 weather, sound propagation and 32–35, 33, 34
THX standard 14–15, 15 weighted sweep 335, 336
time-delay spectrometry (TDS) 331, 339, 340 weighting curves 36–37, 36, 330
time domain behaviour, loudspeakers 78–80, 79, WFS see wave-field synthesis (WFS)
80, 81 whistling see acoustic feedback
Time Energy Frequency (TEF) analyser 331 white sweep 335, 336
time-sensitive networking (TSN) 283–284, 290, Wilson, Woodrow 41
310–311, 311 wind speeds, sound propagation and 34–35, 35
426