ACEX30-18-112 Vasishta Kanthi
ACEX30-18-112 Vasishta Kanthi
VASISHTA KANTHI
VASISHTA KANTHI
Typeset in LATEX
Printed by [Name of printing company]
Gothenburg, Sweden 2019
iv
Investigation on the Effect of Loudspeaker Ringing on Perceived Spectral Balance
VASISHTA KANTHIMATHINATHAN SUBRAMANIAN
Department of Architecture and Civil Engineering
Chalmers University of Technology
Abstract
The Ringing artefacts of a loudspeaker can have a large impact on the perception of music
in home, studio and car audio systems. A large contributing factor to this ringing effect
is given by the driver’s natural resonance. In the case of car audio systems, additional
mounting components and cavities within a car’s door and body of the driver can enhance
the ringing effect. Unfortunately, the ringing effect cannot be eliminated completely, as it
is an intrinsic property consequent to any resonating object. Alternatively, its effect can be
controlled. A thorough investigation on different methodologies for detection and measure-
ment of the driver’s natural resonances, as well as the perceptual effect of ringing caused
and enhanced by the resonances from cavities, mounting enclosures and other components
of a car door was implemented. A comparison of different systems, created with synthetic
transfer functions was performed, in which artificial low frequency resonances were em-
ployed. These artificial low frequency resonances have similar characteristic properties of
the driver resonances, in which they are detected and measured by using signal processing
techniques involving Cumulative Spectral Decay (CSD) and Continuous Wavelet Trans-
form (CWT) plots, as well as a System Identification Method that uses the Steiglitz -
McBride algorithm. These methods rely on the impulse response measurement of a loud-
speaker. The methodologies successfully detected the driver’s natural resonance frequency
and its strength in terms of Quality (Q) factor, with a minimal degree of variance from
each other. From these results, three low frequencies and their corresponding Q factors
were chosen, and were set into a listening test platform as second order IIR peak filters.
This was implemented by using MATLAB’s GUI and Simulink interface. The test used
two music audio of different genres: a recorded jazz ensemble and a synthetic electronic
ensemble. To emulate the effect of delayed resonances, three delay time values were given
into the resonance settings, and the test was carried out. The test primarily focuses on
three case studies: the threshold of audibility of resonances, the threshold at which a non
resonant system with a parametric bandpass filter is perceptually similar to a resonant
system, and the threshold at which a system’s resonance is inaudible when the same filter
is used. The results show a high degree of dependency between the frequency in test and
the audio. Very low frequencies are found to have the highest audibility thresholds in the
case of recorded audio. The control of ringing is dependent on the type of audio played,
as indicated by the high variance in the results with the recorded audio in comparison to
the synthetic audio. The effect of delayed resonances has had a minimal impact on the
results. This shows a good promise, and further future work is definitely needed in order
to create a metric for loudness of a mounted car loudspeaker resonance, as this metric can
intuitively suggest a severity threshold to the effect of ringing in the spectral balance of
audio. This thesis work serves as a starting point into developing the metric for ringing
loudness.
Keywords: Ringing, Resonance, Quality Factor, Centre Frequency, CSD, Wavelet,
System Resonance, Listening Test
v
Acknowledgements
I would like to thank my Supervisor, Jens Ahrens, who has been the backbone of my
progress and inspiration to reach my intended goal in my current field of interest.
Thank you for all the patience, time and energy spent in carrying out this project
successfully, and at the same time, giving me all the knowledge needed in the field,
which I will take forward for years to come, and teach others as well.
I am very thankful and proud of my classmate and friend, Nikolaos Chrysovalantis
Roumpakis, who not only discussed on critical problems and issues in great detail,
but also has supported me morally and personally during the course of the thesis
and studies.
I thank all the staff and students at the Division of Applied Acoustics at Chalmers
University of Technology, who have been providing a friendly and motivating envi-
ronment.
Finally, I thank my parents and grandparents. Thank you for all the moral and
emotional support throughout my years in life, through my times of despair and
happiness, which has honed me to be the man today
vii
Contents
List of Figures xi
1 Introduction 1
1.1 Scope of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Theory 5
2.1 Impulse Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Laplace Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.5 Cumulative Spectral Decay (CSD) . . . . . . . . . . . . . . . . . . . . 7
2.6 Continuous Wavelet Transform (CWT) . . . . . . . . . . . . . . . . . 7
3 Description 11
3.1 Ringing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Ringing Example: Pulse . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3 Ringing Example: Genelec 8020B . . . . . . . . . . . . . . . . . . . . 14
3.4 Perceptual Model to Test Loudspeaker Ringing . . . . . . . . . . . . 16
4 Methods 19
4.1 Impulse Response Measurement . . . . . . . . . . . . . . . . . . . . . 19
4.1.1 Measurement Equipment . . . . . . . . . . . . . . . . . . . . . 19
4.1.2 Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2 Detection & Estimation of Resonances . . . . . . . . . . . . . . . . . 20
4.2.1 Method I: Cumulative Specral Decay (CSD) and Continuous
Wavelet Transform . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2.1.1 CSD Method . . . . . . . . . . . . . . . . . . . . . . 20
4.2.1.2 Continuous Wavelet Transform (CWT) . . . . . . . . 25
4.2.2 Method II: Pole - Zero Identification Method . . . . . . . . . . 29
4.3 Listening Test Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.3.1 Simulink Models . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.3.2 MATLAB Graphical User Interface (GUI) . . . . . . . . . . . 32
4.3.2.1 Control Panel for Listening Test I . . . . . . . . . . . 33
4.3.2.2 Control Panel for Listening Test II . . . . . . . . . . 34
4.3.3 Listening Test Experiment . . . . . . . . . . . . . . . . . . . . 34
ix
Contents
5 Results 37
5.1 Methodology Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.1.1 Cummulative Spectral Decay (CSD) . . . . . . . . . . . . . . 37
5.1.2 Continuous Wavelet Transform (CWT) . . . . . . . . . . . . . 40
5.1.3 System Identification Method . . . . . . . . . . . . . . . . . . 41
5.1.4 Comparison of All Methods . . . . . . . . . . . . . . . . . . . 45
5.2 Listening Test Results . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.2.1 Test I - Threshold of Audibility . . . . . . . . . . . . . . . . . 46
5.2.2 Test II - Threshold of Equivalence . . . . . . . . . . . . . . . . 49
5.2.3 Test III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.3 Overview and Discussion on the Listening Test . . . . . . . . . . . . . 57
5.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Bibliography 63
A Appendix I
A.0.1 Cumulative Spectral Decay (CSD) . . . . . . . . . . . . . . . I
A.0.2 Continuous Wavelet Transform (CWT) . . . . . . . . . . . . . II
A.0.2.1 3D and 2D Magnitude Plot: . . . . . . . . . . . . . . II
A.0.2.2 2D FrontView, Full and Half Slices: . . . . . . . . . . III
A.0.3 System Identification Method (Steiglitz - McBride) . . . . . . IV
A.0.3.1 Full Impulse Response and Frequency Response Re-
construction: . . . . . . . . . . . . . . . . . . . . . . IV
A.0.3.2 Sliced Impulse Response and Frequency Response
Reconstruction: . . . . . . . . . . . . . . . . . . . . . V
x
List of Figures
xi
List of Figures
A.1 Cummulative Spectral Decay Plot of Genelec 8020B and Genelec 7050B I
A.2 Continuous Wavelet Transform - Magnitude Plot of Genelec 8020B
and Genelec 7050B . . . . . . . . . . . . . . . . . . . . . . . . . . . . II
A.3 Continuous Wavelet Transform - 2D Front View and location of Res-
onances of Genelec 8020B and Genelec 7050B . . . . . . . . . . . . . III
A.4 Impulse Response, Frequency Response Reconstruction of Genelec
8020B and Genelec 7050B . . . . . . . . . . . . . . . . . . . . . . . . IV
A.5 Impulse Response, Frequency Response Reconstruction of Genelec
8020B and Genelec 7050B . . . . . . . . . . . . . . . . . . . . . . . . V
xii
List of Tables
xiii
List of Tables
xiv
1
Introduction
Music, in its most intrinsic form, is the art of combining tones or sounds in order,
created by instruments of different forms, that create a certain or specific composi-
tion, that creates harmony and expression of emotion. These instruments may be
vocal, percussive, strings, air flow, or even electronic. Each of these instruments
have a unique characteristic in tonality, timbre, colour and quality, depending on
its assembly and function to produce the sound, that specifically defines the role
of the instrument in a certain ensemble. The main characteristic in question, is
resonance. Their resonant behaviour describes the type of tonality, timbre, colour
and quality the instrument may have, and its purpose in a composition. In fact,
it is due to the sound of these resonances that many musical genres today have an
indirect dependency. For example, the classical sound of Flamenco comes from the
classical Spanish guitar. In Indian classical music such as Carnatic, the Saraswathi
Veena, Mridangam and Tampura are some unique instruments that have specific
resonant behaviours that define the quality of the genre. In Jazz, the incorporation
of piano, saxophone, trumpet drums, guitar and contra bass give their unique fea-
ture individually, or as a whole. In all types of genres, the human voice is the most
common instrument, that has several resonances, such as mouth resonance, nasal
resonance, chest resonance, etc., in which each resonance give a very distinctive
quality and colour to the tonal sound of the human voice. In essence, resonances
are a crucial part in the art form of music. At some level, it dictates the type, the
harmony and emotion to its art. However, resonances from other bodies can prove
to be hazardous. Any object that can vibrate, will have a resonant frequency in
which, theoretically, would cause the object to vibrate indefinitely. As objects are
subjected to natural dampness from its environments, its vibrational energy will
dissipate over a short period of time. The main difference in this case, is that the
decay time of a vibrational object at its resonant frequency is the longest, and this
period of longevity causes the problem of "Ringing".
This brings the implication that not every resonance in an acoustical system provides
any pleasantness, especially with loudspeaker driver and port-reflex resonances. The
effect of these loudspeaker resonances result in the addition of unwanted, sustained
sounds that can merge with audio. This is conveniently called ringing noise. Fortu-
nately, these resonances being low frequency resonances that range between 40 - 50
Hz in the case of monitor and mid-range loudspeakers, and 15 - 45 Hz in the case of
sub woofers can be minimised. There has been a considerable amount of research
for decades on minimising the effect of low frequency driver resonances. Most of to-
day’s high end loudspeakers such as Genelec, Neumann, Creative, etc., have a good
1
1. Introduction
overall frequency response. Most of this research focus on the design aesthetics of
the loudspeaker driver and enclosure, usage of filters, etc., which improve the Thiele
- Small Parameters and overall frequency response.
The same cannot be said in the case of car loudspeakers. High end loudspeakers
such as Genelec, Neumann, etc., have their own, well designed enclosures and electric
circuitry that produce its high quality audio, with an overall flat frequency response.
Car loudspeakers, however, have a common enclosure, being the car’s internal body.
This suggests the problem of having a non-stationary enclosure mount, that could
induce vibrations directly into the driver, primarily due to the car movement and
secondarily due to a varying compliance. As a result, there will be a shift in the
driver’s resonance frequency. Modern cars of today have enclosure mountings with
some degree of damping, that isolate the effect of vibrations. In spite of this, there
is still an eerie of doubt in regard to the audibility of these resonances, especially
when audio is played.
Another issue that car loudspeaker systems have is the inclusion of cavities within
its mounting. As new car designs come by, the interior design may not take into
account the loudspeaker’s design aesthetics. As a result, this would induce cavity
holes around a driver’s mounting. The main danger in this case is the production
of cavity resonances. As explained in the beginning, these resonances will have its
characteristic perceptual behaviour, and can affect the perceptual spectral balance
if the input energy is high. Since the driver can induce high energy vibrations and
since the driver’s resonant frequency could be shifted due to the varying compliance
experienced due to the car’s internal enclosure, the probability of inducing audible
cavity resonances is high.
Unlike a loudspeaker driver’s resonance, it is much harder to control the effect of
cavity resonances, as it is implicit to the interior design of the car. Localization of
these resonances is complex, as it varies with varying interior designs. Conventional
Sound Pressure Level (SPL) measurements are impractical, as it is rather complex
to deduce level differences between a resonance and background noise when heavy
backward masking effects is experienced. Psychoacoustical models to deduce loud-
ness and roughness of these resonances can also be hard to detect for the same
reason.
In spite of all the nuances, fortunately the effect of ringing noise by these resonances
can be dealt with. But some questions do arise:
1. How does one quantify the loudness and roughness of ringing? As discussed
already, conventional methods are impractical
2. How much control is necessary to minimise the ringing noise effectively? One
could use filters or damping methods to minimise the effect, but does that
come at the cost of affecting the spectral balance of audio?
3. How much does a Loudspeaker Driver’s Resonance contribute to the effect of
car-induced resonances, knowing that the primary source of vibrational energy
is from the loudspeaker’s driver?
4. How can one determine a standard or metric to give a severity value to Ring-
ing? Since car-interior designs change over short periods of time, how does
one ensure that the new designs fall within agreeable limits?
2
1. Introduction
3
1. Introduction
4
2
Theory
A major part of the methodologies used to detect and estimate resonances, as well
as creating the resonance model involves many signal processing concepts and tech-
niques. In this section, a basic theory of several techniques and methods used to
evaluate the perception of loudspeaker ringing is discussed. Most of the theory
explained in this section are based on the signal processing theory.
From the block diagram in figure 2.1, shows a general representation of a system
with an impulse response h(t), with an input signal x(t) and output signal y(t).
The impulse response h(t) is generally represented as the ratio between the output
signal y(t) and the input signal, x(t).
Y (s)
H(s) = (2.1)
X(s)
where H(s) is the Laplace Transform of h(t). The right side of equation 2.1 gives
the Transfer function of a system.
2.2 Convolution
Convolution is an operation of two functions to form a third function, in which it
expresses the how the shape of one is modified by the other. This is an important
concept in signal processing, as it suggests how a signal’s response can be altered
by the response of a system.
5
2. Theory
In equation 2.2, the envelope of the impulse response h(t) modifies the input signal
x(t), which leads to a modified signal y(t).
Z ∞
F (s) = f (t)e−st dt (2.4)
0
where ‘s’ is a complex number equal to σ + jω. The formal definition states that
the real variable time ‘t’ can be transformed into a complex frequency variable ‘s’.
LT has an intrinsic property, which states that the convolution of two signals is
equivalent to the product of their Laplace transforms. Looking at equation 2.2,
y(t) = x(t) ~ h(t) (2.5)
L[y(t)] = L[x(t) ~ h(t)] (2.6)
Y (s) = X(s)H(s) (2.7)
LT plays an important role in control theory, as it can represent the convolution
of a linear time invariant system as a multiplication factor and allows to define the
transfer function of a system ( as shown in equation 2.1).
Z ∞
F (ω) = f (t)e−jωt dt
−∞
Like LT, the FT has the intrinsic property of representing the convolution of two
signals as the multiple of their Fourier Transforms. Looking at equation 2.2,
6
2. Theory
Equation 2.10 can also represent the transfer function of the system. The main
difference between the transfer functions in LT domain and FT domain is that LT
can represent the transient response of a system, where as FT can represent the
steady state response of a system.
In signal processing, the FT of a signal is implemented by the Discrete Fourier
Transform (DFT) of the signal, which is given by the equation:
N −1
2π
f (k)e−j N nk
X
F (n) = (2.11)
0
7
2. Theory
1 −t2
φ(t) = √ ejω0 t e B (2.14)
πB
where ‘ω0 ’ indicates the centre frequency of the wavelet, and ‘B’ is the bandwidth
parameter, that has a control over the decay of the oscillation in the wavelet [9].
u=1 u = 20
The scaling factor applied is used to stretch or compress the wavelet, such that it
changes the frequency of the wavelet. Figure 2.2 shows the morlet wavelet, when
unscaled and scaled. The convolution of the impulse response and these scaled
wavelets would obtain a filtered impulse response, representing an approximate fre-
quency, given by the following equation.
fc fs
f= (2.15)
u
8
2. Theory
fc being the centre frequency of the wavelet, fs being the sampling frequency of the
wavelet, and ‘u’ being the scaling factor.
9
2. Theory
10
3
Description
To have a clear conscious on the problem and the scope of the thesis, this chapter
describes the definition of Ringing, how it is created, and how it affects the response
of a system. It also describes how a perceptual model can be implemented to evaluate
the perception of ringing in the spectral balance of audio.
3.1 Ringing
When audio is played through a loudspeaker, if the audio consists of frequencies that
match with the resonance frequency of the loudspeaker, the diaphragm will start to
vibrate indefinitely1 and produces its own sound, in which not only will affect the
quality of audio, but will also keep vibrating and have a sustained effect, even when
the signal has stopped. This is conveniently known as "Ringing". The cause for the
loudspeaker ringing effect is due to the fact that the driver is a form of a mechanical
mass-spring system, in which the cone attached to the wire coil acts as the mass,
and the foam or rubber surrounds that joins the inner and outer edges to the frame
acts as the spring2 [3]. As a mass-spring system, there will be a frequency that
causes the diaphragm to vibrate indefinitely. This frequency, being the mechanical
resonance frequency of the driver. This also results in a generation of a back electro-
motive force (EMF) that travels back to the loudspeaker cables and to the power
amplifier. Essentially, any vibrating object that has a mass-spring characteristic
would definitely have a resonating frequency, thereby having the tendency to induce
ringing.
Ringing is a characteristic associated with resonance. The amount of ringing from
a resonance depends on the strength and Q of the resonance. In loudspeakers,
the ringing effect of the mechanical resonance from the driver can be perceptive,
depending on the loudness of the input signal, and the strength of the resonance.
Generally, ringing does not add new frequencies, unlike distortion, rather ringing
sustains existing frequencies[4]. This may not always be the case, if the strength
of the resonance is high. For instance, a room having prominent modes can flatten
audio that have frequencies close to the resonant frequency. The amount of flatness
is proportional to the strength and Q of the room resonance.
1
Theoretically, a resonance will vibrate indefinitely. In reality, a resonance will be subjected to
damping, in which the energy will dissipate over time
2
i.e., The diaphragm of the driver
11
3. Description
0.8 0
Magnitude in dB
Amplitude
-10
0.6
-20
0.4
-30
0.2
-40
0 -50
0 0.005 0.01 0.015 102 103 104
Time in seconds Frequency in Hz
Figure 3.1: Time Signal and Frequency Response of a Short Duration Pulse
The signal is passed through two peak filters, in which both have a centre frequency
fc equal to 1000 Hz and gain of 18 dB. The Q factors of both filters are set to 6 and
24. The resultant output is shown in figure 3.2.
Amplitude
0.5
0.4
0 0.2
0
-0.5
-0.2
-1
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
Time in seconds Time in seconds
The effect of ringing can be clearly seen in both cases. Although this example is
seen in the case for a pulse, the same effect also occurs with audio or speech passed
through a loudspeaker. When the input signal contains frequencies that align closely
to the centre frequency, it will result in a boost, as well as sustain. The amount of
sustain is proportional to the Q value. In figure 3.3a, although the Q is seen to be
12
3. Description
Frequency Response of Pulse with Filter Frequency Response of Pulse with Filter
40 40
20 20
0 0
Magnitude in dB
Magnitude in dB
-20 -20
-40 -40
Filter Parameters: Filter Parameters:
fc = 1000 Hz fc = 1000 Hz
-60 -60
Q=6 Q = 24
Gain = 18 dB Gain = 18 dB
-80 -80
101 102 103 104 101 102 103 104
Frequency in Hz Frequency in Hz
(a) Q = 6 (b) Q = 24
Clearly, the boost at 1000 Hz can be seen. However, there is no information shown
on the evolution of frequencies over time. In order to do this, a Short Time Fourier
Transform (STFT) is needed to have a time - frequency display. STFT has the time
- frequency resolution issue, depending on the type of window used to implement
the plot. An alternate approach to the general STFT is the cumulative spectral
decay (CSD), which is a form of STFT, in which it uses a unit step function as the
window.
(a) Q = 6 (b) Q = 24
The spectral decay profiles are more prominent, as shown in figure 3.5. The sustain
observed in figure 3.2 can now be correlated to the given CSD plots. It is much clear
to see how the time taken for the resonance to decay varies with varying Q.
13
3. Description
10-3
8 0
fs = 2400 Hz
fs = 2400 Hz
6 -10
4
-20
2
-30
0
-40
-2
-50
-4
-6 -60
-8 -70
0 0.05 0.1 0.15 0.2 0.25 100 101 102 103
(c) CSD
Figure 3.5: Impulse Response, Frequency Response and CSD of Genelec 8020B
Monitor Loudspeaker
The drop in magnitude observed in 3.5b is a consequence of the anti aliasing filter
approaching the Nyquist frequency, as a consequence for resampling the impulse
response.
The same filters used earlier with Q values of 6 and 24 are used on the IR, with
the same gain of 18 dB. This time, the centre frequency is shifted to 45 Hz instead,
since the system’s response starts to be flat from 40 Hz, as observed in 3.5b, as well
3
The procedure and reason for downsampling is explained in Chapter 4
14
3. Description
10-3 10-3
8 8
Filter Parameters: Filter Parameters:
6 fc = 45 Hz 6 fc = 45 Hz
Q=6 Q=6
4 Gain = 18 dB 4 Gain = 18 dB
2 2
Amplitude
Amplitude
0 0
-2 -2
-4 -4
-6 -6
-8 -8
0 0.05 0.1 0.15 0.2 0.25 0 0.05 0.1 0.15 0.2 0.25
Time in seconds Time in seconds
Magnitude in dB
-20 -20
-30 -30
-40 -40
-50 -50
-60 -60
-70 -70
100 101 102 103 100 101 102 103
Frequency in Hz Frequency in Hz
Figure 3.6: Impulse Response, Frequency Response and CSD of Genelec 8020B
Monitor Loudspeaker through filters
15
3. Description
As expected, the filters boost the 45 Hz, as well as give a long sustain. The fil-
ters also alter the magnitude of other neighbouring frequencies. This is a similar
phenomenon observed in room modes, as explained earlier. Frequencies around the
centre frequencies will have an altered response due to the strong resonance.
16
3. Description
system identification method involving the Steiglitz - McBride Algorithm [11]. This
may need a perceptual model to test the effect of ringing through a series of listening
tests, in order to justify the methodology. To undertake this listening test properly,
and to have a considerate thought on determining the metric needed for reference,
three perceptual models are created, answering three questions:
• Threshold of Audibility: What is the minimum level with respect to an
audio’s level, for a user to be able to perceive the ringing effect
• Threshold of Equivalence: What is the level boost required for a non-
resonant system to have the same bass energy as a system with resonance
• Threshold of Flatness: What is the level cut required for a resonant system
to have a flat response
The second and third questions listed above are rather intuitive, as it suggests the
energy required to have a perceptive resonance, and the energy required to nullify
the audibility of the resonance respectively.
The creation of a loudness metric to the perception of resonance ringing is rather
a complex task, as several factors need to be taken into account, such as the de-
pendency on the acoustic environment that can influence the audibility of these
resonances [4]. This suggests that several tasks would be required in order to cre-
ate the metric, wherein each task would have to be looked upon thoroughly and in
detail before proceeding to the next task. As a starting point and as the scope of
this thesis, it is necessary to have a form of verification to conclude whether the
resonance ringing does have any influence to the spectral balance of audio, and to
create a methodology to detect and estimate these resonances.
It would be then necessary to have a range of centre frequencies along with its
corresponding Q, in order to have an evaluation for several loudspeaker cases, such
as a case for sub woofers and a case for mid range drivers. The main frequency
of interest is the low frequency resonance, being the mechanical resonance of the
loudspeaker drivers as well as the additional resonances from cavities and enclosures
in car bodies. It may be expected that the low frequency resonances of drivers would
range between 10 - 90 Hz, but this is just a speculative thought, and a more adverse
and exact determination of a loudspeaker’s resonance is necessary. In order to do
so, two methods are followed in which each determines the centre frequency and
corresponding Q value. Both methods will be compared, to show their accuracy
and reliability. This would require the impulse response measurement of several
loudspeakers, in order to show variety. Both the methods as well as the measurement
procedure will be explained in detail in Chapter 4.
Upon determining the range of centre frequencies and corresponding Q values, the
Listening test model can be implemented, to answer the above mentioned questions
needed to evaluate and determine the possibility to create the loudness metric as a
reference.
17
3. Description
18
4
Methods
19
4. Methods
1m
Signal
DAQ
Conditioner
PC
20
4. Methods
CSD plot essentially represents the magnitude decay over time of each frequency.
The CSD of a loudspeaker’s driver can be determined from the impulse response
measurement described in the previous section.
From the equation described in Chapter 2, the CSD is the FFT of the signal at
different time intervals. This can be obtained by multiplying the signal with a series
of unit step windows, in which are separated by a time interval ∆t. This can be
illustrated as follows:
10-4
1.5
0.8
0.5
0.6
Amplitude
Amplitude
0 t
0.4
-0.5
0.2
-1
-1.5
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Time in Seconds Time in Seconds
1
Amplitude
-1
-2
-3
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Time in Seconds
Figure 4.2: Impulse Response,Unit Step Window and Sliced Impulse Response
The signal, being the impulse response of the subwoofer KH805, is multiplied by a
unit step window starting at a time ∆t, after which the resultant signal shown in
figure 4.2c undergoes the FFT process. The resultant response will correspond to
the frequency response of the subwoofer, after ∆t seconds. So in essence, to compute
the CSD, the first slice will correspond to the FFT of the signal with the unit step
window at t = 0 , and the consecutive slices will correspond to multiples of the time
interval ∆t, i.e., ∆t, 2∆t, 3∆t, 4∆t, etc.
Computing the CSD over the whole frequency range is computationally expensive,
and would require a large RAM memory in order to process the information ef-
ficiently. Moreover, this expense can become larger, depending on the resolution
21
4. Methods
20 -20
0 -40
Magnitude in dB
-20 -60
-80
-40
-100
-60
-120
-80
-140
-100
0 -160
0.5 -180
2
10
-200
Time in seconds 1 10 1
Frequency in Hz
These plots are made into waterfall plots, since this kind of plots can visually rep-
resent the decay of frequencies over time clearly. The clarity of the content in the
plots is dependent on the time interval ∆t. For a signal of 1 second, it is important
to have a small ∆t in order to have enough slices within the length of the signal.
For example, if a time interval of 0.1 seconds is chosen, for a sampling frequency
of 1200Hz, this time interval will correspond to 120 samples. Hence for a signal of
22
4. Methods
1200 samples in length, only 10 slices will be obtained. This is illustrated in figure
4.3.
Although one can see the magnitude decay in figure 4.3, the sheer lack of slices
makes the information indistinguishable, in terms of events. A small ∆t is definitely
required, in order to observe the events in the plot. For the same resampled signal,
a time interval of ∆t = 0.001 seconds is chosen, which corresponds to 1000 slices.
This leads to the following plot:
The sheer increase in number of slices show the clarity and resolution of the infor-
mation shown in the plot.
Although the view and resolution of the plot in figure 4.4 seems to be clear, this can
further be improved by modifying the unit step window, a process known as apodiza-
tion , a concept introduced by John D.Button and Richard H.Small [8]. The authors
suggest a method of smoothing the edges and ridges by utilising alternate windows
such as a rectangular window, triangular window, Kaiser window, Blackmann win-
dow and a Gaussian window. The difference here is that the process of utilising
these windows should correspond to the similar use of the Unit Step window, i.e.,
to have time slices with a time interval ∆t. Out of all the windows, the Gaussian
window gives the best smoothing outcome. The reason for this modification will be
explained in detail in Chapter 5.
Upon applying the modified window, the following plot is obtained:
23
4. Methods
Clearly, the content in figure 4.5 is much smoother than in figure 4.4, especially
when the edges and ridges have been smoothened. In this case, a time interval of
0.005 seconds is chosen. This is because apodization method employed requires the
the time interval in samples to be a divisible factor of the total number of samples
of the time signal. Previously, the CSD was computed using the heaviside function
in MATLAB, which is essentially the unit step function. Any time interval can be
used in this case, since the function uses interpolation in order to find the probable
location of the time interval within two samples. This process of interpolation is
not necessary in implementing the apodization, since the time interval chosen gives
a good resolution, as seen in figure 4.5.
The centre frequency of the resonance can be determined at this stage, by calculating
the decay times for certain magnitude drops, ranging from a -5 dB drop to -40 dB
drop, in steps of 5 dB. This can be achieved by linear regression, in which a line is
fit to procure the required drop in level.
The Quality factor Q is calculated, by estimating the frequency bandwidth at a 3
dB drop from the centre frequency. As one might see from the CSD example in
figure 4.5, the resonances appear in the plot after a certain time. Considering the
frequency response of the entire impulse response, one can’t determine the exact 3
db point at the centre frequency, because the magnitude of the adjacent frequencies
obscure the resonance. Upon taking a certain slice at a certain time, the shape of
24
4. Methods
the resonance will be clearer, and so the frequency bandwidth can be determined.
The Q value can be determined from the following equation:
fc
Q= (4.1)
∆f
fc
Q= (4.2)
f2 − f1
The concept of wavelets has been used for many decades, and in recent years, it
has found its application in loudspeaker response analysis. The authors in [2],[9]
and [10] show a great deal in its application in giving a time-frequency display of
loudspeaker responses, as well as a great deal in modal analysis and reverberation
time estimations.
1
In CWT analysis, the scale coefficients represent the content of the signal at an approximate
frequency range
25
4. Methods
In the case of loudspeaker response analysis, the Morlet Wavelet is used, since its
properties satisfies the necessary conditions to extract the information needed in
this case [9][10]. The authors in [10] propose to instigate the property of FFT,
since convolution can be computationally expensive. In other words, the CWT can
alternatively be processed by taking the product of the FFT of the impulse response
and the FFT of the wavelet, and taking the IFFT of the final resultant.
The clarity in the content of the plot clearly shows the sustain effect of resonances.
The balance in resolution for time and frequency is appropriate enough to visually
distinguish the events. At times, it may not be necessary to view the plot in 3D, as
transient events can be clearly be distinguishable when viewing in 2D. This can be
seen in the figure 4.7.
26
4. Methods
-30
-40
-50
Frequency in Hz
102
-60
-70
-80
-90
1
10
-100
The detection and estimation of the resonances had been unfortunately challenging
in this case, and although the CWT plots show the effect of resonances clearly,
detecting the resonance through the decay profile method proved to be tricky in
this case. The major reason being that the the data points in the frequency slice
follows an almost perfect curve. Fitting a line through this data set is impractical.
Moreover, since the frequency range is given in terms of scale coefficients, improving
the number of frequency bins is redundant.
This has lead to find an alternate approach to detect and estimate the resonances.
In this case, a rather robust approach, based on a postulate given in [2] is followed.
It mentions that ideal system containing resonances with equal Q and relative equal
bandwidth show a true perceptual relevant visualization of these resonances. An
example plot obtained from [2] shows the decay pattern of such a system.
The importance in this diagram is the behaviour and visual pattern of the reso-
nances. Having this as the visual reference into identifying a resonance, the es-
timated CWT is now plotted in 2D, in this case, approximate frequency against
magnitude.
27
4. Methods
Figure 4.8: Ideal system with two resonances of equal Q - Source : [2]
-20
-30
-40
Magnitude in dB
-50
-60
-70
-80
101 102
Approximate Frequency in Hz
Figures 4.8 and 4.9a are compared in order to identify the possible resonance loca-
tions. Since the high density of slices in figure 4.9a, 12 slices are chosen so as to have
a better visualization, as shown in figure 4.9b. The annotated frequencies can be
seen in the plot, showing the locations of the resonances. As mentioned already, this
method is rather a robust way of identifying the resonances, since it involves a visual
interpretation of one, instead of automating the process like in other methods, but
since the goal is to verify whether CWT can detect and estimate the resonances, it
is fair to follow this method.
28
4. Methods
Like in the CSD method, a slice after a certain time is taken, and the Q values at the
located resonances are estimated. The following table shows the estimated results
for all the loudspeakers.
R R∗
HR (s) = + (4.3)
s + α + jβ s + α − jβ
s + α + β vr
HR (s) = 2r (4.4)
s2 + s ωQn + ωn2
where,
−1/2
ωn = α 2 + β 2 − Natural Resonant Frequency
ωn
Q= − Q - Factor
2α
From the above equation, upon estimation of the poles, the centre frequency and
corresponding quality factor of the resonance can be determined. This requires the
estimation of the transfer function coefficients in the S-domain. It is given by the
following equation:
m
bi s i
P
B(s)
H(s) = = Pni=0 k (4.5)
A(s) k=0 ak s
It can be seen that to estimate the transfer function of the resonant component of the
system, the knowledge of coefficient values an and bm , as well as the order m and n is
needed. Authors in [2] and [3] propose methods to reconstruct the transfer functions
through mathematical models and algorithms. The Steiglitz - Mcbride non Linear
29
4. Methods
Least Squares Estimation method [2] is used in this case. Unfortunately, estimation
of poles and zeros in the S - domain is numerically unstable. To counteract this, the
estimation of poles and zeros is done in the Z - Domain instead. The corresponding
transfer function equation in the Z- Domain is as follows:
m
B(z −1 ) −i
P
i=0 bi z
H(s) = = Pn
A(z −1 ) k=0 ak z
−k
The above equation is can be reduced by partial fraction expansion, in which the
constituent residues and poles resemble the transfer function equation given in equa-
tion 4.3. The Z-Domain poles can be converted into S-Domain poles by the impulse
invariant transformation method, given by the following equation:
1
spk = ln zpk
T
From here, the frequency and quality factor for each pole can be determined. From
the decay time estimation, the frequencies can be compared with the estimated
poles, which will give the corresponding quality factor.
The Steiglitz - Mcbride non LSE method of determining the transfer function is a
complex problem, and mathematically solving it can be a tedious task. Fortunately,
MATLAB has an in-built function called stmcb() that performs the exact estimation
of transfer function coefficients. Unfortunately, the function requires an input order
number for the numerator and denominator. The accuracy increases with higher
order number. With the given sample rate, fs = 51.2kHz, this will require a huge
order number value, and will be computational expensive. As in the case for previous
methods, the impulse response is down sampled by decimation, to a sample rate
fs = 1200Hz.
This method of determining the resonant frequencies and corresponding Q factor
is repeated for different loudspeaker measurements, in which it will give the range
of driver resonance frequencies and its corresponding quality factor. The range of
centre frequencies and Q factor values are given in the table below:
30
4. Methods
which block models can be fabricated that can be executed on a real time basis. It
is widely used in simulations of many processes and systems in the engineering field,
and in combination with MATLAB program, it provides a good post processing
ability. Given below are the models that will perform three listening tests.
The first model consists of an input audio channel, in which two audio signals: a
recorded jazz ensemble and a synthetic electronic ensemble, have been chosen. The
audio signal is then fed to a resonance filter with parameters estimated from the
previous section, and is added back to the original audio signal. The resonances are
delayed in time, to emulate the response of delayed resonances with 0 ms amplitude
response. The delay times are chosen arbitrarily, but with the conscience of having
a real case delayed resonance scenario. The model is integrated with a Graphical
User Interface (GUI), that has settings that can change the gain parameter of the
resonance, so that the user can adjust the audibility of the resonance, as well as an
A/B option to compare the reference audio signal with the Stimuli.
31
4. Methods
The second model consists of the same input audio channel. The audio signal is
convolved with a Synthetic Impulse Response (SIR), which has the response of the
delayed resonances that was used in the first model. This SIR is created by passing
an ideal pulse through the same resonance filter used in the first model and is added
back to the original pulse. The resultant SIR is convolved with the audio signal by
Overlap - Save Fast convolution method, which is fed to a boost peak filter with
adjustable gain. The output signal from the peak filter is compared with the original
audio signal,ison will suggest the similarity in audible resonance between the two.
A GUI will be used for the used to adjust the gain of the peak filter, to suggest the
amount of similarity between the two signals.
The third model is similar to the second model, with the difference that the convolved
signal is passed through a Cut peak filter, and on the same GUI, the user will remove
the effect of the resonance by changing the gain of the peak filter, and is compared
with the original audio signal.
The first model performs the threshold of audibility, and is the first listening test.
This will be undertaken separately to the other two tests, as the results of this test
will determine the level at which the chosen resonances are audible for the given
audio signal. With this information, the initial conditions for the resonant filter will
be set, such that it would be clearly audible for the second and third tests.
32
4. Methods
Audibility to find the minimum level at which a resonance is perceived. The second
one will test both the Threshold of Equivalence and the Threshold of Flatness, to
find the level required for a peak filter to match the level of the resonance and to
remove the effect of the resonance respectively.
The control panel for the first Listening Test is shown in figure 4.13. It consists of
a Gain slider, a reference-to-stimulus toggle switch, playback buttons and a Next
Button. This control panel is used for testing and controlling the settings of Model 1,
shown in figure 4.10. The gain slider adjusts the gain level of the resonance filter in
4.10, and the reference-to-stimulus switch controls the switch between the unfiltered
audio track and the filtered audio track. The Next button saves the gain level value
and loads a new stimulus setting to the resonance filter. The markings on the side
of the gain level slider are ambiguous to the actual gain level of the resonance filter,
as this test would require the level to be less transparent to a user, as part of the
experiment.
Unlike conventional A/B comparison listening tests, this listening test runs on real-
time. This is an essential part for all tests, since the threshold level is subjected to
variations with respect to a subject’s hearing sensitiveness. The users will be able to
adjust the level of the filter, whilst listening to the change in auditory perception of
the test audio. This will greatly facilitate a subject’s accuracy in determining their
threshold.
33
4. Methods
The control panel for the second Listening Test is shown in figure 4.14. This panel
has the same controls as shown in 4.13, with the addition of a similarity slider, which
is used to rate the similarity between the reference and stimulus audio. This panel is
used in controlling the second and third listening models. Moreover, the gain slider
in this control panel will be used to adjust the level of the peak filter, instead of
the resonance filter. Similar to the previous test, the gain level markings are kept
ambiguous, in order to keep opaqueness to the actual level set on the filter, as well
as the test is run on real time.
34
4. Methods
are randomised. Moreover, in order to have surety that the subjects are choosing
the right levels without alteration, the audio samples are repeated once, which in-
creases the total number of audio samples to 36. Furthermore, the gain levels of the
resonance filter vary randomly with change in audio sample, ensuring that the test
isn’t preconceived.
In the first listening test, the reference signal will be the raw audio signal and
the stimulus signal will be the audio signal with the delayed resonance. The test
commences when the playback button is pressed. While the audio signal runs, the
subject can switch between the reference channel and the stimulus channel, without
the need to stop the playback. The subject can also adjust the gain level of the
resonance filter while the audio is running. The task for this test involves the subject
to adjust the gain level of the resonance filter, up to a level in which the resonance
in the stimulus channel is minimally heard. Upon determining the adequate level,
the subject can press the next button, after which the chosen gain is recorded and
the next resonance setting is loaded, all of which is implemented while the audio
signal is running. The task is implemented for all the stimulus settings.
In the second listening test, there are two separate setting involved, depending on
the listening model. For the second listening model, the reference signal is the audio
signal with the delayed resonance and the stimulus signal will be the audio signal
with the boost peak filter. In the case of the third listening test model, the reference
signal is the raw audio signal and the stimulus signal is the audio signal with the
delayed resonance and the cut peak filter. Like the first listening model, the test
can be implemented without stopping the audio playback, and also the subject can
switch between the reference channel and stimulus channel without stopping the
audio playback. The gain slider will alter the gain level of the peak filter while the
gain of the resonance filter is fixated. The task for the subject this time, will be
adjusting the peak filter gain, depending on the listening model. For the second
listening model, the task involves adjusting the peak filter gain, so as to match
the loudness level of the reference signal. For the third listening model, the
task involves adjusting the peak filter gain so as to remove the effect of the
resonance in the stimulus channel. Once the subject determines the adequate
level for both listening models, the subject can press the next button, after which
the chosen gain is recorded and the next resonance setting is loaded.
The listening tests were conducted in the listening test room, at the Division of
Applied Acoustics, Chalmers University of Technology. The test directly ran from
MATLAB, into the Motu Ultralite MK4 Sound interface, that gives an adequate
flat and reasonable sound level. In order to have a clear perception of low frequency
resonances, using conventional loudspeakers would not suffice, as the effect of the
room may influence the experiment. In order to tackle this, the subjects were made
to listen through Harman AKG K702 Headphones, which had a really good flat
frequency response, especially in the low frequency region.
35
4. Methods
(a) Motu Ultralite MK4 Sound interface (b) Harman AKG K702 Headphones
36
5
Results
All the three methods show a degree of agreement with each other in estimating the
centre frequency and the quality factor with a nominal degree of variance. This can
be summarized for each individual method in the following sections, in which a brief
discussion on the observations of each method is explained in detail, as well as its
limitations are given.
Although the methodology was applied for all the loudspeakers, the following plots
and observation shown will be based on the Neumann KH805 Subwoofer, since the
events that occur are quite similar with each other. The results for each loudspeaker
are given in the Appendix ??
The centre frequency of the loudspeaker’s driver resonance was identified through
the decay time profile plot, as shown in figure 5.1.
It may be noticed from figure 5.1 that there seem to be an increase in decay times
for larger dB drops. This may be a consequence of the linear fit method to estimate
the decay times, as fitting a line through a series of data points will vary in slope,
depending on the dB drop value. To understand this postulate, a comparison of two
frequencies for different drop levels is given.
37
5. Results
As the decay pattern differs for each frequency, fitting a line within the set of data
points for different dB drops will have different slopes, and for larger dB drops, the
slope will drastically be large. As a result, the estimated decay time, which based on
the time difference between two points on the line correspondent to the dB drop will
not necessarily correspond to the actual drop in level that is observed in the CSD
plot. A possible solution to counter this would be to fit two lines instead of one,
each line corresponding to the probable slopes of the data set given. This would be
the case, if one wants to observe the decay time for large drops in level, but it is not
necessary in this case, since the point of interest is to identify the resonances, whose
centre frequency tend to have longer decay times in comparison to other frequencies,
as clearly observed in figure 5.1.
The accuracy in estimating the centre frequency has a dependency over three major
factors. The first factor, being the unit step window used. As postulated in Chapter
4, in order to have a better resolution on the content and events in the CSD, it would
be necessary to apodize the window used to compute the CSD, due to the problem of
spectral leakage caused by the unit step window [8]. When computing the FFT of a
signal, the mathematical function assumes the signal to be periodic over the whole
signal length. This means that the moment the signal is altered, the periodicity
changes, which will affect the spectral content. Moreover in this case, the abrupt
change in signal content caused by the unit step window introduces artefacts into
the result, affecting the spectral content. This is clearly seen in figure la, wherein
38
5. Results
the magnitude of certain slices seem to suddenly have larger magnitudes than its
predecessor slice, especially in the low frequency region. This is a clear consequence
of spectral leakage. Apodization helps to reduce the effect of spectral leakage, by
smoothing the abrupt change caused by the unit step window.
The second factor is the length of the time signal. The number of samples within one
second can have an effect on the resolution of the events that occur. To show case
on this factor, the following plots will be a CSD comparison without apodization:
Figure 5.2 shows a comparison of the CSD of the subwoofer, with varying impulse
response lengths. Clearly between the 0.4-0.5 second time range, an event occurs,
that cannot be seen in figure 5.2a. It is crucial to have a long impulse response signal
in order to observe the events that can occur, even though the impulse response
length of loudspeakers are supposed to be short.
There were a number of challenges and issues that rose in the quality factor esti-
mation. Initially, due the spectral leakage introduced by the unit step window, the
resolution of the CSD plot was unclear, and the method applied to estimate the Q
factor had given extraneous values that were wrong. This resulted in the application
of the apodization technique to improve the resolution, and had given more reliable
results.
The following table gives the estimation of centre frequencies their corresponding
quality factor for all the given loudspeakers.
39
5. Results
Table 5.1: Estimated Centre Frequency and Quality Factor for Loudspeakers using
CSD
40
5. Results
Table 5.2: Estimated Centre Frequency and Quality Factor for Loudspeakers using
CWT
10-4
1.5
Original IR
Reconstructed IR
1
0.5
Amplitude
-0.5
-1
-1.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Time in Seconds
41
5. Results
using the in-built stmcb() function in MATLAB, which required to give an input
numerator and denominator order. Since the impulse response signal used has been
downsampled to 1200 Hz, the maximum order number that gave the best recon-
truction signal was 1200. The numerator order was chosen to be 1170 and the
denominator order was chosen to be 1200. The reason why the order numbers
weren’t the same is because this would have created an FIR filter response with no
poles. It is necessary to make sure that the chosen numerator order is lesser than
the denominator order. Figure 5.3 shows the comparison between the original im-
pulse response and the reconstructed impulse response. The algorithm had almost
perfectly reconstructed the impulse response, although it seems to show an exact
match. The slight difference can be seen in the frequency response , as shown in
figure 5.4
-20
Original
-30 Reconstructed
-40
-50
Magnitude in dB
-60
-70
-80
-90
-100
-110
-120
100 101 102
Frequency in Hz
Figure 5.4: Frequency response comparison of Original impulse response and re-
constructed impulse response
The minute differences can be seen clearly, although the frequency range of inter-
est shows an exact match, which is the required need in this case. Applying the
equations given in Chapter 4, the centre frequency and corresponding Q value were
estimated. At this stage, it is fair to express that a problem will arise, especially
when there are 1200 poles, giving 1200 frequencies with corresponding Q factors.
The obvious problem being, how to segregate the redundant pole frequencies, and
identify the true resonance frequency. One solution given in [2], is to use the Sin-
gle Value Decompostion method to reduce the order number, by decomposition the
42
5. Results
transfer function into two singular matrices and a diagonal matrix, and estimating
the residue of the diagonal matrix. The diagonal matrix gives the true order num-
ber, and values a transfer function. Another solution to this is to use the CSD slice
at the time chosen in the CSD method. The reconstructed signal created with the
estimated transfer function will correspond to the given number of poles and zeros
that have been arbitrarily given as input into the function. When observing the
estimated Q values, one will see that the values are really high, and none have Q
values below 15. This is because the estimated poles are matched to the impulse
response given, so as to give the consequent frequency response. This means that
if one could reverse the CSD slice chosen in the previous section into a time signal,
this time signal will correspond to the impulse response of the apparent frequency
response of the chosen CSD slice. Since the resonances are much evident after a
certain decay time, it would then be possible to estimate poles to match the impulse
response given, that give the required response.
The SVD method was attempted, but due to unexplained circumstances, it was
rather difficult to implement the method. The CSD slice method however, proved
to be easier. The IFFT of the chosen CSD slice was taken, to give a time signal,
that corresponds to the impulse response of the slice. This was used as input signal
into the function, and the procedure shown in Chapter 4 was followed. This lead to
the following plots:
20
Original IR
Reconstructed IR
0
-20
Amplitude
-40
-60
-80
-100
0 0.05 0.1 0.15 0.2 0.25
Time in Seconds
43
5. Results
The strange artefact seen in the beginning that has such a low amplitude, may be
a consequence of apodization, since the CSD slice was smoothened. Regardless, the
method almost accurately recreated the impulse response as shown in figure 5.5.
The frequency response is as follows:
-36
Original
Reconstructed
-37
-38
Magnitude in dB
-39
-40
-41
-42
101 102
Frequency in Hz
As expected the estimated poles now match the impulse response and give an almost
accurate frequency response. The frequencies and corresponding Q values can now
be estimated from the poles, and by matching the frequency at the peaks on the plot
in figure 5.6 with the estimated frequencies, the centre frequency of the resonance
and its Q value is obtained. This procedure is followed for all the loudspeakers, and
the following table is obtained.
Table 5.3: Estimated Centre Frequency and Quality Factor for Loudspeakers using
CSD
44
5. Results
Loudspeaker
Method Genelec 8020B Genelec 7050B Neumann KH805
Centre Fre- Quality Centre Fre- Quality Centre Fre- Quality
quency Factor quency Factor quency Factor
44 Hz 6.1 28 Hz 3.1 20 2.9
CSD
98 Hz 10 46 Hz 6.2 46 Hz 6.1
41 Hz 5.5 15 Hz 2.7 21 Hz 2.6
CWT
98 Hz 10 46 Hz 4.7 49 Hz 6.3
- - 28 Hz 3.8 20 Hz 3
Sys. Ident. Method
44 Hz 5.3 46 Hz 8.1 46 Hz 8.3
The common frequencies that have been estimated in all methods are tabulated, to
show the trend in their estimations. As it can be seen, in almost all the loudspeakers,
the methods were able to pin point the resonant frequencies, and were able to
estimate to approximate quality factor.
45
5. Results
The results for Test-I show that the relative level range between the audio level and
the resonance is between 3 - 6 dB, with the exception of Audio 1’s result for 35Hz,
as shown in figure 5.7.
The most probably cause for this wide difference is due to the fact that 35 Hz is rather
difficult to perceive if the ambient sound of the audio sample have high reverberance
and transients, which happened to be the case for Audio 1. The subjects experienced
a form of difficulty in determining the threshold specifically for this audio, due to
the extraneous amount of transients and events occurring in the audio.
The difficulty experienced for Audio 1 is also evident for the other frequencies, as
shown in figure 5.8 and 5.9, considering the range in variance, which is within 12
dB, and some cases even larger. However, the subjects felt that audio 2 was much
easier to determine. This is to be expected, since Audio 2 has no reverberance, and
being an electronic ensemble, it is synthetically created.
46
5. Results
This suggests how the choice of audio can vary the threshold of audibility. The
difference in threshold range for different audio obtained here is in par with the
results of Floyd and Olive [1], whom have tested subjects with white noise, pink
noise, an orchestra and a pop song. The estimated thresholds in [1] vary with audio
and frequency, the most noticeable variance being the low frequency. Although this
may be the case, there is still a noticeable variance in relative level for all cases.
This may be attributed by the number of subjects that have participated, and
the hearing capability of the subject. The estimated variable thresholds does not
necessary indicate that it corresponds entirely to the type of audio used, although
the 25% percentile and 75% percentile have a maximum range of 5 dB at most. This
can be regarded as an acceptable threshold value to be used as reference for Test II
and Test III.
Considering that the minimum level needed to perceive the 35 Hz frequency in Audio
1 is 10 dB, the resonance level for Test II and Test III were set to 18 dB. This value
itself was not enough to perceive the 35 Hz in audio 1, as it will be indicated in the
results, but having any higher levels can pose a danger of perceiving loud distortion
over other frequencies, since their minimum perceptual level were much lower. It
could also damage the AKG500 Headphones, affecting the listening test.
47
5. Results
48
5. Results
The results in figure 5.10 show the similar postulate in the dependency of audio in
perceiving the frequency. The variance in Audio 1 shows otherwise. The variance
in Audio 2 is very small, indicating the ease of being able to match the level. In
addition, since the perceptual threshold of audio 1 for 35 Hz was estimated to be 10
dB, it was clear that the subjects would again find difficulty in perceiving the 35 Hz
resonance in audio 1 at 18 dB. The surprising fact is how the subjects found the 35
Hz in audio 2 to be moderately similar to the resonance, in comparison to audio 1.
Initially, the expectation was to have a similarity index to be around 8 to 9, which
was the case for every other frequency. Upon receiving this result, a cross check on
the perception of the audio 2 for 35 Hz had to be made in order to ensure that this
was the actual case. Although as a biased subject, it was surprising to find out that
there was a small hint of modulation at this frequency, with respect to audio 2, even
though the initial thought was considering that the high variance in the percentiles
49
5. Results
itself may have been the outcome for having lesser number of subjects.
This may be an indication that the ringing effect of very low frequencies might
perceptually modulate the audio. As a matter of opinion, this could be just a special
case in regard to synthesized audio, such as Audio 2, since this audio contained
really low frequency bass transients. Moreover, this finding cannot be conclusive, as
it needs to have its own separate verification in order to have any conclusion out of
this.
In the case for 60 Hz, a wide variance is observed in the similarity plots for both
audio samples as shown in figure 5.13. This high variance may be the either due
to the light number of subjects or due to the sheer volume of the resonance, since
the level is high with respect to the threshold. This could be the case, since the
similarity indices shown in figure 5.13 are as expected.
In the case for 90 Hz, audio 1 shows to have a higher variance in comparison to audio
2, which is seen to have a nominal variance. This could be regarded as normal, since
90 Hz is regarded to be easily audible than the lower frequencies. This might not
be the case in audio 1, as postulated that the recording sound track can mask the
frequencies.
50
5. Results
51
5. Results
52
5. Results
Unlike the results in Test II, it is expected to observe a high variance in similarity
index for both audio samples. The reason being, the peak filter used to cut the
resonance effect can easily be misjudged in level setting. There is an ambiguity of
doubt on the exact level setting needed to reduce the resonance effect. There is a
high chance of over setting the level needed, and this can affect the similarity. This
is an obvious observation seen in all cases.
A key factor to be noted in Figure 5.17, 5.19 and 5.21 is the variabiliy in the
similarity index. A clear indication in this case is that, although the subjects were
able to match the reference audio in terms of gain level, the peak filter had an
adverse effect in the quality and timbre of the audio, and much more prevalent in
the case for Audio 1. The high variance in the results does suggest how much for
each subject the audio is similar.
53
5. Results
A possible reason for this high variance in similarity may be attributed to the fact
low Quality factor values have adverse effects on all frequencies around the centre
frequency of a peak filter. Determining the correct loudness may be tricky, especially
if the audio contains a lot of transients. Since the clarity in audio 2 is high enough
to perceive the correct loudness (as seen in figures 5.16, 5.18 and 5.20), the subjects
were able to judge the timbre and quality of the audio, as compared to audio 1. If
the subjects have perceived the wrong loudness, even as mucha as 3 dB, this would
be enough to affect the surrounding frequencies, and thereby making the perceived
audio less similar to the original.
54
5. Results
55
5. Results
56
5. Results
The listening test results show a good agreement to what is expected. In Test I,
the estimated perceptual threshold show the minimum level required to be able
to perceive the low frequency resonances. As indicated in [2], this threshold level
corresponds to a resonator’s steady state level, Lr . The audibility of a resonance
depends on the level difference between a resonance’s maximum level and the system
magnitude level, termed as ∆L. According to the author in [2], if ∆L is lower than
Lr , the resonance will not be audible. In the case of a high end loudspeaker like
the Genelec 8020B, whose level difference happens to be approximately 10 dB with
a resonance frequency at 43 Hz, the audibility of the resonance will depend on the
type of audio. If for example, audio 2 was used, the 43 Hz may correspond to a
level of 5 dB, indicating that the resonance will be audible. In the case of a sub
woofer like the Neumann KH 805, the level difference is much higher. This gives
the necessary indication that in car loudspeakers, especially for sub woofers, for any
additive resonances to be audible, their level differences have to be higher than the
estimated steady state level, which varies with the audio used.
In the assumption that these additive resonances are audible, the question now arises
as to how can one be able to measure the loudness of these additive resonances, and
how should one be able to control the loudness. Conventional sound pressure level
measurements are impractical because the resonance levels can be easily masked
with any audio signal used.
This is where the purpose of Test II and Test III comes in. Although not so evident,
the main purpose of fabricating these two tests is to create a model to emulate
any resonance. With the rise of new car models and designs, a necessary method
of verification would be needed to test the effect of the design of the mounting
structures given to the car loudspeakers. Having to measure the resonance levels
robustly is definitely impractical, time consuming and expensive. The methodology
used to detect and estimate the resonances proved to show that it is possible to
quantify and emulate any resonance. Combining the methodology and the listening
test model can greatly enhance the ability for one to immediately estimate the
effect of ringing, based on the design of the loudspeaker and mounting. In Test II,
determining the level equivalence of the resonances can give one a range of severity
levels to know when the ringing effect becomes severe, with respect to every type of
audio used. Test III determines whether using a filter could minimize the resonance
effect, while maintaining the spectral balance of the audio.
This indicates that one can create a loudness metric to the effect of ringing. Unfor-
tunately, as mentioned in Chapter 3, this requires several tests that involve different
environments and setups. However, this can pave way to the next steps of developing
the metric.
57
5. Results
5.4 Limitations
The methodologies used, as well as the listening test, come with many limitations.
In the CSD method, unfortunately the resolution of the 3D plot comes at the cost of
the resolution of the appodizing window. Inspite of being able to reduce the spectral
leakage caused by the unit step window, the appodizing window causes a slight loss
in power spectral density.
The CWT is very sensitive to change in scaling factor. Normally, to compute the
CWT of a signal, a scaling factor is chosen to have a value in powers of 2, and the
powers are proportional to the number of octaves needed to have adequate resolution,
both in time and frequency. The CWT calculation was implemented manually,
through MATLAB scripts, in spite of the program having the CWT functionality.
This is because the resolution is fixated, without any freedom to vary paramters,
such as the centre frequency, the bandwidth and the scaling factor. This would
hinder the visual accuracy of the frequency data points, as the scaling factor would
map the scaled wavelet for each frequency. Care is need here, in order to avoid
massive divergence. As a consequence, the scaling factor value was not chosen to
be in octaves, and thus hindered the resolution of the plot. With proper care and
better scripting on calculating the CWT, the resolution can be improved further.
The Steiglitz-McBride method to recreate the impulse response can cause instability
when creating the coefficients, depending on the length of the impulse response and
the number of samples. Having a high number of samples can overload the CPU, to
the point of crashing the system, if the RAM is overloaded. The resolution in this
case, will become a challenge, and an adequate balance would be needed to have the
best visualization and estimation of the resonances. Fortunately, since the focus is
on the low-frequency region, downsampling the impulse response helps in reducing
the coefficients needed to estimate the resonances. This would become problematic
when estimating mid and high frequency resonances. There are alternatives to
process long number coefficients, like parallel computing, which may be used to
improve the computational efficiency, and also make more stable coefficients.
The statistical box plots used in the listening tests come at the limitation of the
number of subjects that took part in the test. Some of the subjects have shown a
high degree of deviation, whom had to be excluded from the calculation, as it had
an impact on the result.
Originally, the prime focus on the usage of the methodology was on the measurement
and validation of cavity resonances on a car’s door panel. Due to the non availabil-
ity of resources and time constraints, the focus for validation of the methodology
was changed. The methodology would be more credible, if the chosen loudspeak-
ers had been from an existing loudspeaker from a car. The Genelec and Neumann
Loudspeakers have high fidelity and well dampened casings, that made the mea-
surement and estimation of resonances a challenging task. In spite of this, since
the methodology was well able to perform as it was supposed to be intended for,
this could be considered a plus factor, considering that cavity resonances may be a
bigger challenge to estimate.
58
6
Discussion, Limitations and
Conclusion
6.1 Discussion
The listening test results show a good agreement to what is expected. In Test I,
the estimated perceptual threshold show the minimum level required to be able
to perceive the low frequency resonances. As indicated in [2], this threshold level
corresponds to a resonator’s steady state level, Lr . The audibility of a resonance
depends on the level difference between a resonance’s maximum level and the system
magnitude level, termed as ∆L. According to the author in [2], if ∆L is lower than
Lr , the resonance will not be audible. In the case of a high end loudspeaker like
the Genelec 8020B, whose level difference happens to be approximately 10 dB with
a resonance frequency at 43 Hz, the audibility of the resonance will depend on the
type of audio. If for example, audio 2 was used, the 43 Hz may correspond to a
level of 5 dB, indicating that the resonance will be audible. In the case of a sub
woofer like the Neumann KH 805, the level difference is much higher. This gives
the necessary indication that in car loudspeakers, especially for sub woofers, for any
additive resonances to be audible, their level differences have to be higher than the
estimated steady state level, which varies with the audio used.
In the assumption that these additive resonances are audible, the question now arises
as to how can one be able to measure the loudness of these additive resonances, and
how should one be able to control the loudness. Conventional sound pressure level
measurements are impractical because the resonance levels can be easily masked
with any audio signal used.
This is where the purpose of Test II and Test III comes in. Although not so evident,
the main purpose of fabricating these two tests is to create a model to emulate
any resonance. With the rise of new car models and designs, a necessary method
of verification would be needed to test the effect of the design of the mounting
structures given to the car loudspeakers. Having to measure the resonance levels
robustly is definitely impractical, time consuming and expensive. The methodology
used to detect and estimate the resonances proved to show that it is possible to
quantify and emulate any resonance. Combining the methodology and the listening
test model can greatly enhance the ability for one to immediately estimate the
effect of ringing, based on the design of the loudspeaker and mounting. In Test II,
59
6. Discussion, Limitations and Conclusion
determining the level equivalence of the resonances can give one a range of severity
levels to know when the ringing effect becomes severe, with respect to every type of
audio used. Test III determines whether using a filter could minimize the resonance
effect, while maintaining the spectral balance of the audio.
This indicates that one can create a loudness metric to the effect of ringing. Unfor-
tunately, as mentioned in Chapter 3, this requires several tests that involve different
environments and setups. However, this can pave way to the next steps of developing
the metric.
6.2 Limitations
The methodologies used, as well as the listening test, come with many limitations.
In the CSD method, unfortunately the resolution of the 3D plot comes at the cost of
the resolution of the appodizing window. Inspite of being able to reduce the spectral
leakage caused by the unit step window, the appodizing window causes a slight loss
in power spectral density.
The CWT is very sensitive to change in scaling factor. Normally, to compute the
CWT of a signal, a scaling factor is chosen to have a value in powers of 2, and the
powers are proportional to the number of octaves needed to have adequate resolution,
both in time and frequency. The CWT calculation was implemented manually,
through MATLAB scripts, in spite of the program having the CWT functionality.
This is because the resolution is fixated, without any freedom to vary paramters,
such as the centre frequency, the bandwidth and the scaling factor. This would
hinder the visual accuracy of the frequency data points, as the scaling factor would
map the scaled wavelet for each frequency. Care is need here, in order to avoid
massive divergence. As a consequence, the scaling factor value was not chosen to
be in octaves, and thus hindered the resolution of the plot. With proper care and
better scripting on calculating the CWT, the resolution can be improved further.
The Steiglitz-McBride method to recreate the impulse response can cause instability
when creating the coefficients, depending on the length of the impulse response and
the number of samples. Having a high number of samples can overload the CPU, to
the point of crashing the system, if the RAM is overloaded. The resolution in this
case, will become a challenge, and an adequate balance would be needed to have the
best visualization and estimation of the resonances. Fortunately, since the focus is
on the low-frequency region, downsampling the impulse response helps in reducing
the coefficients needed to estimate the resonances. This would become problematic
when estimating mid and high frequency resonances. There are alternatives to
process long number coefficients, like parallel computing, which may be used to
improve the computational efficiency, and also make more stable coefficients.
The statistical box plots used in the listening tests come at the limitation of the
number of subjects that took part in the test. Some of the subjects have shown a
60
6. Discussion, Limitations and Conclusion
high degree of deviation, whom had to be excluded from the calculation, as it had
an impact on the result.
Originally, the prime focus on the usage of the methodology was on the measurement
and validation of cavity resonances on a car’s door panel. Due to the non availabil-
ity of resources and time constraints, the focus for validation of the methodology
was changed. The methodology would be more credible, if the chosen loudspeak-
ers had been from an existing loudspeaker from a car. The Genelec and Neumann
Loudspeakers have high fidelity and well dampened casings, that made the mea-
surement and estimation of resonances a challenging task. In spite of this, since
the methodology was well able to perform as it was supposed to be intended for,
this could be considered a plus factor, considering that cavity resonances may be a
bigger challenge to estimate.
6.3 Conclusion
The results from both the estimation of resonances and the listening tests show a
good agreement, giving the indication that the methodology can be used to detect
cavity resonances. The combination of the CSD, CWT and System Identification
methods can pin point the exact frequency of the resonance as well as determine its
strength through the Quality factor. In addition, the perception of these resonances
can be estimated using the models used in the listening test.
Due to the high variance in the perceptual test results, it is necessary to have
a higher number of subjects in order to have an in-depth and conclusive result.
This would greatly enhance the perceptual model, and make a better evaluation on
the audibility of the resonances. The methodology for detection and estimation of
resonances will serve well for detecting cavity and other mechanical resonances in a
car for low to mid frequencies, if speed and efficiency is a requirement.
61
6. Discussion, Limitations and Conclusion
62
Bibliography
[1] Floyd E. Toole, Sean E. Olive (1988) The modification of Timbre by Reso-
nances: Perception and Measurements. Journal of Audio Engineering Society,
vol. 32, 122 - 142
[2] Ivo Mateljan, Heinrich Weber, Ante Doric, (2007) Detection of Audible Reso-
nances, 3rd Congress of the Alps Adria Acoustics Association.
[3] Jacob Dyreby, Sylvain Choisel,(2007) Equalization of loudspeaker resonances
using second-order filters based on spatially distributed impulse response mea-
surements, 123rd Convention Audio Engineering Society, New York
[4] Shelley Uprichard, Sylvain Choisel,(2008) The Influence of Acoustic Environ-
ment on the Threshold of Audibility of Loudspeaker Resonances, 125th Con-
vention Audio Engineering Society, San Fransisco
[5] Ethan Winer, (2012) Chapter 1 - Audio Basics, The Audio Expert, 3 - 39.
[6] Ethan Winer, (2012) Chapter 3 - Hearing, Perception and Artifact Audibility,
The Audio Expert, 65 - 104.
[7] Nikos.(2019) A Living Room for the Evaluation of multiple auditory scenes,
Master Thesis, Chalmers University of Technology
[8] John D.Bunton, Richard H. Small (1982) Cumulative Spectra, Tone Bursts and
Apodization, Journal of Audio Engineering Society, vol.30, No.6
[9] S.J. Loudritis, (2005) Decomposition of Impulse Responses Using Complex
Wavelets, Journal of Audio Engineering Society, vol. 53, No.9, 796 - 811
[10] D.B Keele, (1999) Time-Frequency Display of Electroacoustic Data using Cycle-
Octave Wavelet Transforms, Audio of Engineering Society 99th Convention,
New York
[11] K.Steiglitz, L.E. McBride,(1965) A Technique for Identification of Linear Sys-
tems, IEEE Trans. Automation Control, vol. AC-10, 461-464
63
Bibliography
64
A
Appendix
Estimation of Resonances
The following plots represent the estimated resonances from all the three methodolo-
gies used: Cummulative Sepctral Decay, Continuous Wavelet Transform and System
Identification, for the loudspeakers Genelec 8020B and Genelec 7050B.
Figure A.1: Cummulative Spectral Decay Plot of Genelec 8020B and Genelec
7050B
I
A. Appendix
II
A. Appendix
(a) Genelec 8020B - 2D Full Slice View (b) Genelec 8020B - 2D Half Slice View
(c) Genelec 7050B - 2D Full Slice View (d) Genelec 7050B - 2D Half Slice View
Figure A.3: Continuous Wavelet Transform - 2D Front View and location of Res-
onances of Genelec 8020B and Genelec 7050B
III
A. Appendix
(a) Genelec 8020B - Original and Recon- (b) Genelec 8020B - Original and Recon-
structed Impulse Response structed Frequency Response
(c) Genelec 7050B - Original and Recon- (d) Genelec 7050B - Original and Recon-
structed Impulse Response structed Frequency Response
IV
A. Appendix
(a) Genelec 8020B - Original and Recon- (b) Genelec 8020B - Original and Recon-
structed sliced Impulse Response structed sliced Frequency Response
(c) Genelec 7050B - Original and Recon- (d) Genelec 7050B - Original and Recon-
structed sliced Impulse Response structed sliced Frequency Response