0% found this document useful (0 votes)
146 views5 pages

Vibrato Extraction and Parameterization PDF

This document discusses vibrato extraction and parameterization within the Spectral Modeling Synthesis (SMS) framework. It presents two approaches: 1) A frequency-domain strategy that detects vibrato by looking for prominent peaks in the spectral envelope and extracts vibrato by removing those peaks. 2) A time-domain strategy that filters the fundamental frequency track to remove low-frequency modulation components like vibrato. Both aim to isolate vibrato for further study and to allow more flexible sound transformation and re-synthesis. Key challenges include the non-perfect periodicity of vibrato and interactions between formants and fundamental frequency.

Uploaded by

Greyce Ornelas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
146 views5 pages

Vibrato Extraction and Parameterization PDF

This document discusses vibrato extraction and parameterization within the Spectral Modeling Synthesis (SMS) framework. It presents two approaches: 1) A frequency-domain strategy that detects vibrato by looking for prominent peaks in the spectral envelope and extracts vibrato by removing those peaks. 2) A time-domain strategy that filters the fundamental frequency track to remove low-frequency modulation components like vibrato. Both aim to isolate vibrato for further study and to allow more flexible sound transformation and re-synthesis. Key challenges include the non-perfect periodicity of vibrato and interactions between formants and fundamental frequency.

Uploaded by

Greyce Ornelas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Vibrato Extraction and Parameterization in the

Spectral Modeling Synthesis Framework


Perfecto Herrera, Jordi Bonada
Audiovisual Institute, Pompeu Fabra University
Rambla 31, 08002 Barcelona, Spain
{pherrera, jboni}@iua.upf.es http://www.iua.upf.es

[Published in the Proceedings of the Digital Audio Effects Workshop (DAFX98), 1998]
Abstract

Periodic or quasi-periodic low-frequency components (i.e. vibrato and tremolo) are present in steady-
state portions of sustained instrumental sounds. If we are interested both in studying its expressive
meaning, or in building a hierarchical multi-level representation of sound in order to manipulate it
and transform it with musical purposes those components should be isolated and separated from the
amplitude and frequency envelopes. Within the SMS analysis framework it is now feasible to extract
high level time-evolving attributes starting from basic analysis data. In the case of frequency
envelopes we can apply STFTs to them, then check if there is a prominent peak in the
vibrato/tremolo range and, if it is true, we can smooth it away in the frequency domain; finally, we
can apply an IFFT to each frame in order to re-construct an envelope that has been cleaned of those
quasi-periodic low-frequency components. Two important problems nevertheless have to be tackled,
and ways of overcoming them will be discussed in this paper: first, the periodicity of vibrato and
tremolo, that is quite exact only when the performers are professional musicians; second: the
interactions between formants and fundamental frequency trajectories, that blur the real tremolo
component and difficult its analysis.
manipulate it for musical, engineering, or acoustical
1 Introduction purposes.

As Desain and Hoenig [6] noted, the shape of


Long sustained notes become boring and musically modulated signals is quite complex to be
uninteresting if their steady states have a strictly extracted without a solid model of analysis. One of
constant fundamental frequency. Because of that and those models could be the Spectral Modeling
other musical reasons to be found in music Synthesis (SMS) developed by Serra [7]. Recent
performance treatises, good performers invest a lot software developments inside that framework ([8],
of time to developing proficiency in techniques for [9], [10]) have made possible to segment a
the continuous modulation of frequency and/or continuous signal such as a musical phrase or a long
amplitude. This kind of modulations are respectively note into different regions that have different basic
called vibrato and tremolo and its feasibility for features or parameters both static and evolving along
every instrument depends on its sound generation time (i.e. mean fundamental frequency, mean
mechanisms (for example, string instruments favor amplitude, amplitude tendency, noise profile,
vibrato and otherwise reeds favor tremolo). The amplitude and fundamental frequency envelope,
scientific study of vibrato can be traced backwards to etc.); once those parameters have been extracted, it is
the work by Seashore [1] who, notwithstanding his possible to manipulate them separately in order to
technological limitations, yielded a rough but valid achieve delicate sound transformations during the
characterization of that phenomenon. More recent re-synthesis stage. Consequently there are certain
studies ([2], [3], [4], [5]) have been developed with situations in which it could be useful to separate the
the help of modern analysis techniques and devices contribution of modulation processes over a stable set
but we can conclude that, although we understand of parameters in order to achieve a greater flexibility
the basic facts about vibrato in different musical and better quality of synthesis and transformation.
instruments, more research is needed on vibrato as a
physical phenomenon (not to mention as a musical The vibrato problem can be decomposed into three
resource, indeed), specially on its temporal evolution subproblems to be tackled: 1) Identification (or
and its way of change between consecutive notes. Detection and Parameterization); 2) Extraction; and
Anyway, if it is parametrically and/or procedurally 3) Re-synthesis. Considering that our target system is
possible to describe vibrato, it should be possible to an off-line (non real-time) one in this paper we will
focus on the first two points (see [11] for a synthesis global mean and subtracting it from every
oriented paper). fundamental frequency value in the original track.
This smoothed and 0-centered track is then
windowed. A window size of 128 points or 0.37
2 Frequency domain strategy seconds (more than 2 times the lowest period that is
expected to be found for a vibrato) with a 50% of
window overlap has been proved suitable for our
From the frequency-domain point of view, vibrato purposes. Different kind of windows (Blackman-
detection in an off-line system assumes that a steady Harris, Kaiser, Hamming) have been tested yielding
state has been correctly delimitated and no significative differences. For each window its FFT
parameterized in a previous stage of the analysis; is then computed and spectral peaks are calculated
that is to say that we have obtained a fundamental with parabolic interpolation. In case the analyzed
frequency track whose frequency is constrained in a region has vibrato we get a prominent peak around
range of less than a whole tone around an ideal 5/6 Hz (in fact it is the most prominent peak
mean (although the usual vibrato depth reported by detected). As expected, such a clear and stable peak
different studies carried away with professional is not present when the region has no vibrato. The
musicians is lesser than half a tone, it should be vibrato detection process concludes with the
noted that not so well trained performers generate extraction and storage of the rate and the depth of
larger excursions from the nominal fundamental vibrato as high-level parameters of the analyzed
frequency). The fundamental frequency track frame (in fact, they will be later pooled with the
obtained in SMS analysis is an envelope of data values for all the envelope frames and global mean
representing Hertzs along time, and has a number of values will be extracted for a whole region).
values equal to the frame rate of analysis (typically
we use 345 points per second so that each one of our At this point, the vibrato extraction proceeds.
envelope frames integrates information for such a Different algorithms could be implemented, as for
temporal lapse); thus, that envelope will be the example a similar one to the SMS low-level analysis
starting data for the process (for details see [9]). (i.e. by additive synthesis and subtraction of the
harmonic part), but it is more economic and easy to
crop the prominent peak (and sometimes the
second one) of every envelope. Then the IFFT of the
altered spectrum is computed so that we get a signal
without the modulation components, that is, more
stationary than the original one.

Figure 1. Two fundamental frequency tracks:


a) from a steady state portion of sound without
vibrato; b) from a steady state portion of a
sound with vibrato.

The vibrato detection proceeds as follows: the


discrete fundamental frequency track is first
transformed into a 0-centered track by computing the
MATLAB, and finally we opted for a filter defined
Figure 2. Comparison of a fundamental along the following parameters: (passband=.25
frequency track of a steady state portion of radians (approx. 21Hz.), stopband=.11 radians
sound: a) with its original vibrato; b) the same (approx. 9 Hz), passband ripple = 3 dB, stopband
fundamental track after vibrato extraction in attenuation = 40 dB).
the frequency-domain.
3 Time-domain strategy Although this strategy does not allow to characterize
vibrato at the filtering stage, the blackboard-like
In the time-domain there are several robust model implemented in the SMS analysis framework
techniques for fundamental frequency estimation facilitates that vibrato parameters can be extracted
[12] that could be suitable for vibrato extraction. later on by picking the relevant information from
Besides that, time-domain strategies offer important
advantages such as the option of using shorter
windows. In such a scenario, we could find practical
situations that only demand to get rid of vibrato, but
not necessarily to characterize it in full detail. Given
that constraint, a filtering strategy seems quite
suitable to be approached (on the other hand, see
[13] for a time-domain -although not real-time-
complete solution without using filters).

In order to design an appropriate filtering algorithm


for this task we have to take into account the fact that
the value to be given to the filter at every point of
time should be centered around a conventional 0 (in
this case the mean fundamental frequency). For an
off-line system such a central value could be the other concurrent analysis modules (of course there is
mean frequency of the steady state, but in a real-time an arguable time-resolution payoff).
system that center must be approximated from the
past behaviour of the fundamental frequency track. If Figure 3. Comparison of an original
much previous information is used in computing fundamental track of a steady state portion of
such an approximation then we will lose the sound: a) with original vibrato; b) after time-
temporal trends for the pitch, but if only very recent domain vibrato extraction.
values are taken into account then we will lose the
low frequency modulations that we are addressing to 4 Interaction between vibrato and
(in fact this is something like a paradox, because spectral shape
losing low frequency modulations is what we are
trying to achieve!). An acceptable solution should be If we examine the amplitude track of a region with
a number of points spanning more than a common vibrato it seems that there also are cyclic
vibrato cycle, and at least one of the shortest vibrato modulations around an ideal mean value.
cycle we could find. After some trial and error we Although if could be tempting to consider them as
settled into a filter buffer that takes into account the examples of a concomitant tremolo and therefore
preceding 80 envelope points (about one vibrato
cycle of 4.5 Hz) and does not blur the mid-term
variations of the fundamental. If the system does not
yet have 80 data points it uses the mean of the
available points. We feel, nevertheless, that this
mechanism should be exhaustively refined in order
to obtain better results as we can see from Figure 3.

After we get the discussed mean value, we can apply


a filter to the incoming data. Because both the
vibrato rate and depth will be constrained, we have
implemented a 6-order Butterworth high-pass filter
that effectively eliminates frequencies lower than 10
Hz from the fundamental frequency track. The
selection of the filter was done with the help of
to proceed with that track as we did with the Sound examples related to this paper can be found
frequency, we should be warned that superficially at:
similar expressive resources as vibrato and tremolo http://www.iua.upf.es/~perfe/papers/dafx98poster-
could have different musical meanings and uses, and soundexamples.html.
do not need to be associated. It should also be noted
that (at least in human singing) amplitude variations References
follow a pattern not as regular as frequency does. In
fact the main factor for the observed variations in [1] C. E. Seashore. Psychology of Music. New
amplitude are, other than a tremolo process, the York: McGraw-Hill, 1938. (Reprint: Dover,
interaction between the vibrato process and the New York, 1967).
resonances of the vocal tract [14], [15]. Therefore,
our strategy for eliminating those amplitude [2] J. Sundberg. Vibrato and vowel
modulations goes as follows: in the frequency- identification. Arch. Acoust. 2, 257-266,
domain case, an spectral envelope for the steady 1977.
region is computed; then, we proceed by re- [3] E. Prame. Measurements of the vibrato rate of
calculating the right amplitude value for every J. Acoust. Soc. Am. 96 (4), pp.
track (or partial) frame by frame. By right 1979-1984, 1994.
amplitude we mean the amplitude that the track
should have, considering the trajectory correction [4] H. Honing. The vibrato problem, comparing
induced by the vibrato-suppression procedure (for Computer Music Journal, 19
example, lets suppose that the original fundamental (3), 1995.
frequency track entered a resonance region; after
vibrato suppression its amplitude would be still [5] P. Desain and H. Honing. Towards
reflecting its presence in such a resonance region, algorithmic descriptions of continuos
but in fact the track is not there anymore, so we will modulations of musical parameters.
interpolate from the spectral envelope- the Proceedings of the ICMC, 1995.
amplitude corresponding to the current spectral
location for that track). [6] P. Desain and H. Honing. Modeling continuos
modulations of music performance.
On the other hand, in the time-domain case we are Proceedings of the ICMC, 1996.
just starting to implement an incremental-resolution
spectral envelope extracting algorithm much in the [7] X. Serra. A System for Sound
vein of the spectral tracings used by [11], and similar Analysis/Transformation/Synthesis based on a
to the one apparently used by humans [16]. Such an Deterministic plus Stochastic Decomposition.
algorithm, whereby the correction of the amplitudes Ph.D. Dissertation, Stanford University, 1989.
is done frame by frame (as explained before),
computes a spectral envelope that gets increasing [8] X. Serra and others. Integrating
resolution as we get more frames from the basic Complementary Spectral Models in the Design
analysis. of a Musical Synthesizer. Proceedings of the
ICMC, 1997.
5 Conclusions
[9] X. Serra and J. Bonada. Sound
In this paper we have presented a two-fold approach Transformations Based on the SMS High Level
for managing vibrato inside an specific Attributes. Proceedings of the Digital Audio
analysis/synthesis framework like SMS. Although Effects Workshop (DAFX98), 1998.
the higher level attributes extracted in the analysis
process allow both the satisfactory characterization [10] P. Cano. Fundamental Frequency Estimation
of vibrato and also its removal from a steady state in the SMS Analysis. Proceedings of the
portion of a sound in the frequency domain, there Digital Audio Effects Workshop (DAFX98),
will be practical situations in which only removal 1998
will be mandatory and then we can apply a simpler
time-domain strategy. Nonetheless more research is [11] R. Maher and J. Beauchamp. An investigation
needed, and it shall be pursued for us, in order to of vocal vibrato for synthesis. Applied
refine the current algorithms, and, afterwards, Acoustics, 30, pp. 219-245, 1990.
achieve a flexible and acceptable synthesis of vibrato
notes. [12] W. Hess. Pitch determination of speech signals.
Berlin: Springer-Verlag, 1983.
[13] S. Rossignol and others. Feature Extraction
and Temporal Segmentation of Acoustic
Signals. Proceedings of the ICMC, 1998.

[14] Y. Horii. Acoustic analysis of vocal vibrato: a


theoretical interpretation of data. J. of Voice, 3
(1). 36-43. 1989.

[15] M. Mellody and G. H. Wakefield. Modal


distribution analysis of vibrato in musical
signals. Proceedings of SPIE Conf., 1998.

[16] J. H. Ryalls and P. Lieberman. Fundamental


frequency and vowel perception. J. Acoust.
Soc. Am. 72 (5). 1631-1634, 1982.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy