0% found this document useful (0 votes)
31 views49 pages

Acoustic Phonetics 2017-18

The document presents a course on Acoustic Phonetics, focusing on the acoustic properties of speech, formants, and their relationship to vowel sounds. It discusses the fundamental frequency and its role in distinguishing voiced sounds, as well as the historical context of formant theory. Additionally, it covers acoustic analysis techniques, including spectrograms, and the complexities of consonant acoustics.

Uploaded by

malam6047
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views49 pages

Acoustic Phonetics 2017-18

The document presents a course on Acoustic Phonetics, focusing on the acoustic properties of speech, formants, and their relationship to vowel sounds. It discusses the fundamental frequency and its role in distinguishing voiced sounds, as well as the historical context of formant theory. Additionally, it covers acoustic analysis techniques, including spectrograms, and the complexities of consonant acoustics.

Uploaded by

malam6047
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Acoustic Phonetics

Anna Sfakianaki
Phonetician/Linguist (PhD)
Laboratory Teaching Staff,
Staff CSD UoC

CS-578
CS 578 Digital Speech Signal Processing
Spring Term 2017-18
University of Crete
Anna Sfakianaki, CS-578, 2017-18, University of Crete

A COURSE IN PHONETICS

 Ladefoged & Johnson (2011)


 Material from this book was used in the slides.
Anna Sfakianaki, CS-578, 2017-18, University of Crete

QUESTIONS

 Which are the acoustic properties of speech?


 How do we “read”
read spectrograms?

https://home.cc.umanitoba.ca/
Anna Sfakianaki, CS-578, 2017-18, University of Crete

1 Formants
1.
 Sounds differ from each other in three ways
 pitch
 loudness/intensity
 quality
 A vowel sound contains a number of different pitches
simultaneously
 pitch at which it was spoken
 various overtone pitches that give it its distinctive quality
 Vowel Qualityy Overtone Structure
 Overtones = Formants
 The lowest 3 formants distinguish
g vowels from each other
 F1 F2 F3
Anna Sfakianaki, CS-578, 2017-18, University of Crete

1 1 How do formants arise?


1.1
 The air in the vocal tract
acts like the air in a bottle.
 Tap on a bottle.
 Open
O your mouth,
th make k a
glottal stop and flick a finger
against your neck just to the
side
id and d below
b l theth jaw.
j
What do you observe?

Articulate [i, e, a, o, u]
without producing sound.
What do you observe?

Pitch
Pit h off F1 ggoing
i g up ffor [i,
[i e]]
and down for [a, o, u]
Anna Sfakianaki, CS-578, 2017-18, University of Crete

TUBE MODELS awake

 Formants that 
characterize different
vowels are the result of hood
the different shapes of
the vocal tract.
tract 
 Any body of air will vibrate hard
in a way that depends on
its size and shape.
y
 Blow across the top of
tu ((FR))
 an empty bottle
 partially filled bottle
What do you observe? i
heed

æ
had
Adapted from Fant (1960)
Anna Sfakianaki, CS-578, 2017-18, University of Crete

 The vocal tract has a complex shape  contains several


bodies of air with different volumes  different overtones
2) These pulses act like sharp taps
on the air in the vocal tract
tract.
3) The resonating cavities
are set into vibration,
enhancing g or damping g
different frequencies.

1) V
Vocall ffolds
ld open and
d close
l
sending out pulses of acoustic energy at
different pitches and amplitudes.
 One pulse in the vocal In vowels, we actually hear
tract produces three the sum of these waveforms
diff
different waveforms.
f added together
together.

The air in the back of the


vocal tract produces 1
a low-frequency waveform.

Th air
The i iin ffrontt off the
th tongue,
t 2
a smaller cavity, produces a
higher-frequency waveform.

3
Anna Sfakianaki, CS-578, 2017-18, University of Crete

Sum of waveforms
Anna Sfakianaki, CS-578, 2017-18, University of Crete

1 2 Fundamental Frequency (F0)


1.2
 Fundamental frequency: number of vocal fold vibrations per
second.
 Vocal folds must be vibrating in order to have F0.
 It corresponds to variations in pitch (speech melody or
intonation).
 Vocal folds may vibrate faster or slower giving higher or lower
pitch to the sound, BUT the formants of the sound remain the
same as longg as vocal tract shape
p remains unchanged.g

 Male voice: 120 Hz


 Female voice: 220 Hz
 Child voice: 260-280 Hz

 All voiced sounds are distinguishable due to their formants.


Anna Sfakianaki, CS-578, 2017-18, University of Crete

SOME HISTORY…
HISTORY
 The general theory of formants was stated by the great German
scientist Hermann von Helmholtz (1821-1894) about 150 years ago.
 His scientific work covers the disciplines
off Physiology,
Ph i l g Psychology
P h l g
(theories of vision), Physics
((energy,
gy, electrodynamics,
y ,
thermodynamics), philosophy and
aesthetics.
 He was Hertz’s supervisor and during his
studies in 1879 he suggested that
Hertz'ss doctoral dissertation
Hertz
be on testing Maxwell's theory of
electromagnetism, resulting in Hertz’s
di
discovery off electromagnetic
l t g ti waves.
SOME HISTORY…
HISTORY
 Helmholtz
H l h lt invented
i t d the
th Helmholtz
H l h lt
resonator to identify the various
frequencies or pitches of the pure
sine wave components of complex
sounds containing multiple tones. Helmholtz Resonator
 The Helmholtz resonator inspired Alexander Graham Bell to invent the
telephone based on the harmonic telegraph principle.

 “A given vowel is merely the rapid repetition of its peculiar note”


Robert Willis (English physicist)

 A vowel is the rapid repetition (corresponding to the vibrations of the vocal


folds) of its peculiar two or three notes (corresponding to its formants).
 All voiced sounds are distinguishable from one another by their formant
frequencies.
Anna Sfakianaki, CS-578, 2017-18, University of Crete

1 3 Speech synthesis (demo)


1.3
This speech was synthesized in 1971 by Peter Ladefoged on a synthesizer at UCLA.

http://www.phonetics.ucla.edu/course/cha
pter8/speechbird/speechbird.html
Anna Sfakianaki, CS-578, 2017-18, University of Crete

2 Acoustic Analysis
2.
 It is possible to
analyze sounds so
that we can measure
the actual
frequencies of the
formants and
represent them
graphically.
 Average of F1, F2
and F3 frequencies
i eight
in i h A
American
i
English vowels.
heed, hid, head, had,
hod, hawed, hood, who’d
Anna Sfakianaki, CS-578, 2017-18, University of Crete

2 1 Spectrogram
2.1
 Computer programs can analyze sounds and show their components.
The display
displa produced
prod ced is called a spectrogram.
spectrogram
 In spectrograms
 horizontal axis: time

 vertical axis: frequency

 degree of darkness or colour:


formants

Spectrograms
S t
Dark bands for concentrations
of energy at particular frequencies
showing the source and filter
characteristics of speech
Anna Sfakianaki, CS-578, 2017-18, University of Crete

2 2 Computer
2.2 C t P Programs
g ffor acoustic
ti analysis
l i
(free access)
 P
Praat
http://www.fon.hum.uva.nl/praat/
U i
University
it off A
Amsterdam
t d

 Wavesurfer
http://www.speech.kth.se/wavesurfer/
KTH (Royal Institute of
Technology, Stockholm)
Anna Sfakianaki, CS-578, 2017-18, University of Crete

TRADITIONAL VOWEL CHART


front/back tongue position

tongue
e height
Chart based on X-ray data

Ladefoged, Vowels & Consonants, (2001:115)


Anna Sfakianaki, CS-578, 2017-18, University of Crete

2 3 Spectrograms of words (American English)


2.3
heed hid
heed, hid, head
head, had
had, hod
hod, hawed
hawed, hood
hood, who’d
 The vertical scale
goes up to 4000 Hz
which is sufficient to
show the component
frequencies of
vowels.
 The exact position of
the higher formants
varies a great deal
from speaker to
speaker. They are
indicative of a
person’s voice
quality.
 Observe the effect of
the consonant at the
endd off the
h vowel.l
Anna Sfakianaki, CS-578, 2017-18, University of Crete

2 4 Formants in relation to traditional articulatory descriptions


2.4

 F1
 increases from
[i] to [æ] –as
vowel height
decreases.
 decreases
from [] to [u]
–as vowel
height
increases
increases.
 Hence F1 is
inversely related
to vowel height.
Anna Sfakianaki, CS-578, 2017-18, University of Crete

 F2
 higher for
front vowels
 lower for back
vowels
 affected by lip
rounding
decrease of
F2 & F3
Anna Sfakianaki, CS-578, 2017-18, University of Crete

2 5 F1 by F2 plot
2.5
F2

 Zero frequency is placed


at the top right corner
because formants are
inversely related to
traditional articulatory
parameters.
 F2 scale not as expanded
as F1, due to less
prominent energy
(F1: 80% of vowel energy).

F1
Anna Sfakianaki, CS-578, 2017-18, University of Crete

COMPARISON
F2

F1
 “Traditional vowel diagrams express acoustic facts in terms of physiological fantasies.” Oscar
Russell (1930s)
 Vowel height  F1,
F1 not actually tongue height
backness
 Front – back dimension +
lip rounding
 Degree of backness F1-F2 difference
 The closer together F1 and F2, the more “back” a vowel sounds.
Anna Sfakianaki, CS-578, 2017-18, University of Crete

Exercise: Make your own F1 by F2 plot


F2

F1
Anna Sfakianaki, CS-578, 2017-18, University of Crete

3 Acoustics of Consonants
3.
 The acoustic structure of consonants is usually more complicated
than that of vowels.
o els
 In many cases, there is no distinguishable feature during the
consonant articulation itself, e.g.
g silence part of [p, t, k].
 We have to look for the identity of the consonant at the beginning or
the ending of the vowel beside it.
bab dad gag
Anna Sfakianaki, CS-578, 2017-18, University of Crete

3 1 Stops
3.1
 Each of the stop sounds conveys its
quality by its effect on the adjacent
vowel.
 The formants of [æ] correspond to the
particular shape of the vocal tract.
tract
 During the production of [bæ] the
formants correspond to the particular
shape
h th
thatt occurs the
th momentt theth lips
li
come apart.
 Closure of the lips causes a lowering of
all formants.
f
 The syllable [bæb] will begin with
formants in a lower position, then they
will rapidly rise to the positions of [æ],
and finally descend again as the lip
closure is formed.
Anna Sfakianaki, CS-578, 2017-18, University of Crete

Anticipatory Coarticulation
 For the production of e.g. [bib] or [bab],
the tongue will be in position for the
vowel even when the lips are closed at
the beginning
g g of the word.
 This happens because the part of the
tongue not involved in the formation of
the consonant closure is already in
position for the following vowel.
 The formants at the moment of
consonantal release will vary according
to vowel.
 The apparent point of origin of the
formant for each place of articulation is
called the locus of that place of
articulation.
articulation
 The locus depends on adjacent vowels.
Anna Sfakianaki, CS-578, 2017-18, University of Crete

3 2 Formant transitions
3.2
 Faint voicing striations near the baseline for each of the stops
[b d,
[b, d g]] (voice
( i b bar).
)
 In all three words, F1 rises from a low position due to
consonant closure,
closure hence it does not distinguish one place of
articulation from another.
 What distinguishes the three stops are the onsets and offsets
of F2 and F3.
[bd] [dd] []
Anna Sfakianaki, CS-578, 2017-18, University of Crete

3 2 Formant transitions
3.2
 [bd]
 F2 & F3 start at a lower frequency than in [dd].
[dd]
 F2 & F3 are noticeably rising from a low locus.
 [dd]
 F2 is fairly steady at the beginning.
 F3 drops a little.
 [ ]
[]
 Characteristic coming together of F2 & F3  velar pinch
[bd] [dd] []
Anna Sfakianaki, CS-578, 2017-18, University of Crete

3 3 Voiceless stops
3.3
 The release of aspirated stops is marked by a sudden sharp spike lean
vertical line.
line
 Period of aspiration noise absence of energy in F1 & no vertical striations
 Frequency & intensity
 Whisper [t, t, t, k, k, k, p, p, p]. What do you observe?
 [t] > [k] > [p]
 Intensity of [p] burst is sometimes so low that there is no evidence of it on a
spectrogram.
[pm] [tn] [k]
Anna Sfakianaki, CS-578, 2017-18, University of Crete

3 3 Voiceless stops
3.3
 Formant transitions also present in aspiration noise.
 [pm] : F2 & F3 rising into the vowel.
 [tn] : F2 steady, F3 dropping and then rising.
 [k] : characteristic velar pinch

[pm] [tn] [k]


Anna Sfakianaki, CS-578, 2017-18, University of Crete

3 4 Nasals
3.4
 A clear mark of a nasal (and a lateral) is an abrupt change in the spectrogram at the
time of the formation of the articulatory closure.
 A nasall h
has a fformantt structure
t t similar
i il tto th
thatt off a vowel.l Differences:
Diff
 Bands are fainter.
 Bands located in particular frequency locations depending on characteristic resonances of
the nasal cavities.
cavities
 F1: around 250 Hz
 Large region above F1 with no energy.
 F2 etc: varying according to speaker (here around 2000 Hz).
Hz)
 Place cues sometimes not very clear.
[pm] [tn] [k]
Anna Sfakianaki, CS-578, 2017-18, University of Crete

3 5 Voiceless fricatives
3.5
 Highest frequencies in speech occur over fricatives.
 Frequency scale increased to 8000 Hz.Hz
 Diphthong [a] : F1 & F2 start close together for low central [a] and
move apart for high front [].
 Fricatives: Random energy distributed over a wide range of
frequencies.
fie thigh sigh shy
Anna Sfakianaki, CS-578, 2017-18, University of Crete

[f ]
Voiceless fricatives [f,
 Same pattern in [f] and [].
 Difference: Movement of F2 into following vowel.
 Very little movement in [f].
 In [], F2 starts around 1200 Hz and moves down.
 Often confused in noisyy settings.
g
 Fallen together in some accents of English, such as London Cockney
 fin and thin both pronounced with a [f].

fie thigh i h
sigh shy
h
Anna Sfakianaki, CS-578, 2017-18, University of Crete

[s ]
Voiceless fricatives [s,
 The noise in [s] is centered at a high frequency, 5000 – 6000 Hz.
 In [] itt iss lower,
o e , extending
e te d g down
do to about 2500500 Hz.
 Both [s, ] have larger acoustic energy and produce darker patterns than [f, ]
 Both [s, ] are marked with distinctive formant transitions.
 Th llocus off F2 transition
The t iti iincreases th
throughout
h t th
the words
d
 [f] < [] < [s] < [] (see arrows in fig.)
 Before [] F2 of [a] is in a position comparable to its location in [i].
fie thigh sigh shy
Anna Sfakianaki, CS-578, 2017-18, University of Crete

v ]
3 6 Voiced fricatives [v,
3.6
 Voiced fricatives [v, , z, ] have patterns similar to their voiceless
counterparts [f, , s, ].
 Voiced fricatives also have vertical striations indicative of voicing.
 Vertical striations due to voicing are apparent throughout [v] and [].
 The fricative component of [v] is very faint.
faint
 F2 higher around [] than [v].

ever whether fizzer pleasure


Anna Sfakianaki, CS-578, 2017-18, University of Crete

[z ]
Voiced fricatives [z,
 Fricative energy in higher frequencies very apparent in [z, ].
 Voice bar
 faint in [z]
  –vertical striations due to voicing in 6-8 kHz.
hard to see in []
 F2 transition into [] is
 level from[z]
 d
descending
di from
f []
[ ]
ever whether fizzer pleasure
Anna Sfakianaki, CS-578, 2017-18, University of Crete

3 7 Lateral and central approximants


3.7
 Voiced approximants have formants not unlike those of vowels.
 The initial [l] has formants with center frequencies of approx.
approx 250,
250 1100 & 2400
Hz, which change abruptly in intensity at the beginning of the vowel.
 A marked change in formant pattern is characteristic of voiced nasals and laterals.
 A final lateral may have little of no central contact
contact, making it not really a lateral but
a back unrounded vowel.
 A formant around 1100 or 1200 Hz is typical of most initial laterals for most
speakers.
speakers

led red wed yell


Anna Sfakianaki, CS-578, 2017-18, University of Crete

3 7 Lateral and central approximants


3.7
 The most obvious feature of approximant [] is the low frequency of F2 and F3.
 F3 begins at 1600 Hz!
 There is great similarity between red and wed. Young children have difficulty trying
to distinguish them.
 The approximant [w] also starts with a low position for all three formants.
 F2 of [w] has the sharpest rise, as if it were a very short [u].
 The movements of formants for [j] are like those of a very short [i].
 This is whyy [[w]] and [j] are appropriately
pp p y called semivowels,, that is,, semi versions of
vowels [u] and [i] respectively.
led red wed yell
Anna Sfakianaki, CS-578, 2017-18, University of Crete
Anna Sfakianaki, CS-578, 2017-18, University of Crete

INTERPRETING SPECTROGRAMS (I)


 In connected speech, many of the sounds are more difficult to
distinguish.
 Transcribe the segments in the following phrase “She came back
and started again.” (American
( English))

 i k e m b æ k  n s t  t  d   /æ n
Anna Sfakianaki, CS-578, 2017-18, University of Crete

TYPES OF SPECTROGRAMS

wide-band
spectrograms
t

narrow-band
spectrograms
t

“Is Pat sad or mad?”


Anna Sfakianaki, CS-578, 2017-18, University of Crete

TYPES OF SPECTROGRAMS
Wide-band spectrograms
 Very accurate in the time dimension
 They show each vibration of the vocal folds as a separate vertical line.
 Th iindicate
They di t ththe precise
i momentt off a stop
t b burstt with
ith a vertical
ti l spike.
ik
 Less accurate in the frequency dimension
 There are usually several component frequencies present in a single
formant, all of them lumped together in one wide band on the
spectrogram.

Narrow-band spectrograms
 More accurate in the frequency dimension (at the expense of
accuracy in the time dimension).
 The spikes of stop releases are smeared in the time dimension in the
narrow-band spectrogram.
 The frequencies that compose each formant are visible.
Anna Sfakianaki, CS-578, 2017-18, University of Crete

FEMALE VOICE
 Women’s voices usually have a higher pitch.
 The higher the F0 the more difficult it is to locate formants,
formants because the
harmonics interfere with the display of formants.
Greek phrase uttered by a male and a female Greek adult.
Λέ «παππού»
Λέγε ύ πάλι.
άλ (Say “ df th ” again)
(S “grandfather” i )

male

female
Anna Sfakianaki, CS-578, 2017-18, University of Crete

7 INDIVIDUAL DIFFERENCES
7.
 It is important to know what sort of
differences exist between different
speakers.
1. When trying to measure features that are
linguistically significant, one must know how
to disco
discount
nt ppurely
rel individual
indi id al feat
features.
res
2. When trying to find out whether a speaker has
speech problems.
3. For valid speaker identification in forensic
situations.

 Individual variation is readily apparent when


studying spectrograms  relative quality
Anna Sfakianaki, CS-578, 2017-18, University of Crete

7 INDIVIDUAL DIFFERENCES
7.
 Same phonetic
S h ti
quality
 Similar relative
positions
 Different
absolute
values

Vowels pronounced by
2 speakers of Californian
English.
Anna Sfakianaki, CS-578, 2017-18, University of Crete

7 INDIVIDUAL DIFFERENCES
7.
 No simple technique to average out individual characteristics
so that a formant plot shows only the phonetic qualities of
vowels.
 F4 indicator of individual’s head size
 Express values of other formants as percentages of the mean F4.
 F4 values are not usually reported.
 Phoneticians do not really know how to compare acoustic
d t on the
data th soundsd off one individual
i di id l with
ith th
those off another.
th
 We cannot write a computer program that will accept any
individual’s
individual s vowels as input and then output a narrow
phonetic transcription.
Anna Sfakianaki, CS-578, 2017-18, University of Crete

Read
Read…

 Ladefoged & Johnson “A course in phonetics”, chapter 8


 Ladefoged “Vowels & Consonants”
Consonants”, chapter 7
 Lieberman & Blumstein “Speech physiology, speech
perception & acoustic phonetics”, chapter 5, pp. 51-73
 Clark & Yallop “An Introduction to Phonetics &
Phonology”, chapter 7
 Ladefoged,
g , “Elements of Acoustic Phonetics”,, chapter:
p
“The Production of Speech”
Anna Sfakianaki, CS-578, 2017-18, University of Crete

…&
& visit
 https://corpus.linguistics.berkeley.edu/acip/
htt // li i ti b k l d / i /
Material for chapter 8 from UC Berkley Linguistics, “A course in
phonetics” including online exercises
 h
https://soundphysics.ius.edu/?page_id=812
// d h i i d /? id 812
An Interactive eBook on the physics of sound (Indiana University
Southeast)
 http://zonalandeducation.com/mstm/physics/waves/waveAdder/Wave
Adder1.html
Wave Adder
 http://www.linguistics.ucla.edu/people/hayes/103/SpectrogramReadi
ng/Index.htm
Spectrogram reading practice (by Bruce Hayes, UCLA)
 http://home.cc.umanitoba.ca/~robh/howto.html
Monthly Mystery Spectrogram Webzone -Rob Hagiwara's professional
web-space
 http://www.youtube.com/watch?v=Gg4IHbiITd0
Introduction to spectrogram analysis (FloridaLinguistics.com)
Anna Sfakianaki, CS-578, 2017-18, University of Crete

PHONETICS ASSIGNMENT

 Do the waveform and spectrogram reading


exercises on Phonetics Assignment_CS-
578_2017-18.pdf (on course website)
 Submit to asfakianaki@csd.uoc.gr
asfakianaki@csd uoc gr
by March 12th
 10% of projects’ grade

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy