0% found this document useful (0 votes)
137 views28 pages

Experiments in The Perception of Stress by D.B. Fry (1958)

Uploaded by

Mars
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
137 views28 pages

Experiments in The Perception of Stress by D.B. Fry (1958)

Uploaded by

Mars
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

126

EXPERIMENTS IN THE PERCEPTION OF STRESS


D. B. FRY
(University CoZlege, London)

Differences of stress are perceived by the listener as variations in a complex pattern


bounded by four psychological dimensions : length, loudness, pitch and quality.
The physical correlates of these perceptual factors are the duration, intensity,
fundamental frequency and formant structure of the speech sound waves. Experiments
have been made in order to measure the &ect on stress judgments of changes in three
of these physical dimensions, duration, intensity and fundamental frequency. The
experimental method was to synthesize speech stimuli in which these quantities could
be controlled and varied over a considerable range and to use this material to construct
iistening tests which were carried out by large groups of subjects.
English word-pairs of the type subject, object, digest, formed the language material
for the tests; in the first experiment variations in duration were combined with
variations in intensity. The results showed that both duration and intensity act as cues
in stress judgments ;duration produced the greater overall fluctuation in the judgments
and a method is suggested of making a quantitative comparison of the effect of the
two cues.
T h e second experiment combined duration changes with step changes of fundamental
frequency. T h e results showed that the direction of a step change of frequency had
a strong influence on stress judgments but the magnitude of the frequency change had
no marked effect. The tendency was for a higher syllable to be heard as stressed in
preference to a lower one.
The third test included variations in fundamental frequency within one syllable and
contained a range of patterns which imposed sentence intonation on the test items.
The results again demonstrated the all-or-none effect of frequency changes and showed
that this may outweigh the duration cue altogether.

STRESS
AS A TERM IN A DESCRIPTIVESYSTEM

A number of the terms used in descriptive linguistics refer to events that occur at
different levels and at different stages in the process of speech communication. One
such term is “ stress ” which generally denotes both an aspect of the articulatory or
motor side of speech and also a feature of the sounds perceived by a listener. Part
of the usefulness of the term to linguistic description lies in the very fact that it spans
both the transmission and the reception phase of speech, but its use sometimes forms
the basis for the unjustifiable assumption of a one-to-one correlation between trans-
mission and reception in this particular domain. Writers on phonetics and linguistics
generally use “ stress ” to denote either “ the degree of force with which a syllable is
uttered ’’ (Jones, 1949) or “ degree of loudness ” (Bloch and Trager, 1942), but it is
often implied, or explicitly stated that these two things are completely correlated j
Bloomlield (1933), for example, says that stress “consists in speaking one of these
syllables louder @an the other or others ”.
127

An experimental approach to the problem of stress requires Somewhat more rigorous


formulation of the uses of the term and of the types of event that are open to
investigation. In the common usage, succeeding parts of an utterance are said to
bear stronger or weaker stress in comparison with other parts of the utterance, and
normally the parts so characterized are syllables. Hence stress is a term that refers
to a relation between syllables and successive variations in this relation constitute the
rhythmic pattern of an utterance just as successive variations in tone-relations make
up the intonation pattern. The rhythmic pattern plays a very important role in
English and the work reported in this paper deals only with examples drawn from
this language.
Stress as a descriptive term.may, then, on the one hand, refer to features of the
skilled movements that constitute the transmission side of speech and it may be
possible to devise experimental methods of measuring variations in the force of
utterance during a speech sequence. If this were done, it is likely that the variations
would be seen to be more closely connected with phonation than with articulation
and it is unlikely that there would be an exact correlation between degrees of stress
in the linguistic sense and measured force of utterance. On the other hand, Stress
may refer to the.reception of speech and in this case it denotes a complex of perceptual
dimensions.
A sound stimulus may be varied along several physical dimensions, and such
variations, provided they fall within certain ranges, will give rise to changes in basic
psychological dimensions : pitch, loudness, quality and length. These are basic
dimensions in the sense of being independently variable. It is possible to present to
a listener sounds which he will judge to be different in pitch but the Same loudness,
quality and length, or different in quality, but the same pitch, loudness and length, and
so on. In addition to these basic perceptual dimensions, there are others which
constitute in effect a complex of these first four. Thus in psycho-physical experiments
on the “volume” or on the “density” of sounds, the listener operates with a
complex of the simple dimensions.

Perception of the sounds of speech always involves a complex of these dimensions ;


the listener is never concerned exclusively with one of them. He takes in continuous
variations along all of the basic dimensions and his linguistic judgments are
determined by their interaction. This fact is only one more illustration of the
redundant character of speech as a mode of communication. The listener, in normal
conditions, has a number of cues that he can use as the basis of any single judgment
and these cues are provided by variations in any and all of the perceptual dimensions.
On the other hand, the listener may, for a specific judgment, be more dependent
on one than on another : in establishing a phonemic sequence, he may depend very
largely on succeeding variations in quality ; in taking in an intonation pattern, he
may commonly rely mainly on variations in pitch.
128 E x p e r i m t s in the Perception of Stress

PERCEPTUAL FACTORS IN STRESS JUDGMENTS

In the case of stress judgments, even in one particular language, all four dimensions
may play a part and this accounts to some extent for the difficulty of defining the
term and for the occurrence in descriptive linguistics of terms such as " pitch accent",
" force accent )',etc., which are used to denote the supremacy of one dimension over

the others in specific circumstances. If we consider the stress patterns of English in


perceptual terms, there are a number of factors that influence a judgment of stress.
The listener relies on differences in (1) the length of syllables, (2) the loudness of
syllables, (3) the pitch of syllables, (4) the sound qualities occurring in the syllables
and (5) in the kinaesthetic memories associated with his own production of the
syllables he is receiving;
These factors form a complex in which no one is independent of the others. Thus
a stress judgment may be influenced by the length of a syllable, and particularly by
the length of the vowel that it contains, but not independently of the vowel quality.
In the English word /mo:bid/ the first syllable is perceived as stressed, partly because
the first vowel is long. This vowel is, however, long in opposition to the first vowel
of /mo:biditi/ and not in contrast with the second vowel /i/, for in the latter word,
the first vowel is still long in contrast with the second, although the stress is now
perceived to be on the second syllable.
Certain quality differences in English have particular significance in stress
judgments. The substitution of the neutral vowel /a/ for some other vowel, the
reduction of a diphthong to a pure vowel, or the centralization of a vowel are all
powerful cues in the judgment of stress. Some features of consonant quality, such as
the strength of friction or aspiration end the sharpness of onset of the consonant
sound may act in a similar way.
It has sometimes been denied that the listener's kinaesthetic memories of his own
speech can play any part in the reception of speech. Experimental demonstration of
the operation of this factor may indeed be difficult though it might possibly be
achieved. The arguments generally advanced against this view are however largely
irrelevant. Thus it is said that a listener may be able to understand speech in a
language of which he cannot himself utter a word, and this is taken to " prove " that
kinaesthetic patterns contribute nothing to the reception of speech. This, however, is
merely to assume an identity between two statements, (1) that kinaesthetic patterns,
contribute to the reception of speech, and (2) that kinaesthetic patterns are esseniid
for the reception of speech. The second of these is, of course, quite unjustifiable and
is indeed contrary to the whole character of speech as a mode of communication.
The redundancy of speech has been demonstrated in a number of ways ;it is important
to realize that It is to be found at every level of speech activity and as a consequence
there is scarcely any feature which can be said to be essential for speech communica-
tion. A system *at is common to the speaker and the listener and a rime pattern
D. B. Fry 129

of change in the medium of communication are indeed the only two factors that can
be regarded as essential. For the rest, speech consists of features that sub-serve these
requirements and operate in combinations that depend upon the conditions of the
moment. The purpose of experimental work is to explore these combinations and to
study their relation to the conditions in which they occur. In ordinary working, and
particularly in the case of a listener receiving his native language, it is probable that
the listener’s kinaesthetic memories play some part in his reception of speech. If
this is so, it is likely that the contribution will be particularly strong in the case of
stress judgments since rhythm of all kinds has a powerful motor component.

PHYSICAL CORRELATES OF THE PERCEPTUAL FACTORS

In order to experiment with judgments of stress it is necessary first to determine


the physical dimensions of the speech stimulus that we may expect to be closely
correlated with the perceptual factors. We have already said that the influence of
the listener’s kinaesthetic images is not directly accessible to experimental investigation
but the other four factors can be assigned physical correlates rcliably from established
experimenta1 data. This does not, of course, mean that there is in any case a
one-to-one correlation between the stimulus dimension and the perceptual effect.
The length of sounds will be closely correlated with the duration of given sections
of the speech wave-form. Differences in loudness will be associated in part with the
intensity of the speech wave-motions and this in turn will depend upon the frequency
complex or formant structure of the sound. Pitch differences wiU depend mainly
upon variations in fundamental frequency and quality differences on variations in
formant structure.
The basic method of experimental study consists in presenting to a group of
listeners speech sounds in which these physical dimensions can be varied independently
and systematically, developing a method by which the listener’s stress judgments can
be recorded and determining by statistical treatment the influence of the physical
variations. The experiments reported in this paper are concerned with the first three
dimensions only, that is, with variations in the duration, intensity and fundamental
frequency of the speech stimulus. One set of such experiments has already been
reported in some detail (Fry, 1955).

SYNTHESIS OF THE TEST MATERIAL

The essence of this method is that the properties of the speech signals may be
closely controlled. This is generally not possible in the case of live speech and only
partially so in recorded speech, so that the most satisfactory method is to synthesize
the required speech sounds in some way that will afford the necessary control over
all the variables of the speech. The pattern playback equipment at the Haskins
Laboratories was used for the purpose (see Liberman, 1952). In this machine, speech-
130 Experiments in the Perception of Stress

like sounds are generated and controlled by means of a painted spectrogram, which
can be made to resemble to any desired degree a spectrogram from live speech. As in
the common type of speech spectrogram, the frequency composition of the sound (its
formant structure) is related to the disposition of the pattern with respect to the
vertical axis, the total intensity of the sound depends on both the area and the density
of the traces, and the duration of any segment is associated with the extent of any
configuration along the horizontal axis. The painted spectrogram forms the control
system in the process of speech synthesis. The pattern playback equipment generates
an extended range of harmonics of a single fundamental (120 c.P.s.) and does not
afford the possibility of changing the fundamental frequency of the synthesized sounds.
The apparatus was used for the first series of experiments concerned with the duration
and intensity of the synthesized syllables ;in these, the fundamental frequency was
kept constant at 120 C.P.S. The second series was concerned with the effect of varying
the fundamental frequency, and for this purpose a modification of the Vocoder (the
Voback) was used (Borst and Cooper, 1957). The same painted spectrograms in this
case controlled the output of the channels of a Vocoder synthesizer unit, and additional
tracks on the spectrogram controlled the switching on of the pseudo-larynx tone and
the frequency of this tone (the larynx frequency).

LISTENERS'
JUDGMENTS OF STRESS

The next problem in these experiments was to formulate the questions to be asked
of the listeners. In all projects of this nature, it is an advantage if the subjects used
can be induced to supply an operational response to the speech stimulus in conditions
that do not differ too widely from those of normal speech communication. In
experimenting with variations at the phonemic level, it is possible to achieve this
satisfactorily by asking the subject to write down or to speak back what he hears.
No special training in phonetic techniques is needed to enable the subject to show that
he takes one stimulus to be key and another, feu. Reaction to differences of stress
is in another category in the sense that orthography does not mark stress variations
and the subject has no ready-made code in which to record them. As a consequence,
the untrained subject is less aware of stress than of phonemic distinctions and it is
correspondingly difficult to evoke an operational response to stress differences. There
is in English, however, an association between stress pattern and grammatical function
in certain classes of word; for most English speakers, the word /'sAbdgikt/, with
trochaic rhythm is a noun, and the word /sab'd3ekt/, with iambic rhythm, a verb.
It has been found that listeners with no phonetic training, on hearing an isolated word
of this type, can judge whether they hear the noun or the verb form and in this
way can register whether they hear the stress on the first or second syllable. The
material used was confined to five pairs of words, all of this type: subject, object,
digest, contract, perma't.
D. B. Fry 131

THE ANALYTICAL BASIS OF THE TEST MATERIAL

The next task was to synthesize material for listening tests in which variations in
the chosen physical parameters could be made systematically. This involved a
decision on three major points : the physical dimensions to be explored, the range
of variation to be covered and the size of the steps within each range. The obvious
basis for such decisions is to be found in analytical study of the type of material to
be synthesized and spectrograms were made of utterances of the test words by a
number of different speakers. An account of this work, together with some of the
measurements obtained, is to be found in a previously published report (Fry, 1955),
and it will be enough here to indicate the general method. The selected words, both
nouns and verbs, were included in sentences and great care was taken to ensure a
common context, as far as possible, for both the noun and the verb in each pair.
Twelve speakers then recorded all the sentences and spectrograms were made from
the recordings.
The physical parameters selected for the first series of experiments were duration
and intensity and the spectrograms were examined and measured in order to establish
the modes and range of variation which were associated with the two word classes,
noun and verb. Several well-marked features emerged as a result of this analysis.
First, the differences between a noun and a verb were carried almost entirely by the
" vowel " stretches of the wave-motions (see Fry, 1955) and it was evident that in

synthesizing test material the whole range of variation might justifiably be made in
the " vowel " stretches. Second, the distribution of both durations and intensities
showed a well-defined bi-modality; that is to say the noun/verb opposition was
reflected in the physical data and in fact there was very little overlapping of the values
for the members of each pair of words.
This effect was even more apparent when the ratio of one vowel to another was
plotted rather than the absolute value for either duration or intensity. This agrtes with
the linguistic description of stress as a relation between syllables and is very much
to be expected at the physical level since stress relations survive changes in the rate
of utterance (involving changes in absolute durations) and also changes in the mean
intensity level of the speech. Hence the synthesis of test material was carried out
having regard to suitable ratios of duration and intensity and the range of variation
was established in similar terms.
The third feature of the analytical data was that the distribution of duration and
intensity ratio showed certain differences. Fig. 1 shows that the measurements fall
into two groups with a well-defined cross-over point from noun to verb values ; in
the case of intensity, this fell approximately in the middle of the range for all the five
pairs of words. That is to say, the range of intensity ratios covered by the twelve
speakers was approxihately the same in the noun as in the verb ; for subjecf, the
ratio V1/V2 was 14 db. in the noun and -14 db. in the verb, with the cross-over
132 Exp4timents in the Perception of Stress

X
0 X
0 0 X
0' 0 oox x x
0 0 x
0 xx
00 x

+ I I 1 1 I I I I
68 -1042 *I4 46 *I8 *2OSeC.
*02*04*06
Duration of Vowel One

Fig. l(a). Measured vowel durations for the word-pair subject.

point at equal intensity for the two vowels. In the case of duration ratio, each pair
of words had its own pattern of variation ; the range and the cross-over value were
different for each of the five pairs of words. For coiifrucf, for example, the range
of duration ratios was from 0.1 to 1-06 and the cross-over value 0.50, while for digest
the range was from 0.53 to 2.87, and.the cross-over at 1-25.
In selecting values of duration and intensity for synthesizing the test words, the
chief object was to cover as nearly as possible the total range of observed values and
at the same time to make certain of exploring the part of the range close to the
D. 8. Pry 133

20 KEY: x = SUBject
o = SubJECT

15
-
#-

$
9 10
5
r
.-
4
cn
2 5
-

Intensity of Vowel One

Fig. 1. (b) Measured intensities for the word-pair subject. In the plotting of intensity, the over-all
intensity level is brought to the same value for all speakers.

cross-over value from noun to verb. On the basis of the analytical data, it was decided
to adopt an intensity range of &lo db. for all the five pairs of words and to use
a different range of duration ratios for each pair.
The number of steps to be used in each range was partly determined by the length
of the listening test that subjects could be asked to undergo and it was found that,
from this point of view, five steps in each dimension was suitable number. In both
duration and intensity, the two extreme values were chosen to be near the ends of
the observed range, th; middle value was approximately at the cross-over valu; from
noun to verb and the two intervening values were chosen with the object of exploring
134 Exfwn'ments in the Perception of Stress

the uncertainty range between noun and verb. In .the case of subject, for example,
the observed duration ratios (Vl/V2) ranged from 0-15 to 1.28, with the cross-over at
0.66, and the chosen experimental values were 0.25, 0.40,060, 1-00and 1.25. For
all pairs of words, the intensity ratios (Vl/V2) were -10, -5, 0, 5 and 10 db.

THE COhlBINATION OF PHYSICAL CUES

It has been pointed out already that judgments of stress depend upon a complex
of perceptual factors which are interdependent. It follows that the effects of the
physical correlates of these perceptual factors are also likely to be inter-related.
In any speech sequence presented to a listener, the duration, intensity, fundamental
frequency and f o r m a t structure all act as cues which determine the listener's stress
judgments and there is no method of rendering any of these physical dimensions
inoperative. The clearest example of this is to be found, perhaps, in the formant
structure of the speech sounds. In the verb /sabd3ekt/ the first syllable contains
the vowel /a/ and the second /e/, and formant structure typical of these sounds is
an important factor in determining the listener's stress judgment. A modification of
formant structure, in the direction of /A/ in the first vowel or in the direction of /i/
in the second, would at once bias the stress judgments towards the trochaic or noun
form. In synthesizing these words, therefore, whatever the formant structure may be,
it is bound to exert a biasing effect. Hence in experiments with synthesized speech,
we may decide to vary any one of the four physical dimensions and to keep the other
three constant, but the chosen values for the latter will none the less contribute to
the listeners' stress judgments.

THE DURATION AND INTENSITY TEST

In the first series of experiments, it was decided to maintain constant values for
formant structure and for fundamental frequency and to vary duration and intensity.
Fundamental frequency for all the voiced sounds was kept coatant at 120 c.p.s The
formant structure during the vowel stretches gave a vowel quality corresponding to
the stressed vowel in every case ; that is, the first vowel in all versions of subject
sounded like /A/ and the second vowel, like /e/, and similarly for all the other
word-pairs. Hence the biasing effect of the formant structure would tend in the
opposite direction in the first and second syllables of a word and would thus be
partially cancelled out. Another consideration was that the test was first made with
a large group of American listeners. In American speech, it happens quite commonly
that there is little or no opposition of vowel quality in such noun and verb pairs and
hence the bi:sing effect would be rather less considerable. It turned out, in practice,
that there was no marked difference between the responses of the American subjects
and those of a small group of English subjects.
D. B. Fry 135

The variations in duration and intensity ratio covered the required range in five
steps, as has been already indicated. In order to economize in test material, the two
sets of variations were combined together in one set of test items. For each of the
five word-pairs, versions were synthesized which covered the five steps of duration
ratio and the five steps of intensity ratio, each value of duration being combined with
each value of intensity. This gave a listening test of 125 items, which appeared to be
about the longest test that listeners could comfortably manage on one testing occasion.
All versions of the test word-pairs were recorded and assembled in random order.
Each test item was inserted in a carrier sentence (also synthesized) and was heard in
the context “Where is the accent in -? ” Listeners were asked to make a
response to every item and to register this on a test sheet where the appropriate
word-pair was printed for each test-item in this form : SUBject :subJECT, CONtract
: conTRACI‘ and so on. They were asked to underline the form that they heard.

RESULTS OF THE DURATION AND INTENSITY TEST

This test was carried out by 118 subjects ;the effect of variation in the physical
cues was measured in terms of the proportion of these listeners who judged a given
stimulus to be a noun or verb, that is to have trochaic or iambic rhythm. Since all
subjects made a judgment about every test item, the number of noun judgments for
one item is equal to 118 - (the number of verb judgments). For simplicity, therefore,
all results of the test are given as the number of noun judgments, usually presented
as a percentage of the total number of subjects.
In the case of all five word-pairs, the total range of stimuli was enough to cause a
complete swing of the listener’s judgments from noun to verb ;one version in each
set produced a noun judgment from 97-100% of the listeners, and at the other end
of the range, one version produced less than 10% of noun judgments, with the
exception of permit in which the lowest value was 13%. The change in judgments
followed the expected trend: where V1 was long in proportion to V2, there was a
majority of noun judgments, and similarly where V1 was more intense than V2. The
effect was reinforced in versions where V1 was both longer and more intense than V2.
The disagreement amongst subjects was greater, that is the percentage of noun judg-
ments was nearest to 50%, when the duration and intensity cues were opposed to each
other, as for example in versions where V1 was longer but of lower intensity than V2.

THE RELATIVE STRENGTH OF THE DURATION AND THE INTENSITY CUE

There is no doubt from the experimental results that in the English word-pairs used
in the test, both duratim and intensity ratio have a marked influence in determining
stress judgments. An interesting question that one might try to answer on the basis
of these results concerns the relative strength of the two cues, Information on this
136 Experiments in the Perception of Sttess

point can be abstracted from the results by summing the noun judgments for all
intensity ratios at each duration ratio, i.e. by taking the mean of the column values in
the matrix of results. This gives the effect of chhging duration ratio, and similarly
summing for each intensity ratio, i.e. taking the row averages, gives the effect of
changing intensity. The total taken for all five word-pairs showed that the total change
in noun judgments due to duration was from 12% to 92%, and that due to intensity
ratio was from 40% to 82%.
In order to establish the significance of this relation, we need to make a quantitative
comparison of the duration and intensity ratios used h the experiment. Since the
range of values was approximately equal to those found in the analytical data, that
is in natural speech, the range of duration change can be regarded as at least in this
sense equivalent to the range of intensity change. In Fig. 2, the aggregate of noun
judgments for each duration and intensity ratio is plotted. This is a formal representa-
tion of the results in which the abscissae are simply succeeding steps of duration or
intensity change and not points on a quantitative scale, It is evident from the
experimental results that an extension of the duration range would not lead to any
major change in noun judgments since these already cover nearly the whole range
-
(0 100%. Whether extension of the intensity range would give judgment values
near to 0 or to 100% could be determined by experiment, but it was in fact clear from
the preliminary syntheses that preceded the final test that extreme steps of intensity
change from V1 to V2 served only to make the stimulus sound very unnatural without
increasing the impression of strong stress. Such an experiment would, further, leave
unresolved the question of equivalence between duration range and intensity range and
it seemed therefore worth while to seek an alternative method of treating the existing
results' in order to reach some conclusion concerning the relative srrength of the
- duration and intensity cues.
AS we have already said, the response to any stimulus in the test is made up of
four factors: the response due to duration, that due to intensity, to fundamental
frequency and to formant structure. The force of any of these factors could be more
reliably abstracted from the data if the degree of agreement amongst subjects were
expressed on a scale which was not artificially compressed by the barriers of 0 and
100%. Such a measure is provided by taking the logit number for each test item
instead of the percentage of noun judgments. The subjects were able to make one
of two responses to each item. If p = proportion of noun judgments, and q =. (1 - p)
= proportion of verb judgments, then logit p = log. p / q . The range of logit values
will be & 00, the smallest degree of agreement (50%) will have the logit 0 ;positive
values of logit p w i l l indicate agreement in a noun judgment and negative values
agreement in a verb judgment. The logit response for each test item will represent
a factor due to duration and a factor due to intensity and these factors can then be
abstracted as,before by taking the row and column averages of the matrix of results,
An inspection of the crude data made it clear that they would not yield an exact fit
with this type of.treaanent ShCe there were several irregularities in the pattern. A
D. B. Fry 137

Duration
Intensity

I I I I I I

1 2 3 4 5
Increasing ratio V I / V ~

Fig. 2. Percentage of listeners’ “noun” judgtnents for all test word8 as a functi~n.of
(a) vowel duration ratio and (b) vowel intensity ratio.

difficulty arises with values of 0% and loO%, which would theoretically give lclgits of
--cy and 3-cu, ; it seemed good enough for our purposes to consider them crudely
as 3% (logit = -5.293) and 993% (logit = 5.293) since the irregularities in the
pattern make it impossible to use the most refined statistical methods.
The procedure was to calculate the logit values for all percentages occurring in the
results and to tabulate these for each of the word-pairs used in the test. The common
logit for each duration ratio was obtained by taking the column averages and for each
intensity ratio by taking the row averages. The supposition is that the logit for any
combination of duration and intensity can be expressed as a sum of a duration effect
and an intensity effect: one may reasonably expect this to be approximately true
138 Expffimenk in ths Perception of Stress

J
0.2
. I

0.4
I

0.6 0.8
I

1.0 1.2
1
1.4
Duration ratio VvV2

Fig. 3(a). Common logit vaIues for duration ratio from the results for the word-pair subject.

although, as we have said, the irregularities in the distribution make it impossible to


test the hypothesis with any exactness. We can only measure relative effects : since
both the column and the row averages may be considered to contain the general
average, we have subtracted this general average from all column (i.e. duration)
averages to avoid counting it twice over. The logits for each word-pair were plotted
separately as a function of duration and intensity ratio and a typical result (for the
word-pair subject) is given in Fig. 3.
D. 8. Fry 139

-2.01 , , ,

-10 -5 0 5 10db.
Intensity ratio VID2

Pip. 3. (b) Common logit values for intensity ratio from the results for the word-pair subject

It will be seen that the logits both for duration and for intensity lie approximately
on a straight line. We may conclude from this that succeeding steps of duration change
produce equal changes in the logit and the same is true for intensity changes. This
means that the ratio p / q , i.e. noun/verb, is multiplied by nearly the same factor for
equal changes in duration and intensity and thus rises in a geometrical progression.
Since this is so, we may' now compare the effect of duration and intensity by comparing
the slope of the two lines. In the case of subject, the whole range of intensity change
140 Experiments in the Perception of Stress

of 20 db. produces a logit rise of 2.5. On the duration h e , a change ia the logit of
2.5 is effected by a change in duration ratio of approximately .6. Similar calculations
in the case of the other word pairs give the following results: object, 20 db. is
approximately equivalent to a duration ratio change of -4, digest, .16,contract, -35and
pennit .9. This method of treating the data therefore affords a means of making a
quantitative comparison of the duration and intensity cues and their influence on stress
iuc4v-m.

STRESS JUDGhWNTS AND VARIATION IN FUNDAMENTAL FREQUENCY

A second series of experiments was undertaken to explore the role of fundamental


frequency variations in determining stress judgments. It is clear that such variations
will affect the intonation pattern perceived by the listener, and in English speech, the
perception of a rhythmic pattern is very closely bound up with the
perception of intonation. In the case of the word-pairs used in the previous
experiment, in many contexts the sentence intonation pattern would have an over-
riding influence on the decision as to whether the noun or verb form had occurred.
This factor complicates the problem of examining the part played by fundamental
frequency variation in stress judgments since the resulting pitch changes will not
only contribute to the perception of stress but will tend to impose upon the stimulus
sequence a sentence intonation which may be decisive for the stress judgment.
The purpose of this set of experiments was to study the effect of fundamental
frequency variation in conditions where the influence of sentence intonation is reduced
to a minimum. It is generally true that in English the functionally most important
part$ of a sentence intonation pattern are syllables in which the pitch changes in the
course of the syllable. Syllables with Ievel tone are the nearest one can get to units
that are neutral with respect to sentence intonation, though it is obviously not possible
to make sentence intonation inoperative. These experiments were made, therefore,
by synthesizing sequences in which fundamental frequency remained constant
throughout a syllable and any change of frequency was made between syllables.
The material was confined to the word-pair subject. Versions of this were
synthesized in which a change of fundamental frequency was effected at the junction
between the first and second syllable. It has been already said that this synthesis
was carried out with the Voback (Borst and Cooper, 1957), a device in which hand-
painted spectrograms are used to control the synthesizer action of an 18-chmel
Vocoder. The duration, intensity and formant structure of the synthesized sounds
were controlled in a manner similar .to that used in the previous experiments (though
the sound produced by the Vocoder is of a different character) and additional sections
of the painted pattern served to control the switching of the buzz and hiss generators
and the frequency of the fundamental during the buzz sequences. This frequency was
measured-by means of a General Electric audio-frequency meter connected across the
buzz generator.
D. B. Fry 141
As in the previous test, two physical dimensions were explored at the same time ;
the duration ratios already used in subject were used in the new test and were combined
with step-changes of frequency ranging from 5 C.P.S. to 90 C.P.S. The intensity ratio
was constant at equal intensity for V1 and V2 and the formant structure was t ? e same
as in the first test.

The choice of fundamental frequencies for this test involved a number of considera-
tions that should be briefly mentioned. The listeners were to hear a series of sense-
groups, each containing two syllables, and to make a judgment about the mess pattern.
The effect of sentence intonation was to be minimized, but apart from this it was
desirable that the stimuli should be as natural as possible since this was likely to
make the judgments more consistent. In English speech there is a strong tendency
for a sense group to be spoken in one key and for musical modulation to take place
between groups. This effect of key depends largely upon the Occurrence in the group
of some reference pitch, of which the speaker is unaware, but which regulates the
pitch of all the syllables in the group. In the test items it was therefore decided to
adopt a reference frequency which would Occur in every item, and in order to Limit the
number of variables. in the test, the same reference was used throughout the test.
The synthesized speech was intended to sound like that of a male speaker, and the
selected reference frequency of 97 C.P.S. gave this effect successfully.
The range of variation in fundamental frequency was decided on similar grounds.
In the intonation patterns heard from most English speakers changes in pitch of more
than one octave are infrequent and are not often met with in successive syllables, even
from the most excitable speakers. Preliminary syntheses showed in fact that a change
of 90 C.P.S. on 97 C.P.S. (approximately a semi-tone less than one octave) produced
stimuli that sounded rather unnatural and hence this upper limit was adopted as
being likely to show up the maximum effect of frequency change without introducing
very unnatural stimuli which would perhaps make listeners respond in a random
manner.
The relation between the reference frequency and that of the other syllable was
found to be important for the naturalness of the stimulus. Each syllable was on one
tone, that is of constant frequency, and if the relation between the syllables was such
as to make the impression of an exact musical interval, the test word appeared to be
sung and listeners found it difficult to make a stress judgment. .Care was taken
therefore to avoid this effect as far as possible and this was one reason for the fixing
of the reference frequency at 97 C.P.S. In preliminary experiments a reference
frequency of 100 C.P.S. with frequency intervals of multiples of 5 C.P.S. was used.
Many of the stimuli then had much too musical an effect which was eliminated by
the change of the reference frequency to 97 C.P.S. Frequency changes as small as
3 C.P.S. were used in the fist experiments but listeners’ responses to these i t e m were
very inconsistent and were disregarded in the final test. The frequency steps
142 Experiments in the Perception of Stress

ultimately selected were designed to explore adequately the range of variation up to


90 C.P.S. and the experimental values were: 5, 10, 15, 20, 30, 40, 60 and 90 C.P.S.
It was expected that an important factor in determining stress judgments would be
the direction of frequency'change in the course of the stimulus word and it was
necessary therefore to make the step-change of frequency in both directions, that is
in one case with the first vowel on a higher frequency than the second, in the other
case with the first vowel lower. In all cases the lower vowel was at the reference
frequency of 97 C.P.S. The total of frequency changes was therefore 16, each of the
8 intervals used in two directions. The 5 duration ratios were combined with each of
the frequency changes, giving a total of 80 test items. In this test the items were
not inserted in a carrier sentence since this would tend to increase rather than to
minimize the influence of sentence intonation on the results. Subjects were asked
to register their responses in the same way as in the previous test.

RESULTSOF THE FUNDAMENTAL FREQUENCY TEST

The effect of pitch on the perception of stress is generally held to be that a


higher pitch produces an impression of greater stress. This experiment was designed
to test first the hypothesis that, if two syllables differ in fundamental frequency, the
syllable having the higher frequency is more likely to be judged as stressed. It was
intended also ro determine whether this principle, if it operates at all, is subject to
modification through the effect of duration ratio, which was shown by the first
experiment to be an important factor in stress judgments. Last, the experiment was
intended to show whether the size of a frequency step between syllables has a marked
effect on stress judgments.
If a syllable of higher fundamental frequency tends to be judged stressed then in
this test the step-down change of fundamental would lead listeners to perceive the
stress on the first syllable of the test word, that is, it would tend to increase the
number of noun judgments, and the step-up change would decrease the number of
noun judgments. In all, 41 subjects carried out this test ; they included a group of
American and a group of English speakers. Fig. 4 gives the results of the test as
percentage noun judgments for each duration ratio, with the step-down and step-up
changes plotted separately.
The effect of changing duration ratio re-appears clearly in the results of this test
and the shape of the curves is similar to those obtained in the first experiment, but
there is good evidence of the effect of fundamental frequency change suggested by
the hypothesis The step-up change of frequency moves the whole curve in the
direction of fewer noun judgments and the stepdown change displaces it in the
direction of more noun judgments. The difference between means for step-up and
stepdown change is significant at the 1% level for all duration ratios.
D. B. Fry 143

100”

Stepdown
tn /O*
4
6 ao- /*
1-
E 0

U
CT, i 1‘
.em

3 i /

i
a-
8
60-
;c\
2 I/
t
3
0 - 0i
t I
-c I

6 40- 0- I
step-up
W I

s”c
/

- /
I
II

8
L
20-
b,
e

1 2 3 4 5
Increasing ratio VI/W

Fig. 4. T h e effect of step changes of fundamental frequency on “ ~ ~ u n ’


judgments
’ for the
word-pair subject.

EFFECT’ OF TSE SIZE OF TWE STEP-CHANGE IN FREQUENCY

In the case of both duration and intensity ratio it has been shown that progressive
increase in these quantities is reflected in increasing noun judgments by the subjects.
The next question to be asked with regard to fundamental frequency is whether
increase in the frequency ratio of V1 to V2 would have a similar effect, or whether
fundamental frequency change, unlike duration and intensity change, tends to produce
an all-or-none effect
144 Experiments in the Perception of Stress

The effects of frequency change were abstracted from the data by combining all
duration ratios for each step change of frequency. In order to detect any possible
trend in the results, the logit response for each frequency was calculated and the
values are shown in Fig. 5. The first important feature of these results is the
discontinuity in logit response between the values -5 and 5 c.P.s., that is at the
cross-over from a step-up to a stepdown change in fundamental frequency. This
confirm the conclusion already reached by inspecting the results for duration ratio
in this experiment. Increase in the size of the frequency step appears to produce no
marked trend in the results, however. The logit values for the step changes lie
approximately on a horizontal line, indicating that the size of the change is having
no appreciable effect. For the step-up change, if there is any trend, it is in the
direction opposite to the expected one. An increase in the size of the step-up gives
a slight increase in noun judgments, rather than the expected decrease. This effect
is contributed largely by the 90 C.P.S. change and it may well be that this large step-up
appeared even more unnatural to the listeners than an equal stepdown and thus
caused greater uncertainty in the judgments.
These results provide good evidence for supposing that a step-change of fundamental
frequency affects stress judgments in a specific way. It appears likely that so long as
the resulting pitch change is easily perceptible to the listener, he tends to judge a
higher syllable as more stressed, but the magnitude of the pitch change makes little
contribution to his judgment. This would be consistent with the fact that a frequency
change of 3 C.P.S. led to a dispersion of the listeners' judgments ;it may well have
been too small to cause the all-or-none effect in the perception of stress.

SE"CE INTONATION AND STRESS JUDGMENTS

The role of intonation in determining stress judgments has already been touched
upon in connection with the previous experiment in which efforts were made to reduce
the influence of sentence intonation. It is clear, however, that any account of the
factors affecting stress judgments is incomplete without an attempt to answer certain
questio-ns about sentence intonation. The most important of these is the question
whether, as one would expect, sentence intonation is so strong as to be capable of
outweighing all other factors in stress judgments.
A third set of experiments was carried out to answer this question. As in the
previous experiments, these were designed to explore a range of variation in physical
cues and to determine the effect of this variation on stress judgments in the same way
as before. The important variable was again fundamental frequency, but this time
the variations were chosen to allow sentence intonation the maximum effect.
It was said'earlier that, broadly speaking, a syllable containing a pitch change is
functionally more important in English intonation than 'a level syllable, and for this
D. B. Fry 145

1.0-

0"

e .
-K) -
0

.-
- 9 0 - 6 0 - 4 0 - 3 0 -20-15 -10 -5 5 0 1520 3 0 4 0 6 0 9 0
1
Frequency ratio W / V ~

Fig. 5. Logit response for step changc of fdamental frequency. F r e q d e s are plotted
on a logarithmic scale.

third test versions of the word-pair s u b h t wee synthesized in which fundamental


frequency changed in the course of one vowel stretch. It should be made cleat,
however, that the purpose was not to reproduce faithfully certain English intonations,
but rather to mver a wide range of patterns of fundamental frequency variation and
to study the e f k t of these.
146 E x p e n ' m t s in the Percepfion of Stress

Short vowel Long vowel

time
Short vowel Long vowel

time -
)

Fig. 6. Types of fundamental frequency change used in the syllable inflectim test.

Again the intensity ratio of the two vowels in each version was kept constant at
equal intensity and the same formant structure was used as before. The five duration
ratios were combined with the fundamental frequency variations. In order to reduce
the number of variables, the frequency range over which the fundamental varied
within one vowel was kept constant throughout the test. A reference frequency of
97 C.P.S. was again used, that is at some time during the stimulus word the fundamental
D. B. Fry 147

reached this minimum value. The highest frequency used was 130 C.P.S. and when
frequency changed in the course of a syllable it covered the whole of this r q e from
97 to 130 C.P.S. A number of stimulus words included one level syllable and the
fundamental frequency for such syllables was either 97 or 130 C.P.S.
Two tvpes of frequency change within the syllable were used. In the first type, the
frequency changed continuously throughout the vowel, and in the second, the
frequency change occupied only half the vowel duration. Fig. 6 shows the graph
of frequency change with time for the types of syllable used in the test. It will be
seen that the rate of change of frequency was allowed to vary with the duration of the
vowel. Stimulus words were synthesized which covered a range of 16 patterns, each
combined with 5 duration ratios. The different patterns are listed in Table 1 where
the frequency variation for each word is shown symbolically and the letters serve to
identify the patterns in discussing the results.

RESULTS OF THE SYLLABLE INFLECTION TIST

Responses to this test were obtained from 76 subjects, including both American
and English speakers. The first important consideration in examining the results is
that the frequency variations cannot in this case be placed on a quantitative scale ;
the test was designed to show up an all-or-none effect and it is for this that we have
to look in the data from the test. It is to be expected, and the data indeed show once
more, that increasing duration ratio will have the effect of increasing the number of
noun judgments, but the first question is whether any patterns of frequency variation
over-ride the duration cue. In the absence of a fundamental frequency cue, for
example when the five duration ratios are combined with equal intensity in the two
vowels, on a monotone, then the smallest duration ratio produces a majority of verb
judgments, and the largest ratio gives a majority of nouns. A simple criterion might
be applied first of all to the data from the syllable-inflection test and we might look
for any frequency pattern for which the number of noun judgments either never falls
below or never reaches SO%, that is for cases in which the whole curve is transposed
above or below the 50% level. Such cases are to be found in the results and Fig. 7
gives the curves for two such patterns, A and B. For pattern A, even with the
smallest duration ratio, there is a majority of noun judgments and for pattern B, the
greatest duration ratio still produces a substantial majority of verb judgments. These
two frequency patterns will, obviously, sound to the listener like two common English
intonation patterns in which the fall normally occurs in the stressed syllable and it is
not surprising that they should influence stress judgments so strongly. A similar
effect is to be found for patterns J and M, which are functionally similar to A and B.
148 Experiments in the Perception of Stress

TABLE
1.
D. B. Fy 149

0’
,

1 I I i r
1 2 3 4 5
increasing ratio vI/v~

Fig. 7. T h e effect on c‘noun” judgments of two patterns of fundamental frequency change


(see Table 1).

The range and the mean of noun judgments for all patterns are given in Table 1, and
it will be seen that the range for J is 49 - 95%, that for M is 3 - 49%. The influence
of fundamental frequency change is not, however, confined to. patterns giving rise
to a familiar intonation. Both E and F produced an un-English intonation but none-
theless evoked a large majority of noun judgments because of the inflection in the
first syllable.
150 Experimoris in the Perception of Stress

THE EFFECT OF DIFFERENT TYPES OF FREQUENCY PAITERN

A wide variety of patterns was used in this experiment in the hope of answering
certain questions concerning the effectiveness of different types of fundamental
frequency variation in determining stress judgments. The stimulus words contained
three kinds of syllable: level syllables, syllables with a lipear change of frequency
and syllables with a curvilinear change. These syllables occurred sometimes as the
first and sometimes as the second syllable of a stimulus word and it was possible by
grouping the results to obtain some information on the relative power of these
syllabic patterns to influence stress judgments. If we compare patterns A and B, for
example, a noun judgment for A means that the subject heard a linear change of
frequency as stressed in contrast to a Ievel syllable. In B, a verb judgment means
the same thing. But a verb judgment for A or a noun judgment for B means that
the subject heard a level syllable as stressed in contrast to a linear change. Provided
that the five duration ratios are equally represented in the samples, we can group
sets of data together in this way and obtain some indication of the association between
types of syllable and the judgment that the syllable is stressed. The first contrast
treated in this way was that between inflected and level syllables. In all patterns that
contained both a level and an inflected syllable, 66% of all inflected syllables were
judged stressed and 33% of level syllables. This difference was highly significant
at the 1% level.
The two types of inflected syllable were compared in a similar way. For example,
patterns A and J contain an inflected first syllable, in the one case a linear and in
the other a curvilinear inflection. By comparing the number of noun judgments in
this and in similar cases we gain a measure of the relative effectiveness of the two
types of syllable. Of all syllables with linear frequency change, 62% were judged
stressed, whilst 72% of the syllables with curvilinear change were heard as stressed.
This difference is not significant.
The last comparison made in this way was between rising and falling inflections.
The intonation patterns of English involvc both rising and falling tones end the
word-pairs used in these experiments could certainly occur in contexts where noun
and verb might both be required by the sentence intonation pattern to bear a rising or
a falling tone. It would appear, therefore, that this stress judgment should be
independent of the difference between rising and falling changes in fundamental
frequency. The result obtained by grouping the data was that 61% of rising syllables
were judged stressed and 64% of falling syllables, a difference that was not significant.
A final comment is necessary on this experiment with frequency patterns. T h e
variations in frequency indicated in Table 1 should not be simply equated with
English intonation patterns. Whilst it is true that many items appeared to have a
fairly natural intonation, it cannot be assumed that this intonation was necessarily
D. B. Fry 151

the one suggested by the frequency pattern. A preliminary attempt has been made
to correlate the intonation pattern with the frequency pattern by asking several train&
listeners to note the intonation they heard in each item. It is clear from these
.judgments that a number of the vowels are so short that a change of fundamental
frequency is not perceived and the syllable is judged to have a level tone. Other
effects of this sort may appear as a result of further investigation on these lines.

CONCLUSIONS
The experiments reported in this paper represent an attempt to explore three
physical dimensions which appear to be important in determining stress judgments
in English : duration, intensity and fundamental frequency. The importance of the
duration ratio is confirmed by the fresh data presented here ;it seems that in English,
in a considerable variety of conditions, changes of vowel duration ratio can swing
listeners’ perception of strong stress from the first to the second syllable in the type
of disyllable that has been considered. There seems no reason to doubt that this
factor operates in stress judgments in other rhythmic contexts. Intensity ratio has a
similar iduence but it is somewhat less marked. The data show no case in which
change of intensity ratio caused a complete shift of the stress judgment from first to
second syllable.
Change in fundamental frequency differs from change of duration and intensity in
that it tends to produce an all-or-none effect, that is to say the magnitude of the
frequency change seems to be relatively unimportant while the fact that a frequency
change has taken place is all-important. The experiments with a step-change of
frequency show that a hi&er syllable is more likely to be perceived as stressed ;
the experiments with more complex patterns of fundamental frequency change suggest
that sentence intonation is an over-riding factor in determining the perception of
stress and that in this sense the fundamental frequency cue may outweigh the
duration cue.
In conclusion, it may be necessary to reiterate that all judgments of stress in
natural speech depend on the complicated inter-action of a number of cues.
Experiments such as those described above require a drastic simplification of the
conditions in which the judgment is made and even SO there are still a number of
factors which cannot be controlled until further work has been done in this field. The
formant structure cue s t i l l remains to be investigated and it is quite probable that
for English listeners, at least, the changes in vowel quality introduced by variations
in formant structure may prove one of the most powerful factors in determining stress.
The author wishes to thank Dr. F. S. Cooper and the staff of the Haskins
Ladoratories for their help in carrying out some of these experiments and
Dr C. A. B. Smith for suggesting methods of treating the data.
152 Experiments in the Perception of Stress

BLOCH,B. and TRAGER, G.L. (1942). Outline of Linguistic Analysis (Baltimore).


BLOOMFIEID, L. (1933). Language (New York).
BORST, J. M. and COOPER, F. S. (1957). Speech research devices based on a channel V d e r .
3. acoust. SOC.A m . , 29,777.
FRY,D. B. (1955). Duration and intensity as physical correlates of linguistic stress. 3. scout.
SOC.A m . , 27,765.
JONES, D. (1949). An Outline of English Phonetics (Leipzig).

A GLOSSARY
O F SOME T E R M S U S E D IN THE
OBJECTIVE SCIENCE OF BEHAVIOR
BY WILLIAMS. VERPLANCK
Provides an empirical vocabulary in the science of human and animal behavior
Familiarizes readers with developments in the study of animal behavior
Clarifies concepts used by behaviorists and ethologists

Price $1.00
Order from:
American Psychological Association
1333 Sixteenth Street, N.W.
Washington 6, D. C .
U.S.A.

Clars 0' Molesey Ltd. (T.U.), 79 Bridge Road, East Molescy, Surrey.
Copyright of Language & Speech is the property of Sage Publications, Ltd. and its content may not be copied or
emailed to multiple sites or posted to a listserv without the copyright holder's express written permission.
However, users may print, download, or email articles for individual use.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy