Experiments in The Perception of Stress by D.B. Fry (1958)
Experiments in The Perception of Stress by D.B. Fry (1958)
STRESS
AS A TERM IN A DESCRIPTIVESYSTEM
A number of the terms used in descriptive linguistics refer to events that occur at
different levels and at different stages in the process of speech communication. One
such term is “ stress ” which generally denotes both an aspect of the articulatory or
motor side of speech and also a feature of the sounds perceived by a listener. Part
of the usefulness of the term to linguistic description lies in the very fact that it spans
both the transmission and the reception phase of speech, but its use sometimes forms
the basis for the unjustifiable assumption of a one-to-one correlation between trans-
mission and reception in this particular domain. Writers on phonetics and linguistics
generally use “ stress ” to denote either “ the degree of force with which a syllable is
uttered ’’ (Jones, 1949) or “ degree of loudness ” (Bloch and Trager, 1942), but it is
often implied, or explicitly stated that these two things are completely correlated j
Bloomlield (1933), for example, says that stress “consists in speaking one of these
syllables louder @an the other or others ”.
127
In the case of stress judgments, even in one particular language, all four dimensions
may play a part and this accounts to some extent for the difficulty of defining the
term and for the occurrence in descriptive linguistics of terms such as " pitch accent",
" force accent )',etc., which are used to denote the supremacy of one dimension over
of change in the medium of communication are indeed the only two factors that can
be regarded as essential. For the rest, speech consists of features that sub-serve these
requirements and operate in combinations that depend upon the conditions of the
moment. The purpose of experimental work is to explore these combinations and to
study their relation to the conditions in which they occur. In ordinary working, and
particularly in the case of a listener receiving his native language, it is probable that
the listener’s kinaesthetic memories play some part in his reception of speech. If
this is so, it is likely that the contribution will be particularly strong in the case of
stress judgments since rhythm of all kinds has a powerful motor component.
The essence of this method is that the properties of the speech signals may be
closely controlled. This is generally not possible in the case of live speech and only
partially so in recorded speech, so that the most satisfactory method is to synthesize
the required speech sounds in some way that will afford the necessary control over
all the variables of the speech. The pattern playback equipment at the Haskins
Laboratories was used for the purpose (see Liberman, 1952). In this machine, speech-
130 Experiments in the Perception of Stress
like sounds are generated and controlled by means of a painted spectrogram, which
can be made to resemble to any desired degree a spectrogram from live speech. As in
the common type of speech spectrogram, the frequency composition of the sound (its
formant structure) is related to the disposition of the pattern with respect to the
vertical axis, the total intensity of the sound depends on both the area and the density
of the traces, and the duration of any segment is associated with the extent of any
configuration along the horizontal axis. The painted spectrogram forms the control
system in the process of speech synthesis. The pattern playback equipment generates
an extended range of harmonics of a single fundamental (120 c.P.s.) and does not
afford the possibility of changing the fundamental frequency of the synthesized sounds.
The apparatus was used for the first series of experiments concerned with the duration
and intensity of the synthesized syllables ;in these, the fundamental frequency was
kept constant at 120 C.P.S. The second series was concerned with the effect of varying
the fundamental frequency, and for this purpose a modification of the Vocoder (the
Voback) was used (Borst and Cooper, 1957). The same painted spectrograms in this
case controlled the output of the channels of a Vocoder synthesizer unit, and additional
tracks on the spectrogram controlled the switching on of the pseudo-larynx tone and
the frequency of this tone (the larynx frequency).
LISTENERS'
JUDGMENTS OF STRESS
The next problem in these experiments was to formulate the questions to be asked
of the listeners. In all projects of this nature, it is an advantage if the subjects used
can be induced to supply an operational response to the speech stimulus in conditions
that do not differ too widely from those of normal speech communication. In
experimenting with variations at the phonemic level, it is possible to achieve this
satisfactorily by asking the subject to write down or to speak back what he hears.
No special training in phonetic techniques is needed to enable the subject to show that
he takes one stimulus to be key and another, feu. Reaction to differences of stress
is in another category in the sense that orthography does not mark stress variations
and the subject has no ready-made code in which to record them. As a consequence,
the untrained subject is less aware of stress than of phonemic distinctions and it is
correspondingly difficult to evoke an operational response to stress differences. There
is in English, however, an association between stress pattern and grammatical function
in certain classes of word; for most English speakers, the word /'sAbdgikt/, with
trochaic rhythm is a noun, and the word /sab'd3ekt/, with iambic rhythm, a verb.
It has been found that listeners with no phonetic training, on hearing an isolated word
of this type, can judge whether they hear the noun or the verb form and in this
way can register whether they hear the stress on the first or second syllable. The
material used was confined to five pairs of words, all of this type: subject, object,
digest, contract, perma't.
D. B. Fry 131
The next task was to synthesize material for listening tests in which variations in
the chosen physical parameters could be made systematically. This involved a
decision on three major points : the physical dimensions to be explored, the range
of variation to be covered and the size of the steps within each range. The obvious
basis for such decisions is to be found in analytical study of the type of material to
be synthesized and spectrograms were made of utterances of the test words by a
number of different speakers. An account of this work, together with some of the
measurements obtained, is to be found in a previously published report (Fry, 1955),
and it will be enough here to indicate the general method. The selected words, both
nouns and verbs, were included in sentences and great care was taken to ensure a
common context, as far as possible, for both the noun and the verb in each pair.
Twelve speakers then recorded all the sentences and spectrograms were made from
the recordings.
The physical parameters selected for the first series of experiments were duration
and intensity and the spectrograms were examined and measured in order to establish
the modes and range of variation which were associated with the two word classes,
noun and verb. Several well-marked features emerged as a result of this analysis.
First, the differences between a noun and a verb were carried almost entirely by the
" vowel " stretches of the wave-motions (see Fry, 1955) and it was evident that in
synthesizing test material the whole range of variation might justifiably be made in
the " vowel " stretches. Second, the distribution of both durations and intensities
showed a well-defined bi-modality; that is to say the noun/verb opposition was
reflected in the physical data and in fact there was very little overlapping of the values
for the members of each pair of words.
This effect was even more apparent when the ratio of one vowel to another was
plotted rather than the absolute value for either duration or intensity. This agrtes with
the linguistic description of stress as a relation between syllables and is very much
to be expected at the physical level since stress relations survive changes in the rate
of utterance (involving changes in absolute durations) and also changes in the mean
intensity level of the speech. Hence the synthesis of test material was carried out
having regard to suitable ratios of duration and intensity and the range of variation
was established in similar terms.
The third feature of the analytical data was that the distribution of duration and
intensity ratio showed certain differences. Fig. 1 shows that the measurements fall
into two groups with a well-defined cross-over point from noun to verb values ; in
the case of intensity, this fell approximately in the middle of the range for all the five
pairs of words. That is to say, the range of intensity ratios covered by the twelve
speakers was approxihately the same in the noun as in the verb ; for subjecf, the
ratio V1/V2 was 14 db. in the noun and -14 db. in the verb, with the cross-over
132 Exp4timents in the Perception of Stress
X
0 X
0 0 X
0' 0 oox x x
0 0 x
0 xx
00 x
+ I I 1 1 I I I I
68 -1042 *I4 46 *I8 *2OSeC.
*02*04*06
Duration of Vowel One
point at equal intensity for the two vowels. In the case of duration ratio, each pair
of words had its own pattern of variation ; the range and the cross-over value were
different for each of the five pairs of words. For coiifrucf, for example, the range
of duration ratios was from 0.1 to 1-06 and the cross-over value 0.50, while for digest
the range was from 0.53 to 2.87, and.the cross-over at 1-25.
In selecting values of duration and intensity for synthesizing the test words, the
chief object was to cover as nearly as possible the total range of observed values and
at the same time to make certain of exploring the part of the range close to the
D. 8. Pry 133
20 KEY: x = SUBject
o = SubJECT
15
-
#-
$
9 10
5
r
.-
4
cn
2 5
-
Fig. 1. (b) Measured intensities for the word-pair subject. In the plotting of intensity, the over-all
intensity level is brought to the same value for all speakers.
cross-over value from noun to verb. On the basis of the analytical data, it was decided
to adopt an intensity range of &lo db. for all the five pairs of words and to use
a different range of duration ratios for each pair.
The number of steps to be used in each range was partly determined by the length
of the listening test that subjects could be asked to undergo and it was found that,
from this point of view, five steps in each dimension was suitable number. In both
duration and intensity, the two extreme values were chosen to be near the ends of
the observed range, th; middle value was approximately at the cross-over valu; from
noun to verb and the two intervening values were chosen with the object of exploring
134 Exfwn'ments in the Perception of Stress
the uncertainty range between noun and verb. In .the case of subject, for example,
the observed duration ratios (Vl/V2) ranged from 0-15 to 1.28, with the cross-over at
0.66, and the chosen experimental values were 0.25, 0.40,060, 1-00and 1.25. For
all pairs of words, the intensity ratios (Vl/V2) were -10, -5, 0, 5 and 10 db.
It has been pointed out already that judgments of stress depend upon a complex
of perceptual factors which are interdependent. It follows that the effects of the
physical correlates of these perceptual factors are also likely to be inter-related.
In any speech sequence presented to a listener, the duration, intensity, fundamental
frequency and f o r m a t structure all act as cues which determine the listener's stress
judgments and there is no method of rendering any of these physical dimensions
inoperative. The clearest example of this is to be found, perhaps, in the formant
structure of the speech sounds. In the verb /sabd3ekt/ the first syllable contains
the vowel /a/ and the second /e/, and formant structure typical of these sounds is
an important factor in determining the listener's stress judgment. A modification of
formant structure, in the direction of /A/ in the first vowel or in the direction of /i/
in the second, would at once bias the stress judgments towards the trochaic or noun
form. In synthesizing these words, therefore, whatever the formant structure may be,
it is bound to exert a biasing effect. Hence in experiments with synthesized speech,
we may decide to vary any one of the four physical dimensions and to keep the other
three constant, but the chosen values for the latter will none the less contribute to
the listeners' stress judgments.
In the first series of experiments, it was decided to maintain constant values for
formant structure and for fundamental frequency and to vary duration and intensity.
Fundamental frequency for all the voiced sounds was kept coatant at 120 c.p.s The
formant structure during the vowel stretches gave a vowel quality corresponding to
the stressed vowel in every case ; that is, the first vowel in all versions of subject
sounded like /A/ and the second vowel, like /e/, and similarly for all the other
word-pairs. Hence the biasing effect of the formant structure would tend in the
opposite direction in the first and second syllables of a word and would thus be
partially cancelled out. Another consideration was that the test was first made with
a large group of American listeners. In American speech, it happens quite commonly
that there is little or no opposition of vowel quality in such noun and verb pairs and
hence the bi:sing effect would be rather less considerable. It turned out, in practice,
that there was no marked difference between the responses of the American subjects
and those of a small group of English subjects.
D. B. Fry 135
The variations in duration and intensity ratio covered the required range in five
steps, as has been already indicated. In order to economize in test material, the two
sets of variations were combined together in one set of test items. For each of the
five word-pairs, versions were synthesized which covered the five steps of duration
ratio and the five steps of intensity ratio, each value of duration being combined with
each value of intensity. This gave a listening test of 125 items, which appeared to be
about the longest test that listeners could comfortably manage on one testing occasion.
All versions of the test word-pairs were recorded and assembled in random order.
Each test item was inserted in a carrier sentence (also synthesized) and was heard in
the context “Where is the accent in -? ” Listeners were asked to make a
response to every item and to register this on a test sheet where the appropriate
word-pair was printed for each test-item in this form : SUBject :subJECT, CONtract
: conTRACI‘ and so on. They were asked to underline the form that they heard.
This test was carried out by 118 subjects ;the effect of variation in the physical
cues was measured in terms of the proportion of these listeners who judged a given
stimulus to be a noun or verb, that is to have trochaic or iambic rhythm. Since all
subjects made a judgment about every test item, the number of noun judgments for
one item is equal to 118 - (the number of verb judgments). For simplicity, therefore,
all results of the test are given as the number of noun judgments, usually presented
as a percentage of the total number of subjects.
In the case of all five word-pairs, the total range of stimuli was enough to cause a
complete swing of the listener’s judgments from noun to verb ;one version in each
set produced a noun judgment from 97-100% of the listeners, and at the other end
of the range, one version produced less than 10% of noun judgments, with the
exception of permit in which the lowest value was 13%. The change in judgments
followed the expected trend: where V1 was long in proportion to V2, there was a
majority of noun judgments, and similarly where V1 was more intense than V2. The
effect was reinforced in versions where V1 was both longer and more intense than V2.
The disagreement amongst subjects was greater, that is the percentage of noun judg-
ments was nearest to 50%, when the duration and intensity cues were opposed to each
other, as for example in versions where V1 was longer but of lower intensity than V2.
There is no doubt from the experimental results that in the English word-pairs used
in the test, both duratim and intensity ratio have a marked influence in determining
stress judgments. An interesting question that one might try to answer on the basis
of these results concerns the relative strength of the two cues, Information on this
136 Experiments in the Perception of Sttess
point can be abstracted from the results by summing the noun judgments for all
intensity ratios at each duration ratio, i.e. by taking the mean of the column values in
the matrix of results. This gives the effect of chhging duration ratio, and similarly
summing for each intensity ratio, i.e. taking the row averages, gives the effect of
changing intensity. The total taken for all five word-pairs showed that the total change
in noun judgments due to duration was from 12% to 92%, and that due to intensity
ratio was from 40% to 82%.
In order to establish the significance of this relation, we need to make a quantitative
comparison of the duration and intensity ratios used h the experiment. Since the
range of values was approximately equal to those found in the analytical data, that
is in natural speech, the range of duration change can be regarded as at least in this
sense equivalent to the range of intensity change. In Fig. 2, the aggregate of noun
judgments for each duration and intensity ratio is plotted. This is a formal representa-
tion of the results in which the abscissae are simply succeeding steps of duration or
intensity change and not points on a quantitative scale, It is evident from the
experimental results that an extension of the duration range would not lead to any
major change in noun judgments since these already cover nearly the whole range
-
(0 100%. Whether extension of the intensity range would give judgment values
near to 0 or to 100% could be determined by experiment, but it was in fact clear from
the preliminary syntheses that preceded the final test that extreme steps of intensity
change from V1 to V2 served only to make the stimulus sound very unnatural without
increasing the impression of strong stress. Such an experiment would, further, leave
unresolved the question of equivalence between duration range and intensity range and
it seemed therefore worth while to seek an alternative method of treating the existing
results' in order to reach some conclusion concerning the relative srrength of the
- duration and intensity cues.
AS we have already said, the response to any stimulus in the test is made up of
four factors: the response due to duration, that due to intensity, to fundamental
frequency and to formant structure. The force of any of these factors could be more
reliably abstracted from the data if the degree of agreement amongst subjects were
expressed on a scale which was not artificially compressed by the barriers of 0 and
100%. Such a measure is provided by taking the logit number for each test item
instead of the percentage of noun judgments. The subjects were able to make one
of two responses to each item. If p = proportion of noun judgments, and q =. (1 - p)
= proportion of verb judgments, then logit p = log. p / q . The range of logit values
will be & 00, the smallest degree of agreement (50%) will have the logit 0 ;positive
values of logit p w i l l indicate agreement in a noun judgment and negative values
agreement in a verb judgment. The logit response for each test item will represent
a factor due to duration and a factor due to intensity and these factors can then be
abstracted as,before by taking the row and column averages of the matrix of results,
An inspection of the crude data made it clear that they would not yield an exact fit
with this type of.treaanent ShCe there were several irregularities in the pattern. A
D. B. Fry 137
Duration
Intensity
I I I I I I
1 2 3 4 5
Increasing ratio V I / V ~
Fig. 2. Percentage of listeners’ “noun” judgtnents for all test word8 as a functi~n.of
(a) vowel duration ratio and (b) vowel intensity ratio.
difficulty arises with values of 0% and loO%, which would theoretically give lclgits of
--cy and 3-cu, ; it seemed good enough for our purposes to consider them crudely
as 3% (logit = -5.293) and 993% (logit = 5.293) since the irregularities in the
pattern make it impossible to use the most refined statistical methods.
The procedure was to calculate the logit values for all percentages occurring in the
results and to tabulate these for each of the word-pairs used in the test. The common
logit for each duration ratio was obtained by taking the column averages and for each
intensity ratio by taking the row averages. The supposition is that the logit for any
combination of duration and intensity can be expressed as a sum of a duration effect
and an intensity effect: one may reasonably expect this to be approximately true
138 Expffimenk in ths Perception of Stress
J
0.2
. I
0.4
I
0.6 0.8
I
1.0 1.2
1
1.4
Duration ratio VvV2
Fig. 3(a). Common logit vaIues for duration ratio from the results for the word-pair subject.
-2.01 , , ,
-10 -5 0 5 10db.
Intensity ratio VID2
Pip. 3. (b) Common logit values for intensity ratio from the results for the word-pair subject
It will be seen that the logits both for duration and for intensity lie approximately
on a straight line. We may conclude from this that succeeding steps of duration change
produce equal changes in the logit and the same is true for intensity changes. This
means that the ratio p / q , i.e. noun/verb, is multiplied by nearly the same factor for
equal changes in duration and intensity and thus rises in a geometrical progression.
Since this is so, we may' now compare the effect of duration and intensity by comparing
the slope of the two lines. In the case of subject, the whole range of intensity change
140 Experiments in the Perception of Stress
of 20 db. produces a logit rise of 2.5. On the duration h e , a change ia the logit of
2.5 is effected by a change in duration ratio of approximately .6. Similar calculations
in the case of the other word pairs give the following results: object, 20 db. is
approximately equivalent to a duration ratio change of -4, digest, .16,contract, -35and
pennit .9. This method of treating the data therefore affords a means of making a
quantitative comparison of the duration and intensity cues and their influence on stress
iuc4v-m.
The choice of fundamental frequencies for this test involved a number of considera-
tions that should be briefly mentioned. The listeners were to hear a series of sense-
groups, each containing two syllables, and to make a judgment about the mess pattern.
The effect of sentence intonation was to be minimized, but apart from this it was
desirable that the stimuli should be as natural as possible since this was likely to
make the judgments more consistent. In English speech there is a strong tendency
for a sense group to be spoken in one key and for musical modulation to take place
between groups. This effect of key depends largely upon the Occurrence in the group
of some reference pitch, of which the speaker is unaware, but which regulates the
pitch of all the syllables in the group. In the test items it was therefore decided to
adopt a reference frequency which would Occur in every item, and in order to Limit the
number of variables. in the test, the same reference was used throughout the test.
The synthesized speech was intended to sound like that of a male speaker, and the
selected reference frequency of 97 C.P.S. gave this effect successfully.
The range of variation in fundamental frequency was decided on similar grounds.
In the intonation patterns heard from most English speakers changes in pitch of more
than one octave are infrequent and are not often met with in successive syllables, even
from the most excitable speakers. Preliminary syntheses showed in fact that a change
of 90 C.P.S. on 97 C.P.S. (approximately a semi-tone less than one octave) produced
stimuli that sounded rather unnatural and hence this upper limit was adopted as
being likely to show up the maximum effect of frequency change without introducing
very unnatural stimuli which would perhaps make listeners respond in a random
manner.
The relation between the reference frequency and that of the other syllable was
found to be important for the naturalness of the stimulus. Each syllable was on one
tone, that is of constant frequency, and if the relation between the syllables was such
as to make the impression of an exact musical interval, the test word appeared to be
sung and listeners found it difficult to make a stress judgment. .Care was taken
therefore to avoid this effect as far as possible and this was one reason for the fixing
of the reference frequency at 97 C.P.S. In preliminary experiments a reference
frequency of 100 C.P.S. with frequency intervals of multiples of 5 C.P.S. was used.
Many of the stimuli then had much too musical an effect which was eliminated by
the change of the reference frequency to 97 C.P.S. Frequency changes as small as
3 C.P.S. were used in the fist experiments but listeners’ responses to these i t e m were
very inconsistent and were disregarded in the final test. The frequency steps
142 Experiments in the Perception of Stress
100”
Stepdown
tn /O*
4
6 ao- /*
1-
E 0
U
CT, i 1‘
.em
3 i /
i
a-
8
60-
;c\
2 I/
t
3
0 - 0i
t I
-c I
6 40- 0- I
step-up
W I
s”c
/
- /
I
II
8
L
20-
b,
e
1 2 3 4 5
Increasing ratio VI/W
In the case of both duration and intensity ratio it has been shown that progressive
increase in these quantities is reflected in increasing noun judgments by the subjects.
The next question to be asked with regard to fundamental frequency is whether
increase in the frequency ratio of V1 to V2 would have a similar effect, or whether
fundamental frequency change, unlike duration and intensity change, tends to produce
an all-or-none effect
144 Experiments in the Perception of Stress
The effects of frequency change were abstracted from the data by combining all
duration ratios for each step change of frequency. In order to detect any possible
trend in the results, the logit response for each frequency was calculated and the
values are shown in Fig. 5. The first important feature of these results is the
discontinuity in logit response between the values -5 and 5 c.P.s., that is at the
cross-over from a step-up to a stepdown change in fundamental frequency. This
confirm the conclusion already reached by inspecting the results for duration ratio
in this experiment. Increase in the size of the frequency step appears to produce no
marked trend in the results, however. The logit values for the step changes lie
approximately on a horizontal line, indicating that the size of the change is having
no appreciable effect. For the step-up change, if there is any trend, it is in the
direction opposite to the expected one. An increase in the size of the step-up gives
a slight increase in noun judgments, rather than the expected decrease. This effect
is contributed largely by the 90 C.P.S. change and it may well be that this large step-up
appeared even more unnatural to the listeners than an equal stepdown and thus
caused greater uncertainty in the judgments.
These results provide good evidence for supposing that a step-change of fundamental
frequency affects stress judgments in a specific way. It appears likely that so long as
the resulting pitch change is easily perceptible to the listener, he tends to judge a
higher syllable as more stressed, but the magnitude of the pitch change makes little
contribution to his judgment. This would be consistent with the fact that a frequency
change of 3 C.P.S. led to a dispersion of the listeners' judgments ;it may well have
been too small to cause the all-or-none effect in the perception of stress.
The role of intonation in determining stress judgments has already been touched
upon in connection with the previous experiment in which efforts were made to reduce
the influence of sentence intonation. It is clear, however, that any account of the
factors affecting stress judgments is incomplete without an attempt to answer certain
questio-ns about sentence intonation. The most important of these is the question
whether, as one would expect, sentence intonation is so strong as to be capable of
outweighing all other factors in stress judgments.
A third set of experiments was carried out to answer this question. As in the
previous experiments, these were designed to explore a range of variation in physical
cues and to determine the effect of this variation on stress judgments in the same way
as before. The important variable was again fundamental frequency, but this time
the variations were chosen to allow sentence intonation the maximum effect.
It was said'earlier that, broadly speaking, a syllable containing a pitch change is
functionally more important in English intonation than 'a level syllable, and for this
D. B. Fry 145
1.0-
0"
e .
-K) -
0
.-
- 9 0 - 6 0 - 4 0 - 3 0 -20-15 -10 -5 5 0 1520 3 0 4 0 6 0 9 0
1
Frequency ratio W / V ~
Fig. 5. Logit response for step changc of fdamental frequency. F r e q d e s are plotted
on a logarithmic scale.
time
Short vowel Long vowel
time -
)
Fig. 6. Types of fundamental frequency change used in the syllable inflectim test.
Again the intensity ratio of the two vowels in each version was kept constant at
equal intensity and the same formant structure was used as before. The five duration
ratios were combined with the fundamental frequency variations. In order to reduce
the number of variables, the frequency range over which the fundamental varied
within one vowel was kept constant throughout the test. A reference frequency of
97 C.P.S. was again used, that is at some time during the stimulus word the fundamental
D. B. Fry 147
reached this minimum value. The highest frequency used was 130 C.P.S. and when
frequency changed in the course of a syllable it covered the whole of this r q e from
97 to 130 C.P.S. A number of stimulus words included one level syllable and the
fundamental frequency for such syllables was either 97 or 130 C.P.S.
Two tvpes of frequency change within the syllable were used. In the first type, the
frequency changed continuously throughout the vowel, and in the second, the
frequency change occupied only half the vowel duration. Fig. 6 shows the graph
of frequency change with time for the types of syllable used in the test. It will be
seen that the rate of change of frequency was allowed to vary with the duration of the
vowel. Stimulus words were synthesized which covered a range of 16 patterns, each
combined with 5 duration ratios. The different patterns are listed in Table 1 where
the frequency variation for each word is shown symbolically and the letters serve to
identify the patterns in discussing the results.
Responses to this test were obtained from 76 subjects, including both American
and English speakers. The first important consideration in examining the results is
that the frequency variations cannot in this case be placed on a quantitative scale ;
the test was designed to show up an all-or-none effect and it is for this that we have
to look in the data from the test. It is to be expected, and the data indeed show once
more, that increasing duration ratio will have the effect of increasing the number of
noun judgments, but the first question is whether any patterns of frequency variation
over-ride the duration cue. In the absence of a fundamental frequency cue, for
example when the five duration ratios are combined with equal intensity in the two
vowels, on a monotone, then the smallest duration ratio produces a majority of verb
judgments, and the largest ratio gives a majority of nouns. A simple criterion might
be applied first of all to the data from the syllable-inflection test and we might look
for any frequency pattern for which the number of noun judgments either never falls
below or never reaches SO%, that is for cases in which the whole curve is transposed
above or below the 50% level. Such cases are to be found in the results and Fig. 7
gives the curves for two such patterns, A and B. For pattern A, even with the
smallest duration ratio, there is a majority of noun judgments and for pattern B, the
greatest duration ratio still produces a substantial majority of verb judgments. These
two frequency patterns will, obviously, sound to the listener like two common English
intonation patterns in which the fall normally occurs in the stressed syllable and it is
not surprising that they should influence stress judgments so strongly. A similar
effect is to be found for patterns J and M, which are functionally similar to A and B.
148 Experiments in the Perception of Stress
TABLE
1.
D. B. Fy 149
0’
,
1 I I i r
1 2 3 4 5
increasing ratio vI/v~
The range and the mean of noun judgments for all patterns are given in Table 1, and
it will be seen that the range for J is 49 - 95%, that for M is 3 - 49%. The influence
of fundamental frequency change is not, however, confined to. patterns giving rise
to a familiar intonation. Both E and F produced an un-English intonation but none-
theless evoked a large majority of noun judgments because of the inflection in the
first syllable.
150 Experimoris in the Perception of Stress
A wide variety of patterns was used in this experiment in the hope of answering
certain questions concerning the effectiveness of different types of fundamental
frequency variation in determining stress judgments. The stimulus words contained
three kinds of syllable: level syllables, syllables with a lipear change of frequency
and syllables with a curvilinear change. These syllables occurred sometimes as the
first and sometimes as the second syllable of a stimulus word and it was possible by
grouping the results to obtain some information on the relative power of these
syllabic patterns to influence stress judgments. If we compare patterns A and B, for
example, a noun judgment for A means that the subject heard a linear change of
frequency as stressed in contrast to a Ievel syllable. In B, a verb judgment means
the same thing. But a verb judgment for A or a noun judgment for B means that
the subject heard a level syllable as stressed in contrast to a linear change. Provided
that the five duration ratios are equally represented in the samples, we can group
sets of data together in this way and obtain some indication of the association between
types of syllable and the judgment that the syllable is stressed. The first contrast
treated in this way was that between inflected and level syllables. In all patterns that
contained both a level and an inflected syllable, 66% of all inflected syllables were
judged stressed and 33% of level syllables. This difference was highly significant
at the 1% level.
The two types of inflected syllable were compared in a similar way. For example,
patterns A and J contain an inflected first syllable, in the one case a linear and in
the other a curvilinear inflection. By comparing the number of noun judgments in
this and in similar cases we gain a measure of the relative effectiveness of the two
types of syllable. Of all syllables with linear frequency change, 62% were judged
stressed, whilst 72% of the syllables with curvilinear change were heard as stressed.
This difference is not significant.
The last comparison made in this way was between rising and falling inflections.
The intonation patterns of English involvc both rising and falling tones end the
word-pairs used in these experiments could certainly occur in contexts where noun
and verb might both be required by the sentence intonation pattern to bear a rising or
a falling tone. It would appear, therefore, that this stress judgment should be
independent of the difference between rising and falling changes in fundamental
frequency. The result obtained by grouping the data was that 61% of rising syllables
were judged stressed and 64% of falling syllables, a difference that was not significant.
A final comment is necessary on this experiment with frequency patterns. T h e
variations in frequency indicated in Table 1 should not be simply equated with
English intonation patterns. Whilst it is true that many items appeared to have a
fairly natural intonation, it cannot be assumed that this intonation was necessarily
D. B. Fry 151
the one suggested by the frequency pattern. A preliminary attempt has been made
to correlate the intonation pattern with the frequency pattern by asking several train&
listeners to note the intonation they heard in each item. It is clear from these
.judgments that a number of the vowels are so short that a change of fundamental
frequency is not perceived and the syllable is judged to have a level tone. Other
effects of this sort may appear as a result of further investigation on these lines.
CONCLUSIONS
The experiments reported in this paper represent an attempt to explore three
physical dimensions which appear to be important in determining stress judgments
in English : duration, intensity and fundamental frequency. The importance of the
duration ratio is confirmed by the fresh data presented here ;it seems that in English,
in a considerable variety of conditions, changes of vowel duration ratio can swing
listeners’ perception of strong stress from the first to the second syllable in the type
of disyllable that has been considered. There seems no reason to doubt that this
factor operates in stress judgments in other rhythmic contexts. Intensity ratio has a
similar iduence but it is somewhat less marked. The data show no case in which
change of intensity ratio caused a complete shift of the stress judgment from first to
second syllable.
Change in fundamental frequency differs from change of duration and intensity in
that it tends to produce an all-or-none effect, that is to say the magnitude of the
frequency change seems to be relatively unimportant while the fact that a frequency
change has taken place is all-important. The experiments with a step-change of
frequency show that a hi&er syllable is more likely to be perceived as stressed ;
the experiments with more complex patterns of fundamental frequency change suggest
that sentence intonation is an over-riding factor in determining the perception of
stress and that in this sense the fundamental frequency cue may outweigh the
duration cue.
In conclusion, it may be necessary to reiterate that all judgments of stress in
natural speech depend on the complicated inter-action of a number of cues.
Experiments such as those described above require a drastic simplification of the
conditions in which the judgment is made and even SO there are still a number of
factors which cannot be controlled until further work has been done in this field. The
formant structure cue s t i l l remains to be investigated and it is quite probable that
for English listeners, at least, the changes in vowel quality introduced by variations
in formant structure may prove one of the most powerful factors in determining stress.
The author wishes to thank Dr. F. S. Cooper and the staff of the Haskins
Ladoratories for their help in carrying out some of these experiments and
Dr C. A. B. Smith for suggesting methods of treating the data.
152 Experiments in the Perception of Stress
A GLOSSARY
O F SOME T E R M S U S E D IN THE
OBJECTIVE SCIENCE OF BEHAVIOR
BY WILLIAMS. VERPLANCK
Provides an empirical vocabulary in the science of human and animal behavior
Familiarizes readers with developments in the study of animal behavior
Clarifies concepts used by behaviorists and ethologists
Price $1.00
Order from:
American Psychological Association
1333 Sixteenth Street, N.W.
Washington 6, D. C .
U.S.A.
Clars 0' Molesey Ltd. (T.U.), 79 Bridge Road, East Molescy, Surrey.
Copyright of Language & Speech is the property of Sage Publications, Ltd. and its content may not be copied or
emailed to multiple sites or posted to a listserv without the copyright holder's express written permission.
However, users may print, download, or email articles for individual use.