0% found this document useful (0 votes)
183 views49 pages

Learning To. Perceive The Sound Pattern of English : Catherine T. Bestt

This document summarizes a report on how infants learn to perceive the sound patterns of their native language. It discusses how infants are initially able to discriminate all speech sounds but by adulthood can only discriminate sounds used in their native language. The report proposes that infants use detailed phonetic cues to segment words and discover their language's abstract phonological patterns. It introduces the Perceptual Assimilation Model for explaining how experience with a native language shapes perception of non-native speech sounds over the first year.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
183 views49 pages

Learning To. Perceive The Sound Pattern of English : Catherine T. Bestt

This document summarizes a report on how infants learn to perceive the sound patterns of their native language. It discusses how infants are initially able to discriminate all speech sounds but by adulthood can only discriminate sounds used in their native language. The report proposes that infants use detailed phonetic cues to segment words and discover their language's abstract phonological patterns. It introduces the Perceptual Assimilation Model for explaining how experience with a native language shapes perception of non-native speech sounds over the first year.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Haskins LAboratories Status Report on Speech Research

1993, SR-114, 31-80

Learning to. Perceive the Sound Pattern of English*

Catherine T. Bestt

Language lies at the heart of human cognitive Presumably, these accomplishments are built on
and social development. Infants, who are by the infant's prior abilities to discriminate and
definition "without language," become speaker- classify the audible properties that correspond to
hearers of particular languages within their first various levels of organization in speech, e.g., con-
few years, through their experience with the sonants and vowels (phonetic segments), rhythmic
speech of their caregivers and other significant stress patterns, prosodic phrases, and so forth.
people in their environment. The foundation for It is these perceptual abilities for handling the
the emergence of language proper is the infant's "surface phonetic structure" of speech that are the
discovery of sound-meaning correspondences in primary concern of this chapter. In particular, we
the utterances produced by those significant will focus on how the infant's experience with a
people. Social and physical context provide particular language begins to influence perception
support for the semantic meaning of an utterance, of consonant and vowel contrasts that fall outside
although determining the specific referent of an the phonetic inventoryemployed by that language.
unknown word from non-linguistic context alone Developmental changes in perception of such non-
may be no simple task (see Quine, 1960). The native contrasts can provide important insights
present discussion, however, will focus on the about the aspects of the native phonological
other side of the sound-meaning relation, the system to which infants are becoming attuned as
sound pattern itself. It still far from clear how the they gain experience with native speech. The
infant comes to recognize in the stream of central goal of this chapter is to describe and
connected speech the sequence of consonants and provide evidence for a model of how language-
vowels that may underlie the diverse pro- specific experience influences infants' and adults'
nunciations of a given word in different sentences, perception of non-native phonetic contrasts. The
by different speakers, and under different model is the Perceptual Assimilation Model of
speaking conditions (e.g., in rapid casual speech cross-language speech perception.
versus slow, exaggerated infant-directed speech). First, however, we must briefly review the basic
pattern of developmental change in perception of
non-native phonetic contrasts, and describe the
phonetic and phonological organization in spoken
Preparation of this chapter was supported by an NIH
Research Career Development award from the National language that the infant must come to perceive.
Institute on Deafness and Communication Disorders Following that introduction to speech and its per-
(DC00045). Research from my lab that is described herein was ceptual requirements, we will consider two major
also supported by NIH research grant DC00403 to the author theoretical perspectives that might be extended to
and by program project grant HD01994 to Haskins
account for language-specific developmental
Laboratories. Many thanks are due to Alice Faber, Andrea
Levitt, Steven Braddon, Janet Werker, Peter Jusczyk and changes, to provide a backdrop for the presenta-
Carolyn Rovee-Collier for their helpful comments on an earlier tion of the Perceptual Assimilation Model.
draft of the chapter, to Alice also for managing the infant lab
in my near-absence, and to a wonderful group of research Infants' perception of phonetic properties in
assistants who kept the lab running smoothly while I wrote: speech
Janet Calderon, Sandy Chiang, Ron Dewitt, Domenica
Giancolo, and Alyssa Wulf. Thanks also to Steve for doing Young infants can discriminate a wide range of
more than his fair share of child care, and to my daughters phonetic contrasts between consonants (e.g., [b]
Aurora and Vanessa for being patient. vs. [d)) or between vowels (e.g., the vowels in boot

31
32 Best

vs. book), whether or not the tested phonetic fea- (e.g., Gleitman, Gleitman, Landau, & Wanner,
tures are employed linguistically by the ambient 1988; Hirsh-Pasek, Kemler Nelson, Jusczyk,
language. But by adulthood, in fact by much ear- Wright Cassidy, Druss, & Kennedy, 1987; Jusczyk
lier in development, experience with the native & Kemler Nelson, in press; Kemler Nelson, Hirsh-
language comes to exert some rather striking ef- Pasek, Jusczyk, & Wright-Cassidy, 1989; Morgan,
fects on the perception of phonetic contrasts. The 1990). However, prosodic bootstrapping may not
experiential influence is particularly apparent for help the infant so much with segmenting sound at
perception of contrasts that are not part of the na- the word level. Broad prosodic markers do not
tive language's phonological system. As will be ex- consistently specify word boundaries in
plained more fully in the next section, the phono- continuous speech (cf. Gerken & Mcintosh, 1993;
logical system refers to the rules by which a given Jusczyk, Cutler, & Redanz, 1993), especially in
language employs certain phonetic differences as languages like French which lack syllabic stress
linguistic contrasts that can convey differences in alternation patterns like those found in English.
word· meanings. It treats certain other phonetic But word boundaries are often marked by
differences as linguistically equivalent, and yet characteristic differences in the exact way that the
other phonetic features as non-permissible alto- surrounding consonants and/or vowels are
gether even though the same features may be pronounced (e.g., aspirated [t] and reduced "uh"
used linguistically by some other language. vowel in ciP'!!sbut not in sit. RlLsS), phonetic
Mature listeners often have substantial difficulty characteristics to which even very young infants
discriminating and categorizing phonetic con- appear to be sensitive (Christophe, Dupoux,
trasts which are not part oftheir own phonological Bertoncini, & Mehler, submitted; Hohne &
system, but young infants from the same language Jusczyk, 1992). Thus, word-segmentation may be
environment have no difficulty discriminating aided not so much (or not only) by prosodic
those same contrasts. Effects of language-specific bootstrapping but more by what might be called
experience emerge in speech perception during the phonetic bootstrapping.
second half of the infant's first year, and are It is the infant's attention to this sort of detailed
clearly evident by 10-12 months for perception of phonetic information that would seem to be most
many non-native consonant contrasts (see reviews relevant to the discussion of how language-specific
by Best, 1984, 1993, in press, a; Werker, 1989, experience begins to influence perception of
1991; Werker & Pegg, 1992). consonants and vowels, also referred to as
Why and how does experience with the native phonetic segments. A basic premise of this chapter
language come to shape the perception of the pho- is that infants make use of surface phonetic
netic properties of speech in this manner? How do details to discover the more abstract phonological
infants become familiar with the sound system of properties of their native language. As will be
their native language, and how does that process described more fully in a subsequent section, the
subsequently shape perception of unfamiliar con- phonological system refers to the inventory of
sonants and vowels from languages not heard be- phonetic segments that a given language employs
fore? Infants' initial experience with their lan- to convey meaningful differences among words.
guage begins with only the surface phonetic pat- This inventory is organized systematically and
terns of spoken utterances, but ultimately they hierarchically around multiple contrasting
must use that input to develop knowledge of the phonetic features that define linguistically
underlying semantic concepts and syntactic rules important relations among phonetic segments.
of the language. Thus, the first inroads the infant The systematicity of a language's phonological
makes into discovering the systematic structure of system makes possible the vast expansion of
the language take place at some level of its sound vocabulary that takes place in early childhood,
system. Many believe that this discovery process and somewhat later serves as the linguistic
commences at the prosodic level. framework for the child's acquisition of reading
Recent research on prosodic bootstrapping-the and writing abilities. But the relation between the
notion that conversational speech (particularly surface phonetic details of utterances and the
infant-directed speech) provides converging more abstract phonological system of a language
intonational and rhythmic markers that guide is not always transparent, in part because of
infants' attention to clause and phrase boundaries contextually-determined differences in the
in speech-has made important advances in our phonetic details of consonants and vowels, and
understanding of how infants may discover the other effects such as speaker and speaking rate
boundaries of syntactic units at varying levels differences in pronunciations. Thus, in order to
Learning to Perceive the Sound Pattern ofEnglish 33

learn the sound pattern of the ambient language morpheme complete plus the negation prefix in-.
sufficiently to determine sound-meaning relations, Morphemes are comprised of one or more
the infant must begin to untangle the complex syllables, each made up of consonants and vowels,
relationship between the surface phonetics and which are defined in standard linguistic analysis
the underlying phonological system, at least to as phonological segments.
some approximation.
To provide a foundation for considering devel- Phonological patterning
opmental changes in speech perception, we will Phonological segments are the smallest units of
turn now to an overview of the hierarchical nest- the language-specific grammatical system. They
ing of linguistic information conveyed in the are themselves composed of phonetic features, the
speech signal. We will focus in particular on the matrix of articulatory/acoustic properties that
relationship between the lower-order patterning characterize the way a. given phoneme is
at the surface phonetic level of speech and the produced. These properties are described
more abstract, higher-order organization at the according to a universal set of distinctive feature
phonological level of a given language. Differences contrasts by which one segment can differ
in the sound patterns of different languages re- critically from all others (e.g., Jakobson, Fant, &
flect differences not only in their inventories of Halle, 1963; for an introduction to phonetics, see
consonants and vowels, but also especially in the Catford, 1988; Ladefoged, 1982). For example, the
patterns by which they relate phonetic details to consonants and vowels in the word incomplete
phonological structure. It is the relationship be- may be broadly transcribed to correspond to
tween phonetic details and phonological organiza- phonemic segments as /mkamplitl. However,
tion that is most germane to understanding the additional phonetic details that are present in the
effects of language experience on the perception of actual production of the word can be represented
non-native speech sound contrasts. Any theory of in a narrow phonetic transcription as [IIJk;;,mp hJifl].
the acquisition of native language sound patterns, The narrow transcription indicates that the /n!
and of the perception of those patterns, must be preceding the /k/ is actually produced as a
able to take into account the sound structure of nasalized constriction [I)] near the soft palate at
the spoken message and the observations of lan- the back of the mouth, rather at the alveolar ridge
guage- and dialect-specific differences in that behind the upper front teeth [n]. The vowel in the
structure. second, unstressed syllable is the reduced vowel
schwa [;;,], which is somewhat like the "uh" ([A]) in
The structure of the spoken message butter, but shorter in duration. The /p/ is produced
When we convey a spoken message to a listener, with breathy aspiration [ph], which causes the
the utterance we produce via the audible, and to following III to be devoiced 0]. And the tongue-tip
some extent visible, articulatory movements of our closure for the final /t! is not audibly released at
vocal tract is organized according to the multiple the end of the word [fl]. (For an introduction to
levels of linguistic structure of the language we phonology see Kenstowicz & Kisseberth, 1979).
speak (the property of dual structure: Hockett, The phonology of a language is the set of
1963). That is, the spoken utterance concurrently systematic constraints the language places on the
reflects the organizing of sound into words, the sound patterning of its consonants and vowels. To
syntactic organization of those words into the begin with, every language employs but a subset
larger units of noun, verb, or other phrases, and of all humanly-producible consonant and vowel
the superordinate syntactic organization of sounds to produce minimal phonological contrasts
phrases into clauses, one or more of which may in word meanings. As an illustration of minimal
comprise a sentence. At the same time, prosodic contrast, English uses /h/ and /p/ to differentiate
organization is evident in the intonation, temporal the meaning of words that are matched in their
patterns, and amplitude changes that provide a other phonemic elements, such as bat vs. pat.
common carrier for the words at the phrase, Likewise, the vowel contrast !II-IE,.! distinguishes
clause and sentence levels, and serve to signal the minimally contrasting words pit-pet (/pIt/-
linguistic stress, pragmatic emphasis and /pet!). However, modern English lacks the throaty
emotional tone. But there is also nested structure fricative at the beginning of the Yiddish word
if we look in the opposite direction, below the level chutzpah.
of individual words. A word is composed of one or The phonology of a language also includes con-
more units of meaning, referred to as morphemes, textually-determined allophonic variations in the
e.g., the word incomplete contains the stem phonetic details of a given phoneme produced in
34 Best

different surrounding contexts. For example, in guage is often far from a simple, transparent
English the Ipl in pan is produced with aspiration mapping.
and a long lag before voicing starts after the To address how infants might learn aspects of
release of the bilabial closure, denoted the language-specific phonology from ambient
phonetically as [ph]. But the Ipl in spa n is speech, and how that might influence their
produced with a much shorter voicing lag and perception of non-native phonetic contrasts, we
without aspiration, denoted as the allophone [pl. need to briefly review next how languages differ in
However, this difference in pronunciation does not the ways they relate the phonetic details of speech
signal a phonological contrast in English. to phonological structure.
Phonological analyses of the range and constraints
Language differences in phonology and
on allophonic variants reveal which one is the
underlying phonological form, and which others phonetics
are the variants of that underlying form. In this An obvious way in which the sound patterns of
case, [p] is a variant of underlying [ph]. There are languages differ is in their inventories of
no English minimal word pairs whose meaning is phonological segments and minimal contrasts.
differentiated phonologically solely by the Ip/_/phl Although certain basic segment types seem to be
difference. universal, or nearly so, across the inventories of
Certain other contextually-determined effects on the world's languages, other sounds and contrasts
the phonetic details of segments in a spoken are present only in some languages and are absent
message result from more global changes, such as in others. Among the universally-shared
different speech rates and styles. To illustrate, the phonological segments are the stop consonants Ipl
phrase did you eat •. .in slow, careful speech is and It! and the vowels "ah" as in father, "ee" as in
typically produced with two clear Idfs and the "ih" see, and "00" as in boot. 1 Language differences in
vowel in did, clear "y" and long "00" sounds for phonological inventories are numerous, however.
you, and a clear "ee" and It! in eat.. But in rapid, For example, the Ill-/rl contrast found in the
casual speech the phrase may become inventory of English is absent from many Asian
dyeat... where the initial Idl and vowel in did have languages, such as Japanese and Korean, as well
been omitted, the final Idl seems to combine with as from a number of other languages; indeed, the
the "y" of you to form a "j" sound, and the long "00" English Irl is quite rare across languages.
has become an unstressed schwa [<l] (e.g., Oshika, Similarly, the English vowels in hook and hawk
Zue, Weeks, Neu, & Aurbach, 1975; Browman & respectively, are lacking in Spanish, Native
Goldstein, 1990a). Hawaiian and many other languages. Conversely,
Languages also have phonotactic constraints on English lacks the click consonants of Zulu and
the distributional patterns of consonants and other southern African languages, as well as the
vowels, including permissible sequences in dental versus retroflex stop consonant contrast IqI-
syllables and permissible positions that particular 14" of Hindi (our Idl has a tongue-tip position in-
sounds can occupy within a syllable or word. For between the Hindi sounds). English also lacks the
example, Ispa/ and Imopl (mope) are permissible front rounded vowels ly/-/01 found in French,
English syllables but */psa/ and */mpol are not. German, Swedish and elsewhere.
Also, English words may end but may not begin The neat and straightforward description of
with the velar nasal 1f]1 (as in song) or may have language differences in phonological inventories is
an internal voiced palatal fricative "zh" (as in seemingly complicated, however, by the fact that
meatiure) but may not begin with this sound. languages also either require or permit certain
Thus, the phonological system of a language context-conditioned or free allophonic variants for
refers to the underlying linguistically-defined re- at least some of their phonemes. For example, the
lations among the consonant and vowel sounds it French Irl is characterized as a voiced uvular trill
employs. The language's use of consonant or vowel at the back of the throat, yet context-conditioning
differences for contrastive differentiation of word causes its surface phonetic form to become a
meanings, the allophonic patterning of those voiceless uvular fricative when it follows a
phonemes, and their phonotactic distributional voiceless consonant, e.g., as in quatre, the French
constraints all reflect abstract invariant properties word for "four." Permissible differences among
that underlie the surface phonetic details of speakers also result in other freely varying
spoken utterances. As should be clear from these allophones.
examples, the relation between the phonetic de- Allophonic variations may even, at times,
tails and the phonological organization of a lan- appear to obfuscate claims that one language
Learning to Perceive the Sound Pattern ofEnglish 35

lacks a particular phoneme or contrast found in Flege, 1984, in press; Flege & Eefting, 1987; Flege
another. To illustrate, neither the dental nor the & Fletcher, 1992).
retroflex stop that contrast in Hindi are found in Cross~language identity and similarity are
the English phonological inventory. Our Id/ is corroborated by the phonological forms speakers
underlyingly a voiced alveolar stop [d]. However, a use when learning a new language with
dental stop does occur phonetically in English unfamiliar pronunciations, as when a Spanish
speech, as an allophone of Id/ that is context- speaker's initial pronunciation of English pit may
conditioned due to coarticulation (overlapping sound like beet because he or she uses the Spanish
production) with adjacent dental sounds. The unaspirated Ipl and an "ee" vowel because Spanish
dental allophone occurs when Id/ is adjacent to a has no "ih" sound. Cross-language segmental
dental fricative e.g., in birthday. These similarities are also suggested by the phonological
observations might seem to belie the claim that forms speakers of one language give to loan-words
only Hindi, and not English, has a dental stop in from another language (see also Silverman, 1992).
its phonological inventory. The important point, For example, the French calorique, pronounced
though, is that this dental form does not contrast with an unaspirated Ik/, an uvular trilled Irl and
with Id/ in English. It is a context-conditioned the vowels "ah," "0" and "ee," has been adopted
allophone of Id/ and is heard as Id/. The adjacent into English as caloric and pronounced with an
dental segment is perceived as the source of the English aspirated Ik/, English Ir/, and unstressed
variant property (see also Fowler & Smith, 1986; schwa (~] in the first and final vowel positions. 2
Kent, Carney, & Severeid, 1974; Krakow, Beddor, Moreover, similar sorts of phonological
Goldstein, & Fowler, 1988; Mann, 1980, 1986; substitutions are seen in pidgins and creoles,
Whalen, 1983), apparently even by young infants inter-languages which result from social contact
(Fowler, Best, & McRoberts, 1990). between two independent language groups, and
The discussion about language differences in which often derive only from spoken forms at least
allophonic patterning prompts consideration of a in their early stages (e.g., Holm, 1988; Romaine,
similar phenomenon in which different languages, 1988). Finally, the patterns by which listeners
and different dialects of a single language, can label non-native segments, not surprisingly,
differ in their phonetic realizations of the "same" provide further converging evidence about cross-
phonological segment. If the phonetic details language segmental similarities, as will be
differ, then on what basis is the underlying described later.
segment in such cases "the same," in at least some By comparison to the cross-language case, the
crucial way? This question is more problematic for segmental identity issue seems relatively
the cross-language case, but several observations straightforward for the cross-dialect case, at first
suggest that underlying identity of segments, or at glance. For mutually intelligible dialects, the
least close similarity, may often be a reasonable vocabulary, the grammar (phonology, morphology,
assumption nonetheless (see also Flege, 1987, in syntax) and even the written forms are typically
press). For one thing, the phonetic feature matrix nearly identical between dialects. In this case,
that defines a given phonological segment there is no doubt about phonological identity
includes only those features critical for between corresponding segments in the dialects,
distinguishing it from other segments in a even though they differ in some phonetic details.
language's phonology. Allophones are Here again, listeners nonetheless detect dialectal
encompassed in the definition because they vary accent easily, and show differential sensitivity to
on non-critical features. Thus, English and phonetic differences among segments in the native
Spanish both have the phonological segment Ipl vs. non-native dialects (see Faber, Best, &
even though it is often aspirated in English but DiPaolo, 1993).
never in Spanish. It is important to note, however, Numerous examples of cross-dialect phonetic
that listeners are quite sensitive to foreign accent variants of underlying segments can be found in
in their native language, suggesting that listeners languages. On portions of Long Island in New
may nonetheless detect such sub-phonemic York, words such as long are pronounced with a
differences. Findings indicate that while some of final Ig/, although the final Igl is omitted
the sensitivity to foreign accent is attributable to elsewhere in the U. S. To take an example from
prosodic differences, for at least some cross- another language, the nasalization of vowels in
language segmental similarities the phonetic Canadian French commences later into the vowel
differences between the corresponding native and than in continental French (van Reenen, 1982).
non-native segments are also perceptible (e.g., Paralleling another between-language difference,
36 Best

one dialect may lack a phonological contrast found such as Ip/, ItI, /kI can occur in initial but not in
in other dialects of the same language (or found final positions in Mandarin Chinese words and
historically in the language), a situation termed a syllables; in English they can occur in either
"merger" of the contrast. For example, English position. Finally, English phonotactics disallows
speakers from Canada, western U.S., and areas of certain phoneme sequences in syllables that are
the midwest U.S. fail to produce or reliably label nonetheless permissible in other languages, such
the "aw"-"ah" difference, as in hawk-hock, a vowel as */psa/ (e.g., in Greek), */mpol (e.g., Chaga), and
difference that is maintained in the northeast U. */dzva/ (Polish).
S. (e.g., Di Paolo, 1992). Similarly, Texans have In addition, the types of phonological
merged the "ih"-"eh" difference before In/, alternations present in one language may be
pronouncing pin and pen as homonyms (both like absent from others. As an example, Turkish uses a
pin). phonological principle of vowel rounding harmony
Sometimes, a merger is not absolute, but rather within words, whereby the vowels in a word must
is a "near-merger" (see Faber, Di Paolo, & Best, agree in whether they have lip-rounding (e.g., "0"
submitted; Labov, 1974; Labov, Karen, & Miller, and long "00") or not (e.g., "ee" or "ih"). Thus, the
1991). In a near-merger, a phonological contrast possessive form of dere, the word for river, is
found elsewhere in the language is no longer deresi. but the possessive form of boru, the word
evident in a given dialect, but productions of the for pipe, is borusu.. English, of course, does not
near-merged sounds still show reliable acoustic require any sort of vowel harmony. Other
differences and/or the contrast reappears in a languages have a rule of vowel epenthesis to
subsequent sound change in the dialect. One such maintain a regular pattern of consonant-vowel
historical reversal occurred in early Modern alternation, whereby a vowel is inserted between
English. The vowels in words like meat-mate, any adjacent consonants. For example, pluralizing
which had merged earlier, later re-established the Chuckchee word for riverwejem by adding the
different pronuciations when the meat class but plural morpheme -ti results in wejemet and not
not the mate class vowels merged with the vowel *wejemti because the Iml and ItI must be
in words like meet (the meat-meet merger still separated by a vowel (the final i is deleted
stands today) (Labov, 1974). As an example of through a separate phonological rule). As a final
near-merger in current American English, Irl is example, some dialects of Spanish have a rule of
dropped after "ah" in some Boston dialects. Thus, spirantization by which voiced stop consonants fbi,
word pairs such as cod-card are produced as near- Id/, Igl become voiced fricatives following a vowel,
homonyms (Costa & Mattingly, 1981). A similar as in the pronunciation of nada, the word for "no,"
effect is found in many dialects of British English. with a dental fricative instead of a Id/. It is
A near-opposite pattern occurs in Brooklyn, where interesting to note that the early words of young
speakers add /r/-color to the "aw" sound, English-learning children often display
pronouncing sauce like source (Labov, Yaeger, & phonological constraints that are' absent from
Steiner, 1972). In Albuquerque and the Salt Lake adult English, but similar to rules found in other
Valley, vowel pairs such as "ee"-"ih" and long "00"- languages. For example, complete vowel harmony
short "00" (as in boot-book) show near-merger in is evident in "baba" for bottle and "dada" for
the context of a following Ill. That is, word pairs daddy, while vowel epenthesis is evident in
such as pool-pull and heel-hill are pronounced as "buhlue" for blue. However, children's early
near-homophones (Di Paolo & Faber, 1991; Faber, phonologies sometimes also display other
1992; Labov, Yaeger, & Steiner, 1972). constraints that are seldom if ever seen in adult
To return to cross-language differences, phonologies, such as the childish consonant
languages often differ in the phonotactic harmony constraint by which doggy is produced as
constraints they place on the sequences and word "dawddy" or ducky as "gucky."
positions permitted among the segments in their Language differences in phonological
inventories. As an illustration, English does not inventories and in the phonetic properties of
permit the "zh" sound word-initially, but a identical or similar phonological segments are the
number of other languages do, as in the French primary aspects of phonology with which we will
word for magazine,journal, and the Russian word deal in the remainder of the chapter. These are
for woman, zhenshchina. Likewise, English the aspects of speech most likely to be relevant to
disallows "ng" ([I)]) in word-initial position, but considering the lowest-level invariants of native
that position is allowed in Vietnamese, as in the language structure that infants may initially
name Nguyen. On the other hand, stop consonants recognize in the consonants and vowels of the
Learning to Perceive the Sound Pattern of English 37

ambient language. But how is it that the infant dissimilarities vis a vis native phonological
moves from the surface phonetics to the categories. The model is based on the principles of
underlying phonology? And how might the infant's information pickllp and perceptual learning put
progress on this front be reflected in changing forth in the ecological theory of perception, as
perceptual responses to non-native phonetic applied to .listeners' recognition of language-
patterns? specific relations between surface phonetic details
and the llnderlying phonological principles that
On accounting for developmental changes in have been characterized by lingllistic research.
perception of phonetic information The model will be discussed in light of recent
Two comprehensive, but radically different, cross-language perceptual findings with infants
theoretical approaches stand out in the scientific and adults, from my own and others' laboratories.
literature as providing possible accounts of how In addition, PAM's implications for the
infants become attuned to the phonetic properties development of phonological knowledge about the
of their native language and begin to sort out the native language will be considered.
phonetics-phonology relations. The first approach Let us turn now to our evaluation of Chomsky's
is Noam Chomsky's linguistic theory of the proposal about language acquisition, and of the
grammatical structure of language and of its Gibsons' theory of perception and perceptual
implications for language acquisition. Chomsky's learning. This discussion provides the grollndwork
premise of an innate Language Acquisition Device for PAM.
(LAD) is probably the most well-known and
widely-accepted nativist perspective on language Chomsky and the Language Acquisition
development. It is probably less widely known Device
that his LAD was meant to apply to phonological To set the stage,. consider a quote from
as well as syntactic processes. The second is a Chomsky's Language and Mind (1972), which
psychological theory that is rarely applied to illustrates his reasoning about the need for a
language or its development, James and Eleanor language acquisition device. This particular
Gibson's ecological perspective on perception. passage was chosen because of its emphasis on the
Their notion of perception as information pickup role of the LAD in phonological development.
would suggest, as an alternative to an innate
"[W]e can provide an explanation for a certain
linguistic device, that perceptual learning may be
aspect of perception and articulation in terms of a
the means by which language experience affects
very general abstract principle, namely the
perception of native versus non-native phonetic
principle of cyclic application of rules. It is
information.
difficult to imagine how the language learner
To provide the foundation and rationale for the
might derive this principle by 'induction' from
Perceptual Assimilation Model of language-
the data presented to him. In fact, many of the
specific effects on speech perception to be
effects of this principle relate to perception and
presented in the subsequent section, this section
have little or no analogue in the physical signal
of the chapter will critically examine Chomsky's
itself, under normal conditions of language use,
and Gibson's theoretical approaches. It will be
so that the phenomena on which the induction
argued that while Chomsky's theory has provided
would have been based cannot be part of the
important insights about the grammatical
experience of one who is not already making use
structure of language, including its phonological
of the principle.... Therefore, the conclusion
properties, some of his basic claims about the
seems warranted that the principle of cyclic
phonetics-phonology relation have not been
application of phonological rules is an innate
supported by subsequent work in phonology. More
organizing principle of universal grammar that is
important, difficulties with his nativist
used in determining the character of linguistic
perspective on development lead me to reject that
experience and in constructing a grammar that
view as an approach to understanding the
constitutes the acquired knowledge of language."
development of language-specific effects on speech
(Chomsky, 1972; p. 45)
perception, in favor of the perceptual learning
approach outlined by the Gibsons As indicated, a core premise of Chomsky's
Following this theoretical discussion, PAM will theory is that humans possess an innate biological
be developed as a perceptual learning account of specialization for learning language. This
listeners' perception of non-native contrasts specialization is devoted solely to determining the
according to their phonetic similarities and specific grammatical structure of the native
38 Best

language, within the innately-specified "The hearer makes use of certain cues and
constraints on possible human grammars, on the certain expectations to determine the syntactic
basis of spoken input. The biological device, the Structure and semantic content of an utterance....
LAD, is endowed with the universal grammar, A person who knows the language should 'hear'
that complement of grammatical functions found the predicted phonetic shapes.... Notice,
universally across languages. Thus, it includes the however, that there is nothing to suggest that
mechanisms that generate the language-specific these phonetic representations also describe a
rules by which the surface phonetic physical or acoustic reality in any detail.. ..
representations of utterances are derived from the Accordingly, there seems no reason to suppose
underlying deep structure, or abstract phrasal that [even] a well-trained phonetician could
organization of intended meaning. Cross-language detect such contours with any reliability or
similarities in the structure of children's early precision in a language that he does not know ... "
grammatical constructions, their common (Chomsky & Halle, 1968, p. 24-25)
phonological simplifications in pronouncing early
words, and the disparity between those childish Thus, Chomsky and Halle posit that a listener's
constructions and the grammars of the adult perception of phonetic patterns is determined by
languages, are taken as evidence for an innate the phonological component of the specific
biological specialization for language acquisition. grammar of his or her native language, once the
The LAD makes possible the child's construction listener knows the language. But if only a person
of a representation of the grammatical system of who knows the language hears the phonetic
the native language, which includes the shapes predicted by the grammar-the
phonological rules by which sound and meaning meaningful contrasts and phonetic equivalencies
are related, as can be seen in the following quote. within its phonological component-then how
should those same phonetic patterns perceived by
"[T]he child constructs a grammar-that is, a
someone who does not know the language? More
theory of the language of which the well-formed
specifically, How does perception of the phonetic
sentences of the primary linguistic data constitute
details of an unknown language differ between a
a small sample.... A child who is capable of
listener who knows at least one language (i.e.,
learning language must have (i) a technique for
knows a different language-specific grammar) and
representing input signals, (ii) a way of
a listener who has not yet learned a first language
representing structural information about these
(i.e., does not yet know a particular grammar)?
signals, (iii) some initial delimitation of a class
How, indeed, does the first language learner
of possible hypotheses about language structure,
acquire the native phonology, based on the spoken
(iv) a method for determining what each such
input from his or her language environment?
hypothesis implies with respect to each sentence,
The answer to the last question, according to
(v) a method for selecting one of the
Chomsky, is that the LAD helps young children to
(presumably, infinitely many) hypotheses that
determine the language-specific grammatical op-
are allowed by (iii) and are compatible with the
erations that relate the surface phonetic forms of
given primary linguistic data." (Chomsky, 1965,
p.25-30)
native utterances to their underlying phonologi-
cal, syntactic and semantic representations.
Although his work on syntax is more extensive Because young infants innately possess the set of
and more widely known outside of linguistics than universal phonetic features, they should perceive
his work on phonology, it is important to note that the full range of possible surface phonetic con-
Chomsky considered the phonological patterning trasts in non-native as well as in native speech. In
of a language to be a component of its grammar. this way, they remain open to learning whichever
Therefore, the endowment of the LAD also had to language is presented to them. But why, then,
include the universal set of phonetic features-the don't adults and older children also perceive the
full range of possible speech sound features from universal plionetic features in non-native speech?
which all languages select a subset for the surface The brief treatment of this issue in SPE points to
phonetic representation of utterances. The next the answer. It cannot be that mature language
. quote, from The Sound Pattern of English users have somehow lost the universal phonetic
(Chomsky & Halle, 1968-henceforth referred to features with which they were born. Rather, it
as SPE), describes the predicted effects that must be that for them the language-specific
knowledge of a particular language should have grammatical rules they have come to possess nec-
on the perception of phonetic features in speech. essarily translate the surface phonetic features of
Learning to Perceive the Sound Pattern of English 39

utterances to the underlying phonological :repre- details for specific dialectal or allophonic variants
sentations that are in accord with the grammati- ofagiven segrtlent. The latter sorts of detailed de-
cal principles of their language(s). That is, once scriptions might be provided (by phoneticians) to
the child has determined the rules of the lan- fully characterize allophone-specific, dialect-spe-
guage-specific grammar, slhe will "hear" the pho- cific, or even language-specific properties of utter-
netic shapes predicted by the phonological compo- ances. :aut these would not be part of the lan-
nent ofthat grammar. guage-specific grammar, and so are not essential
This process would not constrain young infants' descriptions of phonological segments, which are
perceptions because they have not yet accrued abstract. Phonological segrtlents represent the
sufficient language input to determine the under- functional patterning of sound by the language's
lying language-specific phonological representa- grammar, and therefore are blind to allophonic or
tions of the ambient language's grammar. The dialectal differences, which are phonologically
LAD and its universal grammar are, nonetheless, equivalent in the underlying representation.
present and operating even in the young infant. It is important to point out, however, that this
Its function in phonological development at this segmental or linear view of phonology as
early stage is to construct the underlying gram- propounded in SPE has largely been supplanted
mar of the phonological component of the lan- more recently by nonlinear or autosegmental
guage by generating and testing hypotheses that phonology (e.g., Archangeli, 1988; Archangeli &
could account for the observed patterning of the Pulleyblank, in press; Clements, 1985; Keating,
surface phonetic details in ambient speech. 1988,1990; McCarthy, 1988; Prince & Smolensky,
To understand how this was expected to take 1993; Sagey, 1986; for an introduction to
place, we must briefly examine Chomsky and autosegmental phonology, see Goldsmith, 1976).
Halle's basic assumptions about how phonetic de- The nonlinear approach has developed in response
tails relate to phonological representations. The to several difficulties with the classic linear
classic view of SPE was that each consonant and model's handling of certain aspects of phonological
vowel in an utterance is a discrete segrtlent, repre- patterns and phonetic implementations across
sented phonologically as a feature matrix of all languages. For one, the SPE claim that all
and only those phonetic features that distinguish features are binary fails to account for certain
it from all other segments in the language's inven- phonological processes; the nonlinear approach
tory. The role ofthe phonological component of the instead recognizes multivalent settings for certain
grammar is to assign a language-appropriate pho- phonological features. For another, the exclusively
netic feature matrix for the surface structure of segmental domain of the SPE model failed to
each utterance generated by the syntactic compo- coherently incorporate certain effects of stress
nent of the grammar. Thus, the phonological patterns, intonation, and syllable structure
mapping to phonetic features is a part of the lan- (phonotactics) on segmental properties. These
guage-specific grammar. But the phonetic features effects are handled in nonlinear accounts by
are assumed to be binary, abstract, and timeless assuming instead that segments, stress, tonality,
representations, even though their physical artic- and syllable organization are distinct but
ulatory instantiations extend over time and space interacting subcomponents of the phonology (e.g.,
and show graded variability. That is, each static Ito, 1986; Leben, 1978; McCarthy, 1986, 1989;
phonetic feature in a segmental matrix has only a Pierrehumbert & Beckman, 1988).
positive (+) or a negative notation (-); the values Another common phonological pattern is that
for all features hold absolutely and concurrently phonetic features of one segment often "carry
in a segmental representation that has no time over" to other segments in an utterance, e.g.,
dimension. These static, binary feature specifica- vowel harmony, context-conditioned allophones.
tions of the surface phonetic representation are Because SPE assumed phonetic features are
automatically translated into the continuous, linked to individual phonological segments, these
scalar articulatory details of real utterances, with phenomena required a proliferation of rules for
temporal and spatial extent, by the universal moving phonetic features between segments. In
grammar. That is, the translation to physical ar- nonlinear phonology, the effects follow
ticulations is not part of the language-specific automatically from an assumption that all
grammar. For these reasons, phonological features are independent of specific segments,
representations do not incorporate all of the actual with possible associations to one or more
articulatory details associated with particular segmental "slots" (e.g., Cohn, 1990; Goldsmith,
physical instantiations, such as the full range of 1976; Inkelas & Leben, 1990; Kahn, 1980).
40 Best

Finally, language- and dialect-specific rules of the grammar itself and the interaction of
differences in productions of segments with these rules. At a deeper level, these same
identical phonetic feature specifications call into phenomena are explained by the principles that
question the SPE argument that articu~ato~y determine the selection of the grammar on the
implementation of phonological representatIons IS basis of the restricted and degenerate evidence
automatic and universal, suggesting instead that available to the person who has acquired
articulatory details are part of language-spe~ific knowledge of the language, who has constructed
grammar (see Fourakis & Port, 1986; Keatmg, for himself this particular grammar." Chomsky,
1988, 1990a, b; Mohanan, 1986). For example, the 1972,p.27)
ejective stop /p'/ is released later and henc~ more Chomsky asserts in numerous places in his
forcefully in Navajo than in Quechua (Lmdau, writings that the spoken input from the language
1984)' and nasal vowels have more delayed
environment provides inadequate information
nasalization in Canadian French relative to about the underlying grammar of the language for
continental French (van Reenen, 1982). the child to apprehend that grammar directly. As
Although it has gone beyond the SPE mode~ in the argument goes, each utterance of adult models
handling certain phonetic and phonologIcal offer the young child only an incomplete glimpse
patterns, however, the nonlinear ~pproach ~as of the grammar of the language; some utter~ces
apparently retained the other basIc theoretIc~1 are even ungrammatical. Moreover, caregIvers
premises of SPE. The nonlinear approach stIll generally fail to provide the sort of negative
assumes that phonological features are abstract evidence that would unequivocally refute any
and timeless. Moreover, nonlinear phonology incorrect hypotheses the child might entertain
proponents have had very lit~le to sa?, about about the grammar of the language (e.g., Marcus,
ontogenetic development, certamly n~thmg .t~at Pinker, Ullman, Hollander, Rosen, & Fei, 1992).
differs substantively from Chomsky s natIvIst
In short, the input is a sample of utterances that,
assumptions (e.g., Archangeli & Pulleyblank, .in individually, are incomplete (and consequently,
press). That is, the nonlinear approaches ~etam,
sometimes ambiguous) reflections of the
either tacitly or explicitly, the notion of an mnate
underlying grammatical system, and that,
language acquisition device containing a universal
collectively, presents but a tiny subset of the
grammar, with universal phonetic features.
infinite grammatically acceptable sentences that a
However those unquestioned assumptions, par-
native speaker-hearer could automatically
ticularly ~ertain assumptions underlying the
understand and produce.
posited innate linguistic device, raise some vexing
Thus, the input utterances are taken to be
problems. In-depth critiques of Chomsky's general
informationally inadequate to specify the
theoretical framework have been offered from a
grammar completely and uniquely. Therefore, the
linguistic perspective by Derwing (1973) and
reasoning proceeds, the child must innately
Sampson (1980), and from a psychological per-
possess a specialized device to construct a model
spective by Bohannon, MacWhinney, and Snow
of the grammar and test hypotheses against this
(1990), among others (see special issue of
input. Because this sort of data base has the
Developmental Psychobiology, 1990,23(7), for de-
potential to permit a large number of logically
bate on both sides of the innateness issue. For the
possible alternative descriptions of a grammar,
purposes of the present discussion, we will focus
innate constraints on the forms of permissible
on one of those problematic assumptions from
grammars are posited to be built into the LAD.
Chomsky's claims about the LAD, exemplified in
Although these arguments have been developed
the following quote. The notion it conveys, that
primarily to account for acquisition of syntactic
the input from the environment is inadequate in
processes, it is presumed that phonology is subject
itself to directly specify the grammar of a lan-
to the same general principles as syntax. The
guage to a learner, characterizes a broader epis-
surface phonetic input inadequately specifies the
temological paradox of historical concern to epis-
underlying phonological system, therefore
temologists and perception theorists.
phonological acquisition must depend on innate
"The native speaker has acquired a grammar mechanisms. In the remainder of the current
on the basis of very restricted and degenerate discussion, comments about the acquisition of
evidence; the grammar has empirical grammar refer primarily to the phonological
consequences that extend far beyond the component (see Dent, 1990, re: similar criticisms
evidence. At one level. the phenomena with of nativist claims about semantic and syntactic
which the grammar deals are explained by the development).
Learning to Perceive the Sound Pattern of English 41

Here is the crux of the paradox: The grammar of similarities among their disparate input sets are
a language, including its phonology, must be sufficient for children of a llUlguage community to
shared sufficiently well by the members of the select the same (or quite highly overlapping)
language community for them to understand each "most elegant" solutions from among the various
other's utterances. Chomsky's argument is that alternative grammars that each one privately
the child cannot get the grammar directly from generates, then surely this must mean that the
the inadequate eviden.ce provided by adult input from adults provides robust and consistent,
utterances, and so must use innate linguistic rather than inadequate, evidence about the
mechanisms to determine the grammar. But how grammar of the language. Indeed, if this be the
can a shared grammar be developed in this way, case, why must the children construct their own
individual mind by individual mind, based on private grammars at all? Why not learn the
inadequate input? How could such private grammar directly from the patterning of the
grammars ever be verified, given the presumed publicly available information in utterances, i.e.,
inadequacy of the utterances 3 which are the only learn the phonological system directly from the
direct evidence that speaker-hearers can present surface phonetic patterning of utterances?
to one another? How could those private The problems just summarized reduce to the
grammars become mutually adjusted so that their philosophical paradox inherent in indirect theories
users would be speaker-hearers of the same of perception. The paradox has been recognized
language?4 historically even by proponents of indirect
Chomsky's solution apparently is that the basis theories. Specifically, it is that if inputs convey
of this mutual adjustment is the innate endow- inadequate veridical information about the world,
ment of linguistic concepts in the universal then we cannot directly know the outer world. The
grammar that all humans share. Those innate notion that we must know the world only
concepts are employed to generate and test hy- indirectly, through deduction and interpretation of
potheses about the grammar of a language against inadequate input, comes down to a claim that we
the primary linguistic data each child receives. can perceive in the world only what we already
However, as Chomsky acknowledged, a given set know is there to be perceived. This is, of course,
of primary linguistic data usually will support the reasoning behind the standard nativist claim
multiple solutions. To keep this problem from get- for innate knowledge. And as James Gibson (1979)
ting out of hand, he proposed that the number of argued, it is circular reasoning.
potential solutions is limited by innate constraints
"Note that categories cannot become
on permissible grammatical forms. Nonetheless,
established until enough items have been
multiple grammatical hypotheses are still to be
classified but that items cannot be classified until
expected; the language learner must select the
categories have been established. It is this
"best" of the possible grammatical hypotheses
difficulty, for one, that compels some theorists to
generated to account for the observed data.
suppose that classification is a priori and that
Evaluation criteria for choosing the best among a
people and animals have innate or instinctive
set of possible solutions generally rely on concepts
knowledge of the world. The error lies .. .in
such as elegance or simplicity, which can be noto-
assuming that either innate ideas or acquired
riously difficult to define and reach consensus on
ideas must be applied to bare sensory inputs for
(see Anderson, 1985; cf. Jeffreys & Berger, 1992).
perceiving to occur. ... Knowledge of the world
Again, the handling of this problem is attributed
cannot be explained by supposing that
to innate mechanisms-the requisite linguistic
knowledge of the world already exists."(J. J.
evaluation criteria are part of the LAD. But the
Gibson, 1979, p. 252-253)
difficulties of this line of explanation remain,
compounded by the fact that the linguistic data The claim for innate ideas would also seem to be
set each individual receives will be different in at odds with the basic evolutionary principle of
particulars from that received by each other indi- natural selection, dependent as that principle is
vidual, even within the same community. Given on the organism's fit to an ecological niche. That
this fact, how would the individual children of a is, a species' survival is optimized when its·
language community end up generating and se- physical structure and behaviors are well-suited
lecting the same, or similar-enough,5 grammars? to those veridical properties of its world that are
All normal children, and many who are relevant to satisfying its procreative and survival
exceptional in some way, acquire the language needs. I would argue that, as applicable as these
spoken to them within a few short years. If the concerns are for indirect theories of perception of
42 Best

the physical world, they apply equally to space, and that it is directly detected rather than
Chomsky's nativist model for acquisition of the being "interpreted" by innate knowledge,
phonological grammar of a language. In computation, inference, stored memories, or
particular, they are directly relevant to the arbitrary associations.
assumptions that model makes about indirect
"The evidence ... shows that the available
perception of phonetic patterns in speech.
stimulation surrounding an organism has
A fundamental problem of the indirect
structure, both simultaneous and successive, and
perception view is that it conceives of input to the
that this structure depends on sources in the outer
perceiver from the world as a series of
environment. If the invariants of this structure
instantaneous collections of stimulus features can be registered by a perceptual· system, the
which impinge on the special sensory organs (i.e.,
constants of neural input will correspond to the
eyes, ears, nose), and which inadequately specifY
constants of stimulus energy, although the one
their dynamic and substantive sources in the
will not copy the other. But then meaningful
world. Like snapshots, these inputs individually
information can be said to exist inside the
have no extension in time or space. A somewhat
nervous system as well as outside. The brain is
analogous view can be found in the nativist
relieved of the necessity of constructing such
linguistic assumptions about the language input
information by any process-innate rational
to the child, which could be characterized as
powers, (theoretical nativism), the storehouse of
"sound-bites" of language-individual utterances memory (empiricism), or form-fields (Gestalt
each of which can provide only partial evidence theory). The brain can be treated as the highest of
about the underlying grammar, including its several centers of the nervous system governing
phonological component. According to indirect the perceptual systems. Instead of postulating
perception theories, because the stimulus cues are
that the brain constructs information from the
impoverished with respect to real-world events
input of a sensory nerve, we can suppose that the
and objects, the perceiver presumably must use
centers of the nervous system, including the
additional mechanisms of brain and/or mind to
brain, resonate to information." (J. J. Gibson,
further process the sensory inputs, deduce what
1966, p. 267).
their sources must have been, draw inferences,
develop memorial associations, etc., in order to As this passage indicates, information about the
mentally construct an indirect representation of external world-about distal events, surfaces, and
the world. But how could such mechanisms ever objects-is assumed to be directly picked up from
have evolved, given that the presumed inadequacy stimulation, by integrated perceptual systems. To
of the input would make it impossible for their illustrate the perceptual system concept, the
outputs ever to be verified vis it vis the real world? retina of the eye does not gather visual
It was in response to these and other sorts of information by working in isolation. Rather, it is
concerns about indirect theories of perception and an integral part of the perceptual system for
perception-dependent knowledge in general that seeing: two movable eyes fixed in a head, which is
the Gibsons formulated an alternative, ecological attached to a body that can move to shift location
approach to perception and perceptual learning and orientation of the viewer with respect to the
(E. Gibson, 1969; J. Gibson, 1966, 1979). They external spatial layout; these components are
argued that all animals, for the sake of their neurally integrated with one another and with
survival, must know the world directly from higher centers in the brain. Thus, the perceptual
information available in stimulation. systems are assumed to have evolved to permit
active, physical exploration of the world in the
The direct realism alternative: Gibsons' service of gathering and disambiguating distal
ecological theory of perception information.
The ecological theory of perception represents Thus, the ecological approach, like the linguistic
the opposite philosophical extreme from the nativist approach espoused by Chomsky, is
nativist assumptions of Chomsky's theory. The concerned with biological specialization. However,
philosophical stance taken by the Gibson's the two views differ dramatically in their
ecological theory of perception is that of direct assumptions about the nature of biological
realism, as opposed to indirect or innate specializations-the information they handle, the
knowledge. As the quote below illustrates, way they work, and the forces behind their
ecological theory assumes that stimulation is evolution. According to ecological theory the
structured and dynamic, extending over time and biologically specialized perceptual systems have
Learning to Perceive the Sound Pattern of English 43

evolved, and continue to function, for the pick-up information about the distal sources of
of veridical information from the world. This view stimulation. As a result of this active exploratory
admits the possibility of perceptual systems being behavior of the perceptual systems, the perceiver
specialized for pick-up of information about becomes .better attuned, with increases in
specific types of distal objects or events, such as experience, to the invariants in stimulation that
the information in speech that specifies the specify the defining characteristics of specific
configuration and movements of the vocal tract events, the persisting identity of particular
producing the signal (see Best, 1984, 1993, in objects, and the higher-order commonalities
press, a, b). Such specializations may be abstractly shared by similar events Or by similar objects.
analogous to that of the human hands for grasping The transformational invariants of an event are
and manipulating objects, and the complementary those properties ofthe energy flow that remain
perceptual ability to detect the graspability and constant across the participation of different
manipulability of distal objects. Evidence for objects in that event. For example, the
primitive components of the latter abilities, and of transformational invariant of repetitive rotation
their responsiveness to the physical properties of about an axis specifies the same event of spinning
distal objects (size, distance, speed of movement) whether a top is spinning on a surface, an
is found quite early in development (e.g., von amusement park "anti-gravity" ride is spinning to
Hofsten, 1980). As for the pick-up of distal produce centrifugal force, or the wheels of a car
articulatory information in the speech signal, are rotating on their axles. The structural
Gibson summarized in general terms how and invariants of spherical shape and elastically
why this should be possible (see also Best, 1984, in deformable solid specify an identity relation-the
press a, b; Fowler, 1986, 1989, 1991): same baseball across the events of rolling,
throwing, bouncing, and juggling. Invariants can
"An articulated utterance is a source of a
also specify similarity relations among objects or
vibratory field in the air. The source is
events. The more abstract invariant of a convexly-
biologically "physical" and the vibration is
curved plane characterizes the primary similarity
acoustically "physical." The vibration is a
among the outer surface of an eyeglass lens, the
potential stimulus, becoming effective when a
dome of an enclosed sports arena, and the
listener is within range of the vibratory field. The
silhouette of an old Volkswagen "beetle." And
listener then perceives the articulation because
although the following do not reflect literally the
the invariants of vibration correspond to those of
same event, they involve abstractly similar
articulation. In this theory of speech perception,
curvilinear movement transformations: the
the units and parts of speech are present both in
slithery, winding progression of a snake, the
the mouth of the speaker and in the air between
sinewy movements of a traditional Thai dance,
the speaker and the listener. Phonemes are in the
and the wave-like motion of tall grass rippling in a
air. They can be considered physically real if the
breeze (for further discussion of structural and
higher-order invariants of sound waves are
transformational invariants, see Shaw, McIntyre,
admitted into the realm of physics." (1. 1. Gibson,
& Mace, 1974). Experience-dependent changes in
1966, p. 94)
attunement to such invariants occur through
The direct realist philosophy assumes that perceptual learning.
information from the world is a rich multimodal The ecological perspective has concerned itself
flow of temporally and spatially distributed primarily with general perceptual principles
energy patterns that are lawfully and rather than with linguistically specialized
systematically shaped by distal events and mechanisms. However, I believe it is eminently
objects. The systematic structure in this applicable to children's learning of the sound
information flow is picked up by perceptual pattern of their native language, and to the
systems-extracted, detected, discovered- concomitant effect of this learning on the
through active, physical exploration of the events, perception of non-native sounds and contrasts. If
surfaces and objects that shape the energy flow. we take an ecological view on the realm of
By shifting position and orientation with respect language, the spoken input available to the young
to the objects and the spatial layout, as well as by child is a flow of many utterances, occurring
moving and manipulating objects, the perceiver multimodally within a rich behavioral context
produces changes in the flow of stimulation that that extends over time and people. The flow of this
are systematically influenced by the exploratory linguistic and social stimulation, extending as it
actions in ways that provide rich, direct, veridical does over time and speakers, should reveal
44 Best

regularities or invariants across utterances that by experience. The claim is that the attuned per-
the infant comes to recognize as the sound- ceiver is more quickly and efficiently able to pick
organizing principles of the phonology of the up from the flow of stimulation just that informa-
language (e.g., Best, in press, a). tion to which the perceptual system has become
I have taken the ecological perspective to sensitized, as opposed to, perhaps, simply
account for how experience with the ambient increasing the speed of a cognitive search through
language comes to influence the infant's mental space. This sensitization of the perceptual
perception of non-native speech contrasts. To do system entails detection of critical distinctions
so, I will apply this perspective to linguistic among objects or events that had previously gone
insights about the sound structure of languages, unnoticed. What it is suggested by perceptual
which should form the basis for the child's learning, then, is an optimization and econo-
developing recognition of the relations between mization of pickup or extraction of critically
the phonetic properties of speech and the distinctive properties. Perceptual learning is
phonological organization of the grammar of his or probably more readily apparent for detecting
her native language. For the purposes of this abstract, higher-order invariants (such as the
chapter, we are particularly interested in how curvilinear movement invariant described earlier)
ecological principles apply to perceptual learning, than for detecting the simple, lower-order
specifically with respect to infants' and young invariants to which perceptual systems are
children's perception of the sound pattern of their innately tuned even very early in life (e.g., basic
native language. Therefore, we will turn next to color categories: Bornstein, 1979).
examine in greater depth the ecological approach These principles have been more completely
to perceptual learning. drawn out by Eleanor Gibson in her numerous
writings on perceptual learning (e.g., E. Gibson,
The ecological perspective on perceptual 1963, 1966, 1969, 1977, 1988; E. Gibson & J.
learning Gibson, 1972; J. Gibson & E. Gibson, 1955). As her
Two quotes exemplify the ecological viewpoint opening quote indicates, perceptual learning leads
on perceptual learning, the first from James to improved discrimination, but this does not
Gibson's (1979) book The ecological approach to mean simply the discrimination of smaller and
visual perception, the second from Eleanor finer stimulus differences, hence of always
Gibson's address on "Perceptual development and increasing numbers of individual stimuli. Instead,
the reduction of uncertainty" at the 18th perceptual learning entails the discovery, for
International Congress on Psychology in Moscow. specific purposes, of the critically distinctive
features of objects and invariants of events in
"The perceiving of the world begins with the
stimulation. It involves the education of attention
pickup of invariants.... [T)he theory of
for most efficient detection of the most telling
information pickup... needs to explain learning,
differences among objects and events that are of
that is, the improvement of perceiving with
importance to the perceiver. As she has argued,
practice and education of attention.... The state
the utility that critical distinguishing features and
of a perceptual system is altered when it is
invariants of events have for the perceptual
attuned to information of a certain sort. The
learner is that they reduce uncertainty among
system has become sensitized. Differences are
choices in a world that otherwise presents too
noticed that were previously not noticed.
much, rather than too little, information.
Features become distinctive that were formerly
Educated attention, i.e., a perceptual system that
vague." (1. J. Gibson, 1979, p.254)
is attuned to certain types of information, picks up
"Discrimination learning proceeds ... by
reduced stimulus information, which is selected,
discovering distinctive features of objects and
extracted, or filtered out from the larger flow
invariants of events in stimulation.... The
specifically because of its ability to critically
effective stimulus which active and educated
differentiate things that are of interest or
perception picks out is a reduced stimulus. It is
usefulness to the perceiver. Other stimulus
extracted, filtered out, whereas other stimulus
information which has no utility for
information that does not serve this purpose of
utility is ignored, i.e., not picked up.
differentiation is ignored by the educated
attention." (E. J..Gibson, 1966, pp. 10-15)
This account leaves open the possibility for re-
education of perception, because the undetected
When a perceptual system becomes attuned to a information is still available in stimulation.
particular type of information, it becomes altered Stimulus information that is irrelevant for well-
Learning to Perceive the Sound Pattern ofEnglish 45

used distinctions, and therefore has been The ecological premise is that the complex,
systematically ignored, could later prove nested hierarchy of linguistic organization, includ-
important for other new distinctions. It is ing phonological patterning, exists in the infant's
conceivable, perhaps even likely, that having first language environment. It is all there, that is, if we
learned to economize information pickup by consider the available language stimulation to
overlooking certain information as irrelevant (or span the history of utterances the infant hears,
by perceiving it as equivalent to SOme other along with the rich behavioral contexts in which
pattern of information) may make it more difficult those utterances occur. The flow of spoken utter-
to re-Iearn to attend to it later than would be the ances in context provides the infant a window on
case for a novice learning to attend to the same the patterning of the ambient language. This is
information for the first time. Ecological theory the flow ofstimulation from which infants must
has not directly addressed these possibilities. learn to recognize and abstract the invariants that
However, they are relevant for understanding specify all levels of linguistic structure. Of course,
whether and to what extent second language the infant is not initially able to detect or abstract
learners may learn to detect non-native phonetic from that flow the invariant properties specifying
distinctions that are not utilized in their native most of the levels of linguistic organization sum-
language, and in what way this may be affected by marized above. In fact, the only level of the avail-
varying degrees of experience with the native able information that the infant is likely to be able
language. to detect initially is the surface phonetic informa-
Indeed, the Gibsons did not address speech per- tion. And it is necessarily from among those pho-
ception in great detail in their primary accounts of netic details that the infant must learn to recog-
the ecological approach to perceptual learning (cf. nize the higher-order invariant patterns that spec-
E. Gibson & J. Gibson, 1972), although Eleanor ify words, syntax, morphology, and in particular,
Gibson did address certain aspects of language in phonology.
her research on reading development (e.g., E. Thus, the ecological view is that utterances
Gibson, 1971). The ecological view on perceptual provide a rich flow of information about dynamic
learning has primarily addressed the general is- speech events which extend over time, and that
sues of how perception is shaped by experience. through perceptual learning the individual
Perceptual learning entails the discovery of in- becomes attuned to various levels of invariant
variants in stimulation that reveal the structural structure available in that flow. This view
and functional properties of the source objects and suggests a radical departure from the standard
events. Often, these invariants are hierarchically assumption of discrete, timeless features and calls
nested in complex events, so that higher-order in- instead for a model of phonetics and phonology in
variants may depend on, or be derivatives of, which the crucial dynamic attributes of events in
lower-order invariants. Discovery of certain the speech world are integral to the model. The
higher-order invariants may thus be possible only ecological perspective has begun to offer
once the perceiver has learned which of the lower- alternative insights and evidence both about the
order invariants are critical to the distinction and phonetic details of speech production (Fowler,
which are not. Perhaps, for some distinctions, Rubin, Remez, & Turvey, 1980; Kelso, Saltzman,
there may even be several levels of lower invari- & Tuller, 1986; Saltzman & Munhall, 1989), and
ants supporting the discovery of a higher-order also about its phonological organization (Browman
invariant. & Goldstein, 1986, 1989, 1990a, b, c, 1992a;
Spoken language provides an excellent example Fowler, 1980; Goldstein & Browman, 1986). The
of the sort of complex organization in which latter work has offered an articulatory gestural
higher-order invariants, such as those that specify model of phonology, which we will examine next
syntactic principles, may not be detectable until as the basis for an ecological, perceptual learning
the perceiver has learned to pick up certain dis- account of language-specific effects on the
tinctive information at lower levels, such as the perception of non-native phonetic contrasts. The
critical differences in the phonetic patterns of sim- following summary is based on the works of
ilar-sounding but meaningfully different words. Browman and Goldstein cited above.
For the infant, then, learning the sound pattern of
the native language is the quintessential task of Gestural phonology
perceptual learning, i.e., discovering the multiple The tenets of gestural phonology are grounded
levels of invariant principles by which the stimu- in the spatiotemporal organization of articulatory
lus flow is patterned. gestures in speech, which are themselves
46 Best

grounded in the biomechanical organization of the pharynx (upper throat) for the "ah" vowel, which
human vocal tract. Rather than assuming abstract begins synchronous with the other two gestures
and timeless phonetic features as the atoms or but peaks later and lasts longer.
primitives from which phonological representa-
tions are built, the gestural model assumes that
the phonological primitives are articulatory ges-
nasal cavity
tures, the coordinated actions of vocal tract articu-
lators. The model organizes these gestural fea-
tures within the framework of a hierarchical ar-
ticulatory geometry based on the anatomical re-
lations among the articulators involved in speech.
The vocal tract is comprised of three relatively in-
dependent articulatory systems that are repre-
sented as separate nodes within the articulatory
geometry: the glottal system (vocal cords), the
nasal system (the velum, the valve that permits or
prohibits air flow through the nasal cavity), and
the oral system, which includes the lips and the
tongue as separate subsystems. There is an addi-
tional subordinate level in the tongue subsystem:
tongue tip Versus tongue body, whose actions are
differentiated by different intrinsic and extrinsic
muscles of the tongue. This hierarchically orga-
l'
nized set of articulators functions within the con- glottis
fines of the walls of the vocal tract, which is struc- (larynx)
tured basically as a bent tube of varying diameter,
Figure 1. Schematic lateral view of vocal tract, with major
optionally connected to a second side tube (nasal
articulators labeled and the nasal cavity identified. Many
cavity) via the open velum. The coordinated ac- of the common places of articulation, or locations of
tions of the articulators can cause constrictions at articulatory constrictions, are indicated in italics.
various locations (place of articulation) along the
vocal tract (e.g., dental, alveolar, velar, etc.) (see Thus, articulatory geometry is closely related to
Figure 1 for additional places of articulation). the anatomical structures and movement patterns
Each place can display several variations in de- of the vocal tract. This way, in the gestural model
gree of constriction, which determines the manner the phonological primitives and their physical
of the sound produced (complete closure for stop instantiations derive from a single domain
consonants, critical constriction for causing turbu- grounded in the spatiotemporal properties of real
lent airflow in fricatives, narrow constriction for articulatory events. Because of this, phonological
some vowels and for approximant consonants such representations can specifY the relative timing, or
as Iwl and Ir/, wide opening for the velum in phasing, of one articulatory gesture relative to
nasals and the glottis in voiceless sounds). another. For example, the Canadian French
Articulatory geometry is compatible, in many re- versus continental French difference in vowel
spects, the with nonlinear or autosegmental ap- nasalization that was mentioned earlier (van
proaches that have supplanted SPE phonology. Reenin, 1982) can be specified dynamically as a
Some important distinctions must be noted, how- difference in the relative timing, or phasing,
ever, between the two approaches. Specifically, between the onset of velum lowering for
gestural phonology posits phonological elements to nazalization and the peak of tongue movement for
be gestures defined by a set of dynamic equations the vowel. This characterization departs critically
describing the movement of articulators over from the phonetics-phonology relationship held by
space and time, rather than a specification of ab- classic SPE phonology and by nonlinear
stract, timeless phonetic features. To illustrate, phonologies, neither of which can phonologically
the equation set for the syllable ma describes a represent the dialectal difference phonologically,
velum opening gesture and lip closing gesture even though the nasalization difference appears to
which begin simultaneously and reach their peaks be part of the language-specific grammar in the
synchronously to produce the Im/, and a slower, two dialects. This representational inability occurs
less extreme tongue body gesture to narrow the for the latter two views because they posit that
Learning to Perceive the Sound Pattern of English 47

phonetic and phonological information exists in adducted position (critical constriction rather than
two divergent, informationally incompatible tightly closed) and produces voicing throughout
domains, one physical (actual articulations) and the word. In other words, there is no active glottal
the other only mental (underlying phonological gesture just for the fbi. In contrast, the cognate
representations). voiceless stop Ipl in gapping involves two gestures
In gestural phonology, the dynamical specifica- which must be correctly phased relative to each
tions of articulatory gestures describe change over other. Specifically, the bilabial closure must co-oc-
time in particular vocal tract variables and their cur with an active glottal opening gesture, which
associated articulators (e.g., location and degree of prevents voicing and instead permits turbulent
a constriction by the tongue tip or tongue body airflow (i.e., aspiration noise) through the vocal
somewhere along the vocal tract tube; opening of folds. The peak opening of the glottis coincides
the nasal tube by movement of the velum). The with release of the bilabial constriction; the glottis
model assumes that articulator motion is gov- returns to its default state (vocal folds together for
erned by dynamic principles of spring-like physi- voicing) after bilabial release. The Ipl example il-
cal systems,6 in which the values of several pa- lustrates a gestural constellation that corresponds
rameters of the tract variable(s) are specified: to the segmental level of traditional phonology.
mass, stiffness, damping, rest position, instanta- But gestural constellations may also describe ar-
neous position, acceleration and velocity. All tract ticulatory coordination at the level of syllables,
variables are assumed to have a resting, or de- words, prosodic phrases, etc. Analogous to nonlin-
fault, setting. The resting state is not, of course, ear phonological approaches, these nonsegrnental
specified as a gesture; gestures are active articula- levels of linguistic organization among gestures
tory movements away from the resting state. A are specified for different articulatory tiers, such
given gesture is a particular transformation of a as those representing syllable structure and stress
tract variable (e.g., complete closure of the lips) units. However, neither gestures nor constella-
that remains invariant across different contexts, tions bear a one-to-one relationship either to seg-
speaking rates and styles, and speakers. There ments or to classic phonetic features.
may also be variation in the exact articulators or Because gestures are defined by a dynamical
coordinations among articulators that are used to pattern of articulatory movements, each gesture
achieve essentially identical gestural goals. For has both an intrinsic spatial aspect and an
example, bilabial closure may be achieved by mov- intrinsic temporal aspect. This grounding in the
ing only the lips and keeping the jaw angle con- physical properties of events over time departs
stant, or by keeping the lips immobile and chang- qualitatively from the classic and the nonlinear
ing only the jaw position to bring the lips closer views of static, dimensionless phonetic features.
together (see Abbs & Gracco, 1984). Therefore, the In gestural phonology, the phasing principles
dynamical description of a particular gesture de- among the gestures in a given utterance are
fines a family of articulatory trajectories that all represented in both their spatial and temporal
achieve the same gestural target of a particular relations in a gestural score. To illustrate, a
degree of constriction at a particular location schematic gestural score for the word mob ([mob])
along the vocal tract tube. is shown in Figure 2. The abscissa represents the
Some phonological elements are composed of time line of the utterance, the ordinate represents
only a single gesture, whereas others involve a the tiers in the articulatory geometry that are
specific pattern of coordination between two or needed to display the critical gestures involved in
more individual gestures. Coordinations among that particular word. The rectangular boxes
two or more gestures are called gestural constel- represent the temporal extent during which given
lations. Let us illustrate the difference with the gestures are active for their corresponding
Ip/-fbl contrast, which in classic phonological de- articulatory tiers, or articulatory sets (e.g., tongue
scription share the phonetic features [+anterior], tip, tongue body, etc.). Inside each activation
[-continuant] and [-sonorant], and are distin- interval box, the degree of constriction achieved in
guished only on the feature [+1- voice]. But in the gesture and its specific location along the
gestural description, the voiced stop fbi in gabbing vocal tract are denoted. An American English
involves only a single bilabial closure gesture utterance of mob begins as was described earlier
(complete closure and release of constriction at the for the syllable rna. The pharyngeal gesture for
lips). The state of the glottis, or opening between the vowel ("ah") extends into the final bilabial
the vocal folds, is maintained in the default closure that corresponds to fbi.
48 Best

VELUM
identical to the English one, whereas the ejective
wide
token deviates from it in the constriction degree of
the glottal gesture, which is closed rather than
DarroW
TONGUE BODY pharyngeal wide, producing silence rather than aspiration
prior to the onset of voicing for the vowel.
A different type of Zulu contrast is between voiced
TONGUE TIP
and voiceless lateral fricatives. These gestural
constellations are produced with essentially
LIPS I closed
labial
I the same alveolar tongue tip closure and uvular
tongue body narrowing as in English Ill.

GLOTTIS

[ m a b
TONGUE
BODY
Figure 2. Schematic ~estural score for the word <mob>
[mab] using box notation to indicate activation intervals fg,NGUE 1:= I
for gestures and phasing among gestures.
LIPS

Thus far, the gestural phonology approach has GLOTTIS


been applied in detail primarily to American
English alone, but it can be extended (and in some [ <1 Q ] [ ct. Q ]

cases has been) to suggest gestural characteriza-


tions of certain similarities and differences be- VEWM
tween the gestural constellations for some non-na-
tive phonetic contrasts and contrasts found in the TONGUE DalTOW
BODY pbaryDgeal
English phonological system. A few cross-lan-
TONGUE
guage comparisons will be offered here as TIP
illustrations. However, we must bear in mind an LIPS
important caveat from Browman and Goldstein
(1992b), that any proposed gestural analysis is GLOTTIS
obviously incomplete and speculative in the
[ d Q]
absence of hard data on the actual gestural
processes involved in the utterances being Figure 3. Schematic gestural scores for the Hindi dental-
considered. The comparisons here are based on retroflex l4,oI-/qal contrast (top panels) and English Idol
currently available phonetic, acoustic, and (bottom panel).
physiological descriptions for the phonological
contrasts involved. But the schematic gestural
scores offered are necessarily speculative because VELUM
of the incompleteness of actual gestural evidence,
especially with respect to temporal extent and TONGUE
BODY p::::::W I
precise phasing of gestures. TONGUE
Figure 3 shows the Hindi dental-retroflex con- TIP
trast [qa]-[qa] and English [do], which is gesturally LIPS
most similar to both Hindi patterns. The schema-
tized Hindi gestural scores and the English one GLOTTIS

are essentially the same except that the Hindi Q ] k' Q ]


constriction locations are just anterior and just
posterior, respectively, to the English alveolar lo- Figure 4. Schematic gestural scores for the English and
cation. Recall also that English does have context- Zulu voiceless velar stop Ikhol (left) and for the Zulu
conditioned dental and retroflex allophones of IdJ, ejective velar stop Ik'oI (right).
but not in the context of an isolated [do].
Schematic gestural scores for the Zulu aspirated They differ, however, in employing a smaller con-
versus ejective velar stops [khaHk'a] are com- striction degree along the two sides of the tongue
pared to the correspondingly most similar English (against the upper lateral teeth) than for Ill.
gestural constellation, that for [kha], in Figure 4. Instead, the lateral constriction is critical, produc-
In this case, the Zulu aspirated token is virtually ing airflow turbulence analogous to that at
Learning to Perceive the Sound Pattern of English 49

the tongue tip for fricatives such as English /z/ or (Browman & GQldstein, 1989). Analogously, ges-
"zh" or voiced "th" (in that) versus lsI or "sh" or tural •overlap can account for certain cases of
voiceless "th" (in think). Thus, the Zulu lateral phonological assimilation, as when the In! in sevell
fricatives gesturally resemble both the liquid III plus assimilates to 1m! in casual speech. The fea-
and the voiced-voiceless fricative distinctions of ture-based rule is that the labial feature of the Ipl
English that involve tongue tip constrictions at spreads forward to the In!. The gestural explana-
anterior locations. Larger English gestural con- tion is that the bilabial closure gesture of the Ipl
stellations (multi-segmental) that may approxi- overlaps the velum opening gesture for the In!,
mate the patterns found in the lateral fricatives thus "hiding" the aerodynamic evidence of the .
include /zl/-/sll (paisley, slaw), "zhl"-"shl" alveolar tongue gesture for In! and producing the
(rougeless, Ashley), or voiced vs. voiceless "thl" bilabial nasal 1m! (Browman & Goldstein, 1989).
(blithely, breathless). Finally, the Zulu alveolar Cases of phonological deletion can be handled
versus lateral click consonants incorporate gestu- likewise from a gestural perspective. For example,
ral constellations that are quite dissimilar from feature~based approaches posit a deletion rule
any in English. Both have full closures at two lo- whereby the final ItJ of the first word in perfecf
cations, alveolar (tongue tip) and velar (tongue memory gets deleted, but in gestural terms it is
body). A vacuum is created in the intervening zone simply the case that the alveolar ItJ gesture gets
by drawing the tip or one side of the tongue down- hidden by overlap with the Iml of memory
ward until the suction is released. In syllabic con- (Browman & Goldstein, 1989, 1990c). Gestural
text, this is followed immediately by release of the overlap can even account for the insertion of an
velar closure. The double closure plus suction re- additional segment between other segments,
lease does not closely resemble any English gestu- called epenthesis. As an illustration, something is
ral constellation. often pronounced in American English with a Ipl
Gestural phonology can also account parsimo- between the Iml and the "th," leading feature-
niously for a wide variety of phonological based accounts to invoke an insertion rule. But
phenomena within its articulatory framework, the Ipl arises gesturally from the overlap of the bi-
using gestural primitives that have intrinsic labial closure gesture for the 1m! and the glottal
temporal and spatial dimensions, unlike static, opening gesture for the following "th" (Browman &
dimensionless phonetic features. In most cases Goldstein, 1990c). The phenomenon of metathesis,
these gestural accounts are backed by speech in which the sequential order of segments becomes
production data. For example, minimal contrasts reversed by some phonological process, has been
are two gestural constellations that are identical particularly vexing for generating feature-based
except for a critical difference in constriction rules that are powerful enough to describe the
location (e.g., fbI vs. Id/) or constriction degree phenomenon but not so overly powerful as to gen-
(e.g., fbI vs. Iwl) in the oral tier of the articulatory erate many non-occurring reversals. Such order-
geometry, presencelabsence of a gesture of the ing reversals often occur in speech errors, as when
velum (e.g., Ima/ vs. fbI) or glottis (/pl vs. fbI), etc. the rapid production of Bob flew by Bligh Bay
The tube geometry of the vocal tract also appears comes out as Blob faa by Bligh Bay. A gestural
to account straightforwardly for certain natural analysis of tongue movement for the Ills in these
classes, i.e., groupings of different types of utterances reveals evidence of the temporal
phonetic categories that nonetheless participate "sliding" or overlap of the tongue tip constriction
together in widespread phonological processes. To gesture with those preceding it in the represented
illustrate, nasals, liquids (/r/, Ill) and vowels form sequence, causing both overt and covert speech
the class defined traditionally by the [+sonorant] errors (Browman & Goldstein, 1992a).
feature, which has been difficult to define The gestural phonology model has received some
objectively. In gestural phonology, these phonetic criticism from nonlinear phonologists, as well as
types share the simple gestural similarity that some praise. On the positive side, some phonolo-
they all maintain one of the two vocal tract gists acknowledge that placing articulatory con-
pathways (oral, nasal) wide open for outward air- straints on phonological processes is advantageous
flow (Browman & Goldstein, 1989). Many allo- (see also Archangeli, 1988; Archangeli &
phonic variants can be explained as the overlap- Pulleyblank, in press), especially with respect to
ping of adjacent gestures, or coarticulation, as in better delineation of the relation between phonol-
the dental allophone for In! in te!l. themes, which ogy and phonetics (e.g., Clements, 1992; Pierre-
results from overlapping of the wide velum for In! Humbert & Pierre-Humbert, 1990). By and large,
and the dental location of the tongue tip for "th" the criticisms reflect two underlying observations:
50 Best

1) gestural phonology rejects static, timeless issues in perception. And gestural phonology, the
phonological features that differ in kind from youngest of the approaches, has also focused the
physical, phonetic realizations; 2) it does not in- majority of effort on production. Moreover, none of
voke abstract cognitive rules about phonological these phonological approaches has given any
representations (e.g., Pierrehumbert, 1990; depth of consideration to how infants and young
Pierrehumbert & Pierrehumbert, 1990; Steriade, children perceptually learn about the phonological
1990). In other words, gestural phonology rejects structure of their native language.
two central tenets held by both SPE and nonlinear To address these issues, we return to the direct
phonologies. These criticisms also suggest some realist view of speech perception based on the
partial misunderstanding of gestural phonology. Gibsons' ecological theory of perception. This view
The model does include discrete, or categorical, el- assumes that listeners perceive information in
ements at the phonological level of the task dy- speech about the distal articulatory gestures that
namics used to generate gestures (Browman & shaped the phonetic patterns (Best, 1984, 1993, in
Goldstein, 1992b). Moreover, it does distinguish press, a, b; Fowler, 1986, 1989, 1991;), Because it
between phonological and phonetic levels of repre- assumes that phonological processes derive from
sentation, but views them as macroscopic versus the same physical, dynamic domain as the pho-
microscopic descriptions of the same dynamic, netic details of actual utterances, gestural phonol-
physical domain of speech events (Browman & ogy lends itself to an ecological perspective on
Goldstein, 1990a; see also Ohala, 1990). This cross-language influences in perception, as well as
brings us back to the central claims of the ecologi- on how the infant learns the phonological proper-
cal approach, which assumes that perception must ties of the native language. Articulatory gestures
be grounded in physical reality. On that note, let would provide a common metric for both percep-
us return to the issue of how the physical proper- tion and production of speech. The interrelation of
ties of native speech are perceived by the adult perception and production is central to both
and learned by the child. speech imitation and language acquisition.
The direct realist view posits that perceivers
The ecological approach to perceptual recover information from speech, and from other
learning of speech sound-producing events, about the distal
All of the phonological approaches discussed, structures and events that produced the sounds.
including gestural phonology, have taken their This view assumes that information about
task to be the generation of a physical phonetic articulatory gestures is directly perceived in
output from the more abstract phonological speech, as opposed to being the end-product of
component of the grammar. But we began with, cognitive processing of the raw acoustic input. The
and now return to, the opposite process-how a speech signal is shaped by the structure and
perceiver, particularly a young learner, gets from movements of the vocal tract according to physical
the phonetic surface to the phonological structure laws, as indicated by the earlier quote from James
via perception. Specifically, the chapter began Gibson. Thus, evidence about articulatory
with the question of how experience with one's gestures is available to perceivers as structured
native language comes to affect one's perception of information about the speech events that
non-native speech sounds and contrasts from produced the signal. This view is not the same as
unfamiliar languages. Phonology has provided that of the wel1~known motor theory of speech
little guidance here. Although Chomsky and Halle perception (e.g., Liberman & Mattingly, 1985),
stated in SPE that a listener who knows the which posits that perceivers refer to the motor
language being spoken will hear the phonetic control of their own speech in order to perceive the
shapes predicted by the phonology, it is unclear phonetic structure of speech input. The ecological
how they would expect the phonology to handle claim is that listeners perceive the speaker's
discrepancies between the phonetic features in a articulatory gestures as such, without referring to
non-native sound and the feature matrices defined their own articulatory commands and, indeed,
by the phonological system of the listener's regardless of whether they can themselves
language. Indeed, how would it even handle produce similar signals.
perception of corrupted native speech (e.g., foreign That listeners perceive gestural information in
accented or disordered speech), or the phonetic speech is supported by cross-modal speech percep-
patterns of an unfamiliar dialect? Nonlinear tion research (see also Best, 1993; Studdert-
phonological approaches don't help much, as they Kennedy, 1993). McGurk (McGurk & MacDonald,
also have devoted minimal attention to theoretical 1976) found that when presented with audiovisual
Learning to Perceive the Sound Pattern of English 51

syllables in which the synchronized consonants in ceptual systems become attuned by experience to
the two modalities are from different categories, particular types of information; that this involves
listeners perceive a unified phonetic pattern that optimization in the pickup of relevant informa-
is compatible with both modalities, rather than tion; that it entails the discovery of critically dis-
noticing the discrepancy. That is, the two modali- tinguishing properties of distal structures and
ties apparently provide evidence about a common, events; and that this is accomplished via per-
underlying dimension such as articulatory gestu- ceivers' active search for invariants in the flow of
ral patterns. An alternative argument that the stimulation that most economically specify those
perceptual link between visual and auditory in- crucial properties. Educated attention minimizes
formation is learned by association is illogical in uncertainty about objects and events in the world,
the general case, according to the Gibsons' argu- by selecting or extracting reduced information
ments, and has been empirically refuted for the specifically for its ability to critically differentiate
speech perception case by two recent reports. things of interest or usefulness to the perceiver.
Cross-modal integration does occur for synchro- Earlier it was argued that the identity of objects
nized but discrepant consonants presented audito- and events is specified by structural and trans-
rily and tactually~blindfoldedsubjects manually formational invariants available in the flow of
felt the movements of an experimenter's silent lip stimulation over time and space. Moreover, recog-
movements, synchronized with audio recordings- nition of similarities and differences among things
although they had never had such tactile-auditory often depends on abstraction of higher-order in-
experience with speech. Yet there was no cross- variants which depend on prior detection of other,
modal integration for synchronized audio and lower-order invariants. As Eleanor Gibson re-
written syllables, in the face of the subjects' ex- marked, the critical invariants are generally rela-
tensive associative experience with the relation tional in nature, rather than isolated, independent
between text and speech (Fowler & Dekle, 1991). attributes.
In another study, young English-learning infants To consider how higher-order relational invari-
heard repetitive audio presentations of the French ants might be discovered in speech through per-
lip-rounded vowel Iy/, which does not occur in ceptual learning, I will turn briefly to some cen-
English, synchronous with side-by-side silent tral concepts developed in work on an ecological
videos of English lip-rounded long "00" and un- approach to the formation of complex coordinated
rounded "ee" (Walton & Bower, 1993). The infants skills and behaviors (e.g., Kugler, Kelso, &
preferentially fixated on the "00" video when Turvey, 1982; Saltzman & Kelso, 1987; Turvey,
hearing Iy/. Given their lack of prior experience 1980; 1990) including speech (Saltzman &
with Iyl this could not have been a learned asso- Munhall, 1989). The goal of coordination is to
ciation, but rather suggests detection of the articu- maximize the adaptability and flexibility of
latory commonality of lip-rounding across achieving some goal of action by minimizing the
modalities. number of separate dimensions that must be di-
More in-depth treatment of the rationale and ev- rectly controlled. As Turvey (e.g., 1980, 1990) and
idence for the general direct realist approach to others have argued, this is accomplished by form-
speech perception can be found in other reports ing task-specific synergies among muscle groups,
(e.g., Best, 1984, 1993, in press, a, b; Fowler, 1986, or cQordinative structures. To understand this
1989, 1991; Fowler, Best, & McRoberts, 1990 concept, consider an example commonly cited by
Verbrugge, Rakerd, Fitch, Tuller, & Fowler, 1984). ecological researchers-the task of a puppeteer
Our concern here is specifically with how infants' and the way that the construction of her mari-
and adults' perception may be differently affected onette simplifies the control of its movements. By
by experience with the native language, particu- linking the puppet's limbs with strings to a con-
larly by its phonological structure. What we per- troller bar, the puppeteer obviates the need to
ceive in both native and non-native speech ap- move each joint of each limb separately, instead
pears to depends what we've learned about the producing coordinated movements among multiple
native phonology through experience with that limbs by single movement of the controller. By
language. this means, the many degrees of freedom control-
ling the joints of the separate limbs have become
Language-specific phonetic-gestural
joined together into a coordinative structure with
properties and perceptual learning fewer degrees that must be directly controlled.
Recall the basic tenets of perceptual learning Research on locomotion indicates that coordina-
according to the ecological perspectiv~that per- tive structures account for the coordination of flex-
52 Best

ion and extension of each leg joint in proper se- utterances for invariant patterns that are of
quence during the swing of each leg, the alterna- interest or utility to them. Educated perception
tion between the legs, and the postural adjust- should therefore actively seek and extract critical
ments required throughout for maintenance of features of the coordinative structures responsible
balance. Coordinative structures show task-spe- for the gestural organization of native speech.
cific flexibility in that temporary perturbations re- These coordinative structures should include
sult in automatic, immediate compensatory ad- language-specific articulatory gestures and
justments among the coordinated elements so that constellations of phasing among gestures at all
the general goal is preserved without requiring levels in the language-from traditional segments
numerous command decisions about specific to syllables, words, prosodic phrases, etc. The
elements. information detected for the language-specific
Saltzman and Munhall (1989) provide logical coordinative structures would be higher-order
and empirical evidence that in speech coordinative invariants, consistent with the principle that an
structures accomplish the gestural goal of forming attuned perceptual system optimizes information
a constriction of a particular degree at a particular pickup by extracting a ·reduced stimulus, one that
vocal tract location, by harnessing together the minimizes the degrees of freedom that describe
specific articulators in ways that automatically the events producing the flow of stimulation.
compensate for perturbations and contextual Analogous to the coordinative structures that
variations. The language-specific gestural phasing combine articulators into the coordinative
patterns of Browman and Goldstein's gestural structures to produce gestural events, detection of
constellations are examples of higher-order higher-order invariants would automatically
coordinative structures in speech. Coordinative account for contextual variations such as speaking
structures in motor control can form and re-form, rate and style, allophonic variation due to
and operate as emergent properties of self- phonetic context, speaker differences, and so on.
organizing systems (see Madore & Freeman, 1987; Such invariants allow the perceiver to "hear
Prigogine, 1980; Prigogine & Stengers, 1984; through" lower-order variations that are
SchOner & Kelso, 1988; Turvey, 1980, 1990). irrelevant to phonetic coordinative structures in
Emergent properties of self-organizing systems, native speech. To illustrate, take the case of a
including their sensitivity to initial conditions, man saying Bob normally vs. while clenching a
have been proposed as the basis for the evolution pipe in his teeth. Bilabial closure for fbi involves
of maximal dispersion among the elements of simultaneous jaw and lip narrowing movements,
language-specific phonological inventories while the "ah" vowel involves jaw opening along
(Lindblom, 1992; Lindblom, Krull, & Stark, 1993; with tongue body movement for pharyngeal
Lindblom, MacNeilage, & Studdert-Kennedy, narrowing. When the pipe is clenched, however,
1983), as well as for the ontogeny of phonological the jaws are held in a fixed, nearly-closed position.
organization in the child (Mohanan, 1992; As a result, the speaker must accomplish the
Studdert-Kennedy, 1989). The latter proposals bilabial closure solely with the lips, and the vowel
point to the importance of viewing the native gesture solely with the tongue. The lower-order
phonology as an organized system when articulatory invariants of specific jaw, lip and
considering how language-specific experience may tongue positions at specific times would thus
affect perception of phonetic patterns that fall differ between the two utterances, which together
outside the native phonological system. permit an attentive listener to hear whether the
Insights about coordinative structures and self- speaker's teeth are clenched. But the higher-order
organizing processes, and about the importance of phonological invariant in both utterances is that
minimizing the degrees of freedom that must be bilabial closure occurs at both ends of the
separately controlled, will serve as useful utterance, and a pharyngeal narrowing occurs
heuristics for thinking about perceptual learning between the two closures. Thus, the word Bob is
of phonetic and phonological structure in native perceived in both cases (i.e., the listener "hears
speech. Indeed, they are crucial to an ecological through" the lower-order differences to detect the
approach to the issue, given the direct realist phonological structure). The higher-order
assumption that speech perception entails the description provides "reduced" information,
pickup of information about the distal articulatory relative to the lower-order one, by capturing fewer
events that produced the signal. The ecological individual degrees of freedom.
approach assumes that perceivers actively explore The perception of non-native speech sounds by
the rich flow of multimodal information in spoken the native-language-educated attention of mature
Learning to Perceive the Sound Pattern 0[ English 53

listeners would certainly be influenced by the words__sound-meaning relations~by the third


perceiver's seeking of familiar higher-order quarter of the first year. That is, the infant should
invariants. In other words, the flip side of the more easily and rapidly recognize the crucial
efficiency of extracting native higher-order gestural properties that define a given word
invariants may be an increase in difficulty of irrespective of the irrelevant variation in its
essentially "going back down a notch" to pick up specific details when it occurs in different speech
the lower-order, and therefore more numerous, contexts, is produced by different speakers, etc.
gestural details in unfamiliar non-native But perceptual learning of native gestural
categories and contrasts which are irrelevant to constellations also carries implications for
critical distinctions among native gestural developmental change in perception of non-native
constellations (for further discussion of phonetic patterns during the same period.
implications for second-language learning, see Developmental changes in perception of non-
Best, in press b; cf. Flege, in press). native sounds should be, and are, more dramatic
Although language-specific higher-order because when the infant begins to discover
invariants are present in native speech, reflecting language-specific invariants in native speech,
the coordinative structure among the distal he/shewill pick them in .native speech but will
articulatory events that produced it, most or all of often be unable to find those familiar invariants in
these are initially beyond the perceptual reach of non-native utterances.
infants. They must still discover how the lower- We turn now to the Perceptual Assimilation
order invariants of the simple articulatory Model (PAM), which I developed to account for the
components of gestures, which they are able to developmentally-changing effect of experience
detect from early on, are harnessed into higher- with a particular language on the perception of
order coordinative structures or gestural non-native phonetic contrasts (Best, 1993, in press
constellations by native speakers. Perceptual a, b; Best, McRoberts,& Sithole, 1988; Best &
learning of the critical relational properties of Strange, 1992). I began developing this model
higher-order structural and transformational several years ago in an attempt to provide a
invariants in native speech should thus entail a coherent theoretical account for a number of
progressive reduction in the quantity of stimulus observations in the literature on adult cross-
detail that must be detected, analogous to the language speech perception and on developmental
reduction in directly-controlled degrees of freedom changes in infant speech perception. Specifically,
that results from the formation of coordinative as indicated at the beginning of the chapter,
structures in motor skill acquisition (or adults often have difficulty discriminating non-
coordinated control of marionette limbs). This native phonetic contrasts, while young infants
occurs because infants actively explore utterances have no such difficulty. Before the end of the first
to discover the optimal sets of gestural invariants year, however, infants also begin to display
that specify the native language structures that difficulties discriminating non-native contrasts.
are interesting and useful to them. The latter, of However, no existing theoretical treatment offered
course, continue to change as the infant develops, a single, comprehensive explanation for 1) why,
the discovery of lower-order invariants permitting exactly, language-specific effects might occur in
the further discovery of higher-order ones. either adults or infants, 2) whether and why the
By this ecological account, then, to learn to effects might differ between adults and older
perceive the sound pattern of the native language, infants, and 3) what the effects might suggest
i.e., its phonological structure, is to discover the about the influence of phonological knowledge on
critical invariants specifying the various nested perception. Certain complexities in reported adult
levels of gestural constellations in native speech. findings would also have to be accounted for:
Learning to detect the crucial higher-order discrimination levels appear to vary among
invariants means, of course, that there will be different types of non-native contrasts, perception
developmental change in the perception of native of non-native contrasts can be improved somewhat
speech categories and contrasts. But given the through perceptual training or through second
presumed ability to detect lower-order articulatory language learning but this also depends on the
invariants early on, developmental change in the type of contrast involved, and discrimination of
perception of native patterns may be apparent non-native contrasts can be strongly affected by
mainly as increased efficiency in extraction of various task manipulations (the findings are
critical invariants. This increased efficiency may reviewed and discussed in greater detail in Best,
foster the infant's emerging ability to recognize 1993, in press a, b).
54 Best

Based on the considerations laid out in the pre- Diehl, & Buchwald, 1977; Silverman, 1992). For
ceding portion ofthis chapter, I used the ecological non-native phonetic patterns whose gestural or-
theory of perception as the foundation for develop- ganization is reasonably similar to the gestural
ing a coherent theoretical account of the observa- invariant for one or more native phonetic cate-
tions on cross-language speech perception in gories, the adult listener is likely to detect native
adults and infants. Thus, PAM is based on the gestural invariants, and the non-native sound will
ecologically-motivated assumption that efficient be perceptually assimilated to the most similar
detection of native gestural patterns in speech native category(s). At the same time, however, lis-
may guide and constrain listeners' pickup of in- teners should also detect certain discrepancies be-
formation in non-native phonetic categories and tween non-native phonetic patterns and native
contrasts. This model is unique in several re- gestural constellations. After all, they are quite
spects. First, it follows an ecological line of reason- sensitive at detecting foreign accented utterances
ing about perceptual learning rather than relying of their native language (Flege, 1984; Flege &
on innate linguistic abilities, information process- Fletcher, 1992) and non-native dialect accents.
ing concepts, or cognitive development. Second, it Note that these predictions are quite open to the
attempts to provide a unified account for both possibility of individual differences among listen-
adult cross-language perception findings and de- ers regarding which invariants and discrepancies
velopmental changes in infancy. Third, it is the are detected, and how readily.7 This is because
first to provide a detailed, coherent basis for pre- non-native gestural constellations are not, of
dicting which non-native contrasts should be diffi- course, exactly the same as the native constella-
cult to discriminate and which should be easy, and tions but only resemble them more or less, i.e.,
why. To the extent that PAM is compelling and is they display similarity relations rather than iden-
able to coherently account for the phenomena of tity relations. The resemblances are generally
cross-language speech perception in adults and in- only partial; indeed, a given non-native gestural
fants, it obligates us to give serious consideration pattern may resemble more than one native con-
to the ecological approach. stellation. Perception of the cross-language simi-
We will turn next to an overview of how PAM larities would thus ride on selective attention,
accounts for the perception of non-native phonetic which is dependent on the listener's history of
patterns by adults. For readers who are familiar perceptual learning with the native language-for
with PAM, I should point out that there are sev- example, the particular invariants one learns
eral new features, by comparison with earlier could vary with the style and breadth of native ut-
versions of the model (i.e., Best, McRoberts, & terances with which one has been engaged-as
Sithole, 1988; Best, in press a). Specifically, the well as with other languages or other dialects of
relation between assimilation of non-native seg- the native language (e.g., Chambers, 1992).
ments and discrimination of non-native contrasts For consideration of the possible ways in which
has been clarified, additional discrimination types listeners may perceive non-native phonetic
are now recognized and described, and the devel- patterns, it is useful to conceptualize the native
opmental aspects of the model are more fully de- phonetic domain as the range of vocal tract
lineated. sounds that are globally speechlike in their
gestural properties, vis a vis the types of gestures
Perceptual assimilation model and constellations employed in the native
The basic premise of the Perceptual inventory of phonetic categories (for further
Assimilation model (PAM) is that adults actively development of this concept, see Best, in press b).
seek higher-order-invariants in speech which Outside of this domain, in non-phonetic space,
specify familiar gestural constellations, whether are vocal tract-generated sounds such as coughs,
confronted with native or non-native utterances. chokes, laughs, whistles, razzes ("raspberries"),
Therefore, what they will perceive in non-native tongue clucking, squeals, etc. The latter three, and
speech, at least initially when they have had little other non-speechlike vocalizations, occur in infant
or no linguistic experience with the language in- babbling and sound play. However, many infant
volved, are the similarities and dissimilarities be- vocalizations seem at least globally speechlike to
tween the non-native gestural patterns and the adults, some being quite similar to native
familiar gestural constellations of their native categories (as in /bobo! or IdidiJ) whereas others
language's phonological system (for more tradi- sound foreign, not falling clearly in any particular
tional accounts of the related phenomena of code- native categories (e.g., for and English speaker,
switching and loan-word phonology, see Elman, the latter might include guttural sounds, tongue
Learning to Perceive the Sound Pattern of English 55

trills, etc.) (Oller, 1980; Oller & Lynch, 1992; However, the assimilation of individual non-
Stark, 1980). native segments with respect to categories in the
Analogously, there are three broad ways in native inve:ntory only touches the surface of the
which a non-native phonetic segment may be phonological component of the listener's language-
perceived with respect to the native phonetic specific grammar. Phon()logy encompasses the
domain (see Table 1). First, the perceiver may systematic functional relations among phonetic
detect some resemblance to the gestural invariant forms within a language, including distinctive
of a native category (or perhaps more than one), in segmental contrasts, allophonic alternations,
which case the non-native sound is perceptually phonotactic constraints, and other phonological
assimilated to the native category, i.e., is processes (e.g., Jakobson & Halle, 1957;
categorizable. In cases of assimilation to a native Silverman, 1992). From the ecological perspective
category, the non-native segment may be virtually on perceptual learning, the invariants that
identical to the native gestural constellation, such determine category membership differ
that no cross-language discrepancy is perceived. qualitatively from the higher-()rder relational
invariants which capture the critical differences
Table 1. Perceptual assimilation ofnon-native phonetic that define the systematic relationships among
segments. categories. Thus, perceiving category membership
can be more basic than recognizing critically
1. assimilated to a native phonetic category distinctive relationships between categories. That
a, identical to native gestural invariant: is, one can recognize a particular instance of fbi as
native sound an exemplar of the fbi category because it has a
b. reasonably similar to native invariant: complete bilabial closure and concurrent glottal
acceptable exemplar of native category vibration, without necessarily grasping that the
c. somewhat similar to native invariant. but noticeable
discrepancies:
critical difference from Id/ is constriction location.
For category membership, the perceiver may
deviant exemplar of native category
begin by extracting a set of lower-order properties
2. falls in unfamiliar region of native phonetic domain. of category members. But critical comparisons be-
outside any native categories:
tween categories depend on the abstraction of
unclassifiable speech sound
higher-order invariants that conjointly acknowl-
3. falls in non-phonetic space, beyond the boundaries of edge the similarities that make comparison possi-
the native phonetic domain
ble and capture the differences which crucially set
nonspeech sound
the categories apart with respect to some purpose,
such as a phonological contrast that serves to dif-
Alternatively, the non-native segment may be ferentiate word meanings (J. Gibson, 1979). A
somewhat discrepant but still sufficiently similar critical contrast between events is characterized
to be perceived as a good or acceptable exemplar of by distinctive features. Distinctive features do not
the native category. Or it may be even more merely list the lower-order properties of the indi-
obviously discrepant and thus be perceived as a vidual classes, but rather they capture the rela-
poor exemplar of the category. Second, the non- tions between classes which remain invariant over
native segment may be perceived as globally contexts and non-identity-changing transforma-
speechlike, but its gestural organization may not tions, and which thereby define the uniqueness of
resemble any particular category in the native each class with· respect to the other (E. Gibson,
inventory very clearly. In this case, it will be 1963). The distinctive higher-order invariants that
perceived as speechlike but will not be assimilated define phonetic contrasts indicate mere 'otherness'
to a specific native category. Rather, it will fall in and cannot be heard independently of a speech
an unfamiliar area of the phonetic domain and be segment, (E. Gibson & J. Gibson, 1972), e.g., loca-
an uncategorizable speech sound, as are the tion of constriction in the example above. Thus,
foreign-sounding elements in infant babbling. they are more economical than category-defining
Third, the non-native segment may fall entirely properties, and optimize information pickup by an
outside the gestural range of the native phonetic experience-attuned perceiver.
domain and thus fail to be assimilated as speech, For these reasons, the influence of the system-
falling instead in non-phonetic space. These atic functional relations within the native phonol-
segments are non-assimilable as speech, and so ogy should be more readily apparent in perceptual
will be perceived as nonspeech events, e.g., as comparisons between contrasting non-native cate-
nonspeech mouth sounds, snaps, clicks, etc. gories than in a perceptual response to a single
56 Best

non-native category. As summarized in Table 2, also be very difficult when both members of the
PAM predicts that listeners will easily discrimi- non-native contrast are perceived to fit within a
nate between non-native categories when they can gestural constellation for a single native category
detect in those sounds an invariant that specifies equally well (Single ,.Qategory assimilation type, or
a critical difference, or phonological contrast, be- SC). The BC case and the CG case actually fall at
tween gestural constellations in the native lan- different points along a single dimension, in that
guage (referred to as a iwo-£ategory assimilation both involve non-native contrasts whose members
type, or TC). They should discriminate moderately are assimilated to a single native category. Thus,
well to very well between a non-native category to the extent that prototype effects in perception
for which they detect strong similarity to a given of phonetic categories (i.e., asymmetries in dis-
native gestural constellation and another non-na- crimination around good vs. poor exemplars of a
tive category for which they detect less similarity category-e.g., Grieser & Kuhl, 1989; see descrip-
(or greater discrepancy) to the same native cate- tion in next section) are operative in speech per-
gory (.Qategory Goodness difference, or CG assimi- ception, they should combine with the BC and CG
lation type), or versus one for which they cannot assimilation patterns to predict better BC discrim-
detect clear similarity to any single native constel- ination when both non-native categories are as-
lation Cllncategorized vs. ,.Qategorized assimilation similated as poor (non-prototypical) rather than as
type, or UC). When the non-native categories both good exemplars of the native category, and to pre-
bear only a global resemblance to the gestural dict CG discrimination asymmetries that reflect
constellations of (native) speech but do not assimi- greater category generalization (poorer discrimi-
late clearly into any particular native phonetic nation) around prototypical exemplars than
category(s), they will be both assimilated as un- around non-protoypical exemplars of the native
categorizable speech sound (both .l.!ncategorizable, category.Discrimination should be moderately to
or UU), and will be moderately to fairly difficult to very good, comparable to the CG assimilation
discriminate, depending on they bear any remote type, if both non-native gestural patterns are per-
similarity to any native category(s) and the extent ceived to fall outside the native phonetic domain
to which any such similarities overlap between altogether, in non-phonetic space (Kon-
the two non-native sounds. Discrimination should Assimilated type, or NA).

Table 2. Assimilation effects on discrimination of non-native contrasts.

Contrast Assimilation Type Discrimination Effect


Two-Category excellent discrimination
(TC) each non-native sound is assimilated to a different native category
Category-Goodness Difference moderate to very good discrimination
(CG) both non-native sounds assimilated to the same native
category, but they differ in discrepancy from native "ideal"
(e.g., one is acceptable and the other is deviant)
can vary in degree of difference as members of native category
Single-Category poor discrimination
(SC) both non-native sounds assimilated to the same native category,
but are equal in fit to the native "ideal"
better discrimination for pairs with poor fit (equally poor) to native
category than pairs with good fit (equally good)
Both Uncategorizable poor to moderate discrimination
(UU) both non-native sounds fall within unfamiliar phonetic space
can vary in their discriminability as uncategorizable speech sounds

Uncategorized vs. Categorized very good discrimination


(UC) one non-native sound assimilated to a native category, the other
falls in unfamiliar phonetic space, outside native categories
Non-Assimilable good to very good discrimination
(NA) both non-native categories fall outside of speech domain
and are heard as non-speech sounds
can vary in their discriminability as nonspeech sounds
Learning to Perceive the Sound Pattern of English 57

The earlier comparisons of gestural scores for perception of native and non-native phonetic
English and non-English phonetic categories contrasts. Following that, we can outline a
illustrate some of these cross-language gestural perceptual learning account of development that
similarities and dissimilarities. In the Hindi (<!a]- appears to accommodate those facts. That outline
[qa] example (Figure 3), the dental versus will provide the background for studies I have
retroflex constriction locations do not distinguish conducted with students alld colleagues to test
English stop consonants; in fact, they occur as several predictions of PAM for perception of
phonologically equivalent (i.e., non-distinctive) varying non-native phonetic contrasts by adults
allophonic variants of (alveolar) Idl. As for the and infants.
Zulu [kh]-[k'] example (Figure 4), a distinctive
property of the voiceless velar stop in English [kh ]
Developmental changes in infant perception
is a glottal opening gesture coordinated with of phonetic contrasts
closure (as in Zulu [kliJ). This critical gesture is Young infants, up to about 4 months of age,
lacking from Zulu [k'], which instead has a glottal have had relatively limited experience hearing the
closure and is therefore notably discrepant from native language. Even the language experience
[k h ). The Zulu voiced-voiceless lateral fricatives they have had generally focuses attention more on
differ by essentially the same glottal voicing prosodic patterns than on minimal segmental
distinction (open glottis versus critically closed contrasts. The infant-directed speech that is
glottis) found in similar English fricative contrasts typically addressed to them is characterized by
(e.g., Is/-/z/, "sh"-"zh"). Lastly, the dual exaggerated pitch contours and durational
alveolar+velar closures and the suction release properties, relative to adult prosody, in most
gesture for Zulu alveolar versus lateral clicks are cultures (Fernald et al., 1990; Fernald & Mazzie,
globally unlike anything in English phonology, 1991; Fernald & Simon, 1984; Grieser & Kuhl,
and resemble nonspeech events such as cork 1988; cf. Bernstein Ratner & Pye, 1984).
popping and finger-snapping rather than being Moreover, infants from birth to at least 4 months
even generically speechlike for most English of age prefer listening to infant-directed speech
listeners. more than to adult-directed speech (Cooper &
PAM thus predicts that adults' attunement for Aslin, 1990; Fernald, 1984, 1985; Fernald & Kuhl,
detecting the articulatory gestural invariants that 1987; Werker & McLeod, 1990). In contrast with
specify familiar phonetic categories of the native its prosodic properties, infant-directed speech is
language will foster detection of both similarities not marked by exaggeration or emphasis of
and dissimilarities between non-native segments segmental distinctions (Bernstein Ratner, 1984,
and the native inventory. Even more importantly 1986; Bernstein Ratner & Luberoff, 1984;
for questions about perceptual influences of the Malsheen, 1980). Even so, many findings indicate
native phonological system, discrimination of non- that young infants do discriminate a broad range
native contrasts is predicted to depend on the of consonant and vowel contrasts in nonsense
listener's abstraction of higher-order invariants syllables, regardless of whether or not the
that specify distinctive oppositions in the native contrasts occur in their language environment
phonology, as well as on their detection of (e.g., Eimas, Siqueland, Jusczyk, & Vigorito, 1971;
discrepancies between the native contrasts and Jusczyk & Thompson, 1978; Jusczyk, Copan, &
gestural properties of contrasting non-native Thompson, 1978; for comprehensive reviews, see
segments. But what of young infants, who are not e.g., Aslin, 1987; Aslin, Pisoni, & Jusczyk, 1983;
yet perceptually attuned to native phonetic Best, 1984; Kuhl, 1987; Jusczyk, 1994). Evidence
categories, and especially to the native for developmental decline in discrimination of
phonological system? When and how do infants certain non-native contrasts will be discussed in
begin to extract the gestural invariants of native depth in a subsequent section.
categories and the higher-order invariants of A few phonetic differences have been suggested
critical distinctions found in native contrasts? And to pose difficulties for young infants, viz, certain
how does this early perceptual learning of the native fricative voicing contrasts (e.g., English Is/-
phonetic categories and relationships of the native IzI: Eilers, 1977; Eilers & Minifie, 1975) and
language begin to affect perception of non-native fricative place contrasts (e.g., If/-"th" [taink:]
phonetic forms? Eilers, Wilson, & Moore, 1977). However, more
To provide a basis for discussing these issues, recent work by those researchers, as well as by
we will begin with a brief review of empirical others, has shown that infants do discriminate
findings on developmental changes in infants' those same contrasts (Eilers, Gavin, & Oller,
58 Best

1982; Holmberg, Morgan, & Kuhl, 1977; Levitt, Polka & Werker, in press) (the latter findings will
Jusczyk, Murray, & Carden, 1988). Moreover, be discussed in more detail later).
infants discriminate other fricative place of In addition, young infants are able to perceive,
articulation contrasts, both native (e.g., Is/-Ush": for at least some consonants and vowels, an
Eilers, & Minifie, 1975; Eilers, Wilson, & Moore, underlying phonetic category identity throughout
1977; Kuhl, 1980) and non-native (e.g., Czech the variations introduced by different pitch
retroflex vs. palatal voiced fricatives: Trehub, contours, different speakers and different adjacent
1976; Eilers et aI., 1982). The balance of that segments. Detection of such a phonetic
evidence indicates that young infants can equivalence class would appear as perceptual
discriminate native and non-native fricative constancy across such variations in a phonetic
contrasts. category, within the familiarization or background
In addition to the basic discrimination findings, stimuli and within the test stimuli. Perceptual
infants under 4 months show other revealing constancy was shown in 1-4 month olds for
perceptual patterns. When familiarized with a set discrimination of a vowel contrast presented with
of syllables that share either a common vowel and pitch contour variations (Kuhl, 1979; Kuhl &
different consonants, or the converse, 2-month Miller, 1982). Similar perceptual constancy in
oIds and newborns can detect the addition of new discrimination of a consonant contrast across
syllable that differs in either consonant or vowel speaker variations has been found in 2 month olds
or both (e.g., Bertoncini, Bijeljac-Babic, Jusczyk, (Jusczyk, Pisoni, & Mullennix, 1992), but only if
Kennedy, & Mehler, 1988; Jusczyk & Derrah, there is no delay between the familiarization and
1987), although newborns are more affected by testing phases. Similar memorial effects have
attentional manipulations (Jusczyk, Bertoncini, been found in adults (Martin, Mullennix, Pisoni, &
Bijeljac-Babic, Kennedy, & Mehler, 1990). This Summers, 1989). Perceptual constancy across
pattern suggests that young infants perceive the varying phonetic contexts (e.g., Ipl across Ipi/, Ipo/,
syllables holistically rather than as a combination Ipu/; nasalization across Ino/, Imo/, 11]0/) has been
of discrete segments. Infants between 2-4 months found for both vowels and consonants by 4-6
can also discriminate 3-5 syllable utterances months of age (e.g., Fodor, Garrett, & Brill, 1975;
whose medial syllables differ, but apparently only Hillenbrand, 1983, 1984; Kuhl, 1979, 1980, 1983).
if the contrasted elements are highlighted by the Thus far, only native phonetic categories have
exaggerated prosodic contours of infant-directed been tested with infants.
speech, or differ on more than one articulatory The findings summarized thus far have
feature (e.g., Ir/-/kI) (Goodsitt, Morse, Ver Hoeve, demonstrated little evidence of developmental
& Cowan, 1984; Fernald & Kuhl, 1982, cited in changes in basic aspects of infant speech
Karzon, 1985; Karzon, 1985; see review by perception for native segmental contrasts, save for
Jusczyk,1993). some signs of increased susceptibility to
Vowel prototype, or "magnet," effects may also attentional manipulations or memorial
be found quite early. The magnet effect refers to a disruptions in the first two months (Bertoncini et
perceptual pattern in which listeners show aI., 1988; Jusczyk, Pisoni, & Mullinnex, 1992).
preferences for and greater generalization (poorer However, in final quarter-year, there are some
discrimination) around good rather than poor clearer indications that perception of native
exemplars of a vowel category (as per adult segmental patterns is beginning to be influenced
goodness ratings) (Grieser & Kuhl, 1989). These by experience with the language. As discussed
perceptual asymmetries around good versus poor earlier, languages differ in both the inventories of
tokens indicate that perception of vowel categories consonants and vowels they employ, and also in
is not absolute, but rather shows systematic their phonotactic rules regarding permissible
within-category differentiation, an effect which sequencing of those elements. When 9 month olds
occurs only in humans and not in monkeys (Kuhl, are permitted to choose between listening to two
1991). The discrimination asymmetry for good vs. series of unfamiliar words with English vs. Dtuch
poor tokens has been found in human newborns segments and phonotactics, infants from each
with both native and non-native vowels (Walton & language preferred listening to the list
Socotch, 1993). By 6 months of age, infants still representing their native language. Younger
show the effect for a native vowel (Grieser & Kuhl, infants showed no preference between these
1989) but not for a non-native one (Kuhl, prosodically-similar languages. Although English-
Williams, Lacerda, Stevens, & Lindblom, 1992; learning infants did show a native preference
Learning to Perceive the SOJ,lnd Pattern of English 59

when presented with English vs. prosodically- In the next section, I will outline the perceptual
different Norwegian, that effect was solely learning framework for development of speech
attributable to prosody rather than segmental and perception in infancy (and somewhat beyond). The
phonotactic constraints (Jusczyk, Friederici, suggested path of learning is informed, in part, by
Wessels, Svenkerud, & Jusczyk, 1993). The the findings summarized above, in addition to the
experiential effect on 9 month oIds' preference for general principles of the ecological approach to
segmental patterns is strengthened by recent perception. It provides the backdrop for consider-
findings that Dutch infants this age prefer ing the research findings on adults' and infants'
phonotactically permissible vs. phonotactically perception of non-native phonetic contrasts, par-
impermissible sequences of Dutch segments ticularly a series of studies motivated by PAM,
(Friederici & Wessels, in press), and that which will be described in the subsequent section.
American infants prefer frequently-occurring vs.
infrequently-occurring English phonotactic Perceptualleaming and infant speech
patterns (Jusczyk, Charles-Luce, & Luce, perception
submitted-see Jusczyk, in press). The basic assumption of the ecological account
Infants' discovery of relations between sound of perceptual learning offered here is that the type
patterns and meaning also begins around last of gestural information the child perceives in
quarter of first year, with the beginnings of word speech will change developmentally with increas-
comprehension. Infants usually begin producing ing attunement to the ambient language. The in-
single words a few months later, at around 12-13 fant will become better able with experience to de-
months on average, followed by the emergence of tect both finer structure and more encompassing
syntactic abilities with their first simple word structures in native utterances. Following Eleanor
combinations at around 18 months. A phonetic Gibson's (1991) arguments about perceptual
contrast that young infants discriminated in sim- learning in general, the detection of gestural pat-
ple discrimination tests, prior to the emergence of terns in speech should become increasingly spe-
word comprehension, may later be missed alto- cific to the phonological categories and contrasts of
gether as a minimal phonological contrast by the the native language, there should be an increasing
one year old whose comprehension vocabulary still optimization of attention to them, and pickup of
lacks minimal word pairs (e.g., the /d/-lb/ contrast gestural information should become increasingly
when it appears in dog vs. bog). This follows from economical, that is, focus should shift away from
the claim of child phonologists that the earliest irrelevant properties and sharpen for critically
linguistic units in the single-word period of child distinctive ones. The distinguishing features de-
speech are more global than the segment (e.g., tected for discrimination should shift developmen-
Ferguson, 1986; Ferguson & Farwell, 1965; tally, showing progressive improvement in finding
Macken, 1992; Macken & Ferguson, 1983; the critical features and in abstracting higher-or-
McCune, 1992; McCune & Vihman, 1987; Menn, der invariants, both of which reduce the number
1986; Menu & Matthei, 1992; Vihman, 1992), and of comparisons required for discrimination
that segments are gradually differentiated in both (E. Gibson, 1969; 1971). These are exactly the
production and perception from these early, more advantages afforded to an experienced listener by
global units (e.g., Goodell & Studdert-Kennedy, the phonology of the native language. Because the
1990; Lindblom, MacNeilage, & Studdert- language-specific phonological system reduces
Kennedy, 1983; Nittrouer, Studdert-Kennedy, & lower-order phonetic detail to just those distinc-
McGowan, 1989; Studdert-Kennedy, 1986, 1991) tive features that are crucial for grammatical pur-
due to the pressure exerted by vocabulary expan- poses (e.g., Archangeli, 1988) and organizes that
sion on the organization of the lexicon (Lindblom, information into superordinate structures, it al-
1992; Studdert-Kennedy, 1987, 1991). lows a sensitized perceiver to take in more infor-
Discrimination of minimal contrasts in meaning- mation within a given time frame and to minimize
ful word contexts appears to emerge around 18-19 uncertainty about the important linguistic units.
months of age (Werker & Baldwin, 1991; see As experience with the native language optimizes
Werker & Pegg, 1992). Similar temporary dips in and economizes information pickup, therefore, the
phonetic ability have also been noted in early infant begins to discover the phonological princi-
word productions, where they are taken as evi- ples of that language.
dence of progress in the development and system- This learning will, in turn, be reflected in devel-
atization of phonological knowledge (e.g., Macken, opmental change in the infant's perception of non-
1992; Menn & Matthei, 1992). native categories and contrasts. Progress in per-
60 Best

ceptuallearning about the native language should ited, and the speech typically addressed to them
result in, and be illuminated by, developmental generally focuses attention more on prosodic pat-
changes in perception of non-native speech. The terns than on minimal segmental contrasts. The
suggested pattern of perceptual learning about view posited here is that infants initially detect
native phonological structure, and its expected ef- simple differences in low-order articulatory in-
fects on infant's perception of non-native cate- variants, such as the velar versus alveolar closure
gories and contrasts, is summarized in Table 3. location for 19/-ldI, the presence' versus absence of
During about the first quarter-year of life, very a glottal opening gesture for Ip/-Ibl, or the high
young infants should have attained minimal per- versus slightly lower tongue position near the
ceptuallearning of the higher-order invariants for front of the vocal tract for "ee"-"ih". This ability
native segmental contrasts, at best. Their experi- should extend to simple gestural differences in
ence with the native language is relatively lim- both native and non-native phonetic contrasts.

Table 3. Perception of native and non-native contrasts in infancy and early childhood.

developmental phase information detected native phonetic categories Non-native phonetic categories

1st quarter-year simple articulatory gestures discriminates any vowel & same as for native speech
(0-3 months) (language universal) consonant difference
good vs. poor exemplars prototype effects for vowels same as for native speech
of simple gestures (and consonants?)
(language universal)
invariants of simple perceptual constancy same as for native speech
gestures under speaker for vowels and
& intonation variations consonants
(language universal)
2nd quarter-year continues as above continues as above same as for native speech
(3-6 months) (language uni versa!)
invariants of simple perceptual constancy may fail with non-native
gestures under phonetic for nati ve categories categories
context variations
(language-specific?)
3rd quarter-year simple relational invariants discriminates native vowel fails to discriminate non-native
(6-9 months) for vowels differences vowels that differ from
(language-specific) native relational invariants
good vs. poor vowels prototype effects for nati ve lacks prototype effect for non-
re: relational invariants vowel categories native relational invariants
4th quarter-year simple invariants for native discriminates native vowel discriminates if able to detect
(9-12 months) gestural constellations and consonant categories different native invariants,
(language-specific) or good vs. poor native invariant
or if no speechlike gestures at all
prefers listening to common fails if detects a native invariant
native syllable patterns more but not a goodness difference
than non-native or or if detects speechlike gestures
uncommon native patterns but not any natiVe invariants
extending to 2nd year simple invariants for sound- learns to recognize simple may have difficulty learning
(9-17 months) meaning association native words and meanings meaning associated with
re: global gestural patterns non-native global patterns
18 months higher-order relational detects native phonological perception of non-native
invariants for minimal contrasts phonological contrasts
contrast word pairs depends on similarity to
native contrast invariant
2 - 5 years higher-order relational tendency toward perceptual no difference in response to
invariants among equivalence among non-native allophones vs.
some allophones allophones of a category non-native phonol. contrasts
higher-order invariants
specifying morphological
alternations, etc.
Learning to Perceive the Sound Pattern of English 61

Given the assumption that they detect simple constancyre: variations of speaker, intonation, or
differences in low-order articulatory invariants, it phonetic context in infants of any age.
should not be surprising that infants in the first The assumption that very young infants detect
quarter-year can pick up simple gestural com- simple gestural properties of phonetic categories
monalities within phonetic categories even in the also admits the likelihood that they should also
face of certain category-irrelevant variations. That show so-called perceptual· magnet effects within
is, they show perceptual constancy for simple the first quarter-year, at least for vowels. This is
phonetic equivalence classes across non-identity- based on the reasoning that prototypes and non-
changing transformations. Because lower-order prototypes differ in how well they convey the
articulatory invariants of phonetic categories are important gestural properties of a vowel category.
not greatly affected by speaker (within a single This, in tum, would affect how easily perceivers
dialect) and intonation variations, but may be af- could detect the gestural pattern of the category in
fected- by phonetic context variations due to coar- the differing stimulus tokens. The notion that
ticulation of consonants and vowels, perceptual there is an articulatory basis for good versus poor
constancy across speakers and intonation patterns vowels is consistent with the quantal theory of
may be evident earlier in development than per- speech. The quantal theory demonstrates that
ceptual constancy across different phonetic con- certain vowel types are very stable, in that small
texts. Thus far, the phonetic constancies demon- changes in their articulatory constriction location
strated in the first quarter-year (Jusczyk, Pisoni, produce minimal changes in the acoustic pattern
& Mullennix, 1992; Kuhl, 1979; Kuhl & Miller, of the vowel, whereas other constriction locations
1982) have involved only speaker and intonation are unstable acoustically. Languages tend to avoid
variations. Only the studies with infants in the the latter locations for possible vowels (Stevens,
second quarter-year (Fodor, Garrett, & Brill, 1975; 1972, 1989). Infants in the first quarter-year
Hillenbrand, 1983, 1984; Kuhl, 1980, 1983) have would be expected to show magnet effects for both
involved phonetic variations. In addition, given native and non-native vowels, a prediction that is
the slower, longer-lasting, more global tongue ges- consistent with one recent report (Walton &
tures associated with vowels as opposed to the Socotch, 1992).
more rapid and localized constriction gestures as- Young infants in the first quarter-year should
sociated with consonants, perceptual constancy not yet recognize the more complex coordination
may appear earlier for vowels, or may simply be or phasing required for specific native gestural
more easily obtained and more robust to atten- constellations, e.g., syllable-initial III in English
tional manipulations, than constancy for conso- has an uvular narrowing gesture which follows
nants. Again, studies of very young infants (Kuhl, the tongue tip closure gesture for flJ, rather than
1979; Kuhl & Miller, 1982) have tended to test being synchronous with it as in word-final English
only vowel constancy, whereas studies with in- flJ and in the Russian "hard" flJ, or absent as in the
fants in the second quarter-year (Fodor et aI., Russian "soft" flJ. Only as infants become attuned
1975; Hillenbrand, 1983, 1984; Kuhl, 1980, 1983) to detecting invariants for familiar gestural
have tested for consonant constancy. The constellations in native speech should they begin
possibility of a vowel vs. consonant difference also to show effects of native language experience on
seems compatible with the findings of Bertoncini their perception of non-native contrasts. This sort
et al. (1988) and Jusczyk et al. (1992) (cf. Jusczyk of native attunement would not be expected until
et al., 1990). However, further investigation is at least the second quarter-year (perhaps in
needed to evaluate both possibilities of early perceptual constancy across phonetic context
developmental changes in perceptual constancy. variation), or more likely the following quarter~
Regardless of these possible stimulus parameter year.
effects on perceptual of phonetic equivalence By the third quarter-year (second half-year), in-
classes, very young infants should show constancy fants should progress to discovering and attending
equally for native and non-native phonetic to more economical higher-order relational invari-
categories. To the extent that phonetic categories ants found in the native phonology, such as the
and contextual effects differ among languages, ratio of the two portions of the vocal tract that fall
infants should become attuned to native language on either side of the tongue constriction location
patterns and we should expect to see some for a given native vowel. These discoveries are as-
language-specific effects emerge later, probably sumed to proceed systematically from less to more
around the second half-year. Thus far, however, encompassing and more economical invariants.
no studies have examined phonetic perceptual Thus, the first sorts of native relational invariants
62 Best

infants are likely to discover are relatively simple At this point in development, however, infants
ones such as the ratio between the length of the would not necessarily perceive allophones of a
vocal tract that lies before versus behind the high given phoneme as related variants of a single
front tongue constriction for the vowel Iii ("ee"). segment, such as the allophonic relationship
Once they detect such invariants, they should be- among stressed syllable-initial voiceless aspirated
gin to show language-specific influences on per- Ipl versus unreleased final Ipl versus voiceless
ception of non-native vowel contrasts and proto- unaspirated Ipl after lsi. Instead, they may detect
types. These older infants' abilities to discriminate differences among allophones simply as gestural
non-native vowels and to perceive non-native characteristics of differing native syllable
vowel prototypes will depend on whether they can patterns. This is because they presumably would
detect in those stimuli the relational invariants not yet have discovered the even higher-order
that they can now detect in native vowels, i.e., in invariants that relate allophones to common
whether they "assimilate" the non-native vowels underlying phonological categories. Such abstract
to native categories. If so, performance will fur- commonalities draw on grammatical relations
ther depend on whether the infant assimilates the among lexical items (e.g., different morphological
non-native vowels as good exemplars of native forms of a stem word-see further discussion
category, and whether two contrasting non-native below), which are still beyond young infants'
vowels are assimilated to the same native cate- grasp.
gory or to different categories. However, we should Sound-meaning associations, which relate the
not expect infants' assimilations to match those of higher-order gestural constellation of the spoken
adults completely because infants' detection of na- word to the confluence of contextual signs of its
tive vowel invariants is surely not as well-tuned meaning, emerge in comprehension during the
as that of adults, and the invariants they detect final quarter-year. Some ecological, perceptual
may be somewhat lower-order than those of learning accounts of this important discovery have
adults. been offered in the literature. For example,
With further experience, by the last quarter of parents often repeat a key word several times to
the first year, infants should also begin to their infant under diverse spoken
recognize the higher-order invariants that specifY transformations, such as variations in prosody
native gestural constellations for consonants, as and sentence frame, while they concurrently
well as the broader phonotactic patterns of native engage the named object (noun) in different event
syllables. For example, they should begin to transformations such as holding it out or wiggling
recognize the higher-order relational invariants it back and forth, or while they produce variations
that specifY consonantal gestural constellations in on the named action (verb)(Dent, 1990; Dent &
the native language, such as the precise phasing Rader, 1979; Goldring Zukow, 1991; Zukow &
between the bilabial closure and the glottal Schmidt, 1988). The articulatory gestural
opening gestures for English Ipl (as opposed to the component infants extract for such sound-
different phasing for French Ip/). At this point, meaning complexes is expected to be less
infants' listening preferences and discrimination differentiated phonetically than other gestural
abilities will reflect language-specific influences patterns the same infant might detect in the
on perception of non-native consonants and absence of a sound-meaning relation, because the
syllable types (re: phonotactic rules regarding how added dimension of semantic or contextual
consonants and vowels may be sequenced to form information for words must be reconciled with the
syllables). Older infants' perception of these sorts limitations of the infant's perceptual span and the
of non-native gestural constellations will also need for economization of information pickup. For
depend on whether and how those patterns this reason, children's early words, in both
provide the higher-order gestural invariants they production and perception, should be
have learned to detect in native consonants and differentiated by rather holistic gestural
syllable types. Again, these older infants' properties and not by the finer grain of minimal
assimilations of non-native constellations to contrasts (see Best, in press a). Minimal contrasts
native categories is still not expected to match that they discriminated prior to the emergence of
adults' assimilation patterns, which derive from a meaning are likely to be missed now in sound-
much more sophisticated level of perceptual meaning complexes. Infants at this point have still
learning that incorporates minimal phonological not discovered minimal phonological opposition.
contrast and other even more complex relations Discovery of phonological oppositions per se
among segments in the native phonology. requires detection of finer-grained distinctions
Learning to Perceive the Sound Pattern of English 63

between the gestural constellations of minimally some allophonic relations if they "correct" their
contrastive, meaningful lexical items. The ability normal conversational speech patterns by
to perceive phonological contrasts as such may not repeating words in careful, precise speech to
be apparent until the upper edge of the infancy young children. To illustrate, although they
period. Recall that minimal contrasts are part of pronounce kitty conversationally with a medial
the phonological component of a language-specific flap, they may at times pronounce it carefully for
grammar. The perception of minimal contrast in the child (as when correcting the child's spelling
the native language, a minimum requirement of a errors), with the medial It! as a voiceless alveolar
segmental phonology, should be associated with stop (see Bernstein Ratner, 1993). An underlying
the so-called spurt in children's productive gestural commonality among the diverse
vocabulary (>50 words), which also predicts the allophones of medial It/-Id! is apparent in
emergence of syntax and morphology (e.g., children's productions by 20-22 months (Best,
Macken, 1992). At that point, the comprehension Goodell, & Wilkenfeld, in preparation; Best, in
vocabulary, if not also the production vocabulary, press a). More abstract phonological relations
should be large enough to include minimally among allophones may also be highlighted later
contrastive word pairs such as bed-bad or peas- by learning to read and spell, as is the case for the
keys. To perceive a phonological contrast a flapped allophones of ItI and Id! (see Treiman,
relational invariant must be extracted, the critical Cassar, & Zukowski, submitted).
segmental distinction that marks a difference in Similarly, children may learn about even more
meaning between a minimal pair of words. This abstract phonological relations through frequently
characterization is consistent with the earlier- used morphological operations. For example, the
summarized finding that older infants begin to English voiced-voiceless alternations between Is/-
detect minimal contrasts in meaningful words /zl in noun pluralization (e.g., cat§. versus dog§.)
around 18-19 months (Werker & Baldwin, 1991; and between It!-Id! in the past tense forms of
see Werker & Pegg, 1992). regular verbs (e.g., walked versus climbgs[) covary
Discovery of the still higher-order invariants with the voicing of the preceding segment.
corresponding to numerous other aspects of Morphological development during the preschool
phonological structure await still more experience years (Berko, 1958) should aid children's discovery
with the native language, some probably requiring of related phonological alternations (see also
years. For example, perceptual learning of Gerken, Landau, & Remez, 1990; Gerken &
allophonic relations should depend in part on McIntosh, 1993). Other structural properties of
hearing the same word produced by different the native phonological system that may take
speakers and with varying speech styles (e.g., even longer for the child to fully apprehend in
casual, formal, and careful speech), as well as on speech include some aspects of linguistic stress
hearing how morphological operations on words and intonation, for which perceptual learning may
affect the phonetic form of the base word. To extend to as late as 7·10 years (Cruttenden, 1974).
illustrate, in American English casual speech It! In the next section, I· will review recent data
and Id! have a number of context-conditioned from my own and others' laboratories that
allophonic variants: unreleased stops in final pertains to the preceding account of perceptual
position (e.g., sit., daft mad); rapid tongue taps learning about the native phonology, and its
(flaps) as onsets of non-initial unstressed syllables influence on perception of unfamiliar non-native
(sitting, daddY, kitlY); glottal stops or nasal- phonetic contrasts. The findings will be discussed
released stops preceding unstressed syllabic In! within the framework of the Perceptual
(kitten versus hidden, respectively). Word pairs Assimilation Model (PAM), though it should be
that young children are likely to hear could noted that the work of other researchers was
provide them with evidence of some of these generally not motivated by PAM. Although much
phonological relations, as in the unreleased ItI of the research has involved consonant contrasts,
versus flap in sit-sitting, the unreleased final Id! some more recent work focuses on vowel contrasts;
versus medial flap in dad-dadfiy, the flap versus these areas will be described in separate
glottal stop in kitJy-kitten, and the unreleased Id! subsections below. Because PAM's assimilation
versus nasal-release in hid-hidden. In these cases and discrimination predictions were developed to
morphological transformations of meaningful, account for mature listeners' perceptions of non-
known words provides a crucial link among the native phonetic contrasts, adult findings will be
diverse allophones. Adults may also help clarify described first within each area.
64 Best

Experimental evidence on PAM and assimilate both English Irl and 1lJ, maybe as poor
development of perceptualleaming examplars of their flapped Irl, but more likely as
poor exemplars of their approximant Iwl or as
Consonant contrasts. PAM predicts that adults' uncategorizable speech sounds. The sounds should
ability to discriminate different non-native con- be rather poorly discriminated by Japanese in any
trasts will vary depending on how they assimilate of these cases, although perhaps slightly above
the non-native phonetic categories vis a vis the chance.
phonological inventory of their native language. 8 In a study conducted before the development of
The assimilation predictions presented here and PAM, Kristine MacKain, Winifred Strange and I
elsewhere (see Best, 1993, in press a; Best et al., compared American and Japanese listeners'
1988; Best & Strange, 1992) refer specifically to labeling and discrimination of llJ·/rl in a computer-
adults' initial perception of unfamiliar contrasts synthesized continuum ranging from English rock
from languages with which they have had little or to lock in acoustically-equal steps (MacKain, Best,
no linguistic experience. However, the model could & Strange, 1981). As expected, the American
be extended, via the principles of perceptual learn- listeners strongly displayed the phenomenon of
ing outlined here, to account for changes in per- categorical perception. That is, they labeled the
ception that can occur as adults learn a second items at one end of the continuum very
language (see Best, in press b; for an alternative consistently as IlJ and the items at the other end
view, see Flege, in press). To review PAM predic- as Irl, with a steep category boundary.
tions briefly (Tables 1 and 2), adults are expected Correspondingly, their discrimination between
to show excellent discrimination for non-native items that were 3 steps apart along the continuum
contrasts that are assimilated to two different na- was poor for within-category comparisons but very
tive categories (TC assimilation type). They good for between-category comparisons, with a
should show good to very good discrimination for dramatic peak in discrimination performance at
those that are not assimilated into native phonetic the position of the category boundary found in
space (i.e., are heard as nonspeech: NA type), or labeling. Japanese who had had little English
for those assimilated with differing degrees of conversational experience, on the other hand,
goodness into a single native category (CG type), showed nearly flat labeling and discrimination
or for those in which one pair member is assimi- functions, with no category boundary effect and
lated to a native category but the other is uncate- poor discrimination overall. Interestingly,
gorizable (UC type). Moderate to poor discrimina- however, a subgroup of Japanese subjects who had
tion is expected for non-native contrasts that fall had some period of intensive conversational
within unfamiliar phonetic space (i.e., are both training and/or practice in English showed
heard as uncategorizable speech sounds: UU labeling and discrimination functions similar to
type), and poor discrimination for those assimi- the Americans', although not quite as high. Thus,
lated as equally good exemplars of a single native the results are compatible with PAM, and in
category (SC type). addition suggest that perceptual of non-native
Earlier reports of poor discrimination of non- contrasts can be improved by intensive
native consonants by adults have tended to use conversational experience with the language
contrasts that were most likely assimilated as SC involved (see also Flege, 1989, 1991a; other
types or perhaps as UU types. Discrimination training approaches may also improve
levels for such contrasts should indeed have been discrimination: e.g., Jamieson & Morosan, 1986;
low according to the Perceptual Assimilation Logan, Lively, & Pisoni, 1991; Pisoni, Aslin,
Model. For example, speakers of Japanese and Perey, & Hennessey, 1982; Strange & Dittmann,
Korean who are relatively inexperienced with 1984).
spoken English have great difficulty Monolingual English-speaking listeners have
discriminating and differentially labeling English also, of course, shown poor discrimination for a
1rI-1lI (e.g., Gillette, 1980; Goto, 1971; Miyawaki et number of non-native contrasts, each of which is
al., 1975; Mochizuki, 1981; Sheldon & Strange, most likely to show SC assimilation patterns. For
1982; Yamada & Tohkura, 1991). Their languages example, Thai voiced vs. voiceless unaspirated
do not have an IlJ category and their Irl is not a utterance-initial stops are both good exemplars of
liquid approximant as in American English, but English voiced stops, and are difficult for English
rather a flap more like the medial Id/ in daddy listeners to discriminate (Lisker & Abramson,
(Bloch, 1950; Price, 1981; Vance, 1987). Thus, 1970). Hindi voiceless unaspirated dental vs.
PAM would expect monolingual Japanese to retroflex stops, which are likely to be heard as Id/,
Learning to Perceive the Sound Pattern of English 65

are quite difficult for English listeners to voiceless, voiceless aspirated) and places of
discriminate, as are Nthlakampx (Thompson: articulation (alveolar, lateral, palata}), which
Interior Salish) velar vs. uvular ejective stops lk'l- yielded 18 minimal contrasts. According to post-
Iq'l, which are likely to be heard as "odd" test questionnaires, the listeners assimilated all
exemplars of English IkI (or sometimes as other clicks as various nonspeech sounds (e.g., "a cork
English sounds) (Polka, 1991; Werker, Gilbert, popping," "tongue clucks," "finger snaps"), except
Humphrey, & Tees, 1981; Werker & Lalonde, for one subject who heard some clicks as being
1988; Werker & Tees, 1984a). Likewise, the Czech similar to English fk/. Performance on an AXE
retroflex vs. palatal voiced fricatives are poorly discrimination test was quite good, ranging from
discriminated by English listeners (Eilers, Gavin, 80% correct (chance = 50%) for the most difficult
& Oller, 1982; Trehub, 1976), who are likely to contrast, the alveolar vs. lateral voiceless
hear them both as "zh". unaspirated pair, to 85-95% correct for the others.
Also relevant to the perceptual learning ap- Thus, the PAM prediction of good to very good
proach more generally are several studies showing discrimination for non-native NA contrasts was
that reducing the memory demands of the discrim- met, and performance differs substantially from
ination task or "stripping away" all acoustic de- that reported above for non-native SC (or UU)
tails other than the crucial difference between the contrasts.
contrasting non-native categories result in in- Several other non-native assimilation types
creased discrimination of SC type contrasts (e.g., have been compared in adult studies from my own
Carney, Widin, & Viemeister, 1977; Miyawaki et and other laboratories. In a direct comparison of
al., 1975; Pruitt, Strange, Polka, & Aguilar, 1990; TC, CG and SC contrasts, I tested English listen-
Werker & Logan, 1985; Werker & Tees, 1984a). ers' discrimination with multiple natural utter-
Both experimental manipulations reduce the ar- ances of three additional Zulu contrasts: voiced vs.
ray of information within which the listener must voiceless lateral fricatives, voiceless aspirated vs.
detect the critical differences. With the acoustic ejective velar stops IkI-Ik'I, and plosive vs. implo-
manipulation in particular, in reducing or elimi- sive bilabial stops. A fourth non-native pair was
nating the irrelevant and redundant stimulus the Tigrinya (Ethiopian) bilabial vs. alveolar ejec-
properties, the experimenter both picks out the tive contrast Ip'I-/t'l (Best, 1990). The Zulu lateral
distinctive features for the listener and simulta- fricatives were expected to assimilate to English
neously attenuates the speechlike properties of as TC contrasts, that is, as a voiced-voiceless
the stimuli, i.e., moves them toward NA assimila- English fricative contrast involving the tongue tip
tion types. (i.e., IzI-/s/, "zh"-"sh" or "th" in t!lis vs. t!link), per-
A more comprehensive examination of the haps in combination with an Ill. The Tigrinya
Perceptual Assimilation model, however, requires ejectives were likewise expected to be assimilated
the comparison of discrimination levels across as a TC contrast, specifically as "odd" English /pl
differing non-native assimilation types, and direct and It!. The aspirated vs. ejective velar stops were
assessment of the listeners' assimilations of the expected to assimilate as a good vs. an "odd" 1kI,
non-native sounds re: their native categories. The i.e., as a CG contrast. And the plosive vs. implo-
first study on this point investigated perception of sive bilabials were expected to assimilate as
several click consonant contrasts from Zulu, a nearly equal English ibis. All PAM predictions
southern Mrican Bantu language, by American were strongly supported. Nearly all subjects as-
English adults who were completely inexperienced similated the contrasts as expected, according to a
with any click languages (Best, McRoberts, & posttest questionnaire that asked them to describe
Sithole, 1988). Clicks should not be assimilable as or give English labels to recordings of each non-
speech sounds within English phonetic space native category. Moreover, the levels of AXE dis-
because their manner and place of articulation are crimination performance were strongly associated
different from anything in the English inventory with their assimilation patterns. That is, the Zulu
of gestural constellations. That is, the click and Tingrinya TC contrasts yielded excellent,
contrasts should produce an NA assimilation near-ceiling discrimination. The Zulu CG contrast
pattern for most English listeners, and should be was discriminated very well, but significantly less
relatively easily discriminated as nonspeech well than the TC contrasts. The Zulu SC contrast
sounds. Subjects were tested with multiple showed the lowest discrimination, much lower
natural tokens on discrimination of all minimal- than either the TC or the CG contrasts.
feature pairings from the three by three matrix of Two other aspects of the results from that study
Zulu click voicing categories (voiceless, short-lag were consistent more generally with perceptual
66 Best

learning principles. First, a recency memory effect continua fits well with PAM predictions. That is,
was found on the AXE discrimination trials only their best performance was on.lw/-/y/, where they
for the SC contrast (plosive-implosive bilabials). matched American listeners' performance levels;
Discrimination was significantly better when X their lowest performance was on /r/-1lI, where the
matched the B category than when it matched the Americans performed as well as they did on Iw/-/yl
A category. Second, discrimination performance and Iw/-/r/. Those Japanese who were least
on all three Zulu contrasts was significantly better experienced with English showed essentially
for matches on the more English-like pair mem- chance performance levels on /r/-Ill but were
ber. Specifically, Zulu IkI and fbi were perceived as substantially better than chance on Iw/-/rl and
more like English IkI and fbi, respectively, than especially on Iw/-/y/.9 Japanese with intensive
were the contrasting Zulu fk.'l and implosive bil- English experience performed more similarly to
abial, and the voiceless lateral fricative was per- Americans on Ir/-llI, as summarized earlier for
ceived as containing an English voiceless fricative MacKain et al. (1981), and also on Iw/-/r/; however,
(lsi or "sh") more consistently than the voiced cog- there was no effect of English experience on
nate was perceived as containing the correspond- Japanese performance with Iw/-/y/.
ing voiced fricative (lzi or "zh"), even though sub- Several adult studies from other labs are also
jects did assimilate the lateral fricatives as a TC consistent with PAM predictions, although they
contrast. AXE discrimination was significantly were not designed to test PAM. Werker and Tees
higher when the X was the more English-like fbi, (1984a) tested English speakers' discrimination of
1kI, or voiceless lateral fricative than when it was Hindi breathy voiced vs. voiceless aspirated dental
the less English-like implosive bilabial, fk.'l or stops and dental vs. retroflex voiceless unaspi-
voiced lateral fricative. rated stops, as well as Nthlakampx velar-uvular
In another study, which extended the findings of ejectives lk'I-/q'l. They found listeners better able
MacKain, Best, and Strange (1981), we tested to discriminate the first contrast than the other
several PAM hypotheses by comparing categorical two. This finding is consistent with PAM, given
perception in American and Japanese listeners for that the latter two contrasts are each likely to be
three related English consonant contrasts which assimilated as an SC contrast, specifically as Idl
bear differing relations to Japanese phonology and 1kI, respectively. The former contrast, how-
(Best & Strange, 1992). The stimuli were ever, is likely to be assimilated either as IdI-ItJ, a
computer-synthesized continua for the contrasts TC voicing contrast, or as a CG difference in
1r/-1lI, /r/-/w/, and Iw/-/y/. All three are place of which the Hindi breathy voiced dental is heard as
articulation contrasts between approximant a deviant English ItJ. The authors had undertaken
consonants, involving constriction gestures that the study to test whether allophonic experience in
are neither complete closures as in stop the native language may account for variations in
consonants nor critically narrow as in fricatives. discriminability of different non-native contrasts
The first is not a phonological contrast in (see also Werker et aI., 1981; Werker & Tees,
Japanese, as described earlier, and was expected 1984b). As they note, although the allophonic ex-
to show SC assimilation or UU assimilation. In planation may be compatible with good discrimi-
the second contrast, Irl is of course non-native for nation of the Hindi dental voicing contrast
Japanese, whereas Iwl is a native category but is (English has dental ItJ allophones) and poor dis-
produced with less lip-rounding than in English. crimination of Nthlakampx ejectives (English has
Japanese listeners should assimilate this contrast no ejective allophones), it is inconsistent with the
as either a CG difference within the Japanese Iwl poor discrimination of the Hindi dental-retroflex
category, or as a UC contrast with Irl as an contrast (English does have dental allophones of
uncategorizable speech sound (or, less likely, as a IdI). Interestingly, however, a separate study
TC contrast with a very poor Japanese Ir/). The found that listeners who had had experience with
Iw/-/yl difference is a phonological contrast in Hindi in their first year of life were better able
Japanese as in English, although again both than those without such experience to discrimi-
elements are pronounced somewhat differently in nate the dental-retroflex contrast as adults (Tees
the two languages. It should therefore be & Werker, 1984).
assimilated as a TC contrast by Japanese Two other reports have explicitly evaluated
listeners. Although we did not obtain posttest PAM hypotheses against several other possible
assimilation judgments from the Japanese accounts for variation in perception of differing
listeners, the pattern of consistency in their non-native speech contrasts. One focused in depth
categorization and discrimination of the three on the Hindi dental-retroflex distinction in initial
Learning to Perceive the Sound Pattern of English 67

position, investigating English listeners' percep- of the Farsi voiced contrast. They described the
tion of that place of articulation contrast within Farsi voiced contrast in CG or DC assimilation
each of four different voicing settings: voiced, terms and the ejective contrast in BC or DC
voiceless aspirated, breathy voiced (i.e., voiced as- assimilation terms. Thus, the findings from these
pirated), and voiceless unaspirated (Polka, 1991). two studies are also generally consistent with the
The former two voicing patterns occur for initial predictions of the Perceptual Assimilation Model.
stops in English, whereas the latter two do not. In contrast with the evidence that adults assimi-
Performance on the four place of articulation con- late non-native contrasts with respect to native
trasts was not uniform, but rather was near phonological categories, young infants show little
chance for the former two voicing patterns, better or no effect of the ambient language on their per-
than chance for the breathy voiced one, and better ception of non-native consonants up to about 8
still for the voiceless unaspirated one. 10 This pat- months of age. A number of studies have shown,
tern of results led Polka to reject an account based however, that language-specific influences begin
on the lack of phonological status of the dental- to appear by 8-10 months and are well-established
retroflex stop contrast in English, as well as an by 10-12 months. But how closely does the 10-12
account based on exposure to dental allophones of month old's discrimination of various non-native
Iti-IdJ in English. An account in terms of the consonant contrasts mirror the pattern found in
acoustic salience of the formant transitions in the adults? In other words, are one-year olds likely to
various contrasts was also inconsistent with the have discovered the same higher-order invariants
observed performance pattern, given that formant in native speech contrasts as adults have? Have
transitions are most salient acoustically in the they yet discovered even that most basic aspect of
voiced dental-retroflex contrast, which was the the phonological component of the grammar-
most difficult for English listeners to discriminate. phonological contrast? According to the perceptual
However, an assimilation account seemed to work learning account of infant speech perception de-
well, in that most listeners heard both members of veloped here, the answer to the last two questions
the poorly discriminated voiced dental-retroflex should be "no."
contrast as Id/ and both members of the voiceless As with the literature on adult tests of cross-
aspirated dental-retroflex contrast as It!, i.e., as language speech perception, initial reports of a
BC contrasts. But they heard the more easily dis- decline by 10-12 months in infants' discrimination
criminated voiceless unaspirated dental-retroflex of non-native consonants used contrasts that
contrast as "th" (this) - Id/ and breathy voiced den- adults from their language community assimilate
tal-retroflex contrast as Id/-It!, i.e., the latter two as BC types. In a conditioned head-turn procedure
contrasts appear to have been heard as TC (see Eilers, Wilson, & Moore, 1977), Werker and
contrasts. colleagues found that English-learning 6-8 month
In a related study, Polka (1992) examined olds discriminate the Hindi voiceless unaspirated
English and Farsi listeners' perception of the dental-retroflex stops, the Hindi breathy voiced
velar-uvular stop distinction in two voicing vs. voiceless aspirated dental stops , and the
contexts: voiced (native to Farsi only) and ejective Nthlakampx velar-uvular ejectives Ik'I-/q'l. Yet by
(native to neither language). On the voiced velar- 10-12 months of age infants have essentially
uvular contrast, English listeners perceived the ceased to discriminate the first and third of these
uvular category as "bad" exemplars of English Igl (the latter age was not tested on the second
or as no clear English consonant, thus contrast). Hindi-learning and Nthlakampx-
assimilating the contrast as a CG or UC learning infants, of course, still discriminate their
difference, which they discriminated above native contrasts by 10-12 months (Werker et aI.,
chance. Most listeners in both groups performed 1981; Werker & Tees, 1984a). Moreover, when
poorly on the non-native ejective contrast, presented with a computer-synthesized continuum
describing it either in terms corresponding to an ranging from fbi to dental to retroflex stops, 6-8
BC assimilation pattern or a UU assimilation month old English-learning infants, 10-12 month
pattern. The few subjects in both groups who old Hindi infants, and Hindi adults perceive three
showed good discrimination described the latter separate categories, whereas 10-12 month old
sounds in terms corresponding to TC, CG or UC English-learning infants and English-speaking
assimilation. A separate group of English listeners adults hear only two categories corresponding to
showed comparable, above-chance discrimination fbi and d/ (Werker & Lalonde, 1988).
levels on the voiced and the ejective contrast, A recent study from my lab extended PAM
though with a trend toward better discrimination directly to infants' perception of additional types
of non-native assimilation types (Best et al., 1990). as large as that of the 6-8 month olds, but they
In this study, 6-8 month old and 10-12 month old also showed a high degree of variability. This
American English-learning infants each pattern suggests two possibilities that warrant
participated in three discrimination tests with further investigation: 1) the infants may have
non-native consonant contrasts from Zulu, the assimilated /kI-!k.'/ as a CG contrast and shown a
same ones that had been used in the adult study prototype asymmetry effect (Ruhl et al., 1992;
summarized earlier (Best, 1990): plosive vs. Polka & Werker, in press) in which discrimination
implosive bilabial stops, voiceless aspirated vs. depended on whether they habituated to the
ejective velar stops /kI-!k.'/, and voiced vs. voiceless English-like Zulu /kI or the non-prototypicallk'/; 2)
lateral fricatives. The infants were tested using a some of the infants may have assimilated /kI-!k.'/as
conditioned visual fIxation habituation procedure a BC contrast, failing to hear that the voicing lag
(see Best, McRoberts, & Bithole, 1988; Horowitz, in /kI is aspirated while the lag in !k.'/ completely
1975; Miller, 1983). As summarized earlier, blocks airflow (i.e., is silent), whereas others may
English-speaking adults assimilated the lateral have heard the aspiration difference and shown
fricatives as a TC contrast, the velars as a CG CG assimilation. The fIrst possibility would result
contrast, and the bilabials as an BC contrast. in signifIcant test-order effects in discrimination
Their discrimination levels followed the order TC levels, whereas the second would not.
> CG » BC. In the infant study, the 6-8 month The good discrimination of the lateral fricative
olds discriminated all three contrasts. The 10-12 and velar voicing contrasts by both 6-8 month olds
month olds, however, failed to discriminate all and English-speaking adults, but poor discrimina-
three Zulu contrasts, unlike both the younger tion by 10-12 month olds, indicates a temporary
infants and the adults. The most difficult contrast dip in development perhaps comparable to those
for them was the lateral fricative distinction. noted earlier in the phonological properties of tod-
Rather than showing even a small (non- dler's single word productions and in their percep-
signifIcant) fIxation increase from the end of tion of minimal contrasts in meaningful words.
habituation to the beginning of the test phase, as Thus, it may be evidence of progress in the discov-
they had shown for the other two contrasts, in the ery of higher-order phonological category infor-
lateral fricative test they simply showed a further mation in speech. To examine the time-course of
decline, or continuation of habituation. the transitional period for these two contrasts,
It is noteworthy that the TC lateral fricative Glendessa Insabella and I tested English-speaking
contrast was especially diffIcult for the 10-12 4 year olds, using the same conditioned fIxation
month olds, given that, as a TC contrast, it was habituation procedure as we had with the infants
the easiest of the Zulu contrasts for adults. The (although they had to be instructed that their fIx-
older infants' difficulty might be related to the fact ations controlled the audio, and that they should
that most adults assimilated the lateral fricatives tell us afterwards whether "the sounds changed"
to various consonant clusters, many of which were at some point during the test) (lnsabella & Best,
not phonotactically permissible in initial position 1990). We had to assure that this procedure was
in English, such as "zhl" and "shl." In other words, sensitive enough to detect discrimination for a
the adults did not find a simple segmental contrast we knew they should be able to hear, so
contrast in English, or even a pair of permissible all children had to show fIxation recovery on one
phonotactic sequences, to which they could test with English Ib/-/di. Because these older chil-
assimilate the lateral fricatives. Not surprisingly, dren would oilly tolerate two tests in a session, we
then, the older infants may have been unable to gave one group the Zulu lateral fricative distinc-
consistently detect any familiar native gestural tion as their second test; the other group got the
constellations in the lateral fricatives, and may Zulu velar voicing contrast as their second test.
have instead perceived them as a UU assimilation The 4-year-olds, uillike the 10-12 month olds, eas-
type, for which discrimination is expected to be ily discriminated the /kI-!k.'/ contrast. However,
poor or perhaps as an BC assimilation type re: they still failed to discriminate the lateral fricative
English (both Zulu fricatives had Ill-like properties contrast. Thus, they had already come into line
according to many adults. 11 The older infants also with adult performance on the CG contrast, but
failed to show signifIcant discrimination of the still showed depressed performance on the TC
velar voiceless aspirated vs. ejective /kI-!k.'/, which contrast which had proven easiest of all for the
was a fairly easy CG contrast for adults. On this adults. The reversal of the developmental dip for
contrast, they showed their largest average the CG contrast but not for this TC contrast
increase in fIxation during the test phase, nearly should not be particularly surprising, given the
Learning to Perceive the Sound Pattern of English 69

cOmplexity of the adults' assimilation patterns for that Werker used in her earlier reports of a de-
the latter contrast, as noted above. The prolonged cline in 10-12 month olds' discrimination of sev-
difficulty with the lateral fricative contrast is to be eral non-native consonant contrasts (e.g., Werker
expected according to the outline of perceptual et al., 1981; Werker & Tees, 1984a), we conducted
learning discussed earlier, in that the most com- a follow-up study. Using our fixation procedure,
mon assimilations for adults involved consonant we gave 6-8 and 10-12 month olds a test on the
clusters rather than single segments, and many of clicks, one on Ib/-/d/, and one on the Nthlakampx
the clusters were not even permissible in initial velar-uvular ejective contrast Ik'/-/q'/ used by
position in English. However, adults' assimila- Werker (Best & McRoberts, 1989). The procedural
tions for !kI-/k'/ were much simpler category good- difference did not matter-Werker's findings of
ness differences for a single English segment (!kI)' discrimination at 6-8 months and failure at 10-12
It is crucial to note, in light of the preceding months for the Ik'/-/q'/ contrast was replicated, as
discussion, that 10-12 month olds do not fail with was our previous finding of continued
all non-native contrasts. In a follow-up study with discrimination for the Zulu clicks at both ages.
6-8 and 10-12 month oIds, using the visual All told, then, the infant findings with non-na-
fixation habituation procedure but with a more tive consonants suggest increasing sensitivity to
stringent habituation criterion, infants completed native gestural constellations, which negatively
three tests: the Zulu lateral fricatives, the influences 10-12 month oIds' perception of many
Tigrinya ejective contrast /p'f./t'/ that adults had but not all non-native contrasts. However, the
assimilated as a TC contrast and discriminated patterning for which non-native contrasts are dis-
quite well (Best, 1990), and an English fricative criminated by older infants, and which are not,
voicing contrast (/s/-/zI) (Best, 1991). The younger differs in some telling ways from that of adults in
infants discriminated all three contrasts. This their language community. Although they discrim-
time the older group discriminated an adult TC inate two contrasts that adults discriminate fairly
contrast, the Tigrinya ejectives. But they still easily to very easily-an NA contrast and a TC
failed with the TC lateral fricative contrast. This contrast that adults consistently assimilate to a
failure could not be attributed to· a general simple segmental contrast in the native phonology
difficulty with fricative voicing distinctions, -these older infants fail to discriminate two other
because they were well able to discriminate the contrasts that adults also discriminate quite eas-
native English /s/-/z/ contrast. Given that they ily -a CG contrast and another TC contrast that
could discriminate the TC ejective /p'f../t'/ contrast shows a more complex and somewhat idiosyn-
that showed consistent, single-segment-based cratic assimilation pattern. These findings are
assimilation by adults, these findings lend consistent with the possibility that one-year olds
strength to the interpretation given above for the do not recognize the higher-order gestural invari-
difficulties 10-12 month olds and even 4-year olds ants specifying phonological relations, including
have with the lateral fricatives. minimal phonological contrasts. The infant's de-
Another study showed that older infants also tection of the somewhat lower-order invariants
clearly discriminate a non-native contrast that corresponding to native phonetic categories may
adults assimilate as an NA distinction, as pre- not mark the emergence of true segmental
dicted by PAM in concert with the perceptual phonology. Rather, the infant's detection ofphono-
learning approach (this was actually the first logical contrast per se may be crucially linked to a
PAM study in chronological terms). Infants at 6-8, growing awareness of word-meaning associations
8-10, 10-12, and also 12-14 months were tested on (see Lloyd, Werker, & Cohen, 1993), which
the Zulu click contrast on which adults had shown initially reflects gestural organization at the word
their "lowest" discrimination performance..-still or phrase level rather than the segmental level
fairly high at 80% correct-the lateral vs. apical (e.g., Studdert-Kennedy, 1989, 1991) As stated
voiceless unaspirated clicks (Best et al., 1988). earlier, perception of minimal phonological
This study used the same conditioned fixation contrasts in meaningful contexts may not appear
procedure as Best (1990). All infants also until around 18-19 months (Werker & Pegg,
completed a test with English Ib/-/d/. All four age 1992), generally coincident with the vocabulary
groups clearly discriminated the click contrast, spurt (50+ words) and primitive syntactic
even though they could not have had even constructions in productive language
allophonic experience with such sounds in English development.
utterances. Because we had used a rather Vowel contrasts. Much less research has exam-
different procedure than the head-turn procedure ined language-specific effects on adults' or infants'
70 Best

discrimination of vowel contrasts. However, the tively poor discrimination, whereas Ia/-/rel may
few available non-native vowel findings on adults show SC or weak CG assimilation pattern and
are consistent with PAM predictions, excepting rather poor discrimination. In contrast, Ii/-."eh"
that thus far no vowel contrasts have met the def- should show UC or TC assimilation and near-per-
inition of Non-Assimilable types, i.e., none are fect discrimination, while "ih"-/i! should likewise
perceived as nonspeech sounds. The possibility of show UC assimilation or a strong CG difference
NA vowel contrasts, in fact, seems quite remote and very good discrimination. Discrimination lev-
given the basic commonality of voicing and man- els for these contrasts in a recent study by Flege
ner of gestures involved in vowel production. (in press) are consistent with this assimilation ac-
Vowels are associated with a more open vocal count. All contrasts described except for Ii/-"eh"
tract than consonants, and slower, more global were tested with native Spanish listeners. They
gestures involving primarily the larger extrinsic showed very good discrimination for "ih"-/i/, and
muscles rather than the small intrinsic muscles of poor discrimination for the other three contrasts.
the tongue (with some concomitant jaw and lip The relation between discrimination performance
movements) (e.g., Fowler, 1980). Vowel color is and actual assimilation patterns cannot be deter-
differentiated primarily by the location and height mined, however, because the listeners assimila-
of the tongue at its closest approximation to the tions were not assessed. Flege accounts for the
upper surface of the vocal tract. Vowel contrasts findings with his Speech Learning Model, which is
may also involve length (duration) and voice concerned with whether non-native sounds are
quality differences (e.g., creaky voice). Other dif- "identical," "similar," or completely "new" with re-
ferences in the production and in the phonological spect to native phonological categories (for details,
functions of vowels versus consonants may ulti- see Flege, 1991b).
mately be important for understanding adult Also compatible with assumptions about adults'
cross-language assimilation patterns and early assimilation of non-native segments to their na-
developmental changes in perception of non-native tive phonology, Rochet (in press) found differences
contrasts (see Best, 1993). For example, vowels in the assimilation of the Canadian French high
usually provide the sonority peaks in syllable nu- front-rounded vowel Iyl by Portuguese and English
clei (open airflow through vocal tract); vowels listeners that corresponded to differences in
carry the prosodic properties of utterances much productions of Ii! and lui in those two languages.
more than consonants do; speech errors occur Specifically, English listeners strongly tended to
among vowels or among consonants but never assimilate French Iyl as an lui, whereas
croSs between the two classes; and articulatory Portuguese listeners assimilated it as an Ii!. Also,
movements affect the two classes in opposite Polka (submitted) found that English listeners
manners under stress and speech rate variations assimilated German high front lip-rounded Iyl and
(see Fowler, 1980). high back rounded lui as a strong CG difference
Findings on English vowel perception by native for English short "00," and German mid-high front
Spanish-speaking adults (Flege, 1991b, in press) rounded IYI vs. mid-high back rounded lUI as a
fit well within the PAM predictions, although the weaker CG difference for short "00." She assessed
research was not motivated by the model. Spanish assimilation patterns directly via a keyword
contains only five vowels: Ii! as in S1,., Ia/ as in casa identification task, in which listeners had to
(more fronted than English 10/), lei as in mgs choose from a list of words that reflected the
(roughly "ay" but not diphthongized as in inventory of English vowels (e.g., hid, hoed, heed,
English), 101 as in YQ (not diphthongized as in heard, etc.) to characterize the perceived closest
English) and lui as in SIl. (not diphthongized as in match for each non-native vowel. Discrimination
English). It does not have "eh", "ih", lrel as in bat, was very good for both German contrasts, but
"uh", short "00" as in book, "aw," or several other significantly better for Iy/-Iul than for 1Y1-1U1,
English vowels Thus, English 10/ should be assimi- which Polka interpreted to be consistent with
lated by Spanish listeners as a moderately deviant PAM's predictions.
exemplar of Spanish Ia/. English "ih'" "eh," and lrel Finally, in a recent study completed in my
should be heard as uncategorizable vowels (with laboratory (Best, Faber, & Levitt, in preparation),
respect to each other), or perhaps as poor category English-speaking adults were presented with
exemplars with respect to Spanish Ii!, lei and Ia/, three French vowel contrasts, two Norwegian
respectively. That is, English "ih"-"eh" and "eh"-/rel contrasts, and a Thai contrast. The non-native
should be assimilated as UU types vis a vis vowel contrasts tested were: French high front-
Spanish phonology, and thus should show rela- rounded Iyl vs. mid front-rounded lrel were
uaming to Perceive the Sound Pattern of English 71

generally assimilated as the TC contrast long vs. less English-like vowel in each pair. That is, there
short "00" (boot-book), and French lrel vs. less was no vowel prototype effect on discrimination.
rounded French schwa I~I were generally However, by 6 months of age, infants discrimi-
assimilated as the TC short "oo"-"uh." Both were nated the German vowels only if the habituation
discriminated very well. Similarly, the Norwegian or background stimulus was a non-prototype for
high front in-rounded Iw/and high front English (according to the adult judgments), con-
unrounded Iii were assimilated unanimously as sistent with greater generalization to the proto-
the TC contrast short "00"- "ee" and were also type than the non-prototype. By 10-12 months,
discriminated perfectly. French 10/-/01 ( nasalized discrimination of both German contrasts failed re-
"0") were assimilated as either a strong CG gardless of the direction of stimulus change (Polka
difference for English "0" or as a TC contrast (e.g., & Werker, in press). The results provide another
"o"-"aw") and were discriminated very well. Thai example of non-native contrasts that are discrimi-
high back unrounded IwI and high mid-back nated quite well as CG contrasts by adults in the
unrounded Ia! were assimilated as either a infants' language environment but which are not
moderate CG difference for English "uh" or, for discriminated by infants over a certain age, the
some subjects, to theTC contrast short "oo"-"uh" developmental pattern that was found for discrim-
and was discriminated slightly less well than the ination of Zulu lkI-k'l (Best et aI., 1990). Taken to-
other TC and CG contrasts. Finally, Norwegian gether, the infant vowel perception findings sug-
high front out-rounded Iyl (which has less lip- gest that native language effects appear earlier for
rounding than French Iy/: Linker, 1985) and Iii perceptual prototype effects for non-native vowels
were assimilated by nearly all subjects as (around 6 months) than for discrimination of non-
comparably good Iii, that is, as a SC type; native consonant contrasts (around 10-12
discrimination was much poorer for this contrast months). The argument offered here is that in-
than for the others. When individual subjects' fants discover relational invariants associated
assimilations were grouped according to TC type with native vowels earlier than higher-order in-
vs. CG type vs. SC type, regardless of the specific variants associated with native consonants.
non-native vowels involved, the results clearly Why do infants show changes in perception of
upheld PAM predictions: discrimination was near non-native vowels earlier than consonants? Why
ceiling for TC assimilations, very good but does the emergence of native-language effects on
significantly lower for CG assimilations, and much vowel perception but not consonant perception
lower for SC assimilations. precede infants' earliest word-meaning associa-
Three very recent findings with infants are rele- tions? Both observations suggest that the invari-
vant to understanding the course of perceptual ants infants first discover in native vowels are
learning for vowels, although only one explicitly simpler and/or easier to detect than those discov-
evaluated PAM hypotheses. All three studies ered in native consonants. There are a number of
point to differences between vowels and conso- possible reasons for this developmental asymme-
nants in the development of native-language ef- try. Vowel invariants may be easier to discover
fects on perception. In one study of 6 month olds, because the slower vowel gestures are more stable
English-learning and Swedish-learning infants within the flow of information and are evident
showed vowel prototype effects only for a native over a longer period of time than consonants.
vowel and not for a non-native one (Kuhl et aI., Different gestural invariants may be extracted for
1992). Comparison of this result to the vowel pro- the two classes because the style and complexity
totype effects found for both native and non-native of articulatory movements differ. Vowels also
vowels in English- versus Spanish-learning new- carry the prosody of an utterance. Thus the infor-
borns (Walton & Socotch, 1993) suggests a devel- mation for vowel invariants may be salient to the
opmental decline between birth and 6 months in young infant at the broader and more attention-
detecting goodness-of-fit differences for unfamiliar getting prosodic level of sound structure in
vowel categories. This suggests that the invari- utterances.
ants detected in native vowels by 6 month olds vs.
younger infants are different, a possibility sup- Further work on language-specific attunement
ported by a third recent finding. Both German CG to speech
vowel contrasts from the Polka, (submitted) adult Generally, the findings on adults' and infants'
study described above were discriminated by 4-1/2 perception of non-native segmental contrasts fit
month oIds, who showed no asymmetry in discrim- well with the Perceptual Assimilation Model and
ination between the more English-like and the the basic principles of an ecological approach to
72 Best

perceptual learning of the information in native and specifically that this leads to maximal dis-
speech. However, a number of important ques- persion among the elements of language-specific
tions remain unanswered, and must be pursued in phonological inventories (Lindblom, 1992;
future research. For example, we still do not know Lindblom, Krull, & Stark, 1993; Lindblom,
how or even whether infants actually assimilate MacNeilage, & Studdert-Kennedy, 198;3). But
non-native sounds to native phonetic categories. even that work has not addressed how the
Nor do we know which features or invariants they "optimization of phonetic space" by a language
actually extract from either native or non-native might be expected to affect a listener's perception
speech. Generating the methodology for assessing of particular non-native contrasts. However, as
these issues will not be easy. Ultimately, tech- Lindblom points out (Lindblom, Krull, & Stark,
niques will also be needed to investigate the de- 1993) the principle of maximal dispersion would
velopment of perceptual sensitivity to more ab- benefit the learning of the native sound system by
stract phonological properties such as allophonic drastically reducing the size of the phonetic space
relations, allomorphy (e.g., the voiceless vs. voiced that must be explored to discover the sound pat-
plural marker in ca~ vs. dog§), and grammatical terning of the ambient language. The relation-
effects on phonetic forms (e.g., unreleased ItJ in sit ships among elements in the system would help to
vs. flap in sitting). illuminate precisely which differences are critical
Indeed, it is still largely unknown exactly what in the language, and thereby reduce the informa-
information is captured in the invariants for adult tion that must be picked up subsequently by the
speech perception, especially the higher-order perceiver. The Perceptual Assimilation Model is
invariants, although cross-modal speech quite amenable to the conception of the phonologi-
perception research indicates that the crucial cal system as an optimization of phonetic space by
information is gestural in nature, and is not a given language, but further effort is obviously
specified in purely auditory terms but rather is needed to work out the implications in detail.
amodal (e.g., Fowler & Dekle, 1991; Summerfield,
1978; Walton & Bower, in press). Much more work CONCLUSION
will be needed on this issue, which should benefit What is innate about the development of the
from the ecological approach to speech production phonological component of a language's grammar?
and its phonological organization (e.g., Browman That is, what is it that provides the constraints on
& Goldstein, 1989, 1990a, 1992a; Kelso, Saltzman, acquisition of possible phonological systems? By
& Tuller, 1986; Saltzman & Munhall, 1989). It the ecological reasoning presented in this chapter,
seems likely that characterizing the invariants in the answer is that what is innate-what provides
speech perception will depend on careful the constraints on phonologies and their
mathematical and physical analyses as it has in development-is the structure and dynamic
other domains where, for example, a single possibilities of the human vocal tract. To a first
parameter (termed Tau) has been mathematically approximation, this claim is in line with the
determined to be the singular invariant that underlying assumptions of Chomsky and Halle
specifies time to contact for an observer moving themselves, whose universal phonetic features
toward an object (Lee, 1976; Lee, Young, & Rewt, were initially based on articulatory concepts. The
1992) or for a trajectile moving toward an observer point on which I disagree with them is their
(Savelsbergh, Whiting, & Bootsma, 1991; see also assumption that the constraints are specified
Michaels & Oudejans, 1992), including audible but innately in the mind. By the ecological view
unseen objects rolling toward a listener (Shaw, proposed here, the constraints are, instead,
McGowan, & Turvey, 1991). literally in the physical head, in the vocal tract
In searching out the higher-order invariants for itself and in the lawful physical effects that its
perception of native and non-native speech, it will configuration and movements have on the
probably be necessary also to view the native temporally-varying shape of its acoustic product.
phonology as an organized system. That is, ulti- Chomsky and Halle (1968) were correct in sug-
mately it will be important to conceive of the per- gesting that the listener who knows a language
ceptual effects of phonological differences between hears the phonetic shapes made familiar by expe-
languages more comprehensively, as effects of sys- rience with that language. This claim, I have ar-
temic differences, and not simply differences in el- gued, can be extended even to predict that the lis-
ements or contrasts that one language has and tener hears echoes of those familiar, native pho-
another lacks. This caveat is motivated by propos- netic shapes in the non-native sounds and con-
als that phonological systems are self-organizing, trasts of unfamiliar languages. But I part ways
Learning to Perceive the Sound Pattern of English 73

with their reasoning about the causal mecha- Best, C. T. (1991). Phonetic influences on the perception of
nonnative speech contrasts by 6-8 and 10-12 month-olds.
nisms, and about the source of listeners' knowl-
Presented at the meeting of the Society for Research in Child
edge. Instead, I claim that listeners hear the Development. Seattle WA, April.
phonological structure of their native language in Best, C. T. (1993). Emergence of language-specific constraints in
non-native speech because they have learned to perception of non-native speech: A window on early
detect the gestural invariants that are directly phonological development. In B. de Boysson-Bardies, S. de
Schonen, P. Jusczyk, P. MacNeilage, & J. Morton (Eds.),
available in the information flow from the lan- Developmental neurocognition: Speech and face processing in the
guage environment. Listeners become attuned to first year of life (pp. 289-304). Dordrecht, the Netherlands:
these gestural patterns and pick up the invariants Kluwer Academic Publishers.
specifying those familiar patterns wherever the Best, C. T. (in press a). The emergence of native-language
stimulation provides criterial evidence for them, phonological influences in infants: A perceptual assimilation
model. To appear in J. Goodman & H. C. Nusbaum (Eds.), The
even in non-native sounds. This attunement to na- development of speech perception: The transition from speech sounds
tive gestural invariants begins in infancy but ex- to spoken words. Cambridge MA: MIT Press.
tends over development and into adulthood, where Best, C. T. (in press b). A direct realist view of cross-language
it should even help to account for perceptual speech perception. To appear in W. Strange (Ed.), Speech
changes during the learning of additional perception and linguistic experience: Theoretical and methodological
issues in cross-language speech research. Timonium, MD: York
languages. Press.
Best, C. T., Faber, A., & Levitt, A. (in preparation). Association
REFERENCES between adults' perceptual assimilation and discrimination of
Abbs, J., & Gracco, V. (1984). Control of complex motor gestures: diverse non-native vowel contrasts.
Orofacial muscle responses to load perturbations of the lip Best, C. T., Goodell, E., & Wilkenfeld, D. (in preparation).
during speech. Journal of Neurophysiology, 51, 705-723. Phonologically-motivated substitutions in a 20-22 month old's
Anderson, S. R (1985). Phonology in the twentieth century: Theories imitations of intervocalic alveolar stops.
of rule and theories of representations. Chicago: University of Best, C. T., & McRoberts, G. (1989). Phonological influences on
Chicago Press. infants' perception of two non-native speech contrasts.
Archangeli, D. (1988). Aspects of underspecification theory. Presented at the Society for Research in Child Development,
Phonology, 5, 183-207. Kansas City, April.
Archangeli, D., & Pulleyblank, D. (in press). The content and Best, C. T. McRoberts, G. W., & Sithole, N. N. (1988). The
structure of phonological representations. Cambridge, MA: MIT phonological basis of perceptual loss for non-native contrasts:
Press. Maintenance of discrimination among Zulu clicks by English-
Aslin, R N. (1987). Visual and auditory development in infancy. speaking adults and infants. Journal of Experimental Psychology:
In J. D. Osofsky (Ed.), Handbook of infant development (Vol. 1, pp. Human Perception and Performance, 14,345-360.
5-97). New York: Wiley. Best, C. T., McRoberts, G. W., Goodell, E., Womer, J. S., Insabella,
Aslin, R N., Pisoni, D. B., & Jusczyk, P. W. (1983). Auditory G., Kim, P., Klatt, L., Luke, S., & Silver, J. (1990). Infant and
development and speech perception in infancy. In M. Haith & adult perception of nonnative speech contrasts differing
J. Campos (Eds.), Handbook ofchild psychology Vol. 2: Infancy and in relation to the listener's native phonology. Presented at
developmental psychobiology. New York: Wiley. meeting of the International Conference on Infant Studies.
Berko, J. (1958). The child's learning of English morphology. Montreal, April.
Word, 14, 150-177. Best, C. T., & Strange, W. (1992). Effects of phonological and
Bernstein Ratner, N. (1984). Patterns of vowel modification in phonetic factors on cross-language perception of approximants.
mother-child speech. Journal of Child Language, 11,557-578. Journal ofPhonetics, 20, 305-330.
Bernstein Ratner, N. (1986). Durational cues which mark clause Bialystock, E. (1988). Levels of bilingualism and levels of linguistic
boundaries in mother-child speech. Journal of Phonetics., 14,303- awareness. Developmental Psychology, 24, 560-567.
309. Bloch, B. (1950). Studies in colloquial Japanese IV: Phonemics.
Bernstein Ratner, N. (1993). Interactive influences on phonological Language, 26,86-125.
behavior: A case study. Journal of Child Language, 20,191-197. Bohannon, MacWhinney, B., & Snow, C. (1990). No negative
Bernstein Ratner, N., & Luberoff, A. (1984). Cues to post-vocalic evidence revisited: Beyond learnability or who has what to
voicing in mother-child speech. Journal of Phonetics, 12,285-289. prove to whom. Developmental Psychology, 26, 221-226.
Bernstein Ratner, N., & Pye, C. (1984). Higher pitch in BT is not Bomstein, M. H. (1979). Perceptual development: Stability and
universal: Acoustic evidence from Quiche Mayan. Journal of change in feature perception. In M. H. Bornstein & W. Kessen
Child Language, 11, 515-522. (Eds.), Psychological development from infancy. Hillsdale NJ:
Bertoncini, J., Bijeljac-Babic, R, Jusczyk, P. W., Kennedy, L. L & Erlbaum.
Mehler, J. (1988). An investigation of young infants' perceptual Browman, C. P., & Goldstein, L. (1986). Towards an articulatory
representations of speech sounds. Journal of Experimental phonology. Phonology Yearbook, 3, 219-252.
Psychology: General, 117, 21-33. Browman, C. P., & Goldstein, L. (1989). Articulatory gestures as
Best, C. T. (1984). Discovering messages in the medium: Speech phonological units. Phonology, 62, 201-251.
and the prelinguistic infant. In H. E. Fitzgerald, B. Lester, & M. Browman, C. P., & Goldstein, L. (1990a). Representation and
Yogman (Eds.), Advances in pediatric psychology (Vol. 2, pp. 97- reality: Physical systems and phonological structure. Journal of
145). New York: Plenum. Phonetics, 18,411-424.
Best, C. T. (1990). Adult perception of nonnative contrasts Browman, C. P., & Goldstein, L. (1990b). Gestural specification
differing in assimilation to native phonological categories. using dynamically-defined articulatory structures. Journal of
Journal of the Acoustical Society of America, 88, 5177. Phonetics, 18, 299-320.
74 Best

Browman, C. P., & Goldstein, 1. (1990c). Tiers in articulatory Eimas, P. D., Siqueland, E. R, Jusczyk, P., & Vigorito, J. (1971).
phonology, with some implications for casual speech. In Speech perception in infants. Science, 171, 303-306.
J. Kingston & M. E. Beckman (Eds.), Papers in laboratory Elman, J. 1., Diehl, R I., & Buchwald, S. E. (1977). Perceptual
phonology I: Between the grammar and physics of speech (pp. 341- switching in bilinguals. Journal of the Acoustical Society of
376). Cambridge, UK: Cambridge University Press. America, 62, 971-974.
Browman, C. P., & Goldstein, 1. (1992a). Articulatory phonology: Faber, A (1992). Articulatory variability, categorical perception,
An overview. Phonetica, 49, 155-180. and the inevitability of sound change. In G. W. Davis & G. K
Browman, C. P., & Goldstein, 1. (1992b). Response to Iverson (Eds.), Explanation in historical linguistics (pp. 59-75).
commentaries. Phonetica, 49, 222-234. Amsterdam: John Benjamins Publishing Co.
Carney, A. E., Widin, G. P., & Viemeister, N. F. (1977). Faber, A., Best, C. T., & Di Paolo, M. (1993). Cross-dialect
Noncategorical perception of stop consonants differing in VOT. perception of nearly-merged forms. Presented at the meeting of
Journal of the Acoustical Society ofAmerica, 62, 961-970. the Linguistic Society of America. Los Angeles, CA, JanualY.
Catford, J. C. (1988). A practical introduction to phonetics. Oxford: Faber, A., Di Paolo, M., & Best, C. T. (submitted). Perceiving the
Clarendon Press. unperceivable: The acquisition of nearly-merged forms.
Chambers, J. K. (1992). Dialect acquisition. LAnguage, 68, 673-705. Ferguson, C. A., & Farwell, C. B. (1975). Words and sounds in
Chomsky, N, (1965). Aspects of the theory of syntax. Cambridge MA: early language acquisition. LAnguage, 15,419-439.
MIT Press. Ferguson, C. A (1986). Discovering sound units and constructing
Chomsky, N. (1972). LAnguage and mind. Cambridge MA: Harcourt sound systems: It's child's play. In J. S. Perkell & D. H. Klatt
Brace Jovanovich, Inc. (Eds.), 1nvariance and variability of speech processes (pp. 36-53).
Chomsky, N., & Halle, M. (1968). The sound pattern of English. New Hillsdale, NJ: Erlbaum.
York: Harper & Row. Fernald, A. (1984). The perceptual and affective salience of
Christophe, A., Dupoux, E., Bertoncini, J., & Mehler, J. mother's speech to infants. In 1. Feagans, C. Garvey, & R
(submitted). Do infants perceive word boundaries? An Golinkoff (Eds.), The origins and growth of communication (pp. 5-
empirical approach to the bootstrapping problem for lexical 29). Norwood, NJ: Ablex.
acquisition. Fernald, A (1985). Four-month-old infants prefer to listen to
Clements, G. N. (1985). The geometry of phonological features. motherese. Infant Behavior and Development, 8, 181-195.
Phonology Yearbook, 2, 225-252. Fernald, A, & Kuhl, P. K. (1982). [Discrimination of five-syllable
Clements, G. N. (1992). Phonological primes: Features or sequences (galasalaga versus galatalaga) by 6 month
gestures? Phonetica, 49,181-193. old infants]. Unpublished findings, cited in Karzon, R G.
Cohn, A. (1990). Phonetic and phonological rules of nasalization. (1985)
UCLA Working Papers, 76, May. Fernald, A., & Kuhl, P. K. (1987). Acoustic determinants of infant
Cooper, R P., & Aslin, R N. (1990). Preference for infant-directed preference for motherese speech. Infant Behavior and
speech in the first month after birth. Child Development, 61, Development, 10,279-293.
1584-1595. Fernald, A., & Mazzie, C. (1991). Prosody and focus in speech to
Costa, P., & Mattingly, 1. G. (1981). Production and perception of infants and adults. Developmental Psychology, 27, 209-221.
phonetic contrast during phonetic change. Journal of the Fernald, A., & Simon, T. (1984). Expanded intonation contours in
Acoustical Society ofAmerica, 69,567 (abstract). mothers' speech to newborns. Developmental Psychology, 20,104-
Cruttenden, A (1974). An experiment involving comprehension 113.
of intonation in children from 7 to 10. Journal of Child LAnguage. Fernald, A., Taeschner, T., Dunn, J, Papousek, M., Boysson-
1,221-231. Bardies, B., & Fukui, 1. (1990). A cross-language study of
Dent, C. H. (1990). An ecological approach to language prosodic modifications in mothers' and fathers' speech to
development: An alternative functionalism. Developmental preverbal infants. Journal of Child LAnguage, 16,477-502.
Psychobiology, 23, 679-703. Flege, J. E. (1984). The detection of French accent by American
Dent. c., & Rader, N. (1979). Perception, meaning and research in listeners. Journal of the Acoustical Society ofAmerica, 76, 692-707.
semantic development. In P. French (Ed.), The development of Flege, J. E. (1987). A critical period for learning to pronounce
meaning: Pedolinguistic series (pp. 178-230). Japan: Bunka foreign languages? Applied Psycholinguistics. 8, 162-177.
Hyoron. Flege. J. E. (1989). Chinese subjects' perception of the word-final
Derwing, B. (1973). Transformational grammar as a theory of language English It I-I d I contrast: Before and after training. Journal of
acquisition. Cambridge, UK: Cambridge University Press. the Acoustical Society of America, 86,1684-1697.
Di Paolo, M. (1992). Hypercorrection in response to apparent Flege, J. E. (1991a). Perception and production: The relevance of
merger of 101 and I:JI in Utah English. Language & phonetic input to L2 phonological learning. In T. Huebner & C.
Communication, 12.267-292. Ferguson (Eds.), Cross-currents in second language acquisition and
Di Paolo, M., & Faber, A (1991). Phonation differences and the linguistic theory. Philadelphia: John Benjamins.
phonetic content of the tense-lax contrast in Utah English. Flege. J. E. (1991b). Orthographic evidence for the perceptual
LAnguage Variation and Change, 2, 155-204. identification of vowels in Spanish and English Quarterly
Eilers. R E. (1977). Context-sensitive perception of naturally Journal of Experimental Psychology. 43. 701-731.
produced stop and fricative consonants by infants. Journal of the Flege. J. E. (in press). Second-language speech learning: Theory,
Acoustical Society of America, 61, 1321-1336. findings, and problems. To appear in W. Strange (Ed.), Speech
Eilers, R E., Gavin W. J., & Oller, D. K. (1982). Cross lingUistic perception and linguistic experience: Theoretical and methodological
perception in infancy: Early effects of linguistic experience. issues. Timonium. MD: York Press.
Journal of Child LAnguage, 9, 289-302. Flege, J. E., & Eefting, W. (1987). The production and perception
Eilers, R E., & Minifie. F. D. (1975). Fricative discrimination in of English stops by Spanish speakers of English. Journal of
early infancy. Journal of Speech and Hearing Research. 18, 158-167. Phonetics. 15, 67-83.
Eilers, R E., Wilson, W. R, & Moore, J. M. (1977). Developmental Flege, J. E., & Fletcher, J. (1992). Listener and talker effects on the
changes in speech discrimination in infants. Journal of Speech perception of degree of foreign accent. Journal of the Acoustical
and Hearing Research, 20, 766-780. Society of America, 91, 370-389.
Learning to Perceive the Sound Pattern of English 75

Fodor, J. A., Garrett, M. F., & Brill, S. 1. (1975). Pi Ka Pu: The Gibson, J. J., & Gibson, E. J. (1955). Perceptual learning:
perception of speech sounds by pre-linguistic infants. Perception Differentiation or enrichment? Psychological Review, 62, 32-4!.
& Psychophysics, 18, 74-78. Gillette, S. (1980). Contextual variation in the perception of L and
Fourakis, M., & Port, R. (1986). Stop epenthesis in English. Journal R by Japanese and Korean speakers. Minnesota Papers in
of Phonetics, 14, 197-22!. Linguistics and the Philosophy of Language, 6, 59-72.
Fowler, C. A. (1980). Coarticulation and theories of extrinsic Gleitman, 1., Gleitman, H., Landau, B., & Wanner, E. (1988).
timing. Journal of Phonetics, 8,113-133. Where learning begins: Initial representations for language
Fowler, C. A. (1986). An event approach to the study of speech learning. In F. Newmeyer (Ed.), The Cambridge Linguistic Survey.
perception from a direct-realist perspective. Journal of Phonetics, Cambridge MA: MIT Press.
14,3-28. Goldring (Zukow), P. (1991). Early steps toward language: How
Fowler, C. A. (1989). Real objects of speech perception: A social affordances educate attention. Presented at meeting of
commentary on Diehl and Kluender. Ecological Psychology, 1, the International Conference on Event Perception and Action.
145-160. Amsterdam, the Netherlands, August.
Fowler C. A. (1991). Sound-producing sources as objects of Goldsmith, J. (1976). Autosegmental phonology. Unpublished
perception: Rate normalization and nonspeech perception. doctoral dissertation, MIT University.
Journal of the Acoustical Society of America, 88, 1236-1249. Goldstein, 1., & Browman, C. P. (1986). Representation of voicing
Fowler, C. A., Best, C. T., & McRoberts, G. W. (1990). Young in- contrasts using articulatory gestures. Journal of Phonetics, 14,
fants' perception of liquid coarticulatory influences on follow- 339-342.
ing stop consonants. Perception & Psychophysics, 48, 559·570. GQodell, E., & Studdert-Kennedy, M. (1990). From phonemes to
Fowler, C. A., & Dekle, D. J. (1991). Listening with eye and hand: words or words to phonemes: How do children learn to talk?
Cross-modal contributions to speech perception. Journal of Presented at meeting of the International Conference on Infant
Experimental Psychology: Human Perception and Performtlnce, 17, Studies, Montreal, April.
816-828. Goodsitt, J. V., Morse, P. A., Ver Hoeve, J. N., & Cowan, N. (1984).
Fowler, C. A., Rubin, P. Remez, R. E., & Turvey, M. T. (1980). Infant speech recognition in multisyllabic utterances. Child
Implications for speech production of a general theory of Development, 55,903-910.
action. In B. Butterworth (Ed.), Speech production (pp. 373-420). Goto, H. (1971). Auditory perception by normal Japanese adults
New York: Academic Press. of the sounds "L" and "R." Neuropsychologia, 9, 317-323.
Fowler, C. A., & Smith, M. (1986). Speech perception as "vector Grieser, D. L, & Kuhl, P. K. (1988). Maternal speech to infants in a
analysis": An approach to the problems of segmentation and tonal language: Support for universal prosodic features in
invariance. In J. Perkell & D. H. Klatt (Eds.), Invariance and motherese. Developmental Psychology, 24, (1), 14-20.
variability in speech processes (pp. 123-139). Hillsdale, NJ: Grieser, D. 1., & Kuhl, P. K (1989). Categorization of speech in
Erlbaum. infants: Support for speech-sound prototypes. Developmental
Friederici, A. D., & Wessels, J. M. 1. (in press). Phonotactic Psychology, 25, 577-588.
knOWledge and its use in infant speech perception. Perception & Hillenbrand, J. (1983). Perceptual organization of speech sounds
Psychophysics. by infants. Journal ofSpeech and Hearing Research, 26, 268-282.
Gerken, 1. A., Landau, B., & Remez, R. E. (1990). Function Hillenbrand, J. (1984). Speech perception by infants:
morphemes in young children's speech perception and Categorization based on nasal consonant place of articulation.
production. Developmental Psychology, 27, 204-216. Journal of the Acoustical Society of America, 75, 1613-1622.
Gerken, 1., & Mcintosh, B. J. (1993). Interplay of function Hirsh-Pasek, K, Kemler Nelson, D., Jusczyk, P., Wright Cassidy,
morphemes and prosody in early language. Developmental K, Druss, B., & Kennedy, 1. (1987). Clauses are perceptual
Psychology, 29, 448-457. units for young infants. Cognition, 26, 269-286.
Gibson, E. J. (1963). Perceptual learning. Annual Review of Hockett, C. F. (1963). The problem of universals in language. In J.,
Psychology, 14, 29-56. H., Greenebrg (Ed.), Universals in language. Cambridge MA:
Gibson, E. J. (1966) Perceptual development and the reduction of MIT Press.
uncertainty. In Proceedings of the 18th International Congress of von Hofsten, C. (1980). Predictive reaching for moving objects by
Psychology, 7-17. human infants. Journal of Experimental Child Psychology, 30, 369-
Gibson, E. J. (1969). Principles of perceptual learning and development. 382.
Englewood Cliffs, NJ: Prentice-Hall, Inc. Hohne, E. A., & Jusezyk, P. W. (1992). Allophonic variation and
Gibson, E. J. (1971). Perceptual learning and the theory of word word segmentation in infant speech perception. Paper
perception. Cognitive Psychology, 2, 351-368. presented at the International Conference on Infant Studies, Miami
Gibson, E. J. (1977). How perception really develops: A view from FL,May.
outside the system. In D. LaBerge & S. J. Samuels (Eds.), Basic Holm, J. A. (1988). Pidgins and creoles. Volume I: Theory and
processes in Reading: Perception and Comprehension. Hillsdale, NJ: structure. Cambridge, UK: Cambridge University Press.
Erlbaum Associates. Holmberg, T. 1., Morgan, K A., & Kuhl. P. K. (1977). Speech
Gibson, E. J. (1988). Exploratory behavior in the development of perception in early infancy: Discrimination of fricative
perceiving, acting, and the acquiring of knowledge. Annual consonants. Paper presented at the meeting of The Acoustical
Review of Psychology, 39, 1-49. Society of America, Miami Beach FL (December).
Gibson, E. J. (1991). An odyssey in learning and perception. Horowitz, F. D. (1975). Visual attention, auditory stimulation, and
Cambridge, MA: Bradford Books (MIT Press). language discrimination in infants. Monographs of the Society for
Gibson, E. J., Gibson, J. J. (1972). The senses as information- Research in Child Development, 39, Serial # 159.
seeking systems. (London) Times Literary Supplement, June 23, Inkelas, 5., & Leben, W. (1990). Where phonology and phonetics
711-712. intersect. In J. Kingston & M. E. Beckman (Eds.), Papers in
Gibson, J. J. (1966). The senses considered as perceptual systems. laboratory phonology I: Between the grammar and physics of speech
Boston, MA: Houghton-Mifflin. (pp. 341-376). Cambridge, UK: Cambridge University Press.
Gibson, J. J. (1979). The ecological approach to visual perception. Insabella, G., & Best, C. T. (1990). Four-year-olds' perception of
Boston, MA: Houghton-Mifflin. nonnative contrasts differing in phonological assimilation.
76 Best

Presented at meeting of the Acoustical Society of America, San Kelso, J. A. S., Saltzman, E. 1., & Tuller, B. (1986). The dynamical
Diego, November. perspective on speech production: Data and theory. Journal of
Ito, J. (1986). Syllable theory in prosodic phonology. Unpublished Phonmcs, 14,29-59.
doctoral dissertation, University of Massachusetts, Amherst. Kemler Nelson, D. G., Hirsh-Pasek, K., Jusczyk, P. W., & Wright-
Jakobson, R, Fant, G., & Halle, M. (1963). Preliminaries to speech Cassidy, K. (1989). How the prosodic cues in motherese might
analysis. Cambridge MA: MIT Press. assist language learning. Journal of Child Language, 16,55-68.
Jakobson, R, & Halle, M. (1957). Phonology in relation to Kenstowicz, M., & Kisseberth, C. (1979). Generative phonology.
phonetics. In 1. Kaiser (Ed.), Manual of phonetics (pp. 215-251). New York: Academic Press.
Amsterdam: North Holland. Kent, R D., Carney, P. J., & Severeid, 1. R (1974). Velar
Jamieson & Morosan (1986). Training non-native speech contrasts movement and timing: Evaluation of a model for binary
in adults: Acquisition of the English /6/-/6/ contrast by control. Journal ofSpeech and Hearing Research, 17, 175-177.
francophones. Perception & Psychophysics, 40, 205-215. Krakow, R, Beddor, P. S., Goldstein, 1., & Fowler, C. A. (1988).
Jeffreys, W. H., & Berger, J. O. (1992). Ockham's razor and Coarticulatory influences in the perceived height of nasal
Bayesian analysis. American, Scientist, 80, 64-72. vowels. Journal of the Acoustical Society ofAmerica, 83, 1146-1158.
Jusczyk, P. W. (1992). Developing phonological categories from Kugler, P. N., Kelso, J. A. S., & Turvey, M. T. (1982). On the
the speech signal. In C. A. Ferguson, 1. Menn, & C. Stoel- control and coordination of naturally developing systems. In J.
Gammon (Eds.), Phonological development: Models, research, A. S. Kelso & J. E. Clark (Eds.), The development of movement
implications (pp. 17-64). Timonium, MD: York Press. control and coordination (pp. 5-78). Chichester, UK: John Wiley.
Jusczyk, P. W. (1993). Sometimes it pays to look back before you Kuhl, P. K. (1979). Speech perception in early infancy: Perceptual
leap ahead. In B. de Boysson-Bardies, S. de Schonen, P. Jusczyk, constancy for spectrally dissimilar vowels. Journal of the
P. MacNeilage, & J. Morton (Eds.), Developmental neurocognition: Acoustical Society ofAmerica, 66,1668-179.
Speech and face processing in the first year of life (pp. 289-304). Kuhl, P. K. (1980). Perceptual constancy for speech-sound
Dordrecht, the Netherlands: Kluwer Academic Publishers. categories in early infancy. In G. H. Yeni-Komshian, J. F.
Jusczyk, P. W. (1994). Infant speech perception and the Kavanaugh, & C. A. Ferguson (Eds.), Child phonology: Vol. 2:
development of the mental lexicon. In J. Goodman & H. Perception (pp. 41-66). New York: Academic Press.
Nusbaum (Eds.), The development of speech perception: The Kuhl, P. K. (1983). Perception of auditory equivalence classes for
transition from speech sounds to spoken words (pp. 227-270). speech in early infancy. Infant Behavior and Development, 6, 263-
Cambridge: MIT Press, 285.
Jusczyk, P. W., Bertoncini, J., BijeIjac-Babic, R, Kennedy, 1., & Kuhl, P. K. (1987). Perception of speech and sound in early
Mehler, J. (1990). The role of attention in speech perception by infancy. In P. Salapatek & 1. Cohen (Eds.), Handbook of infant
infants. Cognitive Development, perception (Vol. 2, pp. 275-382). New York: Academic Press.
Jusczyk, P. W., Charles-Luce, J., & Luce, P. A. (submitted). Infants' KuhI, P. K. (1991). Human adults and human infants show a "per-
sensitivity to high frequency vs. low frequency phonetic ceptual magnet effect" for the prototypes of speech categories,
sequences in the native language. monkeys do nOt. Perception & Psychophysics, 50, 93-107.
Jusczyk, P W., Copan, H., & Thompson, E. (1978). Perception by 2- Kuhl, P. K., & Miller, J. D. (1982). Discrimination of auditory
month-old infants of glide contrasts in multisyllabic utterances. target dimensions in the presence or absence of variation in a
Perception & Psychophysics, 24, 515-520. second dimension by infants. Perception & Psychophysics, 31,
Jusczyk, P. W., Cutler, A., & Redanz, N. (1993). Preference for the 279-292.
predominant stress patterns of English words. Child Kuhl, P. K., Williams, K. A., Lacerda, F., Stevens, K. N., &
Development, 64, 675-687. Lindblom, B. (1992). Linguistic experience alters phonetic
Jusczyk, P. W., & Derrah, C. (1987). Representation of speech perception in infants by 6 months of age. Science, 255, 606-608.
sounds by young infants. Developmental Psychology, 23, 648-654. Labov, W. (1974). On the use of the present to explain the past. In
Jusczyk, P. W., Friederici, A. D., Wessels, J., Svenkerud, V. Y., & Proceedings of the 11 th International Congress of Linguists (pp. 825-
Jusczyk, A. M. (1993). Infants' sensitivity to the sound structure 852). Bologna: Societil Editrice II Mulino.
of native language words. Journal of Memor y and lAnguage, 32, Labov, W., Karen, M., & Miller, C. (1991). Near mergers and the
402-420. suspension of linguistic contrast. lAnguage Variation and Change,
Jusczyk, P. W., & Kemler Nelson, D. G. (in press). Syntactic units, 3,33-74.
prosody, and psychological reality during infancy. To appear Labov, W., Yaeger, M., & Steiner, R C. (1972). A quantitative
in J. 1. Morgan & K. D. Demuth (Eds.), Signal to syntax. study of sound change in progress. In Report on NSF contract
Hillsdale NJ: Erlbaum. GS-33287. Philadelphia: U. S. Regional Survey.
Jusczyk, P. W., Pisoni, D. B., & Mullinex, J. (1992). Some Ladefoged, P. (1982). A course in phonetics. New York: Harcourt-
consequences of stimulus variability on speech processing by Brace-Jovanovich.
two-month-olds. Cognition, 43, 253-291. Leben, W. (1978). The representation of tone. In V. Fromkin (Ed.),
Jusczyk, P. W., & Thompson, E. (1978). Perception of a phonetic Tone: A linguistic survey. New York: Academic Press.
contrast in multisyllabic utterances by 2-month-old infants. Lee, D. N. (1976). A theory of visual control of braking based on
Perception & Psychophysics, 23, 105-109. information about time-to-collision. Perception,S, 437-459.
Kahn, D. (1980). Syllable-based generalizations in English phonologJj. Lee, D. N., Young, D. S., & Rewt, D. (1992). How do somersaulters
New York: Garland Press. land on their feet? Journal of Experimental Psychology: Human
Karzon, R G. (1985). Discrimination of polysyllabic sequences by Perception and Performance, 18, 1195-1202.
one-to four-month-old infants. Journal of Experimental Child Levitt, A., Jusczyk, P., Murray, J., & Carden, G. (1988). Context ef-
Psychology, 39, 326-342. fects in two-month-old infants' perception of labioden-
Keating, P. A. (1988). The phonology-phonetics interface. In F. tal/ interdental fricative contrasts. Journal of Experimental
Newmeyer (Ed.), Linguistics: The Cambridge survey. Vol. I: Psychology: Human Perception and Performance, 14,361-368.
Grammatical theory (pp. 281-302). Cambridge, UK: Cambridge Liberman, A. M. (1992). The relation of speech to reading and
University press. writing. in R Frost & 1. Katz (Eds.), Orthography, phonology,
Keating, P. A. (1990). Phonetic representations in a generative morphology, and meaning. Amsterdam: Elsevier Science
grammar. Journal of Phonetics, 18, 321-334. Publishers B. V.
Learning to Perceive the Sound Pattern of English 77

Libennan, A. M., & Mattingly, 1. G. (1985). The motor theory of McCune, L. (1992). First words: A dynamic systems view. In C. A.
speech perception revised. Cognition, 21, 1-36. Ferguson, L, Menn, & C. Stool-Gammon (Eds.), Phonological
Lindau, M. (1984). Phonetic differences in glottalic consonants. development: Models, research, implications (pp. 313-336),
Journal ofPJumetics, 12, 147-155. Timonium, MD: York Press.
Lindblom, B. (1992). Phonological units as adaptive emergents of McCune, L., & Vihman, M. (1987). Vocal motor schemes. Papers
lexical development. In C. A. Ferguson, L. Menn, & C. Stoel- and Reports in Child Language Development, 26,72-79.
Gammon (Eds.), Phonological development: Models, research, McGurk, H., and MacDonald, J. (1976). Hearing lips and seeing
implications (pp. 131-164). Timonium, MD: York Press. voices. Nature, 264, 746-748.
Lindblom, B., Krull, D., & Stark, J. (1993). Phonetic systems and McLaughlin, B. (1978). Second-language acquisition in childhood.
phonological development. In B. de Boysson-Bardies, S. de Hillsdale NJ: Erlbaum.
Schonen, P. Jusczyk, P. MacNeilage, and J. Morton (Eds.), Menn, L. (1986). Language acquisition, aphasia and phonotactic
Developmental neurocognition: Speech and face processing in the first universals. In F. R Eckman, E. A. Moravcsik, & J. R
year of life (pp. 399409). Dordrecht, the Netherlands: Kluwer Wirth (Eds.), Markedness, (pp. 241-255). New York: Plenum
Academic Publishers. Press.
Lindblom, B., MacNeilage, P., & Studdert-Kennedy, M. (1983). Menn, L., & Matthei, E. (1992). The "two-lexicon" account of child
Self-organizing processes and the explanation of phonological phonology: Looking back, looking ahead. In C. A. Ferguson, L.
universals. In B. Butterworth, B. Comrie, & O. Dahl (Eds.), Menn, & C. Stool-Gammon (Eds.), Phonological development:
Universals Workshop (pp. 181-203). The Hague: Mouton. Models, research, implications (pp. 211-247). Timonium, MD:
Linker, W. (1985). A cross-linguistic study of lip position in York Press,
vowels. UCLA Working Papers in Phonetics, 51, 1-35. Michaels, C. F., & Oudejans, R R. D. (1992). The optics and
Lisker, L., & Abramson, A. S. (1970). The voicing dimension: actions of catching fly balls: Zeroing out optical acceleration.
Some experiments on comparative phonetics. Proceedings of the Ecological Psychology, 4, 199-222.
6th International Congress of Phonetic Sciences. Prague: Academia. Miller, C. L. (1983). Developmental changes in male-female voice
Lloyd, V. L., Werker, J. F., & Cohen, L. B. (1993). Age changes in classification by infants. Infant Behavior and Development, 6, 313-
infants' ability to associate words with objects. Presented at 330.
meeting of Society for Research in Child Development. New Miyawaki, K., Strange, W., Verbrugge, R, Liberman, A. M.,
Orleans, LA, March. Jenkins, J. J., & Fujimura, O. (1975). An effect of linguistic
Logan, J. S., Lively, S. E., & Pisoni, D. B. (1991). Training Japanese experience: The discrimination of [r] and [I] by native speakers
listeners to identify English I r I and III: A first report. Journal of Japanese and English. Perception & Psychophysics, 18,331-340.
of the Acoustical Society of America, 89, 874-886. Mochizuki, M. (1981). The identification of Ir I and III in natural
MacKain, K. S., Best, C. T., & Strange, W. (1981). Categorical and synthesized speech. Journal of Phonetics, 9,283-303.
perception of English Ir I and III by Japanese bilinguals. Mohanan, K. P. (1986). The theory of lexical phonology. Boston: D.
Applied Psycholinguistics, 2, 369-390. Reidel Publishing Company.
Macken, M. A. (1992). Where's phonology? In C. A. Ferguson, L. Mohanan, K. P. (1992). Emergence of complexity in phonological
Menn, & C. Stoel-Gammon (Eds.), Phonological development: development. In C. A. Ferguson, L. Menn, & C. Stoel-Gammon
Models, research, implications (pp. 249-269). Timonium. MD: (Eds.), Phonological development: Models, research, implications
York Press. (pp. 635-662). Timonium, MD: York Press.
Macken, M., & Ferguson, C. S. (1983). Cognitive aspects of Morgan, j. L. (1990). Input, innateness, and induction in language
phonological development: Model, evidence, and issues. In K. acquisition. Developmental Psychobiology, 23, 661-678.
E. Nelson (Ed.), Children's language (pp. 255-282). Hillsdale. Nj: Newmeyer, F. j. (1980). Linguistic theory in America: The first
Lawrence Erlbaum Associates. quarter-century of transformational generative grammar. New York:
Madore, B. F., & Freeman, W. L. (1987). Self-organiZing structures. Academic Press.
American Scientist, 75, 252-259. Nittrouer, S., Studdert-Kennedy, M., & McGowan, R S. (1989).
Malsheen, B. J. (1980). Two hypotheses for phonetic clarification The emergence of phonetic segments: Evidence from the
in the speech of mothers to children. In G. H. Yeni-Komshian, j. spectral structure of fricative-vowel syllables spoken by
F. Kavanaugh, & C. A. Ferguson (Eds.), Child Phonology (Vol. 2, children and adults. Journal of Speech and Hearing Research, 32(1),
pp. 173-184). New York: Academic Press. 120-132
Mann, V. A. (1980). Influence of preceding liquid on stop- Ohala, j. j (1990). There is no interface between phonology and
consonant perception. Perception & Psychophysics, 28. 407-412. phonetics: A personal view. Journal of Phonetics, 18,153-171.
Mann, V. A. (1986). Distinguishing universal and language- Oller, D. K. (1980). The emergence of the sounds of speech in
dependent levels of speech perception: Evidence from infancv. In G. Yeni-Komshian, J. F. Kavanaugh, & C. A.
japanese listeners' perception of "I" and "r." Cognition, 24, 169- Ferguson (Eds.), Child phonology, Volume 1: Production. New
196. York: Academic Press.
Marcus, G. F., Pinker, S., Ullman, M., Hollander, M., Rosen, T. j., Oller, D. K., & Lynch, M. P. (1992). Infant vocalization and
& Fei, X. (1992). Overregularization in language acquisition. innovations in infraphonology: Toward a broader theory of
Monographs of the Society for Research in Child Developmmt, 57 development and disorders. In C. A. Ferguson, L. Menn, & C.
(4). serial # 228. Stoel-Gammon (Eds.), Phonological development: Models, research,
Martin, C. S., Mullennix, j. W., Pisoni, D. B., & Summers, W V. ImplzcatlOns (pp. 509-536). Timonium, MD: York Press.
(1989). Effects of talker variability on recall of spoken lists. Oshika, B. T., Zue, V. W., Weeks, R V., Neu, H., & Ambach, j.
Journal of Experimental Psychology: Learning, Memory, and (1975). The role of phonological rules in speech understanding
Cognition, 15, 676-684. research. [EEE Transaction on Acoustics, Speech and Signal
McCarthy, j. (1986). OCP effects: Gemination and anti- processing, 23,104-112.
gemination. Linguistic Inquiry, 17, 207-263. Pierrehumbert, j. B. (1990). Phonological and phonetic
McCarthy, J. (1988). Feature geometry and dependency. A review. representation. Journal of Phonetics, 18,375-394.
Phonetica, 43,84-108. Pierrehumbert, j. B., & Beckman, M. E. (1988). Japanese tone
McCarthy, J. (1989). Linear order in phonological representation. structure. Cambridge, MA: MIT Press (Linguistic Inquiry
Linguistic Inquiry, 20,71-99. Monograph Series, 15).
78 Best

Pierrehumbert, J. B., & Pierrehumbert, R T. (1990). On attributing Shaw, R, McIntyre, M., & Mace, W. (1974). The role of symmetry
grammars to dynamical systems. Journal of Phonetics, 18, 465- in event perception. In R B MacLeod & H. 1. Pick, Jr. (Eds.),
477. Perception: Essays in honor ofJames J. Gibson (pp. 276-310). Ithaca,
Pisoni, D. B., Aslin, R N., Perey, A. J., & Hennessey, B. 1. (1982). NY: Cornell University Press.
Some effects of laboratory training on identification and Sheldon, A, & Strange, W. (1982). The acquisition of Irl and /11
discrimination of voicing contrasts in stop consonants. Journal by Japanese learners of English: Evidence that speech
of Experimental Psychology: Human Perception and Performance, 8, production can precede speech perception. Applied
297-314. Psycholinguistics, 3,243-261.
Polka, 1. (1991). Cross-language speech perception in adults: Silverman, D. (1992). Multiple scansions in loanword phonology:
Phonemic, phonetic, and acoustic contributions. Journal of the Evidence from Cantonese. Phonology, 9, 289-328.
Acoustical Sodety ofAmerica, 89, 2961-2977. Stark, R E. (1980). Stages of speech development in the first year
Polka, 1. (1992). Characterizing the influence of native experience of life. In G. H. Yeni-Komshian, J. F. Kavanaugh, & C. A.
on adult speech perception. Perception & Psychophysics, 52, 37- Ferguson (Eds.), Child phonology: Vol. 1: Production. (pp. 73-92).
52. New York: Academic Press.
Polka, 1. (submitted). Linguistic influences in adult perception of Steriade, D. (1990). Gestures and autosegments: Comments on
non-native vowel contrasts. Browman and Goldstein's "Gestures in articulatory
Polka, 1., & Werker, J. F. (in press). Developmental changes in phonology." In J. Kingston & M. Beckman (Eds.), Papers in
perception of non-native vowel contrasts. Journal of laboratory phonology I: Between the grammar and physics of speech.
Experimental Psychology: Human Perception & Performance. Cambridge UK: Cambridge University Press.
Price, P. J. (1981). A cross-linguistic study of flaps in Japanese and Stevens, K. N. (1972). The quanta I nature of speech: Evidence
in American English. Unpublished doctoral dissertation. from articulatory-acoustic data. In E. E. David, Jr. & P. B. Denes
University of Pennsylvania. (Eds.), Human communication: A unified view (pp. 51-66). New
Prigogine,1. (1980). From being to becoming: Time and complexity in York: McGraw-Hill.
the physical sciences. San Francisco: W. H. Freeman & Co. Stevens, K. N. (1989). On the quantal nature of speech. Journal of
Prigogine, 1., & Stengers, 1. (1984). Order out of chaos: Man's new Phonetics, 17, 3-45.
dialogue with nature. Toronto: Bantam Books. Strange, W., & Dittmann, S. (1984). Effects of discrimination
Prince, A., & Smolensky, P. (1993). Optimality theory, Technical training in the perception of Ir-1/ by Japanese adults learning
Reports of the Rutgers University Center for Cognitive Science, English. Perception & Psychophysics, 36,131-145.
TR-2. Studdert-Kennedy, M. (1986). Development of the speech
Pruitt, J. S., Strange, W., Polka, 1., & Aguilar, M. C. (1990). Effects perceptuomotor system. In B. Lindblom & R Zetterstrom
of category knowledge and syllable truncation during auditory (Eds.), Precursors of early speech.(pp. 205-218). New York:
training on Americans' discrimination of Hindi retroflex-dental Stockton Press.
contrasts. Presented at meeting of the Acoustical Society of Studdert-Kennedy, M. (1987). The phoneme as a perceptuomotor
America. State College, PA, May. structure. In A. Allport, D. MacKay, W. Prinz, & E Scheerer
Quine, W. V. (1960). Word and object. Cambridge MA: MIT Press. (Eds.), Language, perception and production (pp. 67-84). New
van Reenen, P. (1982). Phonetic feature definitions: Their integration York: Academic Press.
into phonology and their relation to speech. A case study of the Studdert-Kennedy, M. (1989). The early development of
feature nasal. Dordrecht: Foris Publications. phonological form. In C. von Euler, H. Forssberg & H.
Rochet, B. (in press). [CroSS-language studies of vowel perception Lagercrantz (Eds.), Neurobiology of early infant behavior (pp. 287-
and production]. To appear in W. Strange (Ed.), Speech 30l).Basingstoke, England: MacMillan.
perception and linguistic experience: Theoretical and methodological Studdert-Kennedy, M. (1991). Language development from an
issues. Timonium, MD: York Press. evolutionary perspective. In N. Krasnegor, D. Rumbaugh, R
Romaine, S. (1988). Pidgin and creole languages. London: Longman. Schiefelbusch & M. Studdert-Kennedy (Eds.), Language
Rosenblum, T., & Pinker, S. A. (1983). Word migc revisited: acquisition: Biological and behavioral determinants (pp. 5-28).
Monolingual and bilingual children's understanding Hillsdale, NJ: Erlbaum Associates.
of the word-object relationship. Child Development, 54, 773- Studdert-Kennedy, M. (1993). Some theoretical implications of
780. cross-modal research in speech perception. In B. de Boysson-
Sagey, E. C. (1986). On the ill-formedness of crossing association Bardies, S. de Schonen, P. Jusczyk, P. MacNeilage, & J. Morton
lines. Linguistic Inquiry, 19, 109-118. (Eds.), Developmental neurocognition: Speech and face processing in
Saltzman, E. 1., & Kelso, J. A. S. (1987). Skilled actions: A task the first year of life (pp. 461-466). Dordrecht, the Netherlands:
dynamic approach. Psychological Review, 94, 84-106. Kluwer Academic Publishers.
Saltzman, E. 1., & Munhall, K. G. (1989). A dynamical approach to Summerfield, A O. (1978). Perceptual learning and phonetic
gestural patterning in speech production. Ecological perception. In Interrelations of the communicative senses.
Psychology, 1,333-382. Proceedings of the NSF Conference at Asilomar. Washington, DC:
Sampson, G. (1980) Schools of linguistics. Stanford CA: Stanford NSF Publications.
University Press. Tees, R. c., & Werker, J. F. (1984). Perceptual flexibility:
de Saussure, F. (1959). Course in general linguistics. New York: Maintenance or recovery of the ability to discriminate non-na-
McGraw-Hill (translation of Cours de linguistique generaIe. 1916. tive speech sounds. Canadian Journal of Psychology, 38, 579-590.
Paris: Payot). Trehub, S. E. (1976). The discrimination of foreign speech
Savelsbergh, G. J. P., Whiting, H. T. A, & Bootsma, R J. (1991). contrasts by adults and infants. Child Development, 47, 466-472.
Grasping tau. Journal of Experimental Psychology: Human Treiman, R, Cassar, M., & Zukowski, A. (submitted). What types
Perception and Performance, 17, 315-322. of linguistic information do children use in spelling?: The case
Schaner, G., & Kelso, J. A. S. (1988). Dynamic pattern generation of flaps.
in behavioral and neural systems. Science, 239,1513-1520. Turvey, M. T. (1980). Clues from the organization of motor
Shaw, B. K., McGowan, R S., & Turvey, M. T. (1991). An acoustic systems. In U. Bellugi & M. Studdert-Kennedy (Eds.), Signed
variable specifying time-to-contact. Ecological Psychology, 3, 253- and spoken language: Biological constraints on linguistic form (pp.
261. 41-56). Weinheim, FRG: Verlag Chemie.
Learning to Perceive the Sound Pattern of English 79

Turvey, M. T. (1990). Coordination. Ameri!:an Psychologist, 45,938- 2Although loan word pronunciations can be affected by spelling
953. in both donor and recipient languages, the association between
Vance, T. (1987). An introduction to Japanese phonology. Albany, NY: spelling and pronunciation is generally not arbitrary but
State University of New York Press. reflects phonological prinCiples. The degree of transparency
Vihman, (1992). Early syllables and the construction of between spelling and pronunciation differs among languages,
phonology. In C. A. Ferguson, L. Menn, & C. Stoel-Gammon however, e.g., Spanish spelling is quite transparent while
(Eds.),Phonological development: Models, research, implications (pp. English spelling is much less so.
393-422). Timonium, MD: York Press. 3The written form is another type of direct evidence
Walton, G., & Bower, T. G. R. (1993). Amodal representation of that speaker-listeners can present to one another, but it is
speech in infants. Infant Behavior and Development, 16, 233-243. subject to at least the same limitations as the spoken form.
Walton, G., & Socotch, T. (1993). Human newborns show a Presumably, the evidence it carries about the underlying
"perceptual magnet effect" for native and non-native language grammar would also be considered inadequate. In any event,
prototypes. Presented at meeting of the Society for Research in normal children learn to read and write only after they have
Child Development. New Orleans, March. learned to talk, so the written form would generally not offer
Werker, J. F. (1989). Becoming a native listener. American Scientist, an alternative basis for language learning (see also Liberman,
77,54-59. 1992).
Werker, (1991). The ontogeny of speech perception. In 1. G. 4In fact, the relation between the individual speaker-hearer's
Mattingly & M. Studdert-Kennedy (Eds.), Modularity and the grammatical knowledge (linguistic competence), the same
motor theory of speech perception (pp. 91-110). Hillsdale, NJ: speaker-hearer's actual language behavior, (linguistic
Erlbaum Associates. performance), and the community's shared language is a
Werker, J. F., & Baldwin, D. A. (1991). Speech perception and complex issue. Although the matter cannot be explicated here,
lexical acquisition. Presented at meeting of the Society for the reader wishing further information is referred to, e.g.,
Research in Child Development, Seattle WA, April. Chomsky (1968; 1972), Newmeyer (1980), Sampson (1980), and
Werker, J. F., Gilbert, J. H. V., Humphrey, K., & Tees, R. C. (1981). de Saussure (1959).
Developmental aspects of cross-language speech perception. 5Indeed, how could one define "similar enough" if the
Child Development, 52, 349-355. utterances that serve as the only direct interface between
Werker, J. F., & Lalonde, C. E. (1988). Cross-language speech different individuals' grammars inadequately reflect those
perception: Initial capabilities and developmental change. grammars, and thus are by definition inadequate to validate or
Developmental Psychology, 24, 672-683. reliably compare them?
Werker, J., & Logan, J. (1985). Cross-language evidence for three 6Currently, the model assumes that articulator movement is
factors in speech perception. Perception and Psychophysics, 37, modelled fairly well by the dynamic regime of a "point
35-44. attractor," or damped mass spring, model with constant mass
Werker, J. F., & McLeod, (1989). Infant preference for both male for each articulator. Such dynamic regimes characterize the
and female infant directed talk: A developmental study of pattern of movement of a physical system moving smoothly
attentional and affective responsiveness. Canadian Journal of toward a single target ("attractor").
Psychology, 43, 230-246. 7For multilingual listeners, there may also be diachronic
Werker, J. E, & Pegg, (1992). In C. A. Ferguson, L. Menn, & C. variations associated with code-switching, i.e., shifting from
Stoel-Gammon (Eds.), Phonological development: Models, research, use of one language to another may effect changes in which
implications (pp. 131-164). Timonium, MD: York Press. gestural invariants are detected in an unfamiliar phonetic
Werker, J. F., & Tees, R. C. (1984a). Phonemic and phonetic factors pattern (e.g., Elman et aI., 1977; Williams, 1977).
in adult cross-language speech perception. Journal of the 8This claim should also apply to the phonological inventories of
Acoustical Society of America, 75, 1866-1878. other languages, for fluent multilinguals who learned their
Werker, J. F., & Tees, R. C. (1984b). Cross-language speech languages during childhood. That is, childhood-onset
perception: Evidence for perceptual reorganization during the multilinguals may be able to assimilate unfamiliar non-native
first year of life. Infant Behavior and Development, 7, 49-63. sounds to categories in any of their multiple languages. Indeed,
Whalen, D. H. (1983). Vowel information in postvocalic fricative they may have greater overall sensitivity to the phonetic
noises. Language & Speech, 26,91-100. properties of unfamiliar phonological categories, to the extent
Williams, L. (1979). The modification of speech perception and that early learning of more than one language grants increased
production in second language learning. Perception & recognition of the arbitrariness of linguistic categories,
Psychophysics, 26,95-104. although this sort of metalinguistic advantage has thus far been
Yamada, R. A., & Tohkura, Y. (1991). Age effects on acquisition of argued only for semantic and syntactic knowledge, support has
non-native phonemes: Perception of English Irl and III for been mixed (e.g., Bialystock, 1988; Rosenblum & Pinker, 1983;
native speakers of Japanese. In Proceedings of the 12th see McLaughlin, 1978).
International Congress of Phonetic Sciences, Vol. 4 (pp. 450-453). 9In addition, we found that both language groups heard a third,
Aix-en-Provence, France: University of Aix Press. intermediate category between rock and wok. Tests with a
Zukow, P., & Schmidt, C. (1988). Socializing attention: Perceptual second group of American listeners confirmed our suspicion
bases for language socialization. Presented at meeting of the that this category was clearly heard as an Ill, which falls
International Conference on Infant Studies. Washington DC, April. between Iw I and "y" in place of articulation. See Best and
Strange (1992) for further discussion.
I 0It should be noted that Polka used a more sensitive
FOOTNOTES discrimination task, i.e. one with lower memory demands, than
'To appear in C. Rovee-Collier & L. Lipsitt (Eds.), Advances in had Werker & Tees (1984a), which may well account for the
infancy research. Ablex Publishers (1994). discrepancy between the two studies in listeners' difficulty
t Also Wesleyan University. with this particular contrast.
I Exceptions are extremely rare. For example, Native Hawaiian I I This is a new interpretation, which better handles the full array
lacks Itl, including instead only Ipl and Ikl for its non-nasal of findings than the preliminary interpretation offered in Best
stop consonants. (in press a).

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy