Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Connectionist phonology
(USC Thesis Other)
Connectionist phonology
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
INFORMATION TO USERS This manuscript has been reproduced from the microfilm master. UMI films the text directly from the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter face, while others may be from any type of computer printer. The quality of this reproduction is dependent upon the quality of th e copy subm itted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send UM I a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion. Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper left-hand comer and continuing from left to right in equal sections with small overlaps. Photographs included in the original manuscript have been reproduced xerographically in this copy. Higher quality 6” x 9* black and white photographic prints are available for any photographs or illustrations appearing in this copy for an additional charge. Contact UMI directly to order. ProQuest Information and Learning 300 North Zeeb Road, Ann Arbor, Ml 48106-1346 USA 800-521-0600 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. CONNECTIONIST PHONOLOGY by Marc F. Joanisse A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (Linguistics) December, 2000 Marc F. Joanisse Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UMI Number: 3041475 ___ ____ ( g) UMI UMI Microform 3041475 Copyright 2002 by ProQuest Information and Learning Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code. ProQuest Information and Learning Company 300 North Zeeb Road P.O. Box 1346 Ann Arbor, Ml 48106-1346 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UNIVERSITY OF SOUTHERN CALIFORNIA TH E GRADUATE SCHOOL UNIVERSITY PARK LOS ANGELES, CALIFORNIA 90007 This dissertation, written by under the direction of ftlS Dissertation Committee, and approved by all its members, has been presented to and accepted by The Graduate School, in partial fulfillment of re quirements for the degree of MMSr. & *6i\S6,£. DOCTOR OF PHILOSOPHY Dean of Graduate Studies Date ...^ .?.e^ er.l8A > .2000 DISSERTATION COMMITTEE Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Dedication For Suzanne, still my friend. I’ve looked at life from both sides now From win and lose and still somehow It’s life’s illusions I recall I really don’t know life at all Joni Mitchell Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Acknowledgements This work is the result of five years I spent in the Language and Cognitive Neu roscience Lab at the University of Southern California. I am deeply grateful to my advisor and mentor Mark Seidenberg, for providing me with the guidance I needed while I was there. He taught me to question everything and leave nothing to chance. Thanks Mark, I’ll try to be careful. I couldn’t have made it without the support of everyone else who darkened the door of Room 15 while I was at USC. I have benefited from knowing them all, but I want to acknowledge special debts to Joe Allen, Elaine Andersen, Morten Christiansen, Suzanne Curtin, Laura Gonnerman, Todd Haskell, Jelena Mirkovic, Sarah Schuster, Robert Thornton and Jazon Zevin. Thanks also to my friends at the USC Longitudinal Dyslexia Study: Carrie Bailey, Laurie Freedman and Frank Manis. Special thanks to Mike Harm, who has always been generous with his advice and opinions, and who provided me with the MikeNet code used to run all the simulations in this dissertation. Thanks also to Jason Zevin for his useful comments on an earlier draft of this work and for his valuable assistance in exorcizing it of typos and other embarrassing errors. (I bear sole responsibility for all the remaining embarrassing errors). iii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In the course of writing this work, I have found no better resources than the mem bers of my dissertation committee. So, thanks to Maryellen Macdonald, Dani Byrd, Toby Mintz and Rachel Walker for their feedback on earlier drafts of this work, and in general for always being there for me. Portions of work have benefitted from the input of a number of researchers in Lin guistics, Psychology and Neuroscience. They include (but are not limited to) Adam Albright, Paul Boersma, Bruce Hayes, Pat Keating, James McClelland, Joe Pater, Kar- alyn Patterson, Matt Lambon-Ralph, David Plaut, Kevin Russell, Paul Smolensky and Donca Steriade. Thanks also to the participants of the University of Alberta Sympo sium on Phonology and the Lexicon (June 1999), the 5th Southwest Optimality Theory conference (May 1999), the 1998 Meeting of the Berkeley Linguistics Society, and the 2000 CUNY Conference on Human Sentence Processing. Finally, I want to acknowledge all the support that my family has given me. Miche- line, Rob and Carrie will always be my truest friends. And thanks especially to Mom and Dad, for teaching me to be a scientist, and to take risks. This research was funded by NSERC PGS-B 214574 to myself, by an NICHD grant to Frank Manis and Mark Seidenberg and NIMH grants to Mark Seidenberg. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Contents Dedication ii Acknowledgements iii List Of Tables ix List O f Figures x Abstract xii 0 Connectionist Phonology — An Overview 1 0.1 Generative Phonology................................................................................ 4 0.1.1 How Phonology is Learned: The Generative T h e o r y .............. 7 0.2 Optimality Theory .................................................................................... 8 0.2.1 Underlying Assumptions of O T .................................................. 8 0.2.2 Variability Within the OT F ram ew ork........................................ 9 0.3 Connectionist P honology.......................................................................... 11 0.3.1 Some Basic Connectionist A ssum ptions.................................... 13 0.3.2 Task-orientation ........................................................................... 14 0.3.3 Input-driven learning..................................................................... 16 0.3.4 Learning Quasiregular D om ains.................................................. 18 0.3.5 Interactivity..................................................................................... 19 0.4 Overview of C hapters................................................................................ 20 1 Phonological A cquisition, Processing and Typologies 27 1 Cross linguistic Patterns of Phonology 28 1.1 Vowel Inventories....................................................................................... 31 1.1.1 Multiple Constraints on Inventories........................................... 32 1.1.2 Previous Models of C ontrast........................................................ 34 1.1.3 A Connectionist Model of Phoneme A c q u isitio n .................... 35 v Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1.1.4 Experiment I : Front-Back Asymmetries..................................... 37 1.1.4.1 Method & S tim u li........................................................ 39 1.1.4.2 Results and D isc u ssio n .............................................. 42 1.1.5 Experiment 2: Length and Quality Interactions......................... 44 1.1.5.1 Method & S tim u li....................................................... 46 1.1.5.2 R esults........................................................................... 48 1.1.5.3 Summary........................................................................ 49 1.1.6 D isc u ssio n ..................................................................................... 50 1.2 Syllable S tructure........................................................................................ 50 1.2.1 Perceiving S y lla b le s..................................................................... 53 1.2.1.1 Model D e ta ils .............................................................. 54 1.2.1.2 Results and D isc u ssio n .............................................. 57 1.2.2 Producing Speech........................................................................... 59 1.2.2.1 Model D e ta ils .............................................................. 62 1.2.2.2 Results and D isc u ssio n .............................................. 65 1.2.3 Summary: Syllable Typologies.................................................... 67 1.3 Conclusions................................................................................................. 69 2 Phonological Acquisition in Dutch 71 2.1 Dutch Stress Acquisition: Empirical E vidence....................................... 72 2.1.1 Stress in Dutch C h ild re n .............................................................. 74 2.1.2 A Closer Look at the D a t a ........................................................... 76 2.1.3 Generative Accounts of S tag es.................................................... 79 2.1.4 Critique of OT93 and Parameter Setting accounts..................... 80 2.2 A Connectionist Model of Dutch Stress ................................................. 83 2.2.1 Model O v erv iew ........................................................................... 84 2.2.2 Training S e t.................................................................................... 88 2.2.3 Training Procedure....................................................................... 89 2.3 Training R esults........................................................................................... 92 2.3.1 Segmental E rro rs........................................................................... 94 2.3.2 Stress E r r o r s ................................................................................. 95 2.4 Developmental Patterns in the M odel....................................................... 96 2.4.1 Regular and Irregular S tr e s s ....................................................... 98 2.4.2 Error Types for Irregulars ........................................................... 101 2.4.3 Pools of R egularity....................................................................... 104 2.5 Stages of Acquisition in the M o d el........................................................... 107 2.6 Discussion.................................................................................................... 112 3 Connectionist Phonology and Optimality Theory 114 3.1 Constraints and Their S o u rc e s ................................................................. 116 3.1.1 Learning C onstraints.................................................................... 117 3.1.2 M arkedness.................................................................................... 118 vi Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.1.3 Faithfulness.................................................................................... 120 3.1.3.1 The Emergence of Faithfulness.................................. 122 3.1.4 How This Relates to O T .............................................................. 124 3.2 Learning and Encoding G ram m ars.......................................................... 125 3.2.1 Quasiregular D o m a in s................................................................. 126 3.2.2 Rules and Lexicons....................................................................... 130 3.2.3 Learning Trajectories in Connectionism and O T ........................... 137 3.3 S u m m ary .................................................................................................... 139 II Phonology and Language Disorders 143 4 The Influence of Phonology on Morphology - Evidence From SLI 144 4.1 Grammatical Impairments in S L I .............................................................. 147 4.2 Perceptual Deficits in S L I .......................................................................... 149 4.2.1 Possible Bases of Perceptual D e fic its ....................................... 150 4.2.2 How Common are Perceptual D e fic its? .................................... 153 4.2.3 Phonological Deficits and S L I ................................................... 155 4.3 Linking Phonology and M orphology....................................................... 157 4.3.1 Modeling Morphological Impairments....................................... 160 4.4 A Connectionist Model of Morphology ................................................. 163 4.4.1 Model D etails................................................................................ 164 4.4.1.1 A rchitecture.................................................................. 164 4.4.1.2 Training Procedure and C o rp u s................................. 166 4.4.2 Training Results .......................................................................... 168 4.4.3 Speech Impaired N e tw o rk ........................................................... 170 4.4.4 Discussion of Morphology M odels............................................... 174 4.4.5 Toward a Broader Typology of Morphological Impairments . 177 4.5 Phonology in SLI: Crosslinguistic E vidence.......................................... 183 4.5.1 Phonological Salience and Complexity ..................................... 183 4.5.2 Morphological Frequency and Density........................................ 186 4.6 Conclusion ................................................................................................ 190 5 Phonology - Syntax Interactions: Evidence from SLI 192 5.1 Anaphor Resolution Deficits in SLI ...................................................... 194 5.2 Overview of Empirical D a t a ................................................................... 196 5.3 Phonology and Working Memory in S y n ta x ..............................................200 5.3.1 Working Memory S p a n ..................................................................... 202 5.3.2 Data from Aphasics and Normal A d u lts.........................................204 5.4 Working Memory Impairments in S L I .......................................................208 5.5 Simulating Normal Sentence C om prehension.......................................... 210 5.5.1 Model Architecture and T ask ........................................................... 211 vii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.5.2 Training C o r p u s ............................................................................... 217 5.5.2.1 What the Grammar Encoded ........................................ 218 5.5.2.2 Obtaining a Training S e t.................................................. 220 5.5.3 Training Results - What Did the Network L e a m ? .........................221 5.5.3.1 Grammaticality J u d g m e n ts............................................222 5.5.4 Pronoun Resolution............................................................................ 228 5.6 Simulation 2: Sentence Comprehension With Impaired Phonology . . 230 5.6.1 Inducing a Phonological D e fic it...................................................... 230 5.6.2 Training R e s u lts ...............................................................................232 5.6.3 Pronoun Resolution............................................................................234 5.6.4 D isc u ss io n ........................................................................................ 238 5.7 C onclusions..................................................................................................... 239 6 Conclusion: Many Networks, One Model 242 6.1 Conclusion .................................................................................................... 248 References 249 viii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. List Of Tables 2.1 Catootje at Stage 2 and 3............................................................................. 77 2.2 Sample of errors on irregularly stressed words at Stage 4....................... 103 2.3 Model performance on irregularly stressed forms at Stage 4.................. 103 2.4 Sample of Dutch VC forms that appear to attract final stress in bisyl- labic VC-VC words...................................................................................... 104 2.5 Network’s performance on bisyllabic word forms with phonologically predictable final (upper) and initial (lower) s t r e s s ................................ 106 2.6 Examples of errors produced by the model that were consistent with errors at Stages 2 and 3 in Fikkert (1994)................................................. 108 4.1 Past tense productions in language impaired children and controls. . . 159 4.2 Typology of morphological impairments.................................................. 178 4.3 Examples of Hebrew verb inflection.......................................................... 188 5.1 Experimental manipulations used in comprehension task....................... 198 5.2 Comprehension model: nouns used in training......................................... 218 5.3 Sentence comprehension model: overview of training stimuli............... 220 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. List Of Figures 1.1 Vowel Recognition M o d e ls........................................................................ 36 1.2 Asymmetry of back vowels over front vowels........................................... 37 1.3 Schematization of vowel variability ........................................................ 39 1.4 Training error rates for networks................................................................. 42 1.5 Generalization rates for networks............................................................... 44 1.6 Vowel inventories used in the three training sets....................................... 46 1.7 Mean error rates for the three network types............................................. 49 1.8 Network used in the syllable perception experiment................................ 55 1.9 Results of model indicating better identification of consonants in CV positions....................................................................................................... 58 1.10 Network used to simulate the planning of speech production................. 63 1.11 Results of syllable production experiment................................................. 66 2.1 Architecture of Dutch word production model.......................................... 86 2.2 Dutch token frequencies, by word length................................................... 90 2.3 Dutch type frequencies, by word length.................................................... 91 2.4 Model’s performance on samples of 2- and 3- syllable words taking either regular or irregular stress.................................................................. 100 2.5 Proportion of Stage 2 and 3 errors in the network.................................... 109 2.6 Proportion of Stage 3 errors in the network............................................... 110 2.7 Distribution of stress placement................................................................. I l l 4.1 Speech categorization profiles of language impaired children............... 151 4.2 Verb learning model.................................................................................... 165 4.3 Comparison of intact and speech impaired networks.............................. 172 4.4 Past tense production data from children with SLI and two control groups............................................................................................................ 172 5.1 Sentence comprehension task stimuli........................................................ 197 5.2 Pronoun comprehension in children with SLI.......................................... 199 5.3 Sentence comprehension in adult aphasics...................................................205 5.4 Sentence comprehension in normal speakers, during various consecu tive tasks........................................................................................................... 207 x Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.5 Sentence comprehension model architecture..................................................212 5.6 Grammaticality judgments 1.............................................................................225 5.7 Grammaticality judgments 2.............................................................................226 5.8 Sentence comprehension model: error over time...........................................233 5.9 Generalization in impaired and unimpaired sentence comprehension models................................................................................................................ 236 5.10 Pronoun resolution in impaired and unimpaired sentence comprehen sion models........................................................................................................237 6.1 The Connectionist Phonology model...............................................................243 6.2 Perception models..............................................................................................244 6.3 Production models............................................................................................. 245 6.4 Morphology model............................................................................................ 246 6.5 Sentence comprehension model....................................................................... 247 xi Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Abstract This dissertation develops a theory of phonological processes based on character istics of the human auditory and articulatory systems, in tandem with the connectionist principles that govern learning and representation in neural mechanisms. This is done using a model-based approach that tests hypotheses about the nature of phonology using connectionist networks. In Part 1 of this work, this methodology is applied to data traditionally associated with theoretical linguistics: the typologies of vowel sys tems and syllable structure (Chapter 1), and stress acquisition in Dutch (Chapter 2). The relationship of this theory to other constraints-based approaches to phonology is then discussed by addressing similarities and differences between subsymbolic and symbolic theories of language (Chapter 3). In Part 2, the model-based methodology is used to investigate the role of phonological processing in morphology (Chapter 4) and sentence comprehension (Chapter 5). Special attention is focused on the language abilities of children with developmental language impairments, due to the purported role of a phonological deficit in their broader language deficits. The advantages of this approach are discussed with respect to how a single model/theory of processing can be used to address a broad range of formal and psycholinguistic data. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 0 Connectionist Phonology - An Overview Reductionism... fold[s] the laws and principles o f each level o f organization into those at more general, hence more fundamental levels... This transcendental world view is the light and way fo r many scientific materialists (I admit to being one o f them), but it could be wrong. At the least, it is surely an oversimplification... That would not be all bad. I will confess with pleasure: The challenge and the crackling o f thin ice are what give science its metaphysical excitement. (E.O. Wilson). This dissertation is about why language users, even very young ones, know a great deal about the sound structure of their language, and considers the consequences of an impairment to that knowledge. None of the topics that are dealt with in this work are new ones; they concern basic linguistic phenomena that have been studied for many decades. What is new is the general approach that is taken, specifically the use of a single framework to unify the study of a broad set of facts about how language is learned and represented in the brain, in both normal and abnormal populations. 1 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The target phenomena of this work involve phonology, the system of mapping sound to meaning in a language. The study of phonology is motivated by the observa tion that words are more than simply linear sequences of sounds. Instead, speakers of a language seem to have an implicit knowledge of an abstract structure within words. Consider English phonotactics. Speakers of English agree that the ptak is not an acceptable nonce word, compared to a form like stap. The source of this judgment does not come from surface observations about acceptable sequences of phonemes; for example, /pt/ occurs in many English words, including apt, kept and optimality. Likewise, it is not simply that the sequence /pt/ is too difficult to produce, since /pt/ is actually a valid cluster in many other languages. For example, ‘ptak’ is an actual word in Russian. Instead, the unacceptability of ptak stems from deeper principles of English phonology that prohibit sequences of two or more stop consonants in syllable onsets. This principle is itself derived from more general observations about natural classes of phonemes like stops, fricatives and vowels, and the abstract structure of syllables from which constructs like ‘onsets’ are derived. The fact that even linguistically naive speakers can use these facts to make acceptability judgments about novel forms indi cates that these principles are implicitly learned and used, and are not merely artifacts of literacy or education. This dissertation asks three fundamental questions about phonology: Where does phonological knowledge come from?; How is this knowledge implemented in the brain?\ and How does it relate to other types o f linguistic knowledge? I advance a model of phonology that answers these questions as follows: Speakers’ phonological 2 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. knowledge is acquired in the course of performing language tasks such as recognizing and repeating words, which involve learning mappings between sound and meaning. Phonology is implemented in a probabilistic, distributed neural mechanism, and re flects the characteristics of this neural substrate in non-trivial ways. Finally, the nature of this model is such that other types of linguistic knowledge, such as syntax and mor phology, can be thought of as task-specific reimplementations of this same model. At the heart of this theory are a small number of assumptions about how linguis tic knowledge is acquired, and how it is represented in the brain. They include the following: (1) Input-driven learning: Learners acquire the structure of language based on information available to them in the linguistic input to which they are exposed. Domain-Generality: Language is acquired using statistical, error-driven or self-organizing learning mechanisms similar to those used in learning in other types of knowledge. Functionalism: Phonological systems are shaped by characteristics of artic ulation, acoustics, the learning mechanism that acquires language, and interactions amongst these three factors. These assumptions are tested with respect to several areas of inquiry traditionally associated with phonological research, and some that are not. In Part 1 of this disser tation, this approach is used to explore sets of data in which cognitive factors such as perceptibility and ease of processing serve to explain phonological patterns in child 3 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. language acquisition, adult processing and cross-linguistic studies. Part 2 of this work extends this model of phonology by considering its influence on how other aspects of language are learned, represented and processed. This is done by considering the consequences of an impairment to phonological knowledge, and exploring the theory that certain developmental language impairments in children are the result of just such a deficit. In this chapter, I present an overview of different approaches to phonology and language. Much of the existing work on phonology has come from the tradition of Generative Grammar, which describes phonology and other aspects of language as sys tems of rules or rule-like principles that operate on discrete symbols such as phonemes and features. The general assumptions of this approach are typically considered, and contrasted with those of the current theory. I then outline the argument for an alter native view of linguistics that seeks to understand language processes as probabilistic, task-oriented and interactive. 0.1 Generative Phonology Theories of Generative Linguistics define phonology as a speaker’s knowledge of words’ constituents and the types of operations that can apply to them. Early phono logical theories characterized this as the knowledge of abstract units like phonemes and features, and the rules that can operate on them (Chomsky & Halle, 1968; Smith, 1973; Halle & Vergnaud, 1987). Recent theories have recast these ideas within newer for mal frameworks, such as in the case of Optimality Theory, which applies mechanisms of constraint satisfaction and interaction to phonological theory (Prince & Smolensky, 4 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1993; McCarthy & Prince, 1993; Smolensky, 1999). Similarly, other recent frame works have expanded the domain of phonological constituency to include increasingly more abstract units, as in autosegmental phonology (Goldsmith, 1990) and feature ge ometry (Clements, 1985; Sagey, 1986). In spite of their differences, all Generative approaches are faithful to the core un derlying premises of Generative Grammar. These include the assumption that humans are bom with some amount of innate linguistic structure that guarantees that the child can quickly and efficiently converge upon the correct target grammar. The motivation for this is known as the logical problem o f language acquisition, the fact that an infinite set of grammars can be deduced from the limited language inputs that children are ex posed to (Chomsky, 1965; Gold, 1967). This is restated in the poverty of the stimulus argument, which holds that the input that children are exposed to is impoverished, to the extent that it does not contain sufficient information for the child to learn the cor rect grammar of a language. An innate linguistic endowment (known as a Universal Grammar or UG, Chomsky, 1964) serves to constrain what is possible in a grammar, making language acquisition possible by narrowing the space of possible grammars to a degree at which grammar identification becomes tractable within a finite period of time (Berwick, 1997; Wexler & Cullicover, 1980). A basic example of how this works is provided below. A related assumption of the Generative approach is that the similarities among the world’s languages can also be explained as resulting from UG. All human languages share interesting and non-obvious commonalities. Similarly, they are argued to differ only in limited ways from one another. These observations are attributed to the fact 5 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. that UG constrains the space of possible grammars to a finite set (for the sake of leam- ability). As such, observations that languages share certain commonalities suggest important facts about the nature of humans’ innate linguistic endowment. A third characteristic of Generative Linguistic theories is how they describe the implementation of language in the brain. Mental grammars are considered to be a set of symbolic mechanisms that apply to a discrete inventory of language tokens. Such grammars are described as deterministic rules, principles or constraints that act upon the words in the mental lexicon to produce grammatical utterances. This algebraic model of language and the mind forms the basis for most work in the Generative tra dition (Fodor, Bever, & Garrett, 1974) along with early frameworks in artificial intelli gence and cognitive psychology (Newell & Simon, 1972). (Not all Generative theories share this view of a separable grammar and lexicon. For example, the Distributed Mor phology framework specifically argues against a lexicon , e.g. Halle & Marantz, 1993; Noyer, 1997). The Generative Phonology framework involves these same principles applied to phonological systems, and is based on similar observations about language. First, lan guages that are not genetically related tend to share a large amount of phonological structure; for example, many languages exhibit intervocalic voicing and word-final de- voicing of consonants, while no language exhibits the opposite pattern of intervocalic devoicing and word-final voicing. The nature of these patterns appears to be abstract and complex, and some have argued that they can only be derived through indirect evidence because there are no overt cues to them in the input. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0.1.1 How Phonology is Learned: The G enerative Theory By way of an example, consider the stress pattern of a language. Stress is itself em pirically observable; it involves changing the relative prominence of some segmental material in a word by making it louder, longer and higher in pitch (languages vary in which combination of these three is used to this effect). The principles that underlie how stress is applied to words are proposed to involve abstract elements of phonology that are only observable through indirect evidence. This means that the stress pattern of a language requires children to have access to more abstract facts about languages. For instance, children must be aware of such principles as feet and prosodic words and know how the headedness of these units is derived (e.g., words are made up of one or more feet, these feet should be binary, and they are either head-initial or head-final). In addition, children also must learn which units are weight bearing, such as whether long vowels are heavy or light. Inferring the stress rules of a given language thus involves analyzing the syllabic and prosodic structure of the language’s words. It is argued that children are either not capable of performing such feats of inferential logic, or that they are not exposed to a sufficiently rich set of linguistic inputs to do so. This in essence is the poverty of the stimulus argument as it is applied to phonology. Similarly, it can be argued that children fail to arrive at impossible hypotheses about phonological structure. The reason why no language allows for ternary feet (feet composed of three weight-bearing units) is exactly because language learners never hypothesize such a system. The same types of principles serve to constrain all important facts about what is possible in phonological systems: which units can be 7 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. weight-bearing (e.g., coda consonants, but not onset consonants), how phonemes are parsed into syllables, and so on. 0.2 Optimality Theory Optimality Theory (OT) represents a more recent framework in the Generative tra dition that accounts for phonological phenomena using constraints instead of rules (Prince & Smolensky, 1993). The grammar is seen as a specific ranking of constraints which is used to determine the optimality of possible phonological forms. The gram mar assesses all possible surface forms relative to this ranking to determine the correct output. 0.2.1 Underlying Assumptions o f OT OT brings with it several underlying assumptions about the nature of grammar. Among them is the claim that these constraints are provided by UG. As such, all speakers of all languages have the same set of constraints available to them. There are a variety of different classes of OT constraints, related to the types of structure that they evaluate. Markedness constraints evaluate the degree to which a form adheres to such princi ples as syllabic, segmental or subsegmental well-formedness. Faithfulness constraints evaluate relationships between related forms, such as the underlying and surface forms of a word (McCarthy & Prince, 1995), or a base and reduplicant (Benua, 1997). 8 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UG is also assumed to provide speakers with the rules governing how constraints may be ranked. In particular, it is assumed that constraints are ranked in a strict domi nation hierarchy. This principle dictates that forms violating a higher-ranked constraint will always be less harmonic than ones that do not, all else being equal. Strict dom ination serves to limit the nature of constraints-based grammars; for example, a form that violates many lower-ranked constraints will never be less optimal than a form that violates a single high-ranked constraint. OT accounts for language leamability thanks to the universality of constraints and the mechanism used to rank them. Language learning proceeds by finding a constraint ranking that produces correct surface forms for a language. In Chapter 3 I discuss sev eral proposals for how to best characterize the mechanism that learns these rankings. Another major assumption of OT is that the factorial reranking of these universal con straints will generate all possible grammars, while ruling out all impossible grammars. As a result, OT can account for crosslinguistic facts about phonological systems, while not overgenerating systems that do not occur. 0.2.2 Variability W ithin the OT Framework Any consideration of OT as a theoretical framework carries with it the risk of not in cluding a variety of different modifications that have been proposed since its inception. That is, it difficult to critique OT as a single entity, since it is a ‘moving target’; not every OT account adheres to all the canonical principles set out in the original Prince Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. & Smolensky (1993) work. For the sake of clarity, I refer to the original formula tion of this theory as OT93 when discussing the differences between it and other OT approaches. Consider one primary mechanism of OT93, the commitment to strict constraint domination. While this is considered an important component of UG in the OT frame work, it is also clear that a number of recalcitrant cases exist in which constraint in teraction is much more complex. For example, obligatory contour principle (OCP) violations in Japanese do not simply have an additive effect on a form’s harmony. Instead, structures that incur a single OCP violation are deemed acceptable whereas forms that incur multiple violations of this constraint tend to be ruled out. It appears to be the case that multiple violations of a constraint have a multiplicative effect such that a double violation in a single domain is worse than the sum of two single violations in multiple domains (Ito & Mester, 1998). This is achieved in OT with the use of a locally self-conjoined constraint that is higher-ranked than its unconjoined counterpart. One perspective on local conjunction is that it violates the principle of strict domination by weighting multiple violations of a single constraint more heavily than single violations of higher-ranked constraints. Similarly, the conjunction of two separate constraints also appears to be possible (Ito & Mester, 1998). For example, German Coda Devoicing can be explained as the result of forms that violate both the Voiced Obstruent Prohibition (VOP) and the N o Coda constraints. Forms that violate both within a local domain will also vio late the conjoined constraint VOP&NoC oda that is higher-ranked than either of its counterparts. Here again, this could be interpreted as a violation of strict constraint 10 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. domination because it considers the concurrent violation of two separate constraints as having a stronger effect on the suitability of a form than a single violation of the higher-ranked constraint. Other mechanisms have also been proposed to contend with the shortcomings of strict domination. The most notable of these is Sympathy Theory (McCarthy, 1997; Ito & Mester, 1997; Walker, 1999), in which ‘sympathy’ status is assigned to an output candidate. The winning candidate is then decided on based in part on its faithfulness to the sympathy candidate. Sympathy is useful in resolving ranking paradoxes in several types of problems, most related to phonological opacity. The effects of these types of mechanisms seems to be to create a system in which the basic tenets of strict domination can occasionally be overridden. That is, multiple violations of a lower-ranked constraint, or of two conjoined lower-ranked constraints, can indeed override higher-ranked constraints. Allowing for these types of complex constraint interactions strongly influences the ability to account for various behaviors within OT, to the extent that it changes the nature of what is and is not ruled out by OT grammars. For this reason, it is difficult to speculate on whether any OT-type theory could account for a given set of empirical data. 0.3 Connectionist Phonology In this work I reconsider these types of data by exploring a radically different approach to how phonology is acquired. There are two major characteristics of this theory; that children are not bom with such a complete set of linguistic knowledge (Part 1) and 11 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. that phonology is not an encapsulated module of grammar but instead a component of a larger interactive system (Part 2). This work considers the possibility that the input to which children are exposed contains a rich set of information that they are able to use in order to acquire complex linguistic behaviors. This account does not reject the possibility that human language processing calls upon complex abstract structures such as features, phonemes, sylla bles, feet and prosodic words. Instead, it suggests that the linguistic input to children contains sufficient information to allow them to acquire and use these concepts with out the help of some innate linguistic structure. However, this approach does reject several of the core mechanics of Generative grammars, namely the characterization of grammars as explicit and deterministic rules or constraints, and a lexicon that exists in isolation from such grammars. The present work develops this alternative view of phonology by explaining phono logical mechanisms in terms of physiological factors involved in perception and artic ulation, and cognitive constraints on learning and processing. Functional accounts of this type are not new (e.g., Liljencrants & Lindblom, 1972; Lindblom, MacNeilage, & Studert-Kennedy, 1984; Stampe, 1979) although they have been highly controver sial, often because of the difficulty involved in actually testing these theories. More recently, there has been renewed interest in functional theories of language, thanks in part to a better understanding of articulatory, acoustic and cognitive processes in both children and adults. The result has been a greater number of works that appeal to func tional factors in explaining phonological patterns (among many others, Archangeli & 12 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Pulleyblank, 1994; Boersma, 1998a; Browman & Goldstein, 1992; Byrd, 1994; Flem ming, 1995; Hayes, 1997; Kaun, 1995; Ohala, 1990; Steriade, 1994; Stevens, 1989; Walker, 1998). Connectionist phonology merges the functionalist approach to phonology with the idea that language is represented within a neural system, and as such can be character ized as the result of the basic principles that govern such systems. The Connectionist (or ‘neural network’) framework has emerged in the past two decades as a way to ex plore how neural systems learn and represent cognitive processes (McClelland, Rumel- hart, & the PDP Research Group, 1986; Seidenberg, 1993; Elman et al., 1996) and in particular, language (see Seidenberg, 1997, for a review). The connectionist approach extends beyond simply modeling specific tasks within connectionist networks. Rather, it reflects a more general theory of cognition in which basic properties of connectionist models are used to explain a broad range of empirical data. Connectionist networks simply provide a formal mechanism in which these theories can be implemented and studied in vitro, or perhaps more precisely in machina. 0.3.1 Some Basic C onnectionist Assumptions The field of Connectionism has grown to include many areas of inquiry, a number of which are unrelated to the brain and cognition. The sense in which I use the term ‘Connectionism’ in the present work is limited to the use of connectionist models that encode knowledge in a parallel and distributed mechanism to simulate cognitive tasks and knowledge representation. This approach has largely grown out of work 13 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. by McClelland, Rumelhart, and the PDP Research Group (1986). Smolensky (1999) outlines the fundamental Parallel Distributed Processing (PDP) principles as follows: (2) a. Mental representations are distributed patterns of numerical activity. b. Mental processes are massively parallel transformations of activity patterns by patterns of numerical connections. c. Knowledge acquisition results from the interaction of (i) innate learning rules (ii) innate architectural features (iii) modification of connection strengths with experience A major challenge of this approach has been to arrive at a greater specification of these basic principles with respect to a given domain of inquiry. In isolation, they could describe any number of competing theories of language. However, the connectionist approach to phonology that I advocate in this dissertation is not only an attempt to ap ply these theoretical commitments to phonology, it is also an attempt to better specify a set of theoretical commitments that form its basis. The goal is to draw clear dis tinctions between it and other theories, when applicable. Here I briefly outline several specific assumptions that are important to this theory. 0.3.2 Task-orientation First, it is assumed that patterns of language are acquired in the course of learning linguistic tasks. These include auditory word recognition, word production and repe tition, and sentence recognition. This task-oriented approach differs from Generative 14 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. approaches that consider the task of language acquisition to be one of grammar identi fication. In that approach, children use utterances in their environment in conjunction with innate elements of grammar to set parameters, re-rank constraints, or learn rules. Task-orientation addresses the criticism that grammar cannot be empirically de duced by connectionist models, stemming from the observation that grammars of the type posited by Generative linguists have been shown to be unleamable by some classes of connectionist networks (Dresher & Kaye, 1990; Dresher, 1999). Such ob servations are based on the assumption that language learning involves grammar se lection through such processes as parameter setting; it is not relevant to the approach taken here because its characterization of language learning and processing tasks are completely different from what is assumed by the competing framework. Moreover, it is argued that characterizing acquisition as grammar identification, as the Generative approach does, ignores the basic communicative function of language. In so doing, it also ignores the types of learning that can result from communicative tasks. As I demonstrate throughout this dissertation, networks that learn language tasks, rather than grammars perse, can demonstrate human-like linguistic competence in addition to accounting for a broad range of data that are not explained by alternative theories. 15 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0.3.3 Input-driven learning Another important aspect of the current theory is the assumption that language learners use whatever information is available to them in order to develop internal representa tions of language. Learners derive a great deal of linguistic knowledge from regu larities that exist in the input they are exposed to. This again contrasts with work in Generative linguistics, which assumes that language stimuli available to a child are insufficient to support learning of useful principles of language. As stated above, the ‘poverty of the stimulus’ argument derives from the observation that language learners are not exposed to the entire range of grammatical utterances in their language, nor are they given explicit feedback on the utterances that they produce. The onus seems to be on the connectionist approach to demonstrate how char acteristics of a child’s input can inform the language learner as to the nature of the underlying linguistic system. As such, I emphasize throughout this work how models of specific language phenomena are affected by the nature of the input that they re ceive. The fact that the model is itself sensitive to statistical regularities in the input and is able to maintain certain regularities — but perhaps not others — forms the basis of the learning argument in Connectionist Phonology. A major question that arises from this claim is the extent to which this approach can also account for the special status of language in terms of how it is acquired. For example, children appear to be highly attuned to the speech signal from a very young age (Boysson-Bardies et al., 1992; Jusczyk, 1997). The answer seems to be that, on any theory, language development relies to some extent on innate mechanisms. The present account differs from Generative theories in terms of the nature of the 16 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. mechanisms that are thought to be innate. There are clearly cognitive mechanisms at play in language acquisition that serve to direct the learner’s attention toward the speech signal. Similarly, other types of innate attentional and learning mechanisms help guide children in learning to categorize and name objects (Markman, 1989). Note, however, that such innate mechanisms are different in kind from what is assumed by the strong nativist view of acquisition; on that account, children require specific knowledge of language structure in order to guide the acquisition of grammar. The present work explores the alternative view that much of this type of information is actually available from the input that children are exposed to, or is contained in general constraints on how neural mechanisms learn and encode information. In addition, much of the information that children can use to learn language can be inferred from linguistic input without the need for parents to modify it in specific ways. That is, while some aspects of the input that caretakers produce for children is arguably simplified in terms of grammatical and phonological structure (so-called motherese or caretaker speech, Gleitman, Newport, & Gleitman, 1984; Kuhl et al., 1997), many aspects of adult speech also contain statistical cues useful for acquiring speech. In the connectionist theory, this type of information comes from the fact that information that helps to learn language will also be used to encode and process it. That is, the statistical cues that allow children to acquire linguistic information are themselves a consequence of how neural systems encode information. As such, the existence of these cues is not a coincidence, they reflect how adult speakers are generally encoding language. 17 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A strict ‘motherese’ account of acquisition would suggest that caretakers manip ulate their speech patterns in various ways that promote acquisition. The present ac count is different in that it suggests that certain characteristics of children’s input that promote acquisition are always present in speech, and are not limited to child-directed speech. 0.3.4 Learning Quasiregular Domains An important benefit of connectionist models is their ability to encode patterns of knowledge in a probabilistic way. Generative linguistics has largely sought to describe language behaviors as deterministic and rule-like, if not rule-governed. In reality, how ever, many phonological phenomena are only partially regular. For example, com pound word formation in Japanese triggers a sequential voicing rule (commonly called rendaku, Ito & Mester, 1986) that causes the initial phoneme of the second word of a compound to be voiced, except when it follows a voiced obstruent. Thus, nise+kane (‘fake money’) is produced nisegane, but onna+kotoba (‘feminine wording’) is pro duced onna-kotoba. While the rendaku process is highly productive, it is blocked in some idiosyncratic cases, such as the compound nawa+hashigo (‘rope ladder’) which is produced nawa- bashigo (the sequential voicing rule predicts *nawa-hashigo), and nise-satsu (‘fake bill’, cf. *nisezatsu). Thus, while rules can be used to account for most rendaku process, they cannot account for the exceptional cases in which the rule is ignored. 18 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Connectionist models are ideally suited to explaining such quasiregular behavior, since they are able to encode both regular patterns and their exceptions within a sin gle architecture. This has been demonstrated in-depth for the case of English past tense verbs, which exhibit both a high degree of regularity (raved, baked, lived), but also a number of exceptional cases (had, took, gave). Implementing the English past tense in connectionist models has given researchers new insights into how morpho logical systems are acquired and used (Rumelhart & McClelland, 1986; MacWhinney & Leinbach, 1991; Plunkett & Marchman, 1993; Joanisse & Seidenberg, 1999), in addition to stimulating debates as to the validity of this approach to language (Fodor & Pylyshyn, 1988; Pinker & Prince, 1988; Pinker, 1991; Marcus et al., 1992). In a similar way, I suggest that the Connectionist Phonology theory might help in un derstanding phonological processes, and the degree to which they fail to demonstrate rule-like behavior. 0.3.5 Interactivity The connectionist approach also assumes that cognitive processes are interactive. Thus, while different types of linguistic knowledge are recognized to be neurally or functionally distinct at some level, information tends to pass freely amongst them. This property is important to the present theory, which suggests that phonology is not an isolated type of linguistic knowledge, but can both influence and be influenced by other aspects of language such as semantics, syntax and morphology. Here again, this view is at odds with earlier symbolic theories of language and the mind that view components of language and cognition as neurofunctionally modular 19 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. and encapsulated from one another (Fodor, 1983). On this type of account, differ ent aspects of grammar and cognition are independently processed and represented in the brain. This view also assumes that different language modules cannot interact; information flow is unidirectional such that processing in one module is not directly influenced by other levels of processing. As I demonstrate in this dissertation, the as sumptions of discrete language modules and information encapsulation among them are incorrect perspectives on how language is represented in the brain. All aspects of language integrate similar basic types of knowledge such as sound and meaning. Differences in types of linguistic knowledge result from differences among the pro cessing tasks that call upon these types of knowledge. Interactivity is assured by the fact that similar types of knowledge are accessible to these different processes, though to different degrees. 0.4 Overview of Chapters Connectionist Phonology involves a much broader field of inquiry than could be treated exhaustively in the present work. The purpose of this dissertation is thus to lay out a general framework and methodology in which this theory can be explored, and to suggest future avenues of inquiry. It considers data from a number of different areas of research, and argues for a common model of language that can help to explain them. 20 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Part 1 The first half of this work applies Connectionist theory to data that are typically asso ciated with formal grammar. In Chapter 1, I look at various aspects of phonological systems from the functional perspective, which proposes only limited innate knowl edge. Two major observations about crosslinguistic patterns in phonology are consid ered from the connectionist perspective. I first investigate the source of vowel inven tory preferences across languages using connectionist models to evaluate the relative Ieamability of different inventories, based on a number of their acoustic properties. Next, I use a similar type of model to investigate the functional nature of syllable structure preferences across languages. This is done with the help of connectionist models that learn to identify and produce syllables of different configurations. The ability of these models to identify and produce consonant contrasts in onset and coda positions is investigated, to determine how syllabic position might affect the leamabil- ity and usability of different consonant types. Based on these data, it is argued that general constraints on learning and processing interact with acoustic and articulatory factors to influence the shape of the world’s languages. Chapter 2 addresses issues of phonological acquisition and representation from the perspective of one set of phenomena in a single language: Dutch stress. I review what I consider to be the two important issues of acquisition facing the Connectionist Phonology account, the complexity of what needs to be learned, and the way in which children learn it. I argue that these questions are also directly relevant to questions of how speakers represent grammar, because the same mechanisms that are used to learn language are also necessary to use language. 21 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Empirical data on Dutch stress acquisition indicate that children learn complex phonological behavior thanks to abstract units like moras and feet. This conclusion is based on the observation that children acquiring stress in Dutch produce a variety of errors that suggests they are accessing abstract phonological structure from an early age. Phonological acquisition is also accomplished in spite of noisy and conflicting information about the ‘default’ stress rule in Dutch. Likewise, children receive at best indirect and incomplete feedback about the abstract prosodic system that underlies it. Such data are reconsidered within the Connectionist theory of acquisition by im plementing a model of Dutch word learning that produces child-like behaviors as it acquires the Dutch stress rule. In addition, this model illustrates how children are able to acquire regularities in a given language in spite of cases that fail to abide by these regularities. The present work is not the only approach to grammar that seeks to apply Connec tionist principles to the study of language. Optimality Theory (Prince & Smolensky, 1993) integrates concepts such as constraint interaction with some basic principles of the Generative tradition. Chapter 3 seeks to better elucidate the relationship between the two approaches, to better understand what motivates the application of model- based connectionist theory to problems of grammar. Some similarities between the two approaches are discussed in-depth. I then suggest some important ways in which Connectionist Phonology can be a useful complement to Optimality Theory, by help ing to better understand the bases of constraints. The chapter concludes with some discussion of how data in Chapter 2 suggest problems with some basic assumptions 22 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. of Optimality Theory, and how newer developments can address these issues by in tegrating more of the basic assumptions of Connectionism into Optimality theoretic models. Part 2 The second part of this work applies Connectionist principles to an altogether different type of linguistic phenomena, specifically language deficits. Other types of linguistic knowledge like morphology and syntax also involve the integration of phonological and semantic information, though often in different types of tasks than are considered in Part 1. The Connectionist Phonology model suggests that these aspects of grammar are not actually independent of phonology, but are related in important ways to it. As a result, phonological factors might interact with morphology and syntax in interesting ways. This theory is laid out in Part 2, and is specifically investigated with respect to impaired morphological and syntactic processing in developmentally language im paired children. The general claim that I advance is that phonological deficits, perhaps due to impaired speech perception, can lead to a variety of developmental deficits in other ‘modules’ of grammar because of the degree to which they depend on phonolog ical information. C hapter 4 discusses acquisition impairments as an interesting test case for this claim. The facts I seek to account for are drawn from the literature on Specific Lan guage Impairment (SLI), which has typically focused on the morphological and syn tactic deficits in these children. The Generative approach has suggested that gram matical impairments in SLI demonstrate how UG can be interrupted in special cases, 23 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. resulting in language learning disorders targeting specific aspects of grammar. The approach that I take is different in that it suggests these impairments are sequelae of a more basic impairment to speech perception. I first review the various Generative accounts concluding that SLI involves a deficit to a specific module of morphology or syntax (Gopnik, 1997; Rice & Wexler, 1996; van der Lely, 1997). This literature is critiqued on the basis of on other studies finding a broad range of language deficits in these children, including deficits to phonological processing and a number of non-linguistic impairments. Alternative approaches are presented, including work by Leonard (1998), Bishop (1997) and Tallal and Stark (1980) that suggest some more basic deficit could be to blame for the grammatical deficits in SLI. This approach has been criticized in the past because of the difficulty in explaining how a deficit in perceiving certain phonemes could result in the array of grammatical problems seen in SLI. The connectionist approach is used to address this issue by simulating the importance of phonological processing on the development of other grammatical abilities. Impairments to morphological processing are simulated in these models as resulting from a phonological/perceptual deficit. Research has also shown syntactic processing deficits in SLI, for instance in re solving bound anaphora (van der Lely & Stollwerck, 1997). The suggestion from such results is that children with SLI have an impairment to specific modules of syn tax. In Chapter 5 I present a critical review of these claims, including a reanalysis of the relevant data. I further suggest that syntactic comprehension impairments are best characterized as a phonological processing deficit that degrades listeners’ working memory for sentences. 24 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Evidence for the importance of phonology and working memory in sentence com prehension is reviewed. This includes data indicating that syntactic processing ability varies in normal speakers as a function of working memory capacity, and that this capacity is closely linked to phonological encoding. In addition, studies have indi cated that children with SLI have limited phonological working memory capacities compared to normally developing children, and that this deficit correlates with their sentence comprehension difficulties. I summarize newer data suggesting that SLI-like comprehension deficits can be simulated in normal adults through degraded speech. The theory that emerges from this is one in which degraded phonological representa tions lead to working memory limitations, which in turn result in impaired sentence comprehension. This theory is tested using a connectionist model that is architec turally similar to the ones presented in the previous chapters. The network identifies words’ phonological forms by recognizing their meanings; sentence comprehension is simulated in this network by presenting it with sequences of words that form sen tences. Inducing a phonological deficit, such as would occur in the case of a perceptual impairment, resulted in a reduced ability to resolve anaphors while leaving less syntax intensive abilities intact. I conclude with Chapter 6 in which I discuss how the many models presented throughout this dissertation relate to the single model/theory of Connectionist Phonol ogy. I propose that all these simulations represent subcomponents of this broader model of phonology that incorporates articulation and acoustics along with semantics, pragmatics, orthography and other types of perceptual inputs. When put together, these 25 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. models form a coherent story of how the patterns of language emerge from these more basic types of knowledge. 26 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Part I Phonological Acquisition, Processing and Typologies 27 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 1 Crosslinguistic Patterns of Phonology This chapter addresses the observation that the phonological systems of the world’s languages tend to show interesting similarities with respect to phoneme inventories and syllable structure.1 Typological observations to this effect have been especially informative to formal theories of linguistics, because of the emphasis these theories place on understanding the universal nature of phonological knowledge. The obser vation that a number of genetically distant languages show similar phonological pro cesses or inventories suggests that these represent more than simply random patterns in language. Instead, this seems to indicate that speakers share innate predispositions to ward specific phonological patterns, resulting in a greater proportion of these preferred patterns crosslinguistically. In this chapter I consider two areas of inquiry into crosslinguistic phonology, and present a connectionist approach to understanding these data. The first of these con cerns how languages with the same number of vowels tend to have the same vowel ‘Portions of this chapter have previously appeared in (Joanisse & Seidenberg, 1998a), copyright retained by the authors, and in (Joanisse, 1999), copyright retained by the author. 28 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. inventories. The account I propose is an extension of work by Liljencrants and Lind blom (1972), who proposed that vowel systems are influenced by a tendency to max imize the acoustic distance between phonemes in order to maximize perceptibility. I demonstrate how the addition of a connectionist learning mechanism enhances the explanatory force of this theory. The second part of this chapter examines data pertaining to more abstract phono logical constituents such as syllables. It is observed that, across languages, syllables are asymmetric in their treatment of onsets and codas. Many languages permit only onsets, few require codas, and none prohibit onsets (Jakobson, 1941/1962). I propose that phonetic and computational factors are responsible for these preferences; optimal syllables are ones that enhance production and perception, and lead to statistical prefer ences that enhance learning certain syllable types over others. I investigate this theory using connectionist models of word recognition and production, and demonstrate how constraints on learning to perceive and produce specific syllable types in these models reflect statistical preferences for these types of phoneme sequences within and across languages. In both these cases, the results are contrasted with more traditional accounts of lan guage universals, which posit innate symbolic mechanisms to account for these types of phonological patterns. For example, Generative accounts have sought to character ize phoneme inventory preferences as a consequence of the markedness of a phoneme’s constituent features (Chomsky & Halle, 1968; Sagey, 1986; Clements & Hume, 1996), or the number of features needed to distinctively describe a phoneme (Rice & Avery, 1995). This type of account suggests that not all phonemes are equal; instead, some are 29 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. preferred over others due to speakers’ universal and innate knowledge of what is and is not marked. Generative explanations of syllable structures also appeal to notions of a universal knowledge of markedness. Here again, it is assumed under this approach that speakers have an innate predisposition toward specific syllable structures, and that languages tend to reflect these tendencies. The present work adopts an alternative approach in which these data are explained as resulting from the interaction of factors related to speech perception, speech pro duction and cognition. This chapter does not cover the entire range of crosslinguistic phonological data. Instead, it attempts to lay out the formal mechanisms within which this alternative framework can be implemented and tested. The general type of explanation that I propose for these crosslinguistic tenden cies is similar to that of other functional accounts of phonological phenomena draw ing upon perceptibility, speech production, and computational constraints on learn ing (Boersma, 1998a; Browman & Goldstein, 1992; Flemming, 1995; Frisch, 1996; Hayes, 1997; Lindblom, 1986; Ohala, 1990; Russell, 1994; Stampe, 1979; Steriade, 1994; Stevens, 1989). It suggests that the ability to accurately perceive and produce specific phonemes in specific contexts is universal, and plays a role in determining phonological processes across languages. The work I present in this chapter is dif ferent to the extent that it addresses these issues within a connectionist framework. Connectionist simulations are used in order to examine the learning and processing of phonological patterns. By teaching models to produce and perceive the relevant types of patterns, I am able to directly test hypotheses about their relative leamability. The 30 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. advantage of this approach is that it posits a mechanism by which universal tendencies can emerge from such functional factors. 1.1 Vowel Inventories The first study I present investigates an important class of regularities concerning the distributions of vowels.2 Although the human vocal apparatus can produce many pos sible vowels, a large proportion of languages only use between 4 and 8 of them dis tinctively. In addition, languages with a given number of vowels tend to use similar sets of vowels. For example, most five-vowel languages employ the set [i e a o u]; a handful use similar inventories, such as [i e a o u]; and many other sets of vowels are not observed at all, such as [e y ce ae u]3. One (possibly extreme) approach within the Generative tradition has been to view these phenomena in terms of the concept of markedness: vowel features are organized into a markedness hierarchy, such that vowels incorporating more marked features are less suitable in an inventory (Chomsky & Halle, 1968; Clements, 1985). The major drawback of this approach is the lack of criteria independent of mere frequency of occurrence for determining which vowels or features are ‘marked.’ This section addresses the alternative hypothesis, that vowel inventory patterns reflect functional constraints related to perception and production; specifically, lan guages tend to maximize distances among vowels. I focus on languages’ tendency to 2I am grateful to Kevin Russell for suggesting many o f the core ideas in this section. 3See Chapter 8 of Maddieson (1984) for an overview o f the facts concerning vowel inventories. 31 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. maximize the acoustic distances among constituent vowels, although featural or ges tural distance may also be relevant. Inventories involving acoustically well-dispersed vowels are easier to both acquire and process because they are easier to discriminate, creating a tendency for languages to recruit such inventories. In contrast, less acous tically dispersed vowel inventories are more difficult to acquire and process because of the greater probability of misperceiving a constituent vowel, leading to languages shifting away from such inventories. On this view, inventories such as [i I e a] do not occur because they involve smaller acoustic distances than other, attested vowel sets. This is not a new idea, and as been studied from a variety of different perspec tives in the past (Flemming, 1997; Jakobson, 1941/1962; Liljencrants & Lindblom, 1972; Manuel, 1990). The contribution of the present work is to apply connectionist methodologies to this theory, to better understand how functional factors influence the phonological systems of the world’s languages. 1.1.1 Multiple Constraints on Inventories One question that arises from this is whether theories of minimal contrast would pre dict no contrast at all; that is, if the goal of languages is to maximize contrastiveness, why do so few languages (if any) have only one vowel? The reason seems to be that languages’ vowel systems represent a point of equilibrium across several competing constraints (Flemming, 1997): • Maximize the contrastiveness o f segments in an inventory (languages should minimize the confuseability o f words) 32 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. This constraint promotes smaller vowel inventories due to the impact that having fewer vowels has on words’ contrastiveness. All else being equal, having fewer vowels in an inventory will mean a smaller likelihood of there being two words that differ only with respect to an acoustically minimal vowel difference. • Encode a large number o f tokens in a language (the more words a language can create, the better). This is kept in check by the second constraint, which encourages languages to maintain a large number of words, in order to communicate information about a large variety of concepts unambiguously. Larger phoneme inventories help to this end by providing a greater number of legal combination of segments while minimizing the need for either homophony or longer words. For example, a language with nine consonants but only 3 vowels will allow only 300 legal monosyllables with (C)V(C) structure; the same language. Adding only two more vowels to that inventory increases this number to 500. • Minimize the length o f these tokens (languages prefer shorter words because they promote faster processing). Languages clearly seek to minimize word length. For example, the number of possible three or four syllable words in English is geometrically greater than one and two syllable words. Nevertheless monosyllables comprise the highest frequency words in English. The source of this constraint would appear to be the need to minimize the time needed to communicate a given concept; because speech is a serial channel, any utterance incurs a cost with respect to its duration 33 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. because of its impact on memory load. As such, shorter words result in shorter utterances, which in turn minimizes computational load. This constraint tends to promote larger phoneme inventories because a greater number of phonemes necessarily increases the number of possible words of any given size in a lan guage. In the present paper, I consider only the first of these constraints, by considering the effect of vowel dispersion and variability when ail other factors are held constant. It is assumed that maximizing vowel distinctiveness ultimately interacts with other sources of constraint in languages, but that it is nevertheless possible to study it in isolation in order to better understand its effect on vowel inventories. 1.1.2 Previous Models of Contrast A similar idea has been explored using mathematical models (Liljencrants & Lind blom, 1972; Lindblom, 1986; Boe, Schwartz, & Valee, 1994) in which the acoustic dispersion of vowels is expressed as a function of the Euclidean distance between their corresponding formant frequencies. Such models predict perceptually optimal inventories by maximizing the distances between all vowels in a given set. Although this approach can account for a considerable amount of data about the distributions of vowels, it is limited in several respects. First, it does not represent the variability with which vowels are produced; as I will demonstrate, this variability can play a role in the frequency with which a vowel occurs in an inventory. Second, this approach is not a model of why this type of distance maximization occurs - the optimization mechanism used in the Lindblom model does not appear to be tied to specific aspects of language 34 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. processing and acquisition, and represents only a metric by which an inventory’s op timization can be gauged. The present work more directly ties this optimization to constraints related to learning and processing, in order to better understand the mech anism by which this optimization might occur. Finally, previous approaches do not easily allow the integration of other types of factors used to differentiate vowels, such as nasalization, diphthongization and vowel length. 1.1.3 A Connectionist Model of Phoneme Acquisition The present work builds on Lindblom’s approach by situating it in a theory of how phonemic inventories are acquired and processed. Joanisse and Seidenberg (1997) described research in which connectionist models were trained to recognize the vowel inventories of pseudo-languages based on their acoustic representations. The approach is based on the premise that, like humans, connectionist models are not equally predis posed to learning and recognizing all types of patterns. By varying the characteristics of these vowel inventories and assessing the models’ capacities to leam them, this work generates predictions about the inventory’s relative suitability. Inventories that are easier to leam are predicted to be more likely to occur in the languages of the world. As inventories become more difficult to leam and process, their likelihood approaches zero. The general structure of the models used in this section is illustrated in Figure 1.1. The models were trained to recognize speech sounds based on a set of examples that were presented to it. The input to the model was the spectrographic representation of a given speech item. Identification resulted from passing activation through two layers 35 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. of connections to the output layer, which represented individual phonemes by assign ing a separate output unit to each possible phoneme input to the model. Learning proceeded through the adjustment of connection weights using the backpropagation learning algorithm (Rumelhart, Hinton, & Williams, 1986); over the course of train ing, the model adjusted these weights in ways that facilitated accurate identification of the training stimuli and, as a consequence, generalization to novel examples. The relative difficulty of learning the training set was reflected in the rate at which train ing proceeded, asymptotic levels of performance, and the number of items a trained network misclassified. Phoneme Identity hidden units Acoustic Input Figure 1.1: General structure of the vowel recognition models used in this chapter. The network leams to map an input, consisting of the spectral representation of a speech sample, to a discrete representation of that speech token. This model was used to explore two types of vowel inventory phenomena. The first related to the preference of four-vowel languages to choose front mid vowels over back mid vowels. I propose that this is related to differences in these vowels’ variability, owing to the precision with which back vowels can be produced compared to front vowels (Beckman et al., 1995; Kaun, 1995). The second set of simulations explored the interaction between the number of vowel quality contrasts in a language, and its 36 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. tendency to use contrastive length. It is shown that both factors affect ease of learning and processing in the model, providing an explanation for why certain inventories are preferred. 1.1.4 Experiment 1: Front-Back Asymmetries The first set of simulations concerns the observation that, in four vowel languages, there is a greater tendency for languages to use a front mid vowel than a back mid vowel, as illustrated in Figure 1.2. This is an interesting asymmetry, given that the dispersion characteristics of the two inventories are approximately equal. A simple dispersion model as in Liljencrants and Lindblom (1972) predicts there to be little difference in the occurrence of the two inventory types, but this is contradicted by the empirical data. ‘Type A’ ‘TypeB’ 10 languages 2 languages Figure 1.2: Asymmetry of back vowels over front vowels. The x and y axes represent second formant (F2) and first formant (FI) values, respectively. Note that the relative dispersion of the two sets is roughly the same. One explanation for this asymmetry derives from differences between front and back vowels with respect to their tendency to vary in production. As Figure 1.3 illus trates, there is a difference between nonlow front and back vowels in their tendency to overlap with each other. Speakers are able to produce the vowel /i/ with a better degree 37 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. of precision, by stiffening the genioglossus muscle and propping the tongue laterally against the dental ridge; this is not possible for back vowels however. In addition, the height range for back vowels is also somewhat smaller than for front vowels. These factors lead to a greater first formant (FI) variability for /u/ than for /i/, causing more overlap for the /u/ — /o/ contrast than for the complementary /i/ — Id contrast (Beckman et al., 1995; Perkell & Nelson, 1985). A second set of articulatory facts might also explain this asymmetery. Kaun (1995) has noted an apparent ‘articulatory antagonism’ between lip rounding and jaw lowering that is inherent to producing nonhigh vowels (such as /of). The result could be a greater degree of FI (and likely also F3) variability in mid back vowels, compared to their front counterparts. This greater variability could again lead to dispreferences for inventories like [i a o u ] , compared to [i e a u]. This type of explanation is consistent with Stevens’ Quantal Theory (Stevens, 1989), in which a phoneme’s distinctiveness is affected by nonlinearities in the re lationship between vocal tract configurations and acoustics: phonemes with quantal articulations are those that can be produced with greater precision as a result of such nonlinearities, because the articulatory target - the range within which an acousti cally appropriate production can be produced - is more easily achieved. In the present account, a vowel inventory’s frequency can be explained in part as the result of the overlap of its constituent vowels. Inventories that minimize this overlap have more discriminable vowels. As a result, it is easier for language users to process words in languages that maximize discriminability. The frequency with which these languages occur is greater because speakers are less likely to neutralize contrasts. Speakers will 38 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. be more likely to neutralize less reliable contrasts, which explains why such languages occur less frequently, if at all. Figure 1.3: Schematization of vowel variability. Note the greater tendency for nonlow back vowels to overlap, compared to their front counterparts. 1.1.4.1 Method & Stimuli To test the hypothesis that production variability has an effect on vowel inventory fre quencies, networks were trained on the invented vocabularies of one of two pseudolan guages. These pseudolanguages differed only with respect to their vowel inventories: the artificial vocabularies consisted of either the vowel set [i e a u] (Language A) or [i a o u] (Language B). All training vowels were synthesized using a Klatt speech syn thesizer (Klatt, 1990) implemented on a Linux operating system. FI, F2 and F3 values for each vowel quality were obtained using the means reported in Lindblom (1986), representing these vowels’ usual positions in languages. Each vowel type’s F1-F3 stan dard deviations were obtained from analyses of the DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus of spoken English, and from data in Beckman et al. (1995). Back vowels had FI standard deviations that were greater than those of the correspond ing front vowels. The result was a set of vowels with similarly spaced /i/-/e/ and /u/-/o/, but with greater variance for the back vowels. less variability/overlap more variability/overlap 39 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Each synthetic vowel token was unique because the F1-F3 values for each value were randomly selected based on the observed mean and standard deviation for such vowels in actual languages. Thus, 35 unique instances of each vowel type were gen erated. Each vowel was synthesized as a raw waveform which was then transformed into fifteen frames of 136 spectral coefficients (bandwidth: 62.5 Hz, frequency range: 0-8500 Hz) using the Fast Fourier transformation. These values were then rescaled to the 0.0-1.0 range to be used as input vectors for the networks. Network training consisted of 100,000 training trials. At the start of each training trial, the model was presented with the spectral coefficients of a vowel token randomly drawn from the training set. Activation first propagated forward through the network. Then, the obtained activation on the output layer was compared to the target activa tion. The backpropagation algorithm was used to adjust connection weight values in order to minimize the difference between obtained and target outputs (Rumelhart et al., 1986). Over the course of many forward- and backpropagation trials, the network was expected to converge upon a near-optimal set of weights that allowed it to produce as few errors as possible; initial weights were randomized, with a range of -0.01 - 0.01; the learning rate was 0.01; the error radius (the threshold of precision to which each unit was trained) was 0.01. The network learned to identify acoustic vowels on the input by activating an out put unit corresponding to the identity of that vowel. All vowel types were presented an equal number of times over the course of training. It was hypothesized that the greater degree of overlap between nonlow back vowels compared to nonlow front 40 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. vowels will cause slower learning rates and poorer generalization in models trained on Language B, compared to Language A. The use of a supervised learning algorithm like backpropagation merits some dis cussion. This learning procedure represents a simplification of the learning task ac tually confronting children as they leam to categorize speech. In reality, children are not assumed to be innately aware of all possible phonemes that are spoken to them. Instead, phoneme categories are learned as commonalities across words. For exam ple, the /ae/ vowel is learned as the overlap between such words as ‘bat’, ‘cat’ and ‘hat’, and the difference between these words and such minimal pairs as ‘bought’, ‘caught’ and ‘hot’. A more realistic model of how phoneme categories are acquired is presented by McCandliss, Fiez, Conway, and McClelland (1999), who demonstrated phoneme learning in an unsupervised Hebbian learner that acquired categorical per ception through repeated presentation of acoustically-represented phonemes. The present learning scheme was used because it allowed me to directly assess how individual phoneme categories were learned in the network; unlike the McCandliss et al. model, the present network had an output layer that explicitly encoded the identities of phonemes on the input. This provided a way to determine whether acoustic inputs were beging correctly classified by the network. In spite of the simplified nature of the network’s task, this model represents a useful demonstration of how a neural system leams the discriminability of the phonemes in an inventory. Differences in the relative ease of discrimination of two inventories are likely to be magnified under more realistic situations, such as the ones confronting actual language learners. 41 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1.1.4.2 Results and Discussion Three different networks were trained on either Language A or B, for a total of six networks; the results reported for each language are averaged across three networks. To assess overall learning of training items, networks were tested at intervals of 10,000 training trials on the complete training set. This was done by presenting a training item to the network, and calculating the resulting Sum Squared Error (SSE) on the output layer, a function of the overall correct and incorrect activations of all output units for a given target output (Rumelhart et al., 1986). The mean SSE’s across all items in the training set, at each 10,000-trial interval, are plotted in Figure 1.4. Higher mean error rates for models trained on Language B indicate that these networks had more difficulty learning the training set consisting of the vowel set [i a o u] compared to those trained on [i e a u]. 5 2 1 .5 • Lang A: More Attested ■ Lang B: Less Attested 1 10 20 30 40 50 60 70 80 90 100 Training Trials (x 1000) Figure 1.4: Sum Squared Error rates averaged over all networks trained on two pseu dolanguages. Lower rates for Language A suggest less difficulty in learning the train ing set. 42 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. This result is consistent with the hypothesis that the Ieamability of a vowel in ventory affects its frequency of occurrence in the world’s languages. In addition, the networks were capable of learning both pseudolanguages, consistent with the obser vation that both inventories are attested, although with different frequencies. Neither inventory was completely unleamable within this architecture (unlike more extreme cases investigated in Joanisse and Seidenberg (1997), such as [i I e ae]). Instead, these results suggest that small differences in the distributions of vowels in inventories can result in processing difficulties in these networks. To further investigate how these inventories differ in their degree of suitability, fully-trained networks were also tested on their ability to generalize to novel stimuli. This is comparable to the type of task confronting an adult language user, who must recognize novel tokens of familiar words in the course of everyday language process ing. This was done by presenting the fully trained networks with new instances of vowels that were not used in the training set. To do this, five new vowels of each type were synthesized using the same parameters used for the training set. These were then presented to each network. Figure 1.5 illustrates the mean Sum Squared Error rate cal culated for the training vowels for each network, at 10,000-trial intervals. These results again indicate that networks trained on Language A had significantly better identifica tion rates than those trained on Language B, from 60,000 to 100,000 iterations. This was confirmed by pairwise t-tests indicating significant differences between the two network types at these five points during training (p<0.05 for each). The results of these simulations support the hypothesis that vowel systems are functionally optimized in ways that maximize the discriminability of their constituent 43 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0.7 • ----- More Attested O Less Attested 0.6 0.5 LD CO CO c CO ffi 5 0.4 0.3 0.2 0.1 0.0 10 Training Trials (x 10,000) Figure 1.5: Mean Sum Squared Error rates for two testing sets of 20 novel vowels (5 of each type), averaged across all networks trained on two pseudolanguages. Lower rates for Language A indicate less difficulty in identifying vowels in the testing set. Error bars indicate standard error of mean. vowels because of its effects on learning. In addition, these simulations suggest that accounting for these phenomena turns on examining stimuli that realistically represent variability in terms of production and overlap. 1.1.5 Experiment 2: Length and Quality Interactions So far I have only considered vowel inventory tendencies in terms of quality contrasts related to formant frequencies. However, many languages use other types of cues to contrast vowels. In this next experiment I consider how the cue of vowel length in teracts with spectral (formant-based) cues. Maddieson observes that “The probability of length being part of the vowel system increases with the number of vowel quality contrasts” (Maddieson, 1984, p. 129). As such, 12.5% of languages with 4 to 6 vowel 44 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. qualities and 24.7% of languages with 7-8 vowel qualities incorporate length contrasts in Maddieson’s UPSID database (Maddieson, 1984), compared to 53.8% of languages with 10 or more vowel qualities. Note however, these data are incomplete, due to the fact that the UPSID database does not consider contrastive length in cases where all vowel qualities participate in length contrasts. I propose that this pattern is not accidental, and that a more extensive survey of length in vowel systems will reveal a similar pattern for languages not included in the UPSID database. This pattern is attributed to the weak contrastiveness of dura tional cues for vowels, compared to spectral cues, owing to the degree of variability intrinsic to vowel length. Vowel length appears to be a useful cue in disambiguating various language contexts, for example in determining the voicing of an adjacent con sonant (Chen, 1970), rate of speech (Magen & Blumstein, 1993), lexical boundaries (Davis, Marslen-Wilson, & Gaskell, 1997), and stress position. Given this tendency for vowels to vary in duration, it is plausible that adding a durational contrast would be dispreferred. This is additionally supported by the observation that length contrasts are frequently observed to accompany close quality contrasts (e.g., /i:/ - / / is more frequent than /i/ — / / and /i:/ — /i/ Maddieson, 1984). This suggests that duration is a useful secondary cue to differentiating spectrally similar vowels, though on its own, it may prove to be less useful than a spectral contrast. Connectionist models provide a way to explore these phenomena, because of their ability to exploit multiple, simultaneous, probabilistic regularities in the service of learning to perform a task. The relative usefulness of cues can be assessed in terms of 45 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the model’s ability to reliably leam and use them. In the present simulations, connec tionist networks were trained to acquire vowel inventories that use spectral or length contrasts to different degrees. It was predicted that differences in how networks learned and generalized would reflect known facts about the occurrence of these cues in vowel systems of different sizes. 1.1.5.1 Method & Stimuli Networks were trained on one of three hypothetical vowel inventories that use length and spectral contrasts to different degrees. As Figure 1.6 shows, these inventories seek to double the number of contrasts in the familiar [i e a o u] set by adding spectral contrasts, length contrasts, or both. \ i(:) u(:) / I I % I \ e(:) o(:),' W Inv. 1) Inv. 2) t 7 - --------------------------------------» * i: u : • » T ' ' I \) » » i \ i ' / v / \a c : „ * y Inv. 3) Figure 1.6: Vowel inventories used in the three training sets. Given the hypothesized differences in the contrastiveness of length and spectral cues, it was predicted that networks would have more difficulty learning Inventory 1 46 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. compared to 2 and 3, because only spectral contrasts are used, and that Inventory 3 would be easier to acquire than Inventory 2 due to the interaction of duration and spec tral cues in maintaining contrast in such a relatively large and crowded inventory. This would be consistent with the observation that a contrast such as I'll — HI is dispreferred compared to /i/ — Id, though it might be more common than /i/ — I'v.l. Ultimately, /i:/ - HI seems to represent a good compromise for languages with a crowded vowel space. The architecture of the model was similar to those used in the previous simulations. The input consisted of 2040 spectral coefficients encoding the acoustic representation of a vowel. A total of 40 vowels of each type were obtained for training purposes. Training sets consisted of synthetic vowels, made to be highly realistic by using for mant means and variances drawn from observed data in the TIMIT database and data published in Beckman et al. (1995). Contrastive vowel length was simulated by creat ing long vowels with a mean duration 1.66 times longer than short vowels, and with a standard deviation of 0.5 for the long vowels, and 0.33 for the short vowels4. In this model, vowel length was implemented by varying the number of non-empty frames presented to the input layer, such that longer vowels had more ‘filled’ frames, and shorter vowels had more empty frames. Model training proceeded similarly to the previous experiment. The network’s task was to identify the vowel presented on the input by activating the appropriate output node. There were 10 output nodes for each network, each corresponding to a 4Variabilities are estimates drawn from recordings of Finnish speakers, where lengths o f vowels spoken in similar consonantal contexts — but differing prosodic contexts — were measured and compared. Since Finnish productively contrasts long and short vowels, it is an ideal case for assessing how a length contrast varies depending on phonetic context. The measurements represent a best guess at durational contrast and variability fora language with reliable length contrast, although languages will tend to vary along these parameters. 47 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. separate vowel identity; in cases where similar vowel qualities were differentiated by contrastive length, separate output nodes were used for the long and short versions of the two vowels. Three networks of each type were trained for 250,000 training trials. To compensate for the relatively simple nature of this task, 15 hidden units were used in each simulation. 1.1.5.2 Results Sum Squared Error means over the course of training are plotted for the three simu lations in Figure 1.7. These were obtained by calculating the mean SSE for training items the network was exposed to over the last 500-trials, at 500-trial intervals. The result is a running estimate of the degree to which the network is able to produce the correct outputs for items in the training set. These results indicate that networks trained on Inventory 3 learned the training set slightly better than those trained on Inventory 2, and much better than those trained on Inventory 1 . To further assess model performance, fully trained networks were tested on a set of 10 novel vowels of each type on which it was trained. This was done by extracting vowels the network had not been exposed to from the TIMIT database in the same way as described above. The spectral representations of each of these vowels was input to the network, and the resulting output was calculated. The vowel corresponding to the output node with the greatest activation was judged to be the ‘winner.’ Errors were scored when the actual winning vowel did not match the target output. The mean percent of correctly generalized items was 86.7% for Inventory 1, 62.3% for Inventory 48 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1 0 0.9 0.8 Inv 1: Qual Only — - Inv 2: Length Only — Inv 3: Length + Qual 0.7 Ui C O 0 6 C O c 05 0.4 03 0 2 0 1 5 0 250 200 10 100 150 Training Trials (x 1000) Figure 1.7: Mean Sum Squared Error rates for the three network types in this experi ment. Lower values indicate better performance. 2, and 91% for Inventory 3. A one-way ANOVA indicated a significant main effect of training language (F = 834.75 p < 0.0001). 1.1.5.3 Summary The present simulations support the hypothesis that contrastive vowel length represents a weaker cue than typical spectral differences for discriminating vowels, and affects languages’ tendency to recruit contrasts that rely solely on length distinctions. This explains the paucity of languages with smaller inventories (3-8 vowel quality contrasts) that use vowel length distinctions. And while frequency data are missing for languages in which all vowel qualities participate in length contrasts, these results predict that the facts should not be different for such languages. Finally, these results are consistent with observations that length contrasts tend to accompany smaller quality contrasts (e.g. /i:/ -/I/). 49 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1.1.6 Discussion In this section I have explored the idea that the vowel inventory preferences of the world’s languages result from their functional optimization. The simulations described here demonstrate the utility of connectionist networks in exploring this type of hypoth esis. The performance of these models is easy to assess, and allows us to directly com pare results to empirical facts about the world’s languages. Additional applications could include testing the contrastiveness of other vowel cues, such as diphthongization and nasality, or in assessing the role of discriminability in consonant inventory fre quencies. For example languages’ preference of velar and alveolar stops over palatal stops might also be a function of these phonemes’ discriminability. The results of these simulations serve to explain phonological patterns based on ar ticulatory, acoustic and computational constraints. The network discovers constraints in the course of learning to perform a task. In contrast to other approaches such as OT, constraints do not have to be specified in advance; they emerge in the course of acquisition given the nature of the architecture, the characteristics of the input, and the task being performed. On this account, the processing mechanism by which crosslin- guistic preferences evolve is generalization. Optimal systems are ones that promote better generalization. 1.2 Syllable Structure Theories of suprasegmental phonology are based on the observation that words are not merely sequences of phonemes, but are instead composed of hierarchically organized 50 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. units such as moras, syllables, feet and prosodic words (Selkirk, 1980). The com plexity of these structures, and the fact that all languages seem to be bound by similar constraints on them, might be interpreted to suggest that such abstract phonological units are components of an innate linguistic endowment. In this chapter, I explore an alternative theory of phonological structure, in which functional characteristics of speech perception and production constrain possible word forms, giving rise to abstract structures and the constraints that act upon them. There is an asymmetry in syllables, related to the status of onset and coda con sonants. Jakobson (1941/1962) observed that languages tend to impose stronger con straints on codas than they do on onsets. That is, many languages prohibit coda con sonants altogether (e.g., Hua, Fijian, Mazateco), and many others strongly limit which classes of consonants may occur in coda positions (e.g., Axininca Campa limits co das to nasals). In contrast, no language prohibits onsets and many require them (e.g., Klamath, Totonac, Arabic). This pattern has been captured in several different theoretical frameworks, most recently OT, which posits two general Markedness constraints governing the parsing of consonants in syllables: ONS states that syllables must have onsets, and NO-CODA prohibits codas. The degree to which a language enforces a strict CV syllable structure is reflected in the rankings of these two constraints relative to the class of Faithfulness constraints. A different type of explanation for the asymmetrical character of the syllable exists, however. It is possible that preferences for onsets over codas hinge on the acoustic characteristics of pre- and postvocalic consonants. For example, stop consonants have 51 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. arguably stronger acoustic cues if they immediately precede a vowel; likewise, it is possible that other classes of consonants can be more salient or discriminable in onsets, compared to codas. On this account, certain codas are dispreferred because the ability to reliably use them contrastively is deprecated compared to onsets (Krakow, 1999; Redford & Diehl, 1999; Wright, 2000; Steriade, 1998; Blevins, in press, among many others). I investigate this hypothesis by testing the degree to which the ability to discrimi nate consonant contrasts in onsets and codas differ. However, I also expand on this per ceptibility hypothesis by investigating the degree to which these acoustic constraints interact with constraints on speech production. As I will demonstrate, cognitive con straints on articulatory planning might also affect the structure of syllables, by ampli fying statistical preferences for certain syllable shapes in order to simplify the task of speech production. The hypothesis is that general constraints on the perception and production of speech interact to constrain how a language’s phonological patterns are learned, ultimately shaping sound patterns across languages. This type of explanation is similar to other functional accounts of phonological phenomena drawing upon perceptibility (Steriade, 1994; Lindblom, 1986; Flemming, 1995) and speech production (Browman & Goldstein, 1992; Hayes, 1997; Stevens, 1989), all of which suggest that the ability to accurately perceive or produce specific phonemes in specific contexts plays a role in determining at least some phonological processes. The present work addresses these types of issues within a somewhat differ ent framework. Connectionist simulations are used in order to examine the learning 52 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. and processing of phonological patterns. By teaching models to produce and perceive the relevant types of patterns, I test hypotheses about their comparative leamability. The first part of this section explores the effect of syllabic position on the acoustic cues of common consonant classes. A connectionist simulation was used to model the recognition of CV and VC syllables. It was hypothesized that if acoustic differences between consonants in CV and VC syllables have a real impact on learning, these differences would express themselves in the model’s performance. Next, I investigate how these acoustic constraints carry over to languages that have more permissive syllable structure. A relatively simple model of speech production is used to investigate how languages that allow many syllable forms exploit the afore mentioned preferences for less marked syllables. Statistical preferences to this effect are amplified by the model in order to better perform the difficult task of speech pro duction; in so doing the model develops behavior consistent with the notion of innate syllabic constraints. 1.2.1 Perceiving Syllables This section explores the hypothesis that preferences for onsets in syllables are due to acoustic factors. It is observed that the acoustic cues for some types of consonants are diminished in coda positions (Krakow, 1999). A survey of English reveals several instances of this, particularly in obstruents. For example, stops are always released in onsets, resulting in bursts. In contrast, stops in codas tend to be unreleased word- finally and before another stop, eliminating bursts as a useful acoustic cue in these 53 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. cases (Kent, Dembowski, & Lass, 1996). Voicing can also be more difficult to distin guish in coda stops, because of the tendency toward devoicing in non-prevocalic po sitions. Similarly, voiced fricatives can tend to be devoiced in coda positions (Denes, 1955), diminishing the contrastiveness of voicing distinctions in such cases. Finally, place of articulation cues for nasals are diminished postvocalically. Admittedly, some languages might not follow these phonetic tendencies. However, these processes seem to occur across many languages, suggesting that there are real constraints on the ability to produce certain acoustic distinctions in wordfinal and preconsonantal positions. To test whether such differences will affect a consonant’s suitability as an onset or coda, I trained two identical connectionist models to identify either CV or VC syllables drawn from actual speech samples. It was expected that the resulting differences in the models’ ability to leam to identify consonants in either onsets or codas would reflect preferences in languages. 1.2.1.1 Model Details A syllable recognition task was implemented in a feedforward connectionist network, illustrated in Figure 1.8. Training stimuli were obtained by extracting the desired sylla bles from the DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus of spoken English. The output of the model consisted of 28 nodes, each representing a different vowel or consonant. The model’s task was thus to map acoustic input to the identity of the phonemes making up this input. This task is similar to what is assumed to be oc curring when listeners attempt to map an acoustic signal to a higher-level phonological representation. It makes no commitment about the actual nature of such a higher-level 54 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. segmental representation (e.g., it does not take a strong stance on the issue of whether speakers represent words as sequences of phonemes or articulatory gestures). In addition, this model does not attempt to address issues related to whether higher- level phonological representations must be innately prespecified. However, it is no table that connectionist models have also been used to argue that such segmental representations are an emergent property of learning to map speech sounds to their articulatory analogues (Guenther, 1995; Plaut & Kello, 1999). The localist represen tations used on the output layer are a simplification of the task by which children infer phonological categories through the discovery of regularities in the mapping between auditory and articulatory events. ib) (d) (g) ip) i i ) ik) is) iS) i/.) ( f ) ' : -A-' '•n.- ' : ' J ' F) '•?) i)) i i) ( c ) if) (ac) (a) ( o ) ip) iyi) ip) output t OOOOOI ^ r e n acoustic input Figure 1.8: Network used in the first experiment. Two training sets were used. The first consisted of 2,661 CV syllables drawn from the TIMIT database. Consonants were stops (p, t, k, b, d, g), fricatives (f, , s, , v, 5, z), nasals (m, n) and liquids (1, r); vowels were from the set: (i, I, e, , a ae, , o, , u, o). There were 125 possible syllable types, representing most of the possible 55 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. combinations of CV syllables in these sets. Twenty-five tokens of each syllable type were selected at random, except when fewer instances were available; in those cases all possible instances were used. The frequency with which the network was presented each CV type was held constant such that no single vowel, consonant or CV type was presented to the network more frequently. The second training set was similar to the first, but consisted of 3,966 VC syllables from the TEMIT database, made up of the same consonant and vowel types as the CV set. Care was taken to include only VC sequences that did not occur before a vowel, which guaranteed that the only consonants used in this set were codas, and not onsets of the next syllable. The training set consisted of 25 tokens of each VC type, except in cases where there were insufficient tokens in the TIMIT database. Here again, the frequency with which networks were exposed to any given VC type was held constant, so that the network would be exposed to each consonant, vowel and VC type to a roughly equal degree. Training items were obtained by transforming raw waveforms into fourteen 8 ms frames of 96 spectral coefficients (bandwidth: 31.25 Hz, frequency range: 0-3000 Hz) using a Fourier transform. The result was a set of 1,344 spectral coefficients for each syllable in the training set. Preliminary testing indicated that these parameters were not optimal for learning fricatives, due to the low frequency cutoff. To offset this, a wider bandwidth (62.5 Hz) and frequency range (0-6000 Hz) were used for obtaining training data for syllables with fricatives. Networks were trained over 200,000 training trials. Training proceeded as fol lows: at the beginning of each training trial, a syllable was chosen at random from the 56 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. training set, and presented to the model. The network was presented with an equal number of each syllable type. Activation propagated forward and the resulting out put was computed. At this point, this output was compared to the target output. The the backpropagation algorithm was then used to adjust connection weights in order to minimize any disparity beteween the actual output and target output (Rumelhart et al., 1986). Other relevant network parameters: the learning rate was 0.01; the error radius was 0.01; initial weights were randomized with a range of -0.01 — 0.01. 1.2.1.2 Results and Discussion At the end of training, both network types were tested on their respective training sets, in order to determine the extent to which the models had learned the training phonemes. Testing was done by presenting each training item to a network, and deter mining which output vowel and consonant nodes had the greatest resulting activations. Errors were scored when the network produced an output for a consonant or vowel that did not match the target outputs for a given item. Figure 1.9 compares the model’s ability to recognize the different phoneme classes in the training sets. These results suggest that the model trained on CV syllables did have an advantage in learning obstruents and nasals; no advantage was apparent for liquids in onsets. The results of these simulations suggest that acoustic differences between certain consonants in pre- and postvocalic positions can have an impact on learning phonemic contrasts. Speech perception is a difficult task, due to the inherent variability and nois iness of the speech signal, and thus it is not unreasonable to expect that languages will 57 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1 0 0 0 ■ CV El VC S to p s Fricatives N a sa ls Liquids Phnnpme Class Figure 1.9: Results of the first experiment, indicating better identification of some consonant classes in CV positions, compared to their VC counterparts. develop preferences for phoneme sequences that maximize phoneme discriminability (e.g., prevocalic stops), even when other sequences are possible (e.g., non-prevocalic stops). This would appear to explain why it is that languages enforcing strict limita tions on phoneme sequences will tend to prefer more discriminable ones. Differences between the two sets were small, which is also consistent with empiri cal data indicating that coda consonants are not completely ruled out. The model data instead suggest that small acoustic differences can have an impact on the distribution of syllable structures across languages. I am not claiming that onsets are universally more suitable positions for all consonants. The present results suggest the possibility that some consonant types are less discriminable in codas than others. However, it is notable that many languages that place limitations on which consonants can occur in coda positions (such as Axininca Campa and Japanese) also tend to neutralize place contrasts in this position, again suggesting that discriminability plays a role in limit ing how some languages use codas. This theory also leaves open the possibility that 58 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. some classes of consonants might be better suited to post-vocalic positions, again due to functional constraints. For instance, Steriade (1994) has observed that some Aus- tronesian languages show preferences for retroflexes in coda positions, but argues that this is again because of the relative discriminability of such consonants in onset and coda positions. Nevertheless, because obstruents are the most highly attested conso nant type across languages (Maddieson, 1984), this is arguably the source of the more generally-attested preference for onsets over codas. 1.2.2 Producing Speech Many languages do not have strict CV syllable structure and instead allow both on sets and codas. Nevertheless, speakers of these languages show evidence of the same types of syllabic constraints discussed above. For example, given the choice between the syllabic parses V.CVC and VC.VC, speakers will tend to choose the former over the latter, even in languages in which both CVC and VC are admissible syllable types (Hammond, 1999). Moreover, speakers of such languages also seem to have knowl edge of hierarchical syllabic information such as syllable weight that allows them to assign stress or construct Feet and Prosodic Words. Data of this type suggest that acoustics alone are not sufficient to explain how lan guage users acquire syllable structure. That is, even speakers of languages that are not strongly constrained in the choice of onsets over codas will nevertheless show knowledge of suprasegmental structures. Since both CV and VC are legal syllables in such languages, it is unclear how a learner of these languages will acquire a pref erence for one over the other. One possibility is that language learners have access 59 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. to sufficient articulatory and auditory information to learn such constraints (Boersma, 1998a). However, it could also be argued that the quality of this information might not be sufficiently rich as to promote the rapid acquisition of such abstract constraints. In this section, I present an alternative account of how these more complex types of phenomena might arise within a functional theory of phonology. On this account, abstract syllabic behavior is explained as the result of the statistical learning mecha nism that is being used to acquire languages. Lexical preferences toward certain syl lable shapes are acknowledged to be the result of the interaction between constraints on articulatory planning and perception. However the actual constraints that govern syllable-driven behavior are learned from statistics in the lexicon. A similar argument has been made in the sentence comprehension literature. St. John and Gemsbacher (1998) argue that speakers find certain syntactic structures more difficult to process not because they necessarily involve more complex manipulations of basic phrase structure, but because they tend to occur less frequently in everyday use. Differences in frequency can then lead to constraints against certain phrase struc tures in sentence comprehension. I propose that a major catalyst to learning phonotactic constraints from lexical statistics is the cognitive planning involved in speech production. I develop a connec- tionist model that simulates the process of articulatory planning by learning to produce ordered sequences of phonemes. The model was trained on a corpus of English sylla bles of various forms, and then tested in order to ascertain how well it had learned to produce the types of syllable shapes in the training corpus. 60 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. An important observation about English is that while it allows a variety of sylla ble shapes, there is a definite preference for syllables beginning with consonants, over those beginning with vowels. For example, a count of syllable types in the CELEX database shows that 45% of English words listed contain at least one CV syllable, while only 15% of words contain at least one VC syllable. This tendency appears to hold in other languages as well: 47% of the Dutch words in CELEX contained CV syllables, compared to 14% with VC syllables. As it turns out, this is not a new observation; Trubetzkoy (1939/1969) presented a survey of onsetless and codaless monosyllables in German and Czech and observed a similar set of facts. What is interesting is that theories of symbolic grammar have tended to ignore these types of statistical regulari ties, possibly because there is no clear need to account for them on this type of theory. Deterministic grammars turn on observations of categorical phenomena (Language X prohibits codas or Language Y neutralizes place contrasts in codas), and do not need to concern themselves with how statistical trends participate in learning or encoding deterministic grammars (an interesting exception is Boersma & Hayes, 1999). The present simulation investigated the possibility that statistical trends toward consonant-initial syllables can in fact affect how speech production is acquired. It was predicted that the nature of the production task required the network to discover more abstract characteristics of syllables - such as preferences for syllables with onsets, and principles of rising and falling sonority — in order to fulfill the demands of the task. 61 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1.2.2.1 Model Details The cognitive aspects of speech production were simulated by training a connectionist model to produce individual syllables as series of consecutive phonemes. This was done by presenting a recurrent network with a set of phonemes as input (e.g., /bisk/), and training it to output each phoneme in turn over a series of discrete time steps (e.g., [b] - [1] - [ae] - [k]). The model architecture is presented in Figure 1.10. The present implementation assumes that speech production involves organizing sequences of discrete phonological units; in the present case, this is represented as sequences of individual phonemes that do not overlap temporally. The use of multiple hidden layers largely reflects characteristics of the task that the network needs to learn. The network outputs words as temporal sequences of phonemes. The dynamics of this layer are achieved thanks to a hidden layer with another layer of units connected to and from it. The network passes activation between these two layers, allowing the output representation to unfold over time. In contrast, the network’s input is a static representation of a word; it does not change over the course of a single trial. The ‘static’ hidden layer allows the network to encode a hidden layer representation of a word that remains static over the course of a trial since the only input that it receives is from a static input. Sequential phoneme production was simulated as follows: the network was pre sented with the all the phonemes in the word as an input, using a feature-based phone mic representation scheme. A CCVCC word frame was used, which aligned con sonants and vowels into specific ‘slots.’ Each single-phoneme slot was made up of phonological features, such that each phoneme was expressed as a vector of 18 binary 62 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Output Layer Recurrent Hidden Layer Static Hidden Layer ooocco coooco ooooco Sequence o f Phonemes | Group of Phonemes Input Layer Figure 1.10: Network used to simulate the planning of speech production. The model was presented with group of phonemes as input, and learned to produce the word as a sequence of phonemes on the output. bits representing ‘V and features.5 Empty slots were represented as a vector of 18 zeroes. An important consequence of this representational scheme was that a given phoneme did not vary in its featural representation based on syllabic position; a /p/ in an onset was represented the same way as a /p/ in a coda. The output representation also used feature-based phonemes, but encoded the phonemes in a word as temporal sequences, rather that in a static set of slots. This was accomplished by presenting the network with an input word and allowing activa tion to propagate throughout the network for 13 time steps. Each output phoneme was activated for two time steps, starting at the third time step (the earliest point at which input activation could propagate to the output layer). Subsequent phonemes were acti vated at time steps 5, 7, 9, 11 and 13. For example, the word band was represented on the input as [_ b aen d _]; the output was Pol at time 3, /ae/ at time 5, /n/ at time 7, and /d/ at time 9. Outputs for any remaining time steps were set to 0. 5voiced, voiceless, consonantal, vocalic, obstruent, sonorant, lateral, continuant, non-continuant, ATR, nasal, labial, coronal, anterior, high, distributed, dorsal, radical. 63 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The network was trained using the backpropagation through time algorithm (Williams & Peng, 1990), which adjusted connection weights after the forward phase of each training trial. Backprop through time is a variant on the standard backpropa gation algorithm, but allows networks to learn a trajectory of outputs over time, rather than a single output vector for any given input (as is the case in standard backprop). The training set consisted of 3,122 monosyllabic English words drawn from Web ster’s New World Dictionary. The training probability of each word was weighted based on the word’s log-transformed frequency in the Wall Street Journal corpus (Mar cus, Santorini, & Marcinkiewicz, 1993). As a result, the network learned to produce a representative set of English words, with a frequency distribution similar to what English speakers produce. The way in which speech production was modeled in this experiment was to some degree unrealistic, given that the articulatory gestures involved in producing a speech sound might vary across contexts; the phonemic scheme used here did not implement contextual effects, and instead held a phoneme’s featural representation constant across all contexts. However, this scheme was faithful to the generalization that speakers rep resent abstract segments in a way that is insensitive to its position in a syllable. This is important because the point to this model was not to simulate how the organization of speech gestures varies across syllabic contexts. Instead, the goal was to investigate how planning to produce certain sequences of phonemes is more difficult due to com putational limitations germane to models of sequential behavior. It is worth noting that more sophisticated models of speech production have been developed in which the ba sic units of speech are not assumed to be discrete segments subdivided into features. 64 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. For example, the Browman and Goldstein (1992) model of articulatory phonology im plements speech production as sequences of speech gestures that are combined over time into an articulatory score. In this articulatory score, there are no discrete bound aries between groupings of gestures; the very nature of speech production requires that gestures of a single ‘phoneme’ will tend to be offset from each other, and the gestures of two discrete ‘phonemes’ can overlap as in the case of coarticulation. (The term phoneme is applied loosely here, since no discrete notion of a phoneme is assumed in their approach). While the Browman and Goldstein (1992) model is much more realistic than the one used here, I argue that the basic assumptions of the current model carry over nicely to their model. First, both consider speech production to be the planning of temporal sequences of articulatory events. Second, both acknowledge that the cognitive mech anism underlying articulatory planning will have a major influence on how well the task can be carried out. As such there should be savings in learning similar articula tory plans, compared to dissimilar ones. For example, learning to produce [bo] and [pa] will promote learning [ba] more than it will [ab], because of the importance of timing in the representation of both articulatory scores and phoneme sequences. 1.2.2.2 Results and Discussion The network was tested on word production by presenting it with the input pattern of a word and comparing the resulting output trajectory to the target output. At each time step, the phoneme corresponding to the network’s output was determined by compar ing the output to the feature vector of each possible English phoneme and finding the 65 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. closest match using a Euclidean Distance metric. Errors were scored when one or more of the output phonemes of an input word failed to match the target output. At asymptote (200,000 training trials), the network had learned to accurately pro duce each word in the training set. The model was also tested before perfect perfor mance was reached (after 50,000 training trials) in order to determine the types of syllables the model had more difficulty learning to produce. This was done by present ing the network with all the words in the training set, and noting those words on which the network produced errors (such as incorrect or missing phonemes). Figure 1.11 plots the network’s mean performance on several syllable shapes used in the training set. 100 Word Shape Figure 1.11: Results of the second experiment, indicating an asymmetry in the model’s ability to produce consonant and vowel initial syllables. These results indicate that the model was not performing identically on all syllable forms. Instead, it appears that the network was better able to produce syllables with 66 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (C)CV shapes compared to their VC(C) counterparts. This appears to be due to statis tical preferences for consonant-initial syllables in the training set; because the task of producing sequences of phonemes was somewhat difficult for the model, it developed internal representations of syllabic structure that helped it to better perform the task. For example, the model carried with it specific expectations about whether syllables tend to have onsets. The result is that the model performed more poorly on syllables that did not have onsets. This is in spite of the fact that the model did not have direct access to the perceptual characteristics of the words it was learning. The phonemic representation used in this simulation meant that there was no greater intrinsic ‘dif ficulty’ involved in producing a consonant in any specific position. Syllabic position preferences and dispreferences derived instead from the statistical properties of the vocabulary the network was trained on. I argue that these statistical properties are no accident, and are due to the acoustic differences between onset and coda cosonants captured in § 1.2.1. 1.2.3 Summary: Syllable Typologies In this section I have explored how functional factors can help to explain certain facts about syllables. It raises the possibility that general principles of learning interact with facts about speech perception and production planning to yield complex linguistic be havior. Connectionist networks are well-suited to investigating this type of hypothesis, because they allow the researcher to implement tasks of perception and production within a general learning system. Model performance can then be readily assessed and compared to empirical data. 67 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. This section has dealt with a very small aspect of syllable structure, namely the ten dency for languages to prefer onsets and to disprefer codas. It is proposed that acoustic factors tend to favor the placement of some consonants in prevocalic positions, partic ularly obstruents. This tendency is not absolute - most languages do not prohibit coda consonants altogether. However, it does yield statistical preferences in lexicons, such that CV syllables will occur more frequently in words than their VC counterparts. This statistical preference is amplified by cognitive processes, because it helps the network to better perform the difficult task of speech production. The result is that languages either prefer onsets over codas to varying degrees, or disallow codas altogether. This account makes an important point about how constraints are learned. Children do not need to have direct access to the sources of these constraints in order to leam them. Instead, learning can also be promoted by the statistical properties of the input to which children are exposed. These statistics can also reflect the existence of these constraints. The production model that I present might also account for how syllable structures are used in the context of building other abstract phonological categories such as feet and prosodic words. For example, planning to produce sequences of more complex syllables might be more easily implemented if the sequences tend to be more pre dictable. This tends to be the case in quantity-sensitive languages, where the relative weight of a syllable can be predicted based on what is known about the word’s stress pattern and the weight of the preceeding or following syllables. Planning to produce a variety of syllable shapes might be easier if general constraints are imposed on the 68 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. order in which these syllable shapes will tend to occur (e.g., the canonical L-H of iambs). 1.3 Conclusions The functional framework adopted in this dissertation holds that phonological patterns can be transmitted lexically, rather than genetically. That is, it hypothesizes that knowl edge of abstract phonological structure is not always necessary for learning phonolog ical principles. This is because the vocabularies of languages are shaped by phonetic and cognitive constraints, and therefore tend to contain important clues as to the nature of the underlying phonological system. This chapter investigated how this theory can help explain typological data about the phonological systems of the world’s languages related to phoneme inventories and more abstract suprasegmental units such as syllable onsets and codas. The primary goal was not to address the entire range of crosslinguistic data pertaining to phonolog ical systems, but instead to develop a framework within which functional explanations for these data could be implemented and tested. The results presented in this chapter suggest that connectionist models of language tasks such as phoneme perception, syl lable perception, and word production are useful to this end because they provide a working model within which specific hypotheses can be tested. Results can be directly compared with empirical data, and predictions for future research can be drawn. 69 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The OT account mentioned in the introduction differs from the present account, in that OT emphasizes the actions and interactions of constraints such as M a x - CONTRAST in vowel inventories (Flemming, 1997) and O ns and N o - C o d a in sylla ble theory (Prince & Smolensky, 1993). Here the emphasis has been placed on spec ifying the source of these constraints, and the types of cognitive mechanisms giving rise to them. I propose that the present work should be viewed as complementary, rather than contradictory to the more symbolic OT approach. I further suggest that an important component of constraints-based theories of language must be the search for an understanding of where phonological constraints come from, an argument that is further explored in Chapter 3. 70 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 2 Phonological Acquisition in Dutch In this chapter, the framework of Connectionist Phonology is applied to the issues of the acquisition and representation of phonology.1 I examine how acquisition data has previously been treated within Generative theories including rule learning, parameter setting (in the theory of Principles and Parameters) and constraint ranking (in OT), and suggest important ways in which the connectionist approach can be useful for understanding these data. I use facts about Dutch stress assignment as a test case for how the Connectionist framework explains data related to phonological acquisition and representation. The primary data focus on two sets of observations about Dutch stress. The first concerns how children learn Dutch stress, and focuses on the types of errors that these children produce as they progress through different stages of acquisition. The second set of data concerns the considerable amount of irregularity that exists in Dutch stress assignment, something that I argue is characteristic of many apparently-productive linguistic systems, but that represents a serious problem for learning in Generative 'Portions of this Chapter also appear in (Joanisse & Curtin, 1999). 71 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. systems. Both these areas of inquiry raise important issues of how language can be learned, and how it is mentally represented. Dutch children acquire stress in what appears to be a specific, stage-like way. Their productions indicate that are using complex underlying representations of phonology as they learn words. Accounting for these facts represents a major challenge to the Connectionist approach to phonology which claims that a considerable proportion of speakers’ underlying knowledge of phonological structure derives from the language in their environment, given the Generative claim that such phenomena derive from innate and deterministic learning strategies. 2.1 Dutch Stress Acquisition: Empirical Evidence Children’s acquisition of language is often described as stage-like. For example, En glish speaking children acquire irregular morphological forms in what is often de scribed as a U-shaped learning curve (Brown, 1973), committing overregularization errors (e.g., *taked) on forms they have previously produced correctly, and then ap parently re-leaming these forms. It has been argued that such behavior underlines the fact that children do not learn language through imitation and memorization, but are instead employing innate language acquisition mechanisms to acquire categorical lin guistic rules (e.g., Marcus et al., 1992). Phonological acquisition in Dutch represents a similar type of case, which the present work examines by focusing on the errors children commit as they acquire main word stress in Dutch. These errors might re veal important facts about the kinds of learning mechanisms that are responsible for learning the phonological systems of language. 72 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Dutch stress patterns can be summarized as follows: main word stress is quantity- sensitive, meaning that heavy and superheavy syllables attract stress. In addition, stress tends to fall toward the rightmost edge of a word, usually on one of the last three syllables. There are several other important generalizations about stress in Dutch: (1) a. Syllables containing schwas are not stressed, and stress is typically as signed to the left of a schwa-syllable. b. The antepenult typically cannot be stressed if the penult is closed (-VC) or contains a diphthong. c. Words with final superheavy syllables or diphthongs have final stress. d. Words with open final syllables have penultimate stress. Several accounts have been developed to explain these facts (Booij, 1995; Gussen- hoven, 2000; Kager, 1989; Hulst, 1984), all of which utilize suprasegmental units, no tions of syllabic weight, and generalizations about trochaic languages (of which Dutch is an example, Hayes, 1991). However, these explanations leave out a few important facts. First, many Dutch words do not fit this set of rules, and instead have irregu lar stress. For example the city name Amsterdam receives final stress /ams-tor- dam/, even though most other trisyllabic words with a penultimate schwa receive antepenul timate (initial) stress. Irregular stress is not limited to any single class of words (e.g., loan words, city names, words ending in -dam), indicating that many of these types of words have no special status in the language beyond their irregular stress patterns. 73 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2.1.1 Stress In Dutch Children Fikkert (1994) investigated the acquisition of stress in 12 Dutch children, ages 1;0 — 2; 11. Her study used a longitudinal design in which words of various lengths and stress patterns were elicited from children in a somewhat structured environment. By recording each child’s speech in 2-week intervals, over the course of several months, Fikkert acquired a large corpus of Dutch children’s utterances. A primary finding of her research was the observation that all the children in her sample seemed to be producing similar types of errors as they acquired words. These errors also seemed to follow a stage-like pattern, such that children tended to produce clusters of one error type at a given point of development. Fikkert described these stages as follows:2 (2) Stage 1 : Children were initially unable to accurately produce polysyllabic words, and instead tended to truncate words. This typically involved pro ducing only the final syllable (e.g., /b 1 n/ (balloon) — > [ 1 n]). Stage 2: As children began producing two- and three-syllable words, they tended to stress the initial syllable of a word, producing errors on words that should receive non-initial stress (e.g., /b 1 n /-» [ bo:m ]). Stage 3: Words were produced with level stress, such that more than one sylla ble seemed to be receiving main word stress (e.g.,/b I n / — >[band n]). Stage 4: Children used adult-like stress patterns, although some phonemic er rors occasionally still occurred (e.g., /b 1 n/ — ► [ba: 1 n]). 2Throughout this paper, I use the convention of stating adult forms in slashes (/ f) and child forms in square brackets ([ ]). 74 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. There are two interesting nuances to these observations. First, children at Stage 3 did not produce every syllable in longer words with equal stress. Instead, Fikkert ob served children dividing polysyllabic words into two equally stressed Feet, and assign ing stress accordingly: /burdo r i/ — * ■ [ b jo j i] {farm)? Second, children still produced some stress errors at Stage 4, specifically Stage 2-type errors in words with non-initial stress. For example, one child at Stage 4 produced the word /k pi t in/ {captain) as [pa:pit in]. It is clear that Dutch children do not produce these types of errors through simple imitation, since they are not likely to be exposed to words with stress on the wrong syllable, and they almost certainly have never heard words with main word stress on two different syllables. Instead, the child’s errors seem to follow what seems on the surface to be an arbitrary path, suggesting deeper principles are at play. Fikkert’s account of this stage-like behavior is that it is the result of the child acquiring the use of abstract phonological knowledge. On her account, Dutch children call upon an increasingly complex set of prosodic units as they acquire stress, following the prosodic hierarchy first proposed by Selkirk (1980). Fikkert suggests that children begin acquiring main word stress using a bare minimum {CW)a template, and progress through a quantity insensitive binary foot (a a )F (Stage 3), until arriving at a final 3A note on these transcriptions: in many cases, the transcribed adult forms in Fikkert (1994) dis agree with the ones found in other primary sources used in this chapter. This problem seems to be limited to vowel quality and quantity, and is due in part to the tendency for these to vary across speak ers o f different regional dialects. However, the actual length of a vowel is considered important to a theoretical characterization o f Dutch stress, and so some o f the transcription disagreements are due to disagreements about whether certain phonetically short vowels are underlyingly long. In any case, I have tried to remain neutral to this debate by using a single primary source for adult forms (CELEX, Baayen, Piepenbrock, & van Rijn, 1993), and stating child forms exactly as they appear in Fikkert. I acknowledge that CELEX transcriptions might still differ from actual Dutch speakers’ judgments, and apologize in advance for any violations of these intuitions. I am deeply grateful to Paul Boersma for lengthy discussion on this matter. 75 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. stage in which their word template is a single stress-bearing prosodic word containing several feet (Stage 4). In short, this stage-like behavior is an overt manifestation of the child learning to use increasingly more sophisticated aspects of the prosodic hierarchy. 2.1.2 A Closer Look at the Data While the facts in (2) seem to suggest a uniform set of stages that Dutch children progress through, a closer investigation of these data indicate a few complications. First, stages of acquisition tended to overlap within individual children. As Fikkert herself acknowledges, the children in her study produced errors from more than one stage during a single session. For example, Table 2.1 illustrates how the stages over lapped in one of these children. At I; 10.25 Catootje frequently produced errors consis tent with Stage 2 ( pipa) and Stage 3 ( bo: na:n). This pattern occurred frequently: all children in the study produced errors from more than one stage during a single session. Table 2.1 also illustrates a second trend in these children, the tendency to regress from later to earlier stages from one session to the next. For example, at 1 ; 10.25 Catootje produced mostly Stage 3 utterances, but then produced words consistent with Stage 2 at 1:11.10. A second question about these data is whether a stage-like pattern was also exhib ited for words taking irregular stress. Fikkert did not analyze any of the child data with respect to irregular forms, so it is difficult to determine whether the stage-like behavior that the children exhibited extended to such words. However, I suggest that any account of Dutch stress acquisition should be able to explain these data. 76 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 2.1: A sample of Catootje’s Stage 2 and Stage 3 errors (in Fikkert 1994). Age Stage 2 Stage 3 1; 10.11 ko n in — > k na: 1; 10.11 x it ar — ► hi:ta: 1; 10.11 olif nt — ► o:ma: 1; 10.25 pa pir— ► pipa 1; 10.25 b 1 n -* bo: na:n 1; 10.25 b 1 n — > bo: noen 1; 10.25 ko n in — ► ko n in 1; 10.25 ko n in — ► ko: n 1; 11.10 ba nan — * ■ b ja:n 1; 11.10 olif nt — * ogam These should not be construed as specific criticisms of Fikkert (1994); both of these issues apply to most language acquisition studies in the Generative tradition. For example, many other linguistic phenomena have been characterized as stage-like (Marcus et al., 1992), though it is also clear that there is considerable overlap among these ‘stages’ (Plunkett & Marchman, 1993). Similarly, accounts of rule, parameter and constraint learning abound in the Generative literature, although questions of how exceptional cases are learned — and the difficulties that they pose for any theory of ac quisition — are scarce. The use of idealized data is largely responsible for these facts. A major assumption in the Generative tradition is that linguistic theory needs to contend exclusively with linguistic ‘competence’, the knowledge of regular grammatical pro cesses, while abstracting away from issues of ‘performance’ (Chomsky, 1965). Since exceptional cases are not considered part of linguistic ‘competence,’ discussions of how core grammar is learned or represented are not applicable to them. 77 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A notable exception to this issue has been the case of English past tense learning, which has received thorough treatments from theory-neutral, Generative and Connec tionist viewpoints (Bybee & Slobin, 1982; Marcus et al„ 1992; McClelland et al., 1986; Plunkett & Marchman, 1993). The data are considered more closely in Chap ter 4; however the debate over past tense verbs centers around the status of irregular past tenses (TAKE - TOOK, EAT - ATE) in the grammar; some accounts see irregulars as extra-grammatical, and subject to a different set of mental operations than regular past tenses (Pinker, 1999). In contrast, the connectionist viewpoint has sought to merge irregulars within the same mechanism as irregulars by implementing past tense learn ing within a single neural network model (McClelland et al., 1986). The debate over these data has motivated a closer investigation of the entire range of speakers’ knowl edge of past tense (e.g., whether children and adults tend to overgeneralize regular and irregular patterns), and the empirical facts that the theories are based on (e.g., whether data in Brown (1973) and elsewhere support a theory of U-shaped learning). Overall, the debate over past tense learning has both stimulated a better understanding of how children learn a specific aspect of language, and has advanced our understanding of the types of theories that could be used to account for these data. In the next section I review two types of generative accounts of Dutch stress acqui sition, with regard to the degree to which they can account for the additional aspects of the data that are noted above. 78 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2.1.3 Generative Accounts o f Stages Fikkert calls upon a Parameter Setting learning theory in her account of Dutch stress acquisition (Dresher & Kaye, 1990; Wexler & Cullicover, 1980). In this theory, the space of possible phonological systems is limited by the parameters governing lan guages’ stress systems. This places strong limits on the possible hypotheses that a language learner can posit, and thus guides how the learner searches through the space of possible languages (Clark, 1992). Although theories differ in their exact character ization of these parameters, there is general agreement that they include parameters targeting the headedness of the prosodic word, foot shape, foot construction and de footing (for discussion, see Hammond, 1990). Parameter setting is not the only type of formal theory that could account for the stage-like nature of Dutch stress acquisition. Curtin (1999) presents one possible sce nario in an OT93 framework, which illustrates how strict constraint ranking and re ranking might account for the stage-like way in which Dutch stress is acquired. The general learning procedure in all OT-based theories involves algorithmically re-ranking constraints based on surface forms perceived in the environment through constraint demotion (Tesar & Smolensky, 1996). Different behaviors emerge as these rankings change from their unmarked (default) rankings to a final stage. Curtin presents an anal ysis in which phonological units such as feet and prosodic words are embodied within constraints. The model learns the Dutch stress rule by ranking these constraints, and produces stage-like errors as a result of incomplete and incorrect constraint rankings. 79 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2.1.4 Critique o f OT93 and Param eter Setting accounts The motivation for pursuing a connectionist approach comes from the apparent dif ficulties that the Parameter Setting and OT approaches have in accounting the more problematic facts discussed in section 2.1.2. My goal in this chapter is to suggest what types of changes would be necessary for the OT approach to account for Dutch stress patterns, and Dutch stress acquisition. Given this, I will discuss these issues from the perspective OT93. For instance, Curtin (1999) is careful to point out that her account is only an incomplete model of Dutch stress acquisition, primarily because it is developed within a strict OT93 framework. Thus, at each point in development the OT93 model produces two types of develop mental patterns: errors that are discretely stage-like, such that all outputs appear to belong to a single stage, and transitional periods indicated by random alternations be tween two adjacent stages (e.g., half Stage 2 errors, half Stage 3 errors). In later work Curtin develops a more advanced model of these and other related data in phonological acquisition (Curtin, 2000). I return to this in the next chapter, where I consider how specific types of modifications to the OT93 approach could lead to better accounts of these data. In the meantime, the earlier OT93 account is used to underline what I see as some shortcomings in the original OT93 approach. The first difficulty is that it is hard to reconcile the overlap among stages within the Parameter setting and OT93 frameworks. These theories contend that children’s grammars are being adjusted independently of their acquisition of lexical items. That is, words and grammar are separate symbolic entities on this account. Such theories predict that any change to the grammar - for example, re-ranking a constraint such 80 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. as M o r a i c T r o c h e e - should have a uniform effect on all words that the child knows.4 This contrasts with the fact that stages of language acquisition do not tend to be circumscribed, as Dutch acquisition data indicates. Instead, children produce utterances consistent with several stages at once, and can regress to earlier stages from one point in time to the next. Curtin (1999) discusses how this issue might be addressed in OT by thinking of between-stage behavior as representing points at which the relative rankings of some constraints is indeterminate. Thus, if two critical constraints have not yet established a ranking relative to one another, multiple outputs could be expected. Less plausible per haps, are cases in which transitions between stages involve demoting an already ranked constraint. In such cases, the oscillation between stages would have to be the result of ‘un-demoting’ and ‘re-demoting’ a constraint. However, this type of behavior is not allowed for in the standard Tesar and Smolensky (1996) constraint ranking learning algorithm, which only admits constraint demotion. Moreover, it is unclear why this behavior would need to occur in the Tesar and Smolensky (1996) learner at all, given that this learning model does not require more than one exposure to a critical form in order to discover the correct ranking. Demotion is usually presented as a major benefit of this learning theory, because of how it constrains learning. In any event, it seems clear that modifications to the constraint ranking scheme and learning mechanism in OT could help bring it into line with recalcitrant acquisition data like these. However, this could have important consequences to OT theories, due to the fact that it weakens the theoretical claim that strict ranking predicts all possible grammars, and no others. 4A notable exception would be irregular forms, which are themselves considered to be lexicalized in some generative accounts (Pinker & Prince, 1988, etseq.). 81 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. As I mentioned in the previous section, there is also a second complication that all generative accounts of Dutch stress must contend with: the fact that not every word in Dutch has regular stress. Dutch children are exposed to both regular and irregular forms, and there is no clear information that would help them discriminate regular cases from the exceptions without first learning what the regular pattern is. This then raises the question of how Generative learners are able to leam deterministic principles such as the ones stated in (1). Linguists have proposed two solutions to this problem. On the first account, irregu lar cases are treated as outside the scope of rules, and are instead explicitly memorized (Aronoff, 1976; Bauer, 1983; Pinker & Prince, 1988; Pinker, 1991). This allows the remaining regular cases to be accounted for using a relatively simple set of determin istic principles (e.g., rules, parameters, constraints). The second way of accounting for such data is to incorporate both regular and irregular cases within the same framework. This has been implemented using elaborate sets of rules that vary in their specificity (Albright, 1998; Chomsky & Halle, 1968; Halle & Mohanan, 1985), with some suc cess. The connectionist framework suggests a third alternative: rules and their analogues are in fact not the ideal framework for understanding linguistic processes. Instead, cognitive behavior is best understood from the subsymbolic perspective, where prob abilistic factors such as similarity and frequency interact in complex ways to give rise to seemingly symbolic behavior (McClelland et al., 1986; MacDonald, Pearlmutter, & Seidenberg, 1994). This approach has several advantages over the Pinker & Prince dual-mechanism model. One relevant advantage is that it does not draw a categorical 82 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. distinction between regular and irregular cases in language. As I discuss below, Dutch stress presents several pieces of evidence indicating that this is a false dichotomy, and that there are intermediate cases that are difficult to account for in a dual-mechanism model. This framework has a particularly relevant benefit to Dutch, since these models of language do not treat the lexicon and grammar as separate entities, and so the deriva tional principle that a grammar acts upon the set of lexical entries in a language be comes a non-issue. Words can behave differently, depending on a range of factors including their frequency or their similarity to other patterns. In the case of Dutch stress acquisition this means that, as the network leams the principles underlying Dutch stress, it can potentially commit different types of errors at a single point in training. This is because the model’s output for any given word is influenced by a number of fac tors, including a word’s frequency, phonological complexity (for instance, the number of syllables and phonemes) and its similarity to other known forms. 2.2 A Connectionist Model of Dutch Stress In this section, I develop a model of word production that is used to simulate the ac quisition of stress in Dutch children. The purpose of this model is to gain a better understanding of the issues raised in the previous section, related to stagelike acquisi tion of stress patterns, and how regular and exceptional stress might be encoded within the same mechanism. 83 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2.2.1 M odel Overview To simulate how Dutch children leam to produce stress, a connectionist model was trained to produce actual Dutch word forms, as sequences of consecutive syllables. This was done by presenting a recurrent network with a set of phonemes as input (e.g., / olif nt/), and training it to output each phoneme in turn over a series of discrete time steps (e.g., [ o] - [li] - [f nt]). The details of how this was done are explained below. First, the motivations for using this particular model architecture should be explained. The purpose of this model was to examine the extent to which phenomena related to Dutch stress acquisition can be better understood from the perspective of Connectionist Phonology. The current model focuses only on processes related to word production because the data I address deal specifically with children’s productions of words. Because this model was not intended to represent a complete account of phonological acquisition, other types of linguistic tasks (such as word recognition and sentence production) were not included in this implementation. Instead, its task was to map a structurally impoverished input of phonemes to a series of syllables, and to determine which syllable was stressed. The network architecture is presented in Figure 2.1. The model received an in put that represented a word that had to be produced on the output. Input words were represented as groups of phonemes that made up that specific word; for example, the word ballon would be [balon]. Each phoneme was itself represented as a vector of 18 binary features corresponding to distinctive phonological features 5. The input layer consisted of 19 single-phoneme slots, and each word was left-aligned with these slots 5voiced, voiceless, consonantal, vocalic, obstruent, sonorant, lateral, continuant, non-continuant, ATR, nasal, labial, coronal, anterior, high, distributed, dorsal, radical. 84 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. such that the first slot always corresponded to the first phoneme in the word, and so on. For example, the word navigator would be represented as naavii aat r______ , where underscores represent empty phoneme slots encoded as a sequence of 18 zeros. The choices of using discrete phonemes, as well as the specific features that were used, should not be interpreted as a commitment to a specific phonological framework. The representations that were used merely reflect the need for a consis tent mechanism for encoding the phonological content of words in a way that reflects the general degrees of similarity among different phonemes. It is proposed that any reasonable representational scheme should yield similar results to the ones presented here, given that the facts I wish to account for are only minimally related to issues of segments and features. The model’s representational scheme does preserve some types of information that are important to determining stress. These include including vowel quality, since vowels vary in their tendency to be stressed, and the syllabic posi tion of consonants, since coda consonants bear syllabic weight whereas onsets do not. However, the model does not explicitly encode other types of generalizations about syllables, including structural information about the cohesion of coda consonants and vowels into rimes, which I assumed can be learned from statistical generalizations on the input. The model’s task was to determine the word’s syllabic structure and stress pattern based only on its phonological structure. For this reason, it did not receive any infor mation about either of these on the input. Instead, this information was to be produced as the network’s output. The output layer consisted of a single CCVVCC syllable 85 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ' o : > l i : > fant ( Sequence o f Syllables Output ^ (with stress) • ■ ’ : ' 0 ' 0 . 0 ' • r t Group o f P h o n em es) Input ooliif ant (no Stress) Figure 2.1: Network used to simulate word production in Dutch. The model was presented with groups of phonemes as input, and learned to produce the word as a sequence of syllables on the output. frame, made up of 6 phoneme slots identical to those on the input, within which in dividual syllables of words could fit.6 The model’s recurrent architecture allowed it to produce sequences of output patterns for any given input. This in turn allowed us to model some of the dynamic aspects of word production, by training the network to produce sequences of syllables over several time steps. For example, the word navi gator would be produced as the syllables [_naa ], [_vii ], [_ aa ] and [_t _r_], in that sequence. Words with fewer than four syllables were presented similarly, but were followed by the balance of empty syllables; for example ballon would be pre sented as [ _ b _ _ ], [_1 _n_], [__________ ] and_[__________ ]. Stress was assigned to a syllable by activating a single node that was used solely for this purpose. For 6T o reduce the size of the training set and shorten training time, words with CCC clusters in either the onset or coda position were excluded. In practice, words of this type made up less than 7% of all Dutch words, and of these, only 3 words had CCC clusters in the coda position, where their effect on stress might be relevant. This suggests the impact of omitting these words from the training corpus was minimal. 86 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. unstressed syllables, the stress node was set to zero (inactive).7 An important feature of this system is that the syllabic position of every given phoneme on the output was not predictable from its position on the input. For example, the third phoneme in the input [b 1 n______________ ] is the onset of the second syllable, whereas the third phoneme in the input [tr mp t _____________] represents the nucleus of the first syl lable. As a result, the task of determining an input phoneme’s syllabic and temporal position (the syllable a phoneme occurs in) was nontrivial, and required the network to acquire important syllabification principles. The task that the model learned is a somewhat artificial one. It was intended to simulate how children produce words as serial sequences of syllables, though the input to this process is arguably not a phonological form encoded as an ordered groups of phonemes. The motivation for using this task was not to closely simulate how children learn to produce the words; ideally, this task would be implemented in a model that learns to map meaning and sound to each other. Actually implementing this task would be possible, though it would involve a much larger network than one used here, because semantic networks require a much larger representational space. Since the resulting network would have required much more training time than the present one (due to the larger network size, and the complexity of meaning-sound mappings), I chose to simplify the network’s task somewhat. I argue that this simplification preserves crucial elements of the word production task, allowing me to examine how a connectionist system learns generalizations about Dutch stress. 7The use of two stress levels is argued to be sufficient for modelling prosodic acquisition since developmental data are currently only available for primary stress acquisition. This is due in part to how notoriously difficult it is to determine secondary stress placement in utterances. 87 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2.2.2 Training Set The training set consisted of Dutch nouns 1 to 4 syllables long drawn from the CELEX corpus of Dutch word forms (Baayen et al„ 1993). Stress assignment and syllabifica tion were also obtained from CELEX, and were based on CELEX coders’ judgments. Nouns were used for two reasons. First, Dutch stress is most predictable for nouns, and so accounts of it have tended to focus on nouns (Booij, 1995; Kager, 1989; Hulst, 1984). Second, the Dutch acquisition data also deals exclusively with nouns, which is perhaps not surprising given that children tend to acquire mostly nouns before the age of 2 (e.g., Macnamara, 1972; Gentner, 1982). CELEX lists 33,553 4-syllable nouns. However, hardware limitations made in cluding all these forms in network training impractical. For that reason, the training set was reduced by randomly selecting 10% of these words. In addition, a number of homophones were removed, leaving 3,324 Dutch nouns that were used in the training set. Figure 2.2 illustrates the frequency distributions of 1, 2, 3, and 4-syllable Dutch nouns, for both the training set and the entire CELEX corpus. Two trends are worth noting here; first, there is an inverse relationship between words’ token frequencies and the proportion of words in the training set in a specific frequency range, such that there are many more low frequency words compared to high frequency words. This trend is not unexpected, given what is known about frequency distributions of words in languages (Zipf, 1935). Second, monosyllables seem to be an exception to this trend, as evinced by the greater proportion of higher-frequency monosyllables in Dutch. Both these trends are also present in the training set, indicating that the 88 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. statistical information about stress placement and word length that the network was exposed to reflected overall statistics of Dutch. The special status of monosyllables is recapitulated in Figure 2.3, in which the type frequencies of 1, 2, 3 and 4-syllable Dutch nouns are illustrated. This graph illustrates that in spite of the fact that monosyllables tend to have higher token frequencies in Dutch, the language as a whole has a small proportion of monosyllabic words, com pared to polysyllabic words. This fact was also captured in the training set. Aside from characterizing certain statistical tendencies in Dutch, the data in these graphs also help support the assertion that the simplifications used to derive the train ing set (i.e., using only nouns, reducing the vocabulary size) did not result in the in clusion of unrealistic data, or the exclusion of important characteristics of Dutch. By definition, all models involve certain simplifying assumptions that make their imple mentation possible; otherwise, they would not be models. The data in these graphs indicate that a major simplifying assumption - reducing the size of the training vo cabulary - did not change some important characteristics of what the model needed to leam. 2.2.3 Training Procedure At the beginning of training, all network connection weights were randomized be tween 0.01 and -0.01. Network training then proceeded as follows: at the beginning of each training trial, a word was selected from the training set. Selection was frequency weighted such that the probability of selecting a given word was a logio function of the word’s frequency in the CELEX corpus. Using a log frequency transform assured that 89 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1 Syllable 2 Syllables 5 fr 84 S. I ----------- 1 so 40 30 20 10 L 0 5 10 15 20 ( 3 S y lla b le s > 5 10 15 20 4 S y lla b les 1 40 i 1 30 1 I 20 1 10 0 Log C ELEX F requency Log CELEX F req u en cy 1 Syllable so X 40 UI 30 10 20 3 Syllables X 4 0 Ui 30 O 20 10 20 Log CELEX Frequency 2 Syllables 60 so 40 30 20 10 10 IS 2 0 4 Syllables 60 40 3 0 20 10 10 20 Log CELEX Frequency Figure 2.2: Frequency distributions of words in the training set (top), compared to the entire corpus of Dutch nouns in CELEX (bottom). Histograms indicate a larger proportion of high frequency monosyllables in Dutch, compared to polysyllables. 90 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 14000< 12000' N u m b e r o f S y lla b le s 1200> N u m b e r of S y lla b le s Figure 2.3: Type frequencies of words in the training set (top), compared to the entire corpus of Dutch nouns in CELEX (bottom), broken down by number of syllables. 91 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the network would receive a sufficient number of exposures to low frequency words within a reasonable number of training trials. During the forward propagation phase of the trial, activation was allowed to propagate throughout the network for 10 time steps (one time step is defined as the propagation of activation across one set of connections to an adjacent layer of units). The network learned to produce each output syllable for two time steps, starting with the third time step (this is because it took three time steps for activation to propagate to the output layer). At the end of the forward phase, connection weights were adjusted for each of these time steps using the backpropagation through time learning algorithm (Williams & Peng, 1990) and cross-entropy error calculation. A logistic activation function was used for each unit in each layer; the learning rate, which scaled the weight adjustments made during the backpropagation phase of each trial, was set to 0.001; the error radius was set to . 1, meaning that error correction was not applied for activations within this level of tolerance. 2.3 Training Results Training was stopped after 5 million training trials. This large number of training trials was due in part to the sequential nature of the output, which required a large number of training trials to learn the task. It was also influenced by the large number of low frequency irregularly stressed words in the training set, which required a large number of training trials to expose the network to a sufficient number of them. In all of the analyses I present below, performance was assessed by presenting the network with words in the training set and comparing each phoneme in the resulting 92 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. net output to the target output. A nearest-neighbor method with a Euclidean distance metric was used for this purpose. (A Euclidean distance metric computes the distance, in feature space, between a phoneme the network’s output to all possible phonemes in Dutch, and determines which is closest). Errors were registered when the network pro duced an output that most closely matched a Dutch phoneme that was not the intended output (e.g., producing [g] instead of [ ]). For my purposes, an entire word was coded as phonologically incorrect if any of the syllables of a word contained an incorrect phoneme. Errors were also registered for cases in which the network failed to produce a phoneme at all, or when it produced a phoneme where none was expected. Stress assignment was assessed by directly measuring the activation of the stress node in the output layer while each syllable was output. A syllable was considered to be stressed if the stress node produced a value of greater than or equal to 0.5 during that syllable’s time step, and unstressed if the stress node had an activation of less than 0.5. The 0.5 value was used because it represented the midpoint between the ‘fully off’ and ‘fully on’ values of 0.0 and 1.0. Although I was most interested in the patterns of behavior that the model exhibited over time, the first set of analyses focused on quantifying the network’s behavior at the end of training, to assess both whether it had learned a significant portion of the training set, and also whether the errors that the model was producing were consistent with those of children. This was done by presenting each word in the training set to the model at the end of training. The network correctly produced the phonological forms of 89% of the training words, and predicted the correct stress for 94% of the training 93 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. words. Below I present an overview of the types of errors the network was producing at the end of training. 2.3.1 Segm ental Errors The words for which the network produced incorrect phonological forms seemed to be ones that represent difficult cases for language learning children, namely low fre quency and longer words. The mean log frequency of the incorrect words was 1.25, indicating that most of the phonologically incorrect words had token frequencies of less than 20 per 4.2 Million in the CELEX frequency counts. In addition, phonologi cal errors tended to occur in longer words: only 22% of the phonologically incorrect words were 1- or 2-syllables long, while 37% and 41% were 3- or 4-syllable words, respectively. The types of errors that it was producing also seemed consistent with those of younger language learning children. Most of the network’s phonological errors con sisted of the model producing a phoneme incorrectly. For example, it output the word fransman (no translation) as [frant-manj. There also appeared to be a number of vowel substitutions (producing orator as [o-ra-tor]), which seems to reflect that the network had some difficulty with the large number of vowels in Dutch, and occasionally sub stituted one for another. In addition, the model committed several consonant sub stitutions, such as the tendency to confuse /!/ and /r/ (e.g., producing goalgetter as [goar-g tor]). A final trend in the model’s errors is worth noting. Just as children seem to be prone to anticipatory and perseveratory speech errors (Wijnen, 1992), the model also 94 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. appears to be producing these types of errors. For example, for revolver (/rav 1 vor/) the model instead produced [ra- vor vor], anticipating the third syllable’s /or/ sequence in the second syllable. From these results, I conclude that while the network was not trained to a point at which it was producing each training form correctly, it was nevertheless producing errors that seem consistent with what children tended to produce. It thus seems a safe generalization that the network had learned aspects of Dutch phonology in ways that were consistent with how children learned Dutch. In the next section, I further consider this possibility, by examining the types of stress errors it commits. 2.3.2 Stress Errors Stress assignment was remarkably good by the end of training, such that only 6% of words in the training set were incorrectly stressed. Stress errors fell into three cate gories. In some cases, the network assigned stress to the incorrect syllable. Elsewhere, the network assigned stress to more than one syllable. Both of these error types are of interest to us, because of how they might compare to children at Stages 2, 3 and 4 in Fikkert (1994). A third type of stress error also occurred, when the network was not able to determine which syllable was stressed; in these cases the stress node was below threshold for all syllables in the word. This error type is more difficult to compare to errors in child language, because it can be interpreted two ways; it could be that that the network was applying equal stress to each syllable for whatever reason, and this resulted in no syllable receiving full stress. A second possibility is that the network 95 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. attempted to apply more stress to a specific syllable, but the stress node was not above threshold for that syllable. As in the case of segmental errors, the pattern of stress errors that the model pro duced at the end of training seemed consistent with those committed by children ages 2 to 5. The model’s error pattern suggests that, while performance had not yet reached an adult-like level of performance when training was stopped, its performance was similar to that of children who have acquired much of Dutch phonology. It is assumed that with more training trials, a larger sample of Dutch words, and possibly a larger number of units in the hidden and cleanup layers, the model’s performance would have achieved an adult-like level of performance that approaches 100% correct. 2.4 Developmental Patterns in the Model The primary issue to be addressed concerned the patterns of behavior the network exhibited over the course of training. This was investigated by recording the network’s weights every 100,000 training trials, effectively saving a snapshot of the network’s state at that point in training. This allowed me to test the network’s performance on a variety of word types at consistent intervals during training. Testing the network over the course of development is analogous to a longitudinal study of an individual child, and allows the comparison of the network’s developmental patterns to that of children acquiring Dutch stress. One caveat should be noted however, and that is the fact that it is difficult to di rectly compare the ‘age’ of a model to that of a child. That is, it is not possible to say with any certainty that a certain number of training trials directly relates to a discrete 96 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. chronological age. Many variables contribute to the speed with which the network learns a specific task, including learning rate, the error calculation method, the number and size of hidden layers, the use of frequency compression, and the representational scheme used to encode various types of knowledge. From the perspective of develop mental data, it is also difficult to know how many ‘trials’ a child receives of any given word form. There is considerable inter-child variability with respect to rate of devel opment and also the age at which certain developmental milestones are reached. For these reasons, the present work instead focuses on the different developmental stages that occur in the model, and the order in which they are arrived at, in a comparison to empirical facts about children. The model also provides an alternative explanation for the errors Dutch children produce at these stages of acquisition. The model’s behavior over the course of learn ing was strongly influenced by characteristics of the training set to which it was ex posed. Analyses of Dutch syllable and stress patterns revealed biases toward words with initial stress and words with one or two syllables. (While Figure 2.2 indicates there were relatively few monosyllabic words, Figure 2.3 shows that the model was nevertheless exposed to many of such words due to the tendency for monosyllables to have higher token frequencies). Errors at Stage 2 reflect the model’s response to this bias toward initial stress. Errors at Stage 3 reflect a greater sophistication on the part of the model as it attempts to fit the broader patterns in the training set, including the tendency for each foot to receive stress (e.g., tr mp s t st for tromp tt st). Errors at this stage also reflect competition between the Stage 2 tendency towards initial stress, and the need to uphold generalizations about moraic trochees (resulting 97 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. in errors such as [ su:nda: ne:s] for su:nda:ne:s). Changes in the strategies the model is using to learn stress patterns is also similar to that of Dutch children. Learning is characterized as the gradual progression from one stage to the next, rather than stepwise shift from one type of behavior to the next. 2.4.1 Regular and Irregular Stress Before looking at specific developmental patterns in the network, I was interested in whether the network had indeed acquired rule-like behavior at all. It is conceivable that the network was doing nothing but ‘memorizing’ the stress patterns of each word it was exposed to. Because Dutch words vary in the regularity of their stress patterns, it was possible to assess whether this was in fact how the model acquired stress. This was done by comparing the network’s performance on phonologically similar words with regular and irregular stress patterns, over the course of training. If the model was simply memorizing stress patterns, one would not expect to see differences between the two types of patterns. Since many different factors might tend to correlate with stress regularity (for ex ample, frequency and word length), it was not ideal to test the network on all regularly- and irregularly-stressed Dutch words. Instead I assembled sets of 2- and 3-syllable Dutch words with syllabic structures known to attract highly regular stress patterns (Hulst, 1984). The 2-syllable word set consisted of Dutch forms containing two heavy syllables, fitting the (C)(C)VX< T .(C)(C)VX,T template.8 In addition, this set included 8While both VC and W codas were considered heavy here, the actual weight of long vowels, is frequently debated in the literature (Booij, 1995; Gussenhoven, 2000). However, statistics in CELEX and Hulst (1984) nevertheless indicate that rhymes coded as W tend to attract stress to a sim ilar degree as VC rhymes, and for that reason they were included in the present study. 98 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. any 2-syllable word ending in a superheavy syllable (W C or VCC). The 3-syllable set was similar, consisting o f words with three equally heavy syllables fitting the (C)(C)VX0 ..(Q(C)VXct.(C)(C)VX template (here again, syllable rhymes coded as VC and VV in CELEX are considered heavy), and 3-syllable words ending in a superheavy syllable. Words fitting the regular stress patterns H.H, H. H.H, and H.H. HH were included in the regular testing set, while words taking any other stress patterns were included in the irregular testing set. The 2- and 3-syllable regular and irregular sets were presented to the network at 100,000-trial intervals to illustrate its performance on these types of words, over the course of training. Results are illustrated in Figure 2.4. Correct stress was scored when the network indicated stress only on the correct syllable in the word; instances of multiple stressed syllables, no stress, or stress on the incorrect syllable, were scored as incorrect. Several interesting patterns emerge from this analysis. The first is that the network was not performing similarly on words taking regular and irregular stress patterns; in stead, performance on different word types seemed to increase and decrease at different points in training. This seems to suggest that the network was inferring generalizations about stress patterns in the course of training, to the detriment of certain forms. This pattern is very clear in the case of 2-syllable words, where the network performed particularly well on words taking regular stress — but very poorly on irregularly-stressed words - early on in training. In addition, as irregular stress perfor mance increased, a decrease in performance on regular stress was observed, but then increased again toward the end of training. This pattern is remarkably similar to the 99 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. C O * o 0 5 T J < D 0 1 C O £ 5 5 >. o ® o O 100 9 0 8 0 7 0 6 0 5 0 • Regular ■ Irregular 4 0 3 0 20 10 0 1 M 2M 4M 0 5M 3M # Training Trials § • Regular ■ Irregular 2M 3M # Training Trials 5M Figure 2.4: Model’s performance on samples of 2- and 3- syllable words taking either regular or irregular stress. 100 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. U-shaped learning pattern observed in children learning morphology (Slobin, 1985; Bybee & Slobin, 1982; Marcus et al., 1992; Plunkett & Marchman, 1993), and sug gests that the acquisition of regular and irregular stress are proceeding in a different fashion in the current model. In contrast, data from 3-syllable words is less familiar; here it appears the network initially tends toward an irregular stress pattern in 3-syllable words (in particular, word- initial stress) due to the preponderance of Stage 2-type errors present at an early stage. This is because the task of acquiring words of this length involved unlearning the irregular word-initial stress pattern for words that took the more regular word-medial stress pattern. Accounts of child language acquisition typically focus on explaining a single type of developmental profile, for instance U-shaped learning. The present results are in teresting because they suggest the possibility of an increased degree of complexity in patterns of language development. As the graphs in Figure 2.4 suggest, various types of constraints on acquisition (regularity biases vs. Stage 2-type errors) can interact to produce non-obvious developmental patterns. These results make strong predictions for future research because of how they might serve to discriminate between grammar- based and connectionist models of language development. 2.4.2 Error Types for Irregulars As I noted above, there is little empirical data comparing the acquisition of regular and irregular stressed words in Dutch, and thus it is difficult to determine at this point whether these results are actually simulating how children learn such forms. However, 101 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. there is at least one aspect of Fikkert’s study that might shed some light on the effects of regularity on Dutch stress acquisition. At the end of training, the network was producing errors primarily on irregularly stressed words. It is difficult to quantify the exact proportion of regular and irregularly stressed words in the training set since it is often difficult to determine what stress pattern is truly the regular one. However, using the sets of 2- and 3-syllable words from the previous analyses, it was found that the network was only producing 4% of the words in the regular sets incorrectly, compared to 18% of the words in the irregular sets. This seems to indicate that irregular stress remains a problem for the network at the point at which training was stopped. The actual types of errors the network committed appear to be in keeping with what Fikkert found for children at Stage 4. While the general trend at this stage is for children to produce words with correct stress, 3- and 4-syllable words taking final stress are incorrectly produced with initial stress by children at Stage 4. Table 2.2 gives some examples of these errors, as listed in Fikkert (1994). Because the final syllable is not the typical location for stress in words of these forms (Hulst, 1984), these would appear to be overregularization errors. This seems to indicate that children have learned the regular stress pattern for such words, but that they are tending to apply it to words with irregular stress. This effect was investigated in the connectionist model by testing it on 3-syllable words with irregular stress, at a point in training consistent with Stage 4 (after 3.5 million training trials). As observed above, the model did tend to produce the correct stress for most irregular forms at this point. However, several initial-stress errors were 102 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 2.2: Sample of errors on irregularly stressed words at Stage 4. Robin 2;4.29 amstor dam apstodam 2;4.29 pa:ra: ply: paloply: 2;4.29 kro:ko: d 1 — ► ko:kod w Tirza 2; 1.17 kro:ko: d 1 — » ko.kalt 2;3.27 amstor dam — * mstod m 2;5.5 ko:n g n — ► ko:n g g Enzo 2;2.4 kro:ko: d 1 — ► n k d 1 2;3.14 bu:do r i -> budor i Table 2.3: Model performance on irregularly stressed forms at Stage 4. target form model output frequency (/ 4.2 Million) di:glo: zi: di:blo:zi: 0 ho:xer wal ho:zorwal 0 sa:ti: n t sa: ti:n t 0 sy:por fly: sy:porpli: 3 tri:jar xi: tn: jarxiy 0 observed on lower frequency irregularly-stressed forms. These errors are reproduced in Table 2.3. These errors are significant because they again suggest that the model was produc ing rule-like behaviors consistent with what is observed in children. Crucially, while the model was capable of encoding most irregular forms, it was not accomplishing this solely by memorizing the training corpus; instead it learned important generaliza tions about how stress is applied, and committed errors that were consistent with these generalizations. And while these types of overregularization errors are not a unique characteristic of connectionist models (Marcus et al., 1992), they do indicate that the 103 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. type of learning mechanism used here can help explain data related to Dutch stress acquisition. For example, while empirical studies have tended to ignore the effects of regularity on prosodic acquisition, these results suggest that this could be an important avenue of future research. Indeed, this work makes strong predictions as to the im portance of pattern regularity in the acquisition of phonological systems, and suggests that developmental profiles can extend well beyond well-researched U-shaped learning patterns. 2.4.3 Pools o f R egularity The second aspect of irregular stress I was interested in concerned words with irregular but consistent stress patterns. For example, two-syllable words in Dutch of the form (VC)ff (VC)C T tend to have initial stress, since Dutch is a trochaic language. However, it has been has observed that certain non-morphological word endings tend to corre late with final stress placement in such words (Hulst, 1984; Booij, 1995), such as the ‘French’ word endings illustrated in Table 2.4. From this, it would appear that factors other than syllabic weight are influencing the placement of stress in Dutch. Table 2.4: Sample of Dutch VC forms that appear to attract final stress in bisyllabic VC-VC words. __________________________ word ending initial final -et 4 30 -el 0 5 -on 6 16 The tendency for (non-morphological) segmental factors to affect stress patterns is common in Dutch. In fact, it has been shown that the ability to learn Dutch stress is greatly enhanced when segmental information is taken into account. Dalemans, Gillis, 104 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. and Durieux (1994) used a similarity-based learning algorithm (a class of statistical learners similar to the type of neural networks used here) to predict Dutch stress, and found that the learner’s ability to assign the correct stress to words was greatly en hanced when both segmental and metrical information was made available to the arti ficial learner, compared to when it was only given metrical information. A drawback of the Dalemans et al. learner was that it was not a model of development, and could not account for how children acquire stress in stage-like ways. Instead, it statistically encoded generalizations about an entire corpus of Dutch stress in a single pass Because the network used here had both segmental and prosodic information vat its disposal during training, it is possible that it could account for similar facts as the Dalemans et al. model does, in addition to providing a model of development. This was tested by presenting the model with words containing well-known segmental cues to stress. Since the network was only trained on a sample of the words in CELEX, it was possible to test it on sets of Dutch VC-VC words that were not in the training corpus. The results of testing the network on all such words are presented in Table 2.5. From these results, it appears that the fully trained model did indeed tend to apply final stress to words ending in it -et -el and -on. By way of comparison, I have also indicated the model’s performance on various VC endings that seem to be highly correlated with initial stress ( - m / m , -aN and -is).9 For these types of words, the model appears to be consistently applying initial stress. Together, these data indicate that the network has learned to keep track of several types of information relevant to Dutch stress, not 9The N symbol is used to indicate ‘any nasal consonant.’ 105 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 2.5: Network’s performance on bisyllabic word forms with phonologically pre dictable final (upper) and initial (lower) stress. pattern % correct VC- <et> 65 VC- <el> 75 VC- <on> 68 VC-<um> 91 VC-<aN> 100 VC-<is> 79 merely suprasegmental factors such as syllabic weight and canonical trochaic stress patterns. These results are significant because they underline an important difference be tween the connectionist approach and symbolic learning theories; whereas the tra ditional Generative accounts of Dutch stress have focused solely on factors such as syllable weight in deriving the rules of stress, it is clear that other types of factors are also relevant. In the present example, stress in words consisting of two heavy syllables should only be predictable from suprasegmental information, on a strictly metrical ac count. However, phonological information such as the coda of the word’s final syllable seems to modulate these factors. Connectionist networks like the one used here took advantage of these types of subregularities in the course of learning the training corpus in order to better acquire the stress patterns of Dutch words; this in turn also allowed it to generalize irregular patterns to unfamiliar words. Indeed, this result underlines a broader difference between the type of learning mechanism I am proposing here and the type theorized by Pinker and Prince (1988), Pinker (1991), Marcus et al. (1992, et. seq.). Generative accounts like this one have 106 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. typically maintained that regular patterns are acquired using linguistic rules, whereas irregular patterns are strictly memorized and are only very rarely extended to novel words. Unfortunately, little work has been done on irregular stress in Dutch adults and children, and thus it cannot be said for certain whether Dutch speakers extend these partial regularities to nonsense words. However, work by Zevin and Joanisse (2000) suggests that English speakers take into account a variety of factors in determining the stress patterns of polysyllabic nonwords. For instance, subjects consistently used metrical information in producing stress in such words as vonima ([ vanimo]), but were also sensitive to nonwords’ similarity to familiar words, producing stress on syllables not predicted by the metrical factors in such words as banella ([bo n lo]). The present modeling results make the strong prediction that Dutch adults are also sensitive to a variety of sources of constraint when they produce the stress for novel words. Thus, they would tend to extend the irregular ‘French’ stress pattern to other words with similar endings, contrary to what is predicted in dual-systems approaches. 2.5 Stages of Acquisition in the Model The second aspect of Dutch acquisition I was interested in concerned the stage-like way in which children appear to learn stress patterns. The network’s behavior over the course of training was investigated to assess whether it was also producing the types of errors described by Fikkert (1994). This was done by presenting the network with the entire set of 2- and 3-syllable words in the training corpus, and comparing its output to the target. A syllable was considered stressed if the activation of the stress node on the output was greater than 0.5. A Stage 2 error was coded when the network incorrectly 107 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 2.6: Examples of errors produced by the model that were consistent with errors at Stages 2 and 3 in Fikkert (1994). word adult form model output Stage 2 echec e: k e:s k donaat do: na:t do:na:t jubee jy: be: jy:be: Francaise fr n s :so fr ns :so diffusor d ffy: z r d ffy:z r woestijnrat wu:s t nr t wu:ste nr t Stage 3 framboos fr m bo:s fr mbo:s vignet v n j t v n j t santon s nt : s n t haverzak ha:vorz k ha: vorz x huisorde h s rdo h s t rdo jeugdbeleid j0 dbol td j0 dbo r t produced an initial stress for a word. A Stage 3 error was coded when the network produced more than one stressed syllable (that is, the stress node was activated above threshold for more than one syllable). Examples of errors consistent with Stage 2 and 3 are listed in Table 2.6. The proportion of errors consistent with Stage 2 and Stage 3 that were produced by the network is plotted in Figure 2.5. These results indicate that the network was producing a significant number of both types of errors during training, but that the proportions of these kinds of errors changed as training progressed. Consistent with the child language data, the model was also producing many more Stage 2-type errors earlier in training, and many more Stage 3-type errors later in training. In addition though, the behavior was not perfectly stage-like. Instead, there was a clear transition period during which the network produced a significant number of both error types. 108 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0 100 200 300 400 500 600 700 800 900 1000 # Training Trials (x 1,000) 100 90 • Stage 2 ■ Stage 3 < o ao m 9 - 70 £ o w 60 U] 50 C 40 © w 30 <D a- s o 100 200 300 400 500 600 700 800 900 1000 # Training Trials (x 1,000) Figure 2.5: Proportion of Stage 2 and 3 errors on 2-syllable (Top) and 3-syllable (Bot tom) words in the the network, over the course of 1 million training trials. 109 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 2.6 illustrates how the Stage 3-type behavior began to decline later on in training, as the proportion of overall correctly stressed words reached asymptote (Stage 4). Here again, the network’s performance seems to be consistent with data from Dutch children, who also demonstrated fewer Stage 3 errors as they produced more correctly stressed words in Stage 4. 100 100 90 85 -o < D 80 % <D 75 55 70 >> CO 50 • S ta g e 3 ■ Correct s tr e s s 60 5000 o 4000 1000 2000 3000 # Training Trials (x 1,000) 100 300 270 85 *o • S ta g e 3 ■ Correct str ess 30 5000 0 4000 1000 2000 3000 # Training Trials (x 1,000) Figure 2.6: Proportion of Stage 3 errors in the network, relative to percent correctly stressed words, over the course of training. (Top: 2-syllable words. Bottom: 3-syllable words) Given these results, it would appear that the network was learning to produce stressed words in a similar way to the Dutch children observed by Fikkert; it showed 110 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. stage-like behavior with respect to the types of errors it produced, but at the same time there was a gradual transition between stages. As such, the network was producing appreciable numbers of errors consistent with Stage 2 and 3 at one point in training, and produced many errors consistent with Stage 3 at the same time as it was producing Stage 4-type behavior. The source of the network’s errors appears to be related to how it was learning the principles governing Dutch stress. Initially, the network was defaulting to initial stress on all words, because of the large proportion of initially-stressed words to which it was exposed. This is illustrated in Figure 2.7 which plots the distribution of stressed syllables in the training set, and shows a large statistical advantage for stress on the first syllable, compared to stress on any other syllable. 3000-T-—— ---------------------------------------------------------------------------------------------------------—----------- 2500* 1 2 3 4 Stressed Syllable Figure 2.7: Distribution of syllable stress location for words in the training set, indi cating a large proportion of Dutch words taking stress on the initial syllable. Over the course of training, the network began producing Stage 3 errors, presum ably as a result of learning to assign stress based on syllabic weight. However because the network had not yet fully acquired the principles governing this, it instead tended 1 1 1 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. to stress two separate syllables in the word. These errors subsided as the network’s ability to correctly stress words of different lengths reached asymptote. 2.6 Discussion The goal of this chapter was to explore how stress is acquired in Dutch, and to ex plain the stage-like way in which learning proceeds in Dutch speaking children. The motive for using a connectionist model should not be misconstrued; the current work does not dispute the existence of basic linguistic mechanisms underlying stress (e.g., suprasegmental units such as moras and feet). Indeed, it would appear that these types of principles represent crucial elements of many types of prosodic behavior, of which stress assignment is only one; the fact that the network learned stress assignment over a large corpus of words indicates that it learned higher-level generalizations about stress assignment that at the very least closely approximate abstract suprasegmental units. Nevertheless, the present work also suggests that many relevant types of information are present in the input to the model, and that children therefore do not need to explic itly encode a broad range of linguistic information in order to learn their language’s grammar. This work also suggests that previous grammar-based account of stress are inade quate to account for a variety of behaviors demonstrated by children acquiring Dutch, because of specific idealizations about the domain of linguistic rules and the way in which they are learned. I argue that a large proportion of Generative accounts have tended to ignore the full breadth of patterns in a language by focusing only on pro cesses that seem regular, and relegating less clear cases to the domain of performance 112 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. or memorization (cf. Clements & Sezer, 1982; Halle & Mohanan, 1985; Inkelas, Orgun, & Zoll, 1996; Ito & Mester, 1995). The present work underlines two major complications with this mind-set. First, there is in fact no non-arbitrary way to distin guish between regular and irregular processes; many exceptional cases can show clear productivity in constrained domains. And second, models of acquisition in Generative grammar (Dresher & Kaye, 1990; Tesar & Smolensky, 1996; Wexler & Cullicover, 1980) fail to account for how exceptional cases are learned, and the effect of such cases on the ability to leam regular cases. A similar problem arises with idealizations of data pertaining to stage-like acqui sition. Here again, it is possible to characterize children’s grammars as undergoing radical, categorical changes within a short period of time, and symbolic grammars can easily provide an account of this. However, closer inspection of these data indicate that there are few specific points in time at which a child is clearly within a specific stage. Under most circumstances, children produce forms that are consistent with more than one stage at a single point in development. This and other work in the connectionist framework explains gradience and overlap among stages as the consequence of gram mar being learned and represented in parallel with the vocabulary that it acts upon. Thus in the model that I present, there is no categorical distinction between learning words and learning rules. As a result, developmental changes in linguistic ability are not the result of changes to a grammar that acts equally upon all lexical forms in a language, but instead the simultaneous acquisition of individual words, and general izations across them. 113 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 3 Connectionist Phonology and Optimality Theory The previous two chapters investigated areas of phonological theory and phonological acquisition that have previously been addressed within Generative frameworks. The purpose was to examine how these data, and others like them, might be well-suited to an alternative connectionist framework in which linguistic patterns are learned and processed within a distributed and probabilistic mechanism. Such an account contrasts with Generative theories in several important ways, related to how symbolic grammars are learned and represented. However, the Optimality Theory (OT) approach to lan guage is argued to represent a major step toward merging connectionist theory and Generative Grammar (Smolensky, 1999). OT is a hybrid approach that acknowledges the importance of subsymbolic computation and the utility of symbolic representations in capturing linguistic facts; it is an attempt to integrate certain features of Connection- ism with the symbolic approach to language. Given this approach, it could be claimed that Connectionist Phonology represents only a restatement of principles of OT within a different computational mechanism, in what could be termed ‘Implementational Connectionism’ (Marcus, 1998; Pinker & 114 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Prince, 1988). It is a central claim o f the present work that Connectionist Phonology is not simply a reimplementation o f OT. Instead, it is a reconsideration of what men tal grammars are, how they are learned, and how they are represented, that varies in important ways from the central OT approach. In this chapter, I seek to better address the similarities and differences between Connectionist Phonology and the original conception of Optimality Theory (Prince & Smolensky, 1993, which I will term OT93 for the sake of clarity). I argue that the data in this dissertation underline several important distinctions between the two frameworks, but that there exist several important developments in OT that represent useful steps forward in applying Connectionist assumptions within the framework of Generative Grammar. This chapter is not intended to be a wholesale (or exhaustive) critique of OT93 or its successors. Nor is it my assertion that the integration of the Connectionist and sym bolic approaches is an altogether wrong way of studying linguistic data. Instead, this chapter seeks to better explain how the Connectionist approach to phonology distin guishes itself from Grammar-based theories, and the extent to which OT93 addresses these facts. I propose that there are currently important differences between Connec tionist Phonology and OT93 that extend beyond mundane implementational facts such as how one chooses to characterize constraints (as symbols or connection weights). Below I discuss what I see as three important contributions that the Connectionist approach makes to the understanding of constraints-based grammar; constraint innate ness and universality, the separability of the grammar and lexicon, and the nature of quasiregular domains. 115 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.1 Constraints and Their Sources Optimality Theory makes a critical assumption about the nature of grammatical con straints, the fact that they derive from a Universal Grammar. However, it is not clear where Universal Grammar itself is assumed to come from in OT93. Universal Grammar provides a set of highly general well-formedness constraints. These often conflicting constraints are all operative in indi vidual languages. Languages differ primarily in how they resolve the con flicts: in the way they rank these individual constraints in strict dominance hierarchies that determine the circumstances under which constraints are violated. A language-particular grammar is a means of resolving the con flicts among universal constraints. On our view, Universal Grammar provides not only the formal mecha nisms for constructing particular grammars, it also provides the very sub stance that grammars are built from. (Prince & Smolensky, 1993, p.3) Note that nowhere in the above statement is the word ‘innate’ used. It is only as sumed that all constraints are available to all language users. A frequent assumption is that because constraints derive from Universal Grammar, they encode linguistic knowl edge that children possess from birth. To most, this is the gist of the strong innateness assumption in the Generative approach. The Connectionist approach suggests an appreciably weaker definition of innate ness, by recasting what is meant by ‘universality’ and ‘Universal Grammar.’ This view could nevertheless be thought of as consistent with OT as a whole. For example, Smolensky suggests that “constraints are the same in all human grammars... [which] corresponds to a strong restriction on the content of the constraints, presumably to be explained eventually by the interaction of certain innate biases and experience.” (Smolensky, 1999, p. 598) 116 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. I suggest that this is also a good summary of what is meant by innateness in the Connectionist viewpoint (see also Elman et al., 1996), and while it represents a depar ture from mainstream thinking in Generative linguistics, it does not involve rethinking certain basic formal mechanisms in OT. I suggest that Chapters 1 and 2 in this work represent alternative ways of thinking about innateness and universality, by helping to understand the interactions between the effects of innate biases and experience. Below I consider how this work addresses the issue of universality without recourse to strong innateness, and how OT could begin to accommodate it. 3.1.1 Learning Constraints A major claim of Connectionist Phonology is that constraints driving linguistic pro cesses can be learned from two sources: the input that children are exposed to, and the neural, auditory and articulatory mechanisms that children possess. Simply put, constraints such as On s, N oCo da and Ma x are emergent characteristics of what children get ‘for free’ while learning language. In this account, Markedness constraints are explained as reflecting characteristics of linguistic performance such as speech perception and articulation. Faithfulness con straints reflect general computational constraints that serve to simplify the types of lin guistic operations that are performed by language users, a type of data compression that is enforced by the computational limitations of a broad class of statistical learning mechanisms such as neural networks. 117 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.1.2 M arkedness The work on syllables presented in Chapter 1 illustrates how this theory applies to prin ciples of markedness in phonology. The notion of markedness derives from early work in phonology (Jakobson, 1941/1962; Trubetzkoy, 1939/1969), which was based on observations about the frequency with which certain structures or processes occurred across languages. The sources of these constraints were not always clear, and could be assumed to be arbitrary. The present work sees markedness as anything but arbitrary, and is instead the result of functional constraints on languages, deriving from multiple sources. This idea is not new; it forms the basis of many functional approaches to phonology (among many others, Boersma, 1998a; Lindblom, 1986; Flemming, 1995; Frisch, 1996; Browman & Goldstein, 1992; Hayes, 1997; Steriade, 1994; Stevens, 1989; Wright, 2000). The contribution of the present work is to suggest a possible mechanism by which functional facts about language are encoded in grammars. The case of syllable typologies in Chapter 1 provides a useful illustration of this. The markedness of codas relative to onsets is explained as a consequence of facts about articulatory processes (that consonants, especially stops, are produced differently pre- and postvocalically Krakow, 1999) and auditory signal detection (that these differences affect the relative ease with which a hearer can identify a consonant in the context of a syllable, Redford & Diehl, 1999; Wright, 2000). The use of a connectionist model is strictly academic; one could imagine using any other Bayesian mechanism to derive the same conclusions about how the recovery of a phoneme from an auditory signal can vary by syllabic context. The main point is that the human cognitive apparatus 118 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. is able to learn statistical generalizations from probabilistic information (e.g., SafFfan, Aslin, & Newport, 1996), and that one can quantify these effects. The crucial next step is to explain how phonetic constraints such as these ones be come ‘phonologized.’ That is, how do such facts become part of a grammar? The answer seems to lie in how the nature of a language’s vocabulary influences what is learned from it. Consider a hypothetical language in which all syllable shapes are possible. This language’s vocabulary contains no useful cues to the existence of syl lable markedness constraints. However, phonetic constraints on syllable production and perception will make some words more fragile than others, based on whether they contain more difficult phoneme sequences. A word’s fragility is simply defined as the likelihood that it will be incorrectly identified by a listener, either during language acquisition or use. Through a process of natural selection, these more fragile words will tend to be modified to decrease their fragility (in this case by shifting, changing or deleting stop consonants in codas to increase their perceptibility). Over many generations of speak ers, this hypothetical language will begin to lexicalize phonetic constraints on sylla ble structure by reducing the number of words that contain codas (as in English), by limiting the types of phonemes that can occur in codas (as in Japanese), and even eliminating codas altogether (as in Klamath). In all three of these scenarios, a lan guage’s vocabulary begins to encode a great deal of statistical information about syl lable markedness as more marked forms are banished from the vocabulary. 119 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. I argue that the ‘lexicalization’ or ‘phonologization’ of phonetic factors plays a major role in how a language learner approaches the structure of syllables in a lan guage, due to the statistical properties it imparts on the lexicon. As the lexicon begins to reflect phonetic factors, their apparent regularity increases. As such, even when a language fails to prohibit codas altogether, learners become aware of statistical dis- preferences for such forms. The simulation in § 1.2.2 illustrates how statistical learning itself imposes a new type of constraint on the language, in which certain patterns be come easier to process based solely on their greater (type) frequency in the language. This process is referred to as frequency boosting (Singleton & Newport, 1993). Work by Frisch (1996) is further suggestive of this, indicating that statistical in formation of this type abounds in many aspects of phonology, and that speakers have access to this type of information as they process language. I suggest that these sta tistical regularities can encode information about all markedness constraints derived from functional considerations. What is innate in this sense is a sensitivity to prob abilistic information in languages. The universality of such constraints derives from shared cognitive, auditory and articulatory characteristics among speakers of all spo ken languages. 3.1.3 Faithfulness Constraints on Faithfulness have not been directly considered in the present work. However, I argue that there is reason to think that these too can be explained as aris ing from functional factors in language processing, on the premise that computational systems contain within them strong biases toward structure preservation. In lieu of 120 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. an exhaustive consideration of how this might emerge, I consider in general terms how this applies to three major classes of Faithfulness constraints. The constraints in (1) all enforce the correspondence between a word’s underlying form and its surface manifestation. (1) M a x : All material in a word’s underlying form should be represented in its output. DEP: Only the material in a word’s underlying form should be represented in its output. C o n t ig : The linear order of segments in a word’s surface form should corre spond to that of the word’s underlying form. The source of these constraints is proposed to derive from the need to preserve a consistent relationship between the internal representation of a word, and how it is actually produced. This is typically described as the relationship between a word’s un derlying form (how it is represented in the mental lexicon) and its surface form (how it is output by the grammar). However, because Connectionist Phonology makes a strong commitment against a distinct mental lexicon, there needs to be some explication of what the term underlying representation means in this present theory. In Connectionist Phonology, words’ underlying representations are derived from the mapping between meaning and sound. In Chapter 4 ,1 present an example of a net work that acquires a language by learning to produce appropriate phonological forms for a semantic pattern and vice-versa. As with children, such a model is exposed to an appropriate sample of a language, containing many different derived forms, though 121 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. perhaps not all possible combinations of stems and inflections. The prediction is that this type of model will develop human-like representations of words and affixes in its weights as a result of learning this task, and will thus be able to generalize to forms that it has never been exposed to. This account is fairly similar to how roots and affixes could be learned in a sym bolic theory; it deduces generalizations from a representative sample of a language. The discriminating factors here are that words are not said to be stored in a discrete symbolic lexicon, and they are not represented separately from the language’s gram mar. That is, symbolic approaches assume that words and affixes are stored as explicit lexical entries that are independent of the grammar that operates on them. This account is different in that it posits that both grammatical and lexical knowledge are encoded as patterns of connection weights in a massively parallel neural mechanism that maps sound and meaning. 3.13.1 The Emergence of Faithfulness There are two critical components of the present theory of Faithfulness; a set of morpho-phonological phenomena involving small modulations in a word’s phonologi cal forms under specific circumstances, and a mechanism that is posited to learn these processes. Given these, we can explore the types of computational factors that con strain learning in such a mechanism. A model that maps meaning and sound for a variety of forms is required to en code word-specific information within a finite number of weighted connections. These mappings are often required to extend across a variety of surface forms. For example, 122 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. English verbs can occur as unmarked (bake), past tense (baked), present progressive (baking) and third person singular (bakes). An underlying form encodes the common alities across all its surface manifestations, such as the unmarked form of a verb (bake) and morphemes that modify it (-t, -s, -ing). In OT, violations of Faithfulness constraints occur when an underlying form is distorted in an output candidate. Here again, English regular past tenses provide a useful example. Verbs that end in an alveolar stop trigger a vowel epenthesis (insertion) process. This serves to break up an undesirable consonant cluster, as in the forms tasted [tejst d], and brooded [brud d] (cf. [*tejstt] or [*brudt/*brudd]). Certain syllable structure constraints in OT would act against illegal consonant clusters like these, for example *tt]cr, *dt]<r and *dd]cr. In English, these Markedness constraints would be said to be ranked above the DEP Faithfulness constraint that would otherwise prevent epenthesis. In Connectionist Phonology, Faithfulness constraints reflect the tendency of the learning mechanism to limit the degree to which any learned morpheme must be dis torted under any given situations. The source of this derives from characteristics of distributed systems that limit the degree to which they will accept distortions in a learned form. It is much simpler for such a system to learn a one-to-one mapping be tween meaning and sound because it is not required to also learn the principles that condition its alternations. Though it was not the primary goal of his work, Hoeffner (1996) has explored how such constraints limit the types of morphological systems that exist in world’s 123 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. languages. This work was largely a response to the (incorrect) claim that connec tionist models are implausible because they are capable of learning a much broader set of morphological patterns than are crosslinguistically attested. Pinker and Prince (1988) posited that connectionist models have no intrinsic constraints on what types of mappings they can learn, and therefore incorrectly predict a very broad typology of morphological systems. For example, no language marks a morphological variant by producing a stem backwards; the Generative approach suggests that this type of marking is prohibited because no linguistic parameter allows for it. To illustrate why this is not the case, Hoeffner implemented this hypothetical sys tem in a connectionist model. He trained the network to produce the phonology of English verbs in either the usual order or in reverse, depending on the state of an in put node (indicating the presence of some morphological condition). He found that learning in this network was considerably worse than in a similar model that learned to produce present and past tense verbs using the usual system of -ed suffixation. The im portance of this work is that it clearly illustrates how CONTIG Faithfulness constraints emerge from more general properties of how distributed connectionist models repre sent and process linguistic information. A distributed neural system that has learned a morpheme’s phonological form will tend to be resistant to distortions of this form due to general computational constraints on how this type of system encodes information. 3.1.4 How This Relates to OT In this section I have suggested an important contribution that the Connectionist ap proach can make to phonology, namely the specification of a Universal Grammar as a 124 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. consequence of functional factors. I propose that an important shortcoming of OT93 has been the statement that constraints derive from a Universal Grammar without a clear specification of what UG derives from. Much current work in linguistics seeks to better understand UG by suggesting it derives from functional factors such as acous tic and articulatory properties of speech. The contribution of the present work is to demonstrate how a Connectionist approach might help to put a finer point on such theories, by demonstrating how phonological information is acquired, represented and processed in neural systems. I suggest this approach is not incompatible with assump tions of universality in OT93, and merely present an alternative viewpoint as to the nature of innateness in linguistic systems. 3.2 Learning and Encoding Grammars A major goal of Chapter 2 was to address shortcomings of constraints-based ap proaches to phonology in accounting for two types of data related to grammar and acquisition. First, it appears that different stages of acquisition can overlap over the course of development. This seems inconsistent with a theory of ranking and demoting constraints, which predicts much more categorical changes in a child’s behavior over the course of acquisition. The second type of data relate to cases of irregularly stressed words, and how they are learned. It appears that both of these represent problematic issues within OT93; it is unclear how overlapping stages of acquisition can be accounted for in the standard OT learning account (Tesar & Smolensky, 1996). Likewise, it is also unclear whether such a learning mechanism can acquire both regular and irregular linguistic patterns, 125 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. and show the types of generalization behaviors often observed in native speakers. In this section, I suggest that these reflect important distinctions between the OT93 and Connectionist approaches to acquisition. However, I also discuss newer Optimality- theoretic approaches that show some promise in addressing these concerns. 3.2.1 Q uasiregular Dom ains A distinguishing aspect of connectionist models is how they encode linguistic gram mars. Generative approaches to grammars treat them as discrete sets of rules or con straints that act upon an equally discrete lexicon. Connectionism is different in that it encodes a grammar as a set of probabilistic generalizations drawn from regularities in a language that can then be applied to familiar and novel forms. An important advan tage of this approach is its ability to encode cases that fail to conform to the regular pattern within the same mechanism as those that do follow the regular pattern. This is particularly helpful in instances where it is difficult to determine which cases are the regular ones. For example, the Italian infinitive marker surfaces in different forms depending on the verb it is affixed to (e.g., sed- are, led-ere, sed- ere, sped- ire). The first ending, - are, is described as the productive ending, as it is typically applied to loanwords and nonce forms. Thus it is assumed to be the default case or rule, while all other cases are seen as non-productive irregulars. However, it is also observed that all four verb classes show some internal consistency, such that class members tend to bear some phonological resemblance to one another. For instance, words taking the - ere ending 126 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. tend to share similar prosodic and phonological structure (Davis & Napoli, 1994). Al bright (1998) has shown that native Italian speakers are sensitive to the neighborhoods of regularity formed by Italian verb classes, and as a result show preferences for the appropriate irregular ending for nonwords that are similar to other irregular verbs (e.g., given a nonce stem like /aduq-/ speakers tend to prefer [ad duqg-ere] over the putative default [adduqg- are]). Such cases are problematic for dual-mechanism type explana tions that posit a single grammatical rule and a list of exceptions, since it is difficult to determine which instance actually represents a rule, and why the exceptional cases show such a high degree of productivity. A similar problem arises in the case of Dutch, because of the degree to which its stress patterns deviate from the default pattern. As I investigated in Chapter 2, there appear to be tradeoffs between different types of regular patterns such that the broader pattern of metrically-driven stress is overridden by segmental factors (as in the case of words ending in -et). These so-called pools o f regularity are easily explained in the Connectionist Phonology framework; they stem from how such a learning mechanism allows for multiple forms of constraint (e.g., MacDonald et al., 1994), and because of how such mechanisms encode generalizations probabilistically, not as explicit rules. In many ways, these data could also be compatible with ideas from OT93 that allow for complex interactions of constraints within a single mechanism. On such an account, constraints deriving stress from a word’s metrical structure could be out ranked by constraints that force irregular stress patterns in words containing specific segments; given the sufficient constraints, the default regular pattern could be over ruled by the irregular pattern in certain cases. A second type of explanation for such 127 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. data has also been proposed in which different word classes are targeted by different phonological processes (e.g., cophonologies Inkelas et al., 1996; ltd & Mester, 1995; Pater, 1995, see also ; Zuraw, 2000). On this account, the lexicon is divided into pre dictable strata that encode specific classes of words, such as loan words, native words and words with a certain metrical or segmental structure. Separate sets of constraint rankings coexist in the grammar for each lexical stratum. This type of system would allow for separate productive processes to occur in a single language, though it does not allow for what are called static processes, patterns that are not productive. As in other Generative accounts, these patterns are simply memorized, and are not governed by grammatical principles. I argue that neither of these are sufficient to account for the broader range of facts about Dutch. First, not all words taking irregular stress do so predictably; words such as tonsil and vernis (varnish) have irregular stress, but do not appear to belong to a predictable class of irregularly stressed Dutch words. Second, not all words with stress-attracting endings take irregular stress; for example the irregular - et stress pat tern seems to be overruled in some words (e.g., clicket, Tibet, ticket have regular stress). For both types of words, any Generative system would have to specify these words as memorized (or lexicalized) irregulars. OT93 grammars use a single con straint ranking, and would therefore incorrectly produce irregular stress on words such as sor bet (a word that takes regular stress, but which is phonologically similar to the -et class of irregulars). The cophonology solution might be more capable of accommodating such facts, by arbitrarily assigning each word in the lexicon to a specific cophonology without 128 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. regard to how these cophonologies might be organized. However, creating arbitrary groups of lexical items presents a special problem for this type of account, because it abandons the principle that lexical strata are organized by some metric of similarity (be it semantic or phonological). It then becomes difficult to explain why words that are similar along these dimensions tend to pattern together in some cases, but not in others. A learning mechanism designed to take advantage of these regularities in assigning words to a cophonology would fail in instances where this is not the case. A mechanism that is designed to ignore these regularities will similarly fail to capture instances where helpful regularities do in fact exist. In addition, such an account cannot explain the strong effect of similarity on the generalization of apparently static irregular patterns to novel words — Inkelas et al. (1996) suggest such patterns are not suitable for a grammar-based account and should instead be memorized. However, this fails to explain why English speakers prefer the irregular ending in some nonwords (e.g., English past tenses: spling-splang). Al bright’s data on Italian infinitives is also informative in this regard (Albright, 1998). I argue that any mechanism that seeks to capture quasiregular domains must be capable of using multiple sources of constraint in probabilistic ways. The connectionist model described in the preceding chapter represents one type of system that can encode such properties grammars. Deterministic systems like OT93 appear to be ill-suited for this purpose, even when one allows for stratified grammars or cophonologies. Further below I describe some newer OT approaches that appear to overcome this problem by abandoning some central tenets of OT93. 129 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.2.2 Rules and L exicons As I have mentioned throughout this work, a defining feature of Connectionist Phonol ogy is the absence of a discrete grammar and explicit lexicon. This means there is no need to posit a pattern as memorized or rule-governed; likewise, the Prasada and Pinker (1993) distinction between rule application (blick-ed) and irregular general ization (splang) is unnecessary. In fact, the connectionist framework allows for a continuum between highly regular, and highly irregular linguistic patterns. For in stance, the consistency of stress patterns varies cross-linguistically; Russian stress is lexical, meaning stress is assigned idiosyncratically and is thus highly unpredictable; English and Dutch have more regular stress patterns but also many exceptions (which can themselves be quasi-predictable, e.g., English bisyllabic verbs often take word- final stress); and Finnish and French have (almost) perfectly predictable stress. On the present account, all these types of systems are simply points along a continuum in which generality trades off with word-specificity, and all such systems can be encoded within a connectionist architecture. Similarly, the consistency of linguistic patterns also varies within languages. As discussed in the previous section, morphological and phonological paradigms can con sist of different patterns that generalize to different degrees. This is problematic for accounts using deterministic grammars that are discrete from the lexicon. The con nectionist framework provides an alternative account of this in which all degrees of regularity can be encoded within a single neural mechanism. It encodes the regular ity of morphological and phonological patterns as a result of learning the words in a language. Because the network architecture is not sufficient to overtly memorize 130 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. each form that it is exposed to, the network must instead encode the regularities that it encounters in the input, including the circumstances under which specific regularities fail to occur. The range of statistical regularities that the network can encode is fairly broad; the class of networks used in this dissertation is in theory capable of fitting a wide variety of statistical regularities. Networks are only limited by the computational capacity imposed by such factors as the number of connections they have at their dis posal and the nature of the inputs they receive. This explains why networks are not limited to a single source of constraint; segmental and metrical information are both available to the network, and both appear to be playing roles in its behavior. The appli cation of one constraint over another is determined by the statistical probabilities that the network has inferred from its input. Boersma (1998b) offers an interesting illustration of how these general principles could be applied to an OT model. Once again, this relates to the case of English past tense verbs, and how speakers might learn the different degrees of regularity in this system. While the -ed morpheme seems to be the most productive past tense marker in English, other processes also seem to be at least partially acceptable in phonologically constrained circumstances. For example, adult speakers tend to accept splung as an acceptable past tense of spling based on the neighborhood effects generated by such forms as wrung, hung, stung, swung and strung. These types of generalizations are incorporated in to an Optimality-theoretic model by admitting a much broader set of constraint types than is usually the case. Specif ically, constraints are allowed to refer to individual morphemes, in what could be re ferred to as paradigm-specific or arbitrary constraints. Such constraints are learned 131 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. as morpho-phonological generalizations across some mapping. For example, a learner can produce several generalizations about the mapping between sing and sang: (2) /s q/ — »/saeg/ — ► ae/_ g ]< r (change q endings to aeq) — *-ae/_q (change any q to aeq) — >ae/_ +NASAL ( becomes ae before any nasal) — >£e/_ +VELAR ( becomes ae before any velar) low FI — >high FI (lower the vowel) high F2 — dow F2 (back the vowel) These constraints vary from highly item-specific (mapping sink to sank) to highly gen eral (lowering and backing a vowel). Generalization also applies across different grains of analysis: pseudo-morphemic groupings (ink — >ank), natural classes of phonemes (Velars, Nasals), vowel quality, and presumably many others. An important aspect of using constraints that span different grains of analysis is that the model does not tend to incorrectly overgeneralize: it could leam brought, whereas a more narrow set of constraints would have difficulty blocking *brang. The model also generalizes well: it can produce regular nonce words such as wugged while at the same time producing the past tense of spling as splang. A further appeal of this approach is that it allows for high-ranked word-specific constraints to account for cases in which the more general [ing — >ang] constraint does not apply (e.g., brought vs. *brang). 132 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. These constraints can be ranked in an OT tableau based on their applicability across the entire range of past tense forms. The result is a mechanism that is able to generalize both the rule-like default -ed ending, and the many quasiregular patterns also present in this system. A portion of the resulting grammar is illustrated below, simplified from Boersma (1998b): bring + PAST bring — ♦brought — > ae/__q lower F2 Verb — *Verb+( )d a. B 5 S * brought * * b. brang *! * c. bringed *! * * The similarity of this approach to Connectionist Phonology should be clear; both use the same principles of generalization to account for quasiregular behavior in phonology: the language learner acquires constraints across multiple grains of gen eralization, and assigns weights (or rankings) to these constraints in order to arrive at an optimal level of generality. However, in OT this can only be accomplished by setting aside strong assumptions of constraint universality, since the theory instead as sumes that the language learner can acquire the content of constraints based on patterns present in the input available to it. The learner can then use the usefulness of the many resulting constraints to determine their place in the grammar. I argue that the appli cation of connectionist principles to the OT framework is a good thing, as it further serves to reinforce the neural underpinnings of the Optimality framework. In addition, it moves forward the OT framework by allowing it to account for a broader set of data than was previously possible. That said, I would argue that there is at least one important distinction between the connectionist approach and the one in Boersma (1998b), related to how constraints are learned. While the number of plausible constraints to derive English past tenses might 133 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. seem limited, the mechanism by which this model actually arrives at a narrow set of possibilities for a given form, as in (1), remains unresolved. There are in fact many different generalizations that could be inferred from even a single alternation. For example a learner that encounters bake — > baked can infer many different constraints for deriving past tenses: (3) bake — > baked ejk — > ejkt Vk — Vkt k — kt V — ► baked [vel] — ► [vel]-t [lab][-hi][obs,vel] — ► [lab][-hi][obs,vel]-t [-hi -low][obs,vel] — > [-hi -low][obs,vel]-t This is only a small set of the many possible constraints that can be inferred from this single present tense-past tense pair. As the learner is exposed to a greater number of forms, the list of constraints continues to increase (an inevitable pitfall of allowing for item-specific constraints). In addition, the learner cannot discard specific constraints without a clear mechanism by which this could be done. Since all constraints are intrinsically violable, putting aside ones that are often violated is inadvisable. An important consequence of allowing the set of constraints to grow is that the process of learning a ranking will tend to become increasingly more difficult as a result of it. This is because constraint ranking mechanisms learn by making adjustments to 134 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. each constraint’s ranking on an form-by-form basis, given the violations that a specific form incurs. When the number of constraints that need to be ranked is very large, this task becomes increasingly complex due to the amount of time and computational resources involved in learning it. That is, the number of constraints that a mechanism can leam from a representative sample a language can quickly become extremely large; this has a negative impact on a learning mechanism that requires all constraints to be evaluated with respect to a given form in order to converge upon a correct grammar (e.g. Tesar & Smolensky, 1996). Connectionist Phonology avoids the problem of ‘too many constraints’ by imple menting constraint satisfaction in a markedly different type of mechanism. Rather than explicitly encoding each constraint, neural networks use connection weights that implicitly encode generalizations made by the network as it learns a given task. The network acquires these generalizations as it is exposed to different forms in a language. However, the range of what the network can leam can be limited by the information that it has at its disposal and its computational capacity. An interesting consequence of this is that it can explain why not every type of generalization will occur in these mod els - only ones that tend to be reliable. That is, constraints that apply frequently and that do not involve many exceptions are more easily represented in a set of connection weights. Conversely, networks disprefer statistics that are less general or reliable; a statistic such as words that are three syllables long, begin with [k], end with [stj, and have at least two non-low vowels take medial stress will not tend to be as reliable as stress the rightmost superheavy syllable because the former will not hold for many words, and 135 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. might tend to have many counterexamples. Such constraints are therefore less likely to be inferred by the network, which learns by making small changes to connection weights over the course of training based on exposures to different inputs. Statistics that are not consistently supported by the words in the network’s input will not tend to be learned. Work by Albright and Hayes (1999) shows promise in adapting this connectionist type of learning to the symbolic approach. They propose that a rule-leaming mech anism that has access to an extremely broad set of generalizations can nevertheless remain tractable using a mechanism they call ‘impugning.’ As their model learns a ranked set of rules, it also sets aside ones that are deemed to be less useful by con stantly reassessing their generality across known forms. This is done by assigning each rule a score based on the number of alternations that it can account for; this score can be weighted using a statistical information such as type and token frequencies. Rules that score very low are ‘impugned’, or set to the bottom of the ranking where they are least likely to be applied. While not an OT model per se, this mechanism does point to how arbitrary mor phology could be learned in OT, and how the problem of constraint learning could be kept tractable. Here again, I suggest that this is a useful area of inquiry, given its faith fulness to how distributed neural mechanisms leam constraints. While it does violate principles of constraint universality established by OT93, it could serve to strengthen the theoretical relationship between OT and Connectionism. 136 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.2.3 Learning Trajectories in C onnectionism and OT The results of Chapter 2 pertaining to stages in acquisition suggest another benefit of the Connectionist approach over more deterministic learning mechanisms, which is the ability to account for a broader range of language acquisition data. Generative theories of language acquisition suggest that stages of acquisition are the result of qualitative changes to the child’s grammar that apply equally to all words in their lexicon. How ever, the data from Fikkert (1994) suggest that this does not accurately describe how Dutch children actually leam phonology, given the gradience in children’s transitions from one stage of acquisition to the next. The present theory suggests such patterns result from how learning tends to proceed in neural systems, typically in a graded and smooth fashion (Rumelhart & McClelland, 1986; MacWhinney & Leinbach, 1991; Plunkett & Marchman, 1993). The probabilistic learning that occurs in connectionist models seems to conflict with Generative grammar learning accounts that assume stage-like changes in chil dren’s productions reflect sudden and drastic changes in their grammar. This seems particularly clear in OT, where changes in a given constraint ranking tend to result in appreciable changes in the forms that are output by the grammar. The prevailing the ory of learning in OT characterizes acquisition as the process of constraint demotion based on the surface forms that the learner is exposed to. I argue that the result is a system that cannot produce the types of probabilistic behaviors discussed in this work, and which has no clear mechanism for accounting for the range of quasiregularity that exists in languages. In this section I focus on how connectionist learning can be used 137 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. to inform issues of acquisition in OT, and I discuss newer models of OT learning that implement in subsymbolic learning principles within the symbolic framework. One issue with regard to learning in the Optimality framework is the system of strict domination that OT93 uses to encode grammars. Such a system uses a single set of constraints with a specific ranking to assess the grammaticality of an output candi date. The way in which Dutch is acquired casts some doubt on whether strict ranking can be used to describe child grammars; while it can be shown that the individual stages with which Dutch is acquired can be captured using such a grammar, it cannot com pletely account for the way in which these stages tend to overlap (Curtin, 1999, 2000). The way in which learning in the OT93 model is proposed to occur (Tesar & Smolen sky, 1996) could allow for some periods of uncertainty when the rankings of crucial constraints are indeterminate, but it cannot explain the tendency toward regressions to earlier stages. It also predicts that between-stage transitions will be characterized by a random oscillation between the two types of behavior, which again does not seem to be borne out by the empirical data. More recent developments in OT might be better suited to addressing these issues. Boersma and Hayes (1999) have proposed an alternative to the strict ranking princi ples in OT93 in which constraint rankings are proposed to be probabilistic rather than deterministic. The actual ranking of constraints is expressed as areas of probability, al lowing relative rankings to be indeterminate to varying degrees. Learning in this model is different from other OT learning accounts to the extent that it involves making small incremental adjustments to constraint rankings using a ‘Gradual Learning Algorithm’ (Boersma, 1998a). 138 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. This approach allows actual constraint rankings to shift from one utterance to the next, but also bounds the degree to which these shifts can occur. The results are be haviors consistent with different types of utterances (e.g., stress errors consistent with Stages 2 and 3) emerging from a single grammar. A major benefit of this approach is the ability to capture the relative frequency with which different alternate behaviors tend to occur. The Gradual Learning Algorithm also results in a smooth, gradual tran sition from one developmental stage to the next. This has been applied to Dutch stress acquisition and truncations by Curtin (2000). I argue that such a model represents an important advance in how grammars can be conceptualized in OT, exactly because of its similarity to how connectionist networks leam and represent grammars, as a statistically-driven constraint satisfaction system. This model seems to successfully merge OT constraint mechanisms with characteris tics of how distributed neural systems leam and represent grammars, and in so doing broaden the conceptualization of linguistic grammars. 3.3 Summary Optimality Theory is seen as a model of grammar that incorporates certain connec tionist principles into a symbolic mechanism. A major claim of OT has been that this hybrid system allows for the coexistence of the two frameworks, because of how it integrates symbolic grammars with principles of subsymbolic, constraints-based sys tems. This allows linguists to continue to think about grammars as generative alge braic symbol manipulators, while being more faithful to how neural systems represent information. Whether this is in fact the case remains to be seen however. In this 139 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. chapter I have discussed a number of phenomena that appear to be consistent with a distributed connectionist framework but are nevertheless difficult to encode within an OT93 grammar. Several newer proposals have been put forward to account for these and other related phenomena, and I have sought to discuss these with respect to their relationship to connectionist grammars. In this dissertation, I have taken an approach to this that is somewhat comple mentary to OT, by thinking about symbolic grammar as an emergent characteristic of connectionist systems. It takes a hard stance against its characterization as ‘implemen- tational connectionism,’ which suggests such model are merely implementing sym bolic rules within a neural mechanism, rather than doing away with rules altogether (Marcus, 1998; Pinker & Prince, 1988). It suggests instead that characterizations of grammars as algebraic and deterministic might in fact reflect idealizations of data that are better described in symbolic-level systems using characteristics of connectionist systems. Optimality Theory is different in that it attempts to remain faithful to relevant subsymbolic principles while providing linguists with symbol-based descriptions of grammar. The present work supports the OT framework, to the extent that it attempts to understand properties of language as the consequence of connectionist principles. In this work I have attempted to highlight areas of inquiry that appear difficult to reconcile with OT93. The areas that I have focused on are ones in which it would benefit from further accommodating some basic principles of Connectionism. These are statistical learning, probabilistic constraint interactions, and the accommodation of quasiregular domains. I have investigated how these characteristics play out in a connectionist framework, with respect to data from crosslinguistic studies and child 140 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. language acquisition. Next I have turned my attention to work that has made progress addressing these issues within symbolic grammars. The use of probabilistic constraint rankings (and rule orderings), the Gradual Learning Algorithm and statistical learning help to recharacterize grammars not as deterministic symbol manipulation devices, but rather probabilistic constraint satisfaction mechanisms that encode linguistic knowl edge in much more complex ways. Connectionist Phonology differs from some OT claims with respect to where con straints come from. Constraints are not innate, at least not in the strong nativist sense that they are symbols present in the mind/brain before language learning takes place. Instead, constraints emerge from: a) functional characteristics of the human auditory and articulatory systems; b) the cognitive substrate that is used to leam the mappings between them, and the mappings between sound and meaning; and c) the effect that these first two factors have on the statistics of the language input that children receive. This is perhaps not a popular view in the Generative tradition, but is one that is becom ing more widely accepted. For example, Boersma (1998a, p. 461) states: Constraints are learned... not innate. Children start with empty gram mars. Every time an empty category emerges, the relevant faithfulness constraints come into being; each time that the child leams to connect an articulatory gesture to a perceptual result, constraints against such gestures come into the picture To this end, Boersma illustrates at great length how Markedness constraints can be derived from facts about the mapping between articulatory and perceptual drives. He also rigorously shows how a grammar consisting of learned constraints can be learned by merging statistical mechanisms with the deterministic formalisms of OT 141 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. §3.2.3. However, he acknowledges that the issue of how constraints themselves emerge remains partially unresolved: While I have proved the Ieamability of the structure of the grammar (constraint rankings)... the model would put on credibility if it could solve the bootstrapping problem of learning the contents (the constraints them selves). I expect that an adequate model will result from marrying the gradual learning algorithm with a neural categorization model that is su pervised by the semantics and pragmatics of the communicative situation. (Boersma, 1998a, p. 461) The present work contributes to this issue by elucidating how such neural mech anisms go about learning constraints. Markedness derives from functional factors re lated to perception and articulation, but also from characteristics of the distributed neural mechanism that encodes these relationships. Specifically, not all Markedness constraints need to be discovered by the language learner; instead, statistical infor mation that exists in the language being learned can also be used to derive putatively universal constraints. Similarly, Faithfulness constraints can derive from preferences in the types of mappings that Connectionist architectures can leam to encode. 142 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Part II Phonology and Language Disorders 143 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 4 The Influence of Phonology on Morphology - Evidence From SLI In this chapter I investigate how the Connectionist approach to phonology can be in formative as to the relationship between phonological and morphological processing.1 The Generative account has tended to treat morphology as either an independent mod ule of grammar (Aronoff, 1976; Bauer, 1983) or as a subcomponent of a syntactic module (Borer, 1998). The alternative theory that I explore here is one in which mor phology is seen as an emergent property of interactions among more basic types of linguistic knowledge, specifically phonology and semantics (Gonnerman, 1998). This theory holds that morphology is not an isolated module of grammar but instead rep resents the intersection of regularities in correspondences between words’ sounds and meanings. Whereas this type of mapping is typically considered to be arbitrary (but see Kelly, 1992), morphology represents an instance in which alternations in a word’s 'Portions of this chapter are reprinted from Joanisse, M.F. & Seidenberg, M.S., Specific Language Impairment: A deficit in grammar or processing?. Trends in Cognitive Sciences, 2, pp. 240-247, Copy right (1998), with permission from Elsevier Science. 144 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. phonological form consistently signal a change in its meaning. For example, adding Is/, frj or llzl to the end of a noun signals plurality. In this chapter I explore how the connectionist theory of phonology can lend useful insights to this account of morphology. The primary source of data to this effect comes from the study of Specific Language Impairment (SLI). It is proposed that deficits in grammatical morphology in these children are the result of a phonological deficit, and are explained by the importance of phonology in the development of morphological representations. SLI is observed in children who fail to acquire age-appropriate language skills de spite being apparently normal in other respects. By definition, children with SLI have no obvious hearing, cognitive, or neurological deficits, yet they leam to talk relatively late. When they do begin to talk they produce fewer utterances than expected for their age and intelligence; and they exhibit deficits in several aspects of language including phonology, morphology, and syntax. The fact that these children are also impaired in comprehending language suggests that their problem is not merely a peripheral one related to the production of speech. SLI has attracted considerable attention as a source of evidence about the biological and genetic bases of grammar. This is because a central issue in language acquisition research is explaining how a child can acquire language in a relatively short period of time, given the complexity of language and the nature of the input to which children are exposed (Chomsky, 1965; Gold, 1967; Chomsky, 1986; Pinker, 1989). As discussed in Chapter 2, the Generative view is that the child is exposed to impoverished inputs, and therefore language is only leamable because knowledge of grammatical structure 145 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. is innate. Some researchers have thus taken SLI as evidence that specific components of this innate grammatical capacity can be damaged. For example, the fact that the children’s use of past tense morphology is impaired is attributed to a deficit in the morphological component of grammar (Gopnik & Crago, 1991). Likewise, the fact that SLI has a heritable component has prompted further speculation that components of grammar may have specific genetic encodings (Pinker, 1989; Gopnik & Crago, 1991; Gopnik, 1997). Pinker, for example, has suggested that “the syndrome shows that there must be some pattern of genetically guided events in the development in the brain...that is specialized for the wiring in of linguistic computation.” (Pinker, 1991, p. 324) I argue that a main question about SLI is whether it is, in fact, a grammar-specific impairment. An alternative view is that the grammatical deficits in SLI are sequelae of processing deficits that interfere with language learning and use. In particular, there is good evidence that SLI involves an impairment in processing speech. I advance a theory in which poor speech perception affects the development of phonological rep resentations, and that these degraded phonological representations are the proximal cause of deviant acquisition of morphology and syntax, in virtue of their roles in learn ing and working memory. This view differs from the generative interpretation of SLI, but is consistent with an older clinical tradition in which developmental language im pairments have been recognized as dysphasias that are often accompanied by deficits in perception and learning (Benton, 1964; Ewing, 1930). 146 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.1 Grammatical Impairments in SLI Generative accounts of SLI have focused on deficits in morphology and syntax in these children. For instance, children with SLI have difficulty producing and comprehend ing morphologically complex words, such as past tenses and plurals in English (e.g. baked, books). These children do understand the concepts of pastness and plurality, but their ability to express these concepts using grammatical morphemes is impaired. This phenomenon is not limited to English; work in the past 20 years has revealed that SLI speakers of other languages exhibit similar impairments in using other as pects of morphology. These include case marking in Hebrew (Dromi, Leonard, & Shteiman, 1993; Leonard & Dromi, 1994; Rom & Leonard, 1990), grammatical as pect in Japanese (Fukuda & Fukuda, 1994), compound words in Greek (Dalalakis, 1994), and agreement in Italian (Leonard, Bortolini, Caselli, McGregor, & Sabbadini, 1992). Several accounts have been proposed to explain the morphological deficit in SLI. Rice and Wexler (1996) suggest that these children are missing the abstract grammat ical principle of inflection, which is necessary for determining linguistic relationships such as subject-verb agreement and grammatical case assignment. As a result, these children fail to proceed beyond an early ‘optional infinitive’ stage in acquisition, dur ing which the application of inflectional rules is not obligatory. On this view, their errors follow from a lack of knowledge that morphological marking is obligatory. To support this theory, Rice & Wexler have put forward evidence indicating that children with SLI produce morphological errors that reflect a failure to apply morphological 147 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. features like Tense and Aspect. As such, children learning English tend to produce in finitival verbs in inappropriate environments (*Yesterday I eat two cookie,) and never produce the opposite pattern {^Tomorrow I ate a cookies.) Children with SLI produce these same types of errors, even later in development. Rice and Wexler (1996) suggest that this delay actually reflects a deviant developmental trajectory in which children with SLI fail to develop past the ‘optional infinitive’ stage of acquisition. A different account of this morphological deficit was proposed by Pinker and Gop nik, who assert that morphological impairments in SLI derive from an inability to leam the rules of inflectional morphology (Pinker, 1991; Gopnik, 1997). Because they lack the capacity to formulate rules, SLI children can only leam morphological marking through rote learning of individual inflected words. This account is consistent with the observation that children with SLI occasionally produce correctly-inflected forms for familiar words (such as baked) as well as irregular forms (such as took) but perform poorly when asked to generate inflected forms for novel words (such as wug) (Gopnik & Crago, 1991). On this account, SLI provides evidence that language involves rules, that this rule-forming capacity can be congenitally impaired, and that the deficit may be genetically transmitted. 148 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. All these accounts share the idea that SLI involves impaired grammar. However, there is disagreement among them concerning the nature of the impairment, specif ically the incidence of different types of grammatical deficits, their relative frequen cies and how often they co-occur, and whether other aspects of language are also af fected. SLI is said to involve selective impairments to specific components of gram mar, but few studies have looked equally carefully at a broad range of linguistic and non-linguistic abilities in the same subjects. In this chapter, I consider the issue of morphological impairments in SLI. In par ticular, I explore the theory that impaired phonological processing can explain these deficits, based on the role of speech perception in phonological development, and the importance of phonology in acquiring and using morphological alternations. In Chap ter 5, I will explore how this same theory can be applied to the syntactic impairments in SLI, based on the role of phonology and speech perception in processing sentences. 4.2 Perceptual Deficits in SLI It is clear that SLI children’s behavioral impairments extend well beyond grammar. In particular, there is considerable evidence that they have subtle impairments in speech perception. In several studies, they performed poorly on tasks that require discriminat ing phonological features such as consonant voicing and place of articulation (Elliott, Hammer, & Scholl, 1989; Stark & Heinz, 1996; Sussman, 1993). Children with SLI fail to show the normal categorical perception effects associated with such stimuli. For example, Joanisse, Manis, Keating, and Seidenberg (2000) compared 8-year old children with SLI to normal controls on two separate speech continua. The first 149 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. continuum manipulated voice onset time (VOT) by cross-splicing the words dug and tug such that words with shorter VOTs should be perceived as dug whereas longer VOTs would be perceived as tug. A second continuum manipulated the perceived place of articulation (POA) of the words spy and sky by manipulating the onset frequency of the second formant (F2) after the /s/ consonant. Words with lower F2 onsets should be perceived as spy, while ones with higher F2 onsets should be perceived as sky. Figure 4.1 illustrates that normally-developing children demonstrated clearly cate gorical identification profiles on this task, grouping both endpoint and midpoint stimuli within a single speech category. In contrast, the language impaired children showed clearly less categorical behavior on the same stimuli, producing fewer consistent re sponses at both the endpoints (as in the POA task). Whereas grammar-based approaches treat this deficit as unrelated to the chil dren’s language impairments (Gopnik, 1997), the alternative account holds that it is their proximal cause: SLI children leam language deviantly because they misperceive speech. I explore in this chapter why such a perception deficit might result in the types of morphological deficits that are frequently observed in these children. 4.2.1 Possible Bases of Perceptual Deficits The basis for a speech perception deficit in SLI is unclear. I argue that the present approach to understanding SLI does not turn on the exact specification of a speech perception deficit, and that it is entirely possible that different language impaired chil dren will show slightly different types of perceptual impairments. This is perhaps for 150 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. e o C D CO c o Q . CO CD cc id CO c o 1= o Q . O 1 . 0 0 .9 0.8 0 .7 0.6 0 .5 0 .4 0 .3 0.2 0.1 O Language Impaired □ Normal Readers m - ... ® " 1 1 0 0 1 2 0 0 1 3 0 0 1 4 0 0 1 5 0 0 1 6 0 0 1 7 0 0 1 8 0 0 F2 onset frequency (Hz) 1.0 ..-m CO 0 .9 CD C 0.8 o CL 0 7 </) u ' O CC 0 .6 § 0 .5 ^ 0 .4 o 0 .3 § - 0.2 0.1 O ----- Language Impaired □ ...... Normal Readers 10 5 0 6 0 7 0 20 4 0 8 0 3 0 VOT (ms) Figure 4.1: Speech categorization profiles of language impaired children, compared to normal controls of the same age, on two speech continua. Data are from Joanisse et al. (2000). 151 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the best, given that there is still a good deal of disagreement among researchers as to the specification of a speech perception deficit in SLI, which I briefly review below. Tallal and colleagues have proposed that the impairment involves the processing of rapid, sequential information (Tallal & Piercy, 1974; Tallal, 1990; Tallal et al., 1996; Merzenich et al., 1996). Spoken language involves perceiving a complex, rapidly changing and fast-fading auditory signal, and thus an impaired capacity to resolve temporal aspects of this signal would greatly interfere with learning language. Tallal’s theory predicts selective impairments in perceiving speech sounds that rely on short (less than 50 ms), transient acoustic cues such as the VOT cue for voicing, and rapid formant transition cues for consonant POA. It also predicts that speech sounds that are discriminated by longer acoustic cues (longer than 100 ms) such as vowels and fricatives should be unimpaired. A second aspect of this theory is that this is not a speech-specific deficit, but in stead reflects a specific neurological impairment that is generalized to many areas of perception. To this end, studies have identified impairments in perceiving rapid stim uli in the visual and tactile modalities in language impaired children (Tallal, Miller, & Fitch, 1995). In addition, work in this area has suggested that the language abilities of children with SLI can be improved by adaptively training them to perceive rapid and sequential auditory signals, including speech and non-speech sounds (Tallal et al., 1996). Research by Tallal and colleagues has generated considerable interest but it has also raised many methodological and theoretical questions and it continues to be the 152 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. focus of intensive investigation. There is little consensus as to the exact characteriza tion of this perceptual deficit, and there may be considerable variability within the SLI population with regard to it (for a review, see Chapter 3 of Bishop, 1997). In addition, processing deficits similar to those described by Tallal have been observed in chil dren whose language is not impaired; Kraus et al. (1996) showed that both a group of children with SLI and a group of learning-impaired children with no language difficul ties had aberrant evoked response potentials (ERPs), recorded from scalp electrodes, consistent with a deficit in perceiving rapid sensory information. Similarly, Ludlow, Cudahy, Bassich, and L. Brown (1983) observed a deficit in perceiving rapid auditory information in both children with SLI and hyperactive children who had no observable language impairment. Thus, if this deficit causes SLI, it is unclear why some children who have it do not develop impaired language. Another challenge for the neural timing hypothesis is evidence that SLI children are also impaired in discriminating speech sounds that are not differentiated by rapidly changing acoustic cues, such as vowels and fricatives (Stark & Heinz, 1996). This sug gests that they have problems perceiving acoustic differences between sounds rather than processing short rapid stimuli. 4.2.2 How Common are Perceptual Deficits? Some researchers have failed to observe abnormal speech perception in children with SLI, raising further questions about its relevance to their language impairments. Such null results need to be interpreted cautiously, however, given that tasks for assessing 153 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. speech perception vary in their sensitivity. A serious concern is whether the tasks that yielded null results provided adequate tests of the children’s perceptual capacities. For example, Gopnik and Goad (1997) tested language impaired children’s speech discrimination abilities and found no difference between them and an unimpaired co hort. The problem with this study is that it only investigated subjects’ abilities to dis criminate and repeat minimal pairs of words (bat and bad) that were spoken to them by the tester. This task does not capture much of the complexity of perceiving continuous speech and may have been simple enough for even perceptually impaired children to perform. The theory that children with SLI have speech perception deficits does not turn on their ability to identify or repeat short (1- and 2-syllable) words that are spo ken to them. Instead, I argue that their sensitivity to subtle speech contrasts is weaker than normally-developing children, and that as a result they tend to develop aberrant phonological representations for these words. As I will discuss below, difficulties with nonword repetition tasks involving longer (3- and 4-syllable) words have been shown to be a good indicator of SLI in children (Bishop, North, & Donlan, 1996). There is an extensive literature on speech perception impairments in SLI using tasks that provide sensitive measures of subtle aspects of auditory processing (Elliott et al., 1989; Tallal, 1990; Ludlow et al., 1983; Stark & Heinz, 1996; Bernstein & Stark, 1985). These tasks directly manipulate aspects of speech, such as VOT or the duration and onset frequencies of formant sweeps. Studies of this sort have found that, while normally-developing children and adults are able to categorize and discriminate speech sounds along a continuum, language impaired children show deviant profiles. 154 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Some studies of this type have revealed apparently normal auditory perception in some children with SLI, which again could be cause for concern. However, here again the results must be interpreted cautiously. For example, Bernstein and Stark (1985) re-examined language impaired children who had demonstrated abnormal au ditory perception in an earlier study (Tallal, Stark, Kallman, & Mellits, 1980). Their study found that some of these children’s perception impairments had resolved to the point where they were performing at normal levels on perception tasks measuring sen sitivity to rapid auditory signals. In spite of this, the grammatical language deficits in these children persisted. The authors suggested that a language deficit could result from a perceptual deficit occurring at a critical point in language development, even though it would not necessarily be present at a later stage in development. 4.2.3 Phonological Deficits and SLI Granting that at least some language-impaired children have abnormal speech percep tion, how can these deficits be related to their impaired grammar? I propose that the link between the two is provided by phonology. Normally-developing children leam the phonological properties of their target language at a very young age. For example, Jusczyk (1997) describes a wide variety of experiments in which infants show sig nificant sensitivity to their language’s phonemic inventory, phonotactics and prosodic structure well before 12 months of age. They appear to be learning this phonological information by attending to the complex properties of speech, using statistical infor mation such as phoneme and syllable concurrence in connected speech (Saffran et al., 1996). 155 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Impaired perception of speech could therefore interfere with the development of phonological representations, by reducing children’s ability to leam meaningful gen eralizations about the phonological properties of their target language. This in turn affects other aspects of grammatical development due to the importance of phonolog ical information in learning other aspects of language. This idea is not new, and bears some resemblance to Leonard and colleagues’ ‘surface hypothesis,’ which maintains that children’s ability to leam grammatical morphemes is impaired by difficulties in consistently perceiving them in less salient contexts (e.g., Leonard & Eyer, 1996). Consistent with this account, many language impaired children, particularly those who also manifest syntactic difficulties (Rapin & Allen, 1983), exhibit abnormal phonology as revealed by a variety of tasks. Most saliently, children with SLI tend to misarticulate or delete phonemes from words (Leonard, 1982). In addition, many stud ies have shown that these children are poor at repeating nonsense words, particularly as word length increases (Bishop et al., 1996; Edwards & Lahey, 1998; Gathercole & Baddeley, 1990; Kamhi, Catts, Mauer, Apel, & Gentry, 1988; Montgomery, 1995). In addition, language impaired children also show deficits that do not seem to be related to speech production, as in so-called phonemic awareness tasks. For example, they have difficulty identifying words with similar phonemes in a word game, such as deciding whether Sam is more like sock or ball (Bird & Bishop, 1992). Other studies have measured their ability to analyze a word into its constituent segments. Joanisse et al. (2000) tested 8-year old language impaired children on a phoneme deletion task, in which they were required to repeat a word presented with a phoneme removed (e.g., saying split without the /p/). Results of this study showed poorer performance in these 156 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. children compared to both same-age normals, and younger normals matched for read ing ability. (Importantly, these are the same subjects whose speech perception impair ment is illustrated in Figure 4.1). Other studies have obtained similar results in other types of phoneme segmentation tasks (Kamhi & Catts, 1986; Kamhi et al., 1988). 4.3 Linking Phonology and Morphology The remaining sections of this chapter consider how an impairment in phonological representation could yield the particular kinds of grammatical impairments observed in SLI. It focuses in particular on a well-studied aspect of inflectional morphology, namely the English past tense. As I will demonstrate, morphology involves the inte gration of a variety of types of linguistic knowledge, including phonology. The theory is that a deficit in phonology, brought about by a speech perception impairment, can lead to the types of morphological impairments that are typically observed in SLI. I im plement verb learning in a connectionist model that learns morphology in the service of acquiring meaning-sound relationships. I then demonstrate how learning morpho logical processes can be subverted by a deficit in phonological representation. Past tense formation in English is typically described as a set of three related rules, as follows: If the final phoneme of a present tense verb is a voiceless consonant, then add /t/; if it is a voiced consonant or a vowel, then add /d/; and if it is an alveolar stop (/t/ or /d/) insert an unstressed vowel as well as /d/. (1) a. bake — ► baked /beykt/ rip — ►ripped /r pt/ 157 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. b. try — ► tried /trajd/ file — ► filed /fajld/ c. bait — ► baited /bejt d/ seed — ► seeded /sid d/ The past tense rule illustrates the fact that many morphological processes have im portant phonological components; they do not merely involve concatenating an affix to a base form. There are three allomorphs of the English past tense morpheme, and deciding which form is appropriate for a given verb is determined entirely by the iden tity of the final phoneme. In order to leam and use this rule, children must be able to phonologically analyze the alternation and the conditions under which particular forms occur. Performing this analysis would clearly be more difficult in the face of a percep tual impairment like the one demonstrated in Figure 4.1, based on two major factors. First, this morpheme is perceptually non-salient, because it involves stop consonants that tend to be devoiced and unreleased wordfinally. Second, ill-formed phonological representations resulting from a perceptual deficit would weaken the ability to analyze and leam how subtle aspects of phonology such as the abstract notions of alveolar and continuant features govern the realization of the past tense inflection. Over the past decade, morphological deficits have become the sine qua non of SLI. More recently, research has sought to characterize the extent to which this impairment affects specific types of morphological processes. Earlier work indicated that children with SLI showed clear deficits in using the regular past tense, but much better per formance on irregular forms (Gopnik & Crago, 1991). This was taken as evidence of an impairment that specifically targets morphological rules while sparing other related 158 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. processes like the memorization of irregulars. A major confounding factor in compar ing performance on regular and irregular morphology is that other factors tend to vary across these items, including frequency and phonological complexity. Studies that have better controlled for these factors have in fact found poor perfor mance on regular and irregular morphology in SLI. Data to this effect are presented in Table 4.1. This Table also illustrates a second pattern in the morphology of children with SLI, which is that they tend to be especially impaired at producing the past tenses of nonwords compared to their performance on familiar verbs taking the regular past tense ending. Table 4.1: Percentage correct past tense productions for language impaired children and controls. ____________________________ YN Group CA SLI Regulars 89.4 95.6 22.2 Irregulars 79.2 88.2 17.6 Nonwords 91.6 94.2 6.8 Note. Data are from van der Lely and Ullman (1997) and Clahsen and Almazan (1998). YN: Younger normal comparison group; CA: Chronological-Age Matched younger normal comparison group;SLI: Specific Language Impaired. The picture that emerges from these data is one in which children with SLI demon strate worse than expected performance on all types of past tense verbs. They are especially poor on novel forms, suggesting their impairment is at least partially due to poor generalization. Performance on familiar forms appears to be somewhat better, and is roughly equal for regulars and irregulars. This suggests that these children are producing familiar past tenses from memory, rather than using a rule-like procedure as normally-developing children appear to be doing. 159 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.3.1 M odeling M orphological Im pairm ents One line of evidence consistent with this account comes from a connectionist model of morphological impairments. Hoeffner and McClelland (1993) used a connectionist model to examine the effects of a perceptual impairment on learning English past tense verbs. Their model learned to map the semantics of a verb to its phonological form, and then studied the effects of a phonological deficit on accurately producing past tenses. The perceptual weakness of wordfinal stops was simulated in the network by weakening the connections to the relevant phonemes in the phonological layer, for example, the final alveolar stop in baked. This weakness was not limited to past tense forms. Uninflected words ending in similar phonological sequences also had less salient endings, as in fact. The authors found that the network was able to learn past tense verb to a high level of precision, even when word endings were less salient. The results were dif ferent, however, when SLI was simulated in this network by degrading the input to the phonological layer as a whole (simulating generally poor perception). The authors found that, like children with SLI, this impaired network had difficulty applying the past tense rule to verbs, due to the differential effect that degraded phonological rep resentations had on the network’s ability to produce inflected forms. The network’s deficit appeared to be morphological in nature, since it was less impaired at producing words like fact, which were phonologically similar to past tenses. The authors argued that this was due to the fact that there is competition between related present and past tense forms such as bake and baked, making it more likely that an impaired network will tend to omit a past tense ending. This will not occur for a word like fact, since 160 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. fack is not a competing word in the model’s vocabulary. As such, there is not tendency for the network to default to fack as a result of occasionally misperceiving the ending offact. This explanation fits well with the observation that language impaired children tend to produce many morphological omission errors (producing bake for baked), es pecially when producing the past tense of a nonword, whereas they do not tend to omit phonemes from the ends of uninflected words. The Hoeffner & McClelland simula tion also supports the theory that a phonological impairment can be severe enough to interfere with the task of verb inflection while allowing for better performance on less demanding tasks such as word repetition. Another important observation from this model was that it did acquire some knowl edge of past tense alternations; it could produce some appropriate forms. Similarly, children with SLI are able to produce some morphologically inflected words, they are simply worse than normally-developing children. This is an important difference be tween the present approach, and rule-based theories of SLI (e.g., Gopnik, 1990; Rice & Wexler, 1996); under the latter, there is no clear mechanism by which a past tense rule can be impaired while preserving some past tense forms. If an element of symbolic grammar is impaired, it should be either unusable or completely absent. Of course, some correct past tense forms in SLI could be attributed to alternative strategies like memorization; they might have explicit knowledge that baked is the past tense form of bake (Ullman & Gopnik, 1994) However, this cannot be the whole story. Studies have also tested children with SLI on the past tenses of nonwords like wug, blick and glorp, and have found that, while these children are much worse than controls on such a task, most are still able to produce correct past tenses for some 161 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. of these items. This again suggests that language impaired children are able to show some amount of linguistic competence in the domain of morphology, albeit to a lesser degree than normally-developing children. The Hoeffner and McClelland (1993) model illustrates how this could occur un der the alternative view. In this account, language impaired children’s knowledge of a morphological alternation is imperfect, rather than absent altogether. For example, the Hoeffner & McClelland model tended to produce errors of omission (failing to produce a form where appropriate) rather than errors of commission (producing a form where it is not appropriate), consistent with the behavior of SLI children. It thus produced many errors in which it defaulted to the more basic, uninflected form (e.g., saying walk instead of walked). It also tended to produce a disproportionate number of overgen eralization errors (eated rather than ate), which are occasionally produced by children with SLI (Clahsen, 1987; Leonard, 1993). Here again, the network’s error pattern seems to reflect the fact that it has not failed to learn a rule altogether, but has instead developed aberrant representations of this ‘rule.’ One criticism of the Hoeffner and McClelland (1993) simulation is that it tended to produce more errors on regular past tenses, compared to irregulars. As discussed above, earlier work on SLI had suggested that these children are worse on regular forms, compared to irregulars (Gopnik & Crago, 1991). However, more recent re search seems to suggest that these effects have been due to coincidental differences between regular and irregular verbs. For example, irregular past tenses tend have higher token frequencies than most regulars (the ten most frequent verbs in English are irregular past tenses (Francis & Kucera, 1982)). In addition, articulatory difficulty 162 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. might also play a role in subjects’ performance, given that regular past tenses tend to involve word-final clusters (e.g., baked), while irregular past tenses do not (e.g., took). It appears that children with SLI are more or less equally impaired on regular and irreg ular forms, and are instead much worse at inflecting nonwords (e.g., wugged, blicked, glorped). Since nonwords were not implemented in the Hoeffner & McClelland sim ulation, it is difficult to assess how it would have treated novel forms. In addition, it is also unclear whether the perceptually impaired model’s difficulty with past tenses was attributable to the perceptual weakness of wordfinal stops. Would a perceptual deficit, in the absence of this reduced salience, still result in a morphological deficit? 4.4 A Connectionist Model of Morphology In this section I expand on the theory that morphological knowledge arises as a conse quence of mappings between meaning and sound. I develop a model of English past tense verbs that acquires morphological structure in the process of learning general lin guistic tasks like word production and recognition. This model is then used to address important issues about generalization and the nature of a perceptual deficit in SLI. The model that I develop is similar to the one presented in Joanisse and Seidenberg (1999), which was developed to address data from aphasia. Previous work has indi cated morphological deficits similar to SLI in aphasics with damage to either Broca’s area or regions of the basal ganglia projecting onto it (Ullman et al., 1997). The Joanisse and Seidenberg model explored the possibility that a phonological impair ment could lead to impairments in morphological generalization by training a connec tionist model on English past tense formation, and simulating the effects of damage 163 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. to brain areas responsible for phonological processing. Damage to phonological rep resentations had a larger impact on generalization than on learning individual verbs, a result that is consistent with the morphological deficits observed in aphasic patients. The result of this model illustrates the importance of phonological representations in using morphology, and also suggests that a similar deficit could also impair morpho logical development. To address the question of whether the morphological deficits in SLI are also caused by a phonological deficit, I will investigate the impact of a percep tual deficit on the model’s ability to learn past tense verbs. 4.4.1 Model D etails 4.4.1.1 Architecture Verb learning was implemented in a connectionist model illustrated in Figure 4.2. Words were represented in different ways in the model; the Phonology layer repre sented the sounds of words using a distributed phonological representation. These representations employed a CCW CCC-VC template (C = consonant, V = vowel), in which each phoneme was represented using 18 binary phonological features2, where an activation value of 1.0 represented the presence of a feature, and 0.0 represented the absence of a feature. Words were aligned with the template as follows: Each word’s vowel was aligned with the first V-slot (VV was used for diphthongs such as /aj/ (buy)). Onsets were 2voiced, voiceless, consonantal, vocalic, obstruent, sonorant, lateral, continuant, non-continuant, ATR, nasal, labial, coronal, anterior, high, distributed, dorsal, radical. 164 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ( C le a r ^ ^ ( ^ ^ m a n t i c T ^ ( j S K t J ) ( ^ P h o n o lo g y ^ ) ,, ( Cleanup) OC\ i in ito ^ _ _ _ _ _ _ _ _ _ _ _ — -— ^ ^ 1 — ^ o n ■ mile* 20 units 600 Word units Distributed Features 1 Past Tense unit (CCWCCC-VC) 162 Units Figure 4.2: Model architecture used to leam verbs. aligned with the C-slots from right to left, and coda consonants were aligned with C- slots left to right. The final VC was used to represent the / d/ syllable in words such as tasted. Empty slots were represented by setting each feature to 0.0. So, for example, the word pasted was represented as _pe_st_ d. In humans, speech input consists of acoustic patterns and speech output consists of sequences of articulatory gestures. The present network abstracts away from this important distinction in the interest of simplicity, given the complications involved in accurately modeling how humans leam to map acoustics to articulation (Guenther, 1995; Plaut & Kello, 1999). It is acknowledged that more accurate representations of acoustics and articulation could allow for a better model of how speech is learned, and could also lead to a better specification of how speech is impaired in SLI. The distributed phonological representations used here allowed the model to repre sent degrees of phonological similarity between words, an important property that fa cilitates generalization to other word forms. The development of higher-order phono logical generalizations was further assisted by connecting the Phonology layer to and from a set of ‘cleanup’ units (Hinton & Shallice, 1991; Plaut & Shallice, 1993). These allowed the network to represent non-linearly separable dependencies in phonology (such as the obstruent/sonorant distinction) and made the computation of speech out put a dynamic process in which the model settled into a pattern over a series of time 165 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. steps (Rumelhart, McClelland, & the PDP Research Group, 1981; Plaut, McClelland, Seidenberg, & Patterson, 1996; Harm & Seidenberg, 1999). The purpose of the Semantics layer was to uniquely represent each verb in the model’s vocabulary. To do this, each verb was represented by a unique node in the Semantics layer. One additional unit was used to represent present/past tense seman tics. This type of localist representation fails to capture semantic similarities between verbs that might be crucial for other phenomena (e.g., Gleitman, 1990; Allen, 1997), however it is adequate for the task at hand, which simply requires the network to iden tify a verb independently of its phonological form. For example, it represents the two senses of the phonological form bound as either the past tense of bind or the present tense of bound. As with the Phonology layer, the Semantics layer was connected to a cleanup layer of 20 units, again allowing it to dynamically settle into a pattern over several time steps. 4.4.1.2 Training Procedure and Corpus The training methodology used in this model reflected the fact that speakers acquire knowledge of language in the course of using it for different purposes. For instance, a task such as learning to recognize spoken words can affect the ability to perform other tasks, such as producing spoken words. This aspect of language was simulated by training the network on four tasks that were randomly interleaved over the course of training. The Speaking trials involved activating the semantic representation of a present or past tense verb and producing its phonological form on the Phonology layer. Hearing involved taking the word’s phonological form and activating its representation 166 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. on the Semantics layer. Repeating involved taking a word on the Phonology layer and maintaining its activation over 7 time steps. Transforming was a special case that involved taking a present tense verb’s phonological form, and activating the past tense node on the Semantics; the network then generated the verb’s past tense on the Phonology layer. Because training on all four tasks was interleaved, the model had to find a set of connection weights that would allow it to perform all of these tasks accurately. The training set for the model consisted of a set of 600 monosyllabic verbs, ran domly selected from the set of approximately 1,200 monosyllabic English verbs. Of these, 64 had irregular past tenses, roughly preserving the ratio of regular-to-irregular verbs in English. How well the model performs depends partially on its how well it can represent English phonology. Since a child acquires this knowledge in the course of learning all the words in a language, and not merely learning present and past tense verbs, it is important to expose the network to as rich a source of phonological infor mation as possible. To do this, the present and past tense forms of an additional 594 English verbs were also used for the Repeating trials. This provided the network addi tional exposure to English phonology; while verbs were used in this case, other word types (e.g., nouns and adjectives) could also have been used for this purpose. Words were presented to the network at random, but the probability of present ing a given word was frequency-weighted, based on its log1 0 frequency in Francis and Kucera (1982). In addition, the frequency with which the network was exposed to a given task was also probability-weighted, to simulate difference how often hu mans perform these tasks. Task probabilities were: speaking=20%, hearing=40%, 167 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. repeating=30%, present-past transformation= 10%. Pilot simulations indicated that the model’s performance did not vary significantly based on the exact proportions of trials of each type. Initial weights were randomized with a range of -.01 and .01. A logistic activa tion function was used for all units in the network (range: 0.0 — 1.0). Training was accomplished using the backpropagation through time algorithm (Williams & Peng, 1990). At the beginning of each trial, an item and task were randomly selected, and the appropriate input was presented. During the forward phase, activation propagated throughout the network for 7 time steps; at the end of this phase, connection weights were adjusted based on the discrepancy between observed and expected patterns, using a cross-entropy error measure (Hinton, 1989) and a learning rate of 0.005. It is worth noting that, because of the nature of the different types of training trials, the input layer for some trials was actually the output layer for others. For instance, the input layer for speaking trials was the Semantics layer, whereas this same layer was considered an output layer for the Hearing trials. However, all connections were used for the forward and backward propagation phases of all trial types. For example, activation propagated along connections both to and from the Semantics layer during all trial types. This was also the case for error backpropagation; error signal was passed along all connections in the network for all trial types. 4.4.2 Training Results The network was trained for 1.7 million trials, at which time performance had reached asymptote. The network was then tested on the training sets used in each task it 168 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. learned. Performance for each word was scored phoneme-by-phoneme, using a Eu clidean distance metric to determine the phoneme closest to the network’s output. If any winning phoneme differed from a target phoneme (e.g., producing /bawnt/ when the target was /bawnd/), the word was scored as incorrect. This included producing an empty slot instead of a phoneme, or vice-versa. To assess the computed Semantics for the Hearing task, an incorrect output was scored when the most active unit did not cor respond to the target unit for the verb in question. Semantics errors were also scored when the Past Tense unit was incorrectly activated (activation equal to or greater than 0.5) or deactivated (activation below 0.5). Based on these criteria, accuracy on the training sets was: 99.8% on the Speaking task, 99.5% on the Hearing task, 98.2% on the Repeating task, and 98.5% on Transfor mation. These results suggest that the network was at near-human performance on the training set. Next, generalization was tested using the 20 nonsense verbs used in Ull- man et al. (1997). The model was input the phonological code of the non word verb and the past tense semantics bit was activated; all other Semantic values were ‘unclamped’, allowing their activation to be set by the network. The conjunction of these two types of information provided the basis for generating novel output. Using the same scoring criteria used above, the network produced correct past tenses for 85% of the nonword items. For the three incorrect items, the the network repeated the uninflected stem, a type of error that human subjects also tend to produce, especially children (Clahsen & Almazan, 1998). This again suggested that while the network’s performance was not perfect, it was nevertheless very similar to how younger humans perform on this type of task. 169 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.4.3 Speech Im paired Network To investigate the role of phonological deficits in morphological development, a phonological deficit was implemented in a model identical to the one described in §4.4.1. This network was trained with identical parameters to the first, with the excep tion that a speech perception deficit was simulated in it. This was achieved by adding noise to the activation values in Phonology layer, by multiplying the activation of each node in the layer by a different random value (mean = 1.0, S.D. 0.1). The effect of this deficit was to make it more difficult for the network to build consistent representations of words, due to small perturbations in words’ phonological forms from one instance to the next. At a very general level, this is how one would characterize the effect of a speech perception deficit on phonological representations. This model was also trained until asymptote, which was reached at 2.5 Million training trials. This network was tested in the same way as the unimpaired network, as specified in the previous subsection. At asymptote, the network was performing approximately as well as the intact network on the Hearing and Speaking trials (99.1% and 99.5% accuracy, respectively). However, performance was much worse on Repeti tion trials, where it produced only 82% correct forms. The speech-impaired network’s performance on the Transformation task was also markedly worse than the intact net work, producing 82.2% correct items. Generalization to nonwords was also quite poor, demonstrated only 60% correct on the same 20 nonwords as used above. These results underline two general patterns in the speech impaired network. First, it would seem that this network showed a general developmental delay, thus taking many more training trials to reach asymptotic performance. In addition however, the 170 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. network showed a deviant pattern of development with respect to tasks that place a heavy load on phonological processing. Specifically, it was much worse than expected on repeating a verb and transforming it from present to past tense. Finally, it was much worse than expected at generalization, producing many incorrect past tenses for nonwords. Perhaps not surprisingly, these are exactly the types of tasks that children with SLI demonstrate. To better characterize how the network learned morphology, the intact and speech impaired networks were tested on a transformation task using three test sets: 20 regular verbs, 16 irregular verbs and 20 nonwords. These items have previously been used to test children with SLI and adult aphasics in van der Lely and Ullman (1997) and Ullman et al. (1997). To simulate child performance on this task, the networks were tested at earlier points in development, before reaching asymptotic performance. The speech impaired network was tested after 1.5 Million training trials, an arbitrary point at which it appeared to be performing well above floor, but also below ceiling on all three conditions. To obtain a normal comparison group, the intact model was tested at two points during training. The first corresponded to a chronological age comparison, and was obtained by also testing the intact network at 1.5 Million trials. The second comparison was the intact model at 900,000 training trials, and corresponded to a ‘younger normal’ condition. The results of all three are illustrated in Figure 4.3. By way of comparison, Figure 4.4 replots the data from Table 4.1 or children with SLI, normal adults, and language-age matched controls on the same task (van der Lely & Ullman, 1997). These figures show several interesting patterns. First, the normal developmental trajectory 171 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ■ Regular □ Irregular ■ Nonword Control 1 Control 2 Phon. Deficit Figure 4.3: Comparison of intact and speech impaired networks. The speech impaired network was tested at 1.5 Million training trials. The Control 1 condition corresponded to a same-age control group (tested at 1.5 Million training trials), whereas the Control 2 condition corresponded to a younger-control group (900,000 training trials). 100 ■ Regular □ Irregular ■ Nonword Adult Controls Younger Normals SLI Figure 4.4: Past tense production data from children with SLI and two control groups (from van der Lely & Ullman, 1997). 172 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. for regulars is slightly different than for irregulars; as such, both normally developing children and the intact network perform better on regulars compared to irregulars. This pattern is well attested in the literature, and corresponds to the tail end of a U-shaped learning curve in which performance on irregulars lags behind generalization because of the relative difficulty of learning irregulars on a case-by-case basis, compared to reg ular past tenses that can be produced through a combination of memorization, analogy to similar forms, and generalizations of the -ed rule. The network that is developed in the present work demonstrates a similar profile, suggesting that it learned the English past tense in a similar way to children. The SLI pattern is somewhat similar to this; while performance on familiar forms is much worse than either normal comparison group, the basic pattern of better reg ulars than irregulars appears to hold. One possibility for this is that children have in fact learned the past tense rule to some small degree, and are simply unable to use it to the extent that normally-developing children can. In contrast, nonword performance in SLI shows a deviant developmental profile, demonstrated by their very poor per formance in generalizing past tense to nonwords. The results of the speech impaired network simulate this pattern because of the importance of phonological processes in developing morphology. It is notable that the network is impaired to a large extent on all three types of forms. This is due to the fact that there are no independent mecha nisms for processing regulars and irregulars, or familiar words and nonwords. A single mechanism is involved in processing all of these. However, there is a division of labor between phonological and semantic knowledge in this network, and verbs do differ in the extent to which they load on phonological knowledge. Specifically, nonword forms 173 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. are unfamiliar to the network, and thus rely heavily on phonology, the locus of gener alization. A phonological impairment, such as would result from a speech perception deficit, results in poor generalization to nonword forms. 4.4.4 Discussion of M orphology Models The models presented in the previous section address a more general theoretical issue of how regular and irregular morphological forms are learned, represented and pro cessed. As I have discussed throughout this dissertation, there is at least one theory that holds that regular grammatical morphology is processed using a rule-like mecha nism. This mechanism is categorically different from the one responsible for process ing exceptional forms, which are encoded in a type of associative/declarative memory (Pinker, 1991, 1999; Marcus et al., 1992; Ullman et al., 1997). On this account, SLI is explained as a deficit in processing morphological rules, such that generalization is impaired while memorized forms are left intact. Earlier SLI data indicating worse performance on regulars than irregulars seemed especially compatible with the words and rules account. Children produced more er rors on regular forms because they were unable to apply grammatical rules to words; irregulars were less impaired because they were not rule-derived, and as such these children were not predicted to have difficulties with such forms (Gopnik & Crago, 1991). In contrast, newer data has indicated there is no regular/irregular dissociation in SLI, and that earlier discrepancies could be due to poorly controlled stimuli. Such data might appear to be problematic to the words and rules account, which has in the 174 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. past suggested that regular forms are always processed using rules (Pinker & Prince, 1988; Pinker, 1991), and as such should be more impaired than exceptional forms. More recently however, this account has been revised to include the possibility of memorization of any familiar inflected form. That is, this account might also allow for speakers to explicitly encode words like baked and walked, in the same way that they encode took and slept. This allows the words and rules theory to explain why children with SLI have less difficulty producing past tenses for familiar regular verbs than for novel verbs. They are memorizing as many familiar past tense forms as they can encode, to compensate for the inability to derive these forms from a rule. What I am proposing here is that this is not quite an accurate interpretation of the facts about SLI. There do indeed seem to be two types of knowledge that enter into morphological processing. The first is phonological knowledge, that allows speakers to analyze relationships between the phonological structures of related forms (how does baked differ from bake!) and across unrelated forms (in what ways are the past tense forms baked and snowed similar?) The second type of knowledge is semantics, which again allows speakers to understand relationships between forms (what is the difference in meaning between bake and baked? What is similar about the meanings of baked, walked and snowed!) The model that is presented in this section encodes both these types of knowledge. Phonological similarity is represented using phonemes made up of distributed features; overlap in the semantics of forms are represented in a basic local ist scheme that also encodes the idea of in t h e p a s t using a single unit in the Semantics layer. The net work learned to map the phonology and semantics of verbs by adjusting the weights 175 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. of two sets of connections between them. These, in concert with two ‘cleanup’ layers, encoded both the idiosyncratic mappings between meaning and sound (what phonolog ical form is activated by a given set of semantics), and the relationship between words’ present and past tense forms. For most forms, this second mapping was straightfor ward, and corresponded to a morphological rule such as the one in (1). As a result, the network learned to generalize this mapping, a way of compressing what is learned to maximize its computational resources. One major characteristic of this is that the net work was able to generalize the past tense pattern to novel forms like blick. In contrast, some forms failed to fit this generalization, namely irregular forms, and for these cases it was instead necessary for the network to explicitly encode the mapping between a verb’s past tense semantics and its past tense phonological form. This connectionist approach differs from the words and rules theory in important ways. The main difference is that it does not posit discrete ‘routes’ or ‘mechanisms’ that encode different forms in the network. Instead, differences in how forms are com puted are due to the degree to which they rely on different types of knowledge encoded in the network. Rule-like behavior is promoted by phonology because of how it allows the network to generalize across forms based on phonological similarity (words ending in specific classes of phonemes take a specific -ed allomorph). ‘Memorization’ of indi vidual forms is promoted by weights connecting the semantics and phonology layers, allowing the network to override the generalization process when a specific familiar form is encountered. Note that a form does not need to have an irregular past tense to be encoded in this way; the network will tend to explicitly encode forms to the fullest extent that its computational resources allow. 176 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Now, consider how the same network learned this task when words’ phonological forms were distorted, as would occur in the case of a speech perception deficit. Under these circumstances, the network had a reduced ability to compute phonological simi larities between present and past tense forms of a word, and across all past tense forms. This was due to the fact that a word’s phonological form was slightly different each time it is presented to the network, making it difficult to leam generalizations about past tense morphology. An important observation about the resulting deficit is that the network was not specifically impaired in any single aspect of morphology; the speech impaired net work was not at ceiling or at floor on regular, irregular or nonword forms. All forms were impaired to some extent, relative to the intact network. However, nonword forms were the most impaired, because they relied exclusively on phonological knowledge to transform a nonword into its past tense. Regulars and irregulars were less impaired because the network had the opportunity to develop word-specific representations of these forms. This seemed to be especially the case for more frequent forms, due to the importance of frequency in the development of arbitrary (non-generalizable) informa tion in distributed networks like this one. 4.4.5 Toward a Broader Typology of M orphological Im pairm ents In this section I have presented evidence for the importance of phonological repre sentations in the development of grammatical morphology. In addition however, the approach that I advocate raises several important observations about the nature of mor phological and perceptual impairments in children with SLI by tying it to a broader 177 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. theory that treats morphology as emerging from the interaction between phonology and semantics (Gonnerman, 1998). Different morphological impairments are possible in this model, based on both the nature of the deficit, and the cognitive centers that it (primarily) affects. A basic typology is illustrated in Table 4.2. In this chapter I have considered only the case of morphological deficits in SLI, which are proposed to be caused by a phonological impairment. However, other modeling work on morpho logical deficits also seems to bear out this typology. Below, I discuss these data with respect to how they are also informative to the connectionist model of morphology presented here. Table 4.2: Typology of morphological impairments. group population etiology morphology profile SLI PD, Broca’s AD, Wernicke’s developmental adult aphasia adult aphasia phonology/perception phonology semantics poor generalization poor generalization poor irregulars Note. AD: Alzheimer’s Disease; CVA: cerebrovascular accident; PD: Parkinson’s Dis ease; SLI: Specific Language Impairment. Morphological deficits found in some patients with Broca’s aphasia and Parkin son’s disease seem to correspond in interesting ways to those found in SLI. As I dis cussed in the previous section, Ullman et al. (1997) found that these types of patients had a greater difficulty generating past tenses for nonwords compared to irregulars. The authors interpreted their results in terms of damage to memory systems subserving grammatical rules. However, Joanisse and Seidenberg (1999) explored the alternative view in which patterns of morphological impairment are due to the effect of difficul ties processing phonology. To this end, we used a connectionist model that learned 178 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. to produce and recognize present and past tense verbs by mapping their sounds and meanings, similar to the one in Figure 4.2 but with separate Speech Input and Speech Output layers in order to better simulate the separability of articulatory and auditory representations. Broca’s aphasia and Parkinson’s disease were simulated in this network by intro ducing artificial lesions into the fully trained model. These lesions were obtained by adding Gaussian noise to the network’s Speech Output layer and removing a propor tion of the connections between the Speech Output and cleanup layers. The result was a phonological deficit that impaired the network’s ability to generate a past tense verb from its present tense form. This deficit seemed to be most severe for nonwords, be cause they did not have semantics to support them. Irregular verbs were much less impaired (though not completely intact). Regular past tenses were slightly worse than irregulars, though this difference was not significant. Together with the model presented in the previous section, these results suggest that impairments to morphological generalization in adult aphasics and developmen- tally language impaired children are the result of impairments to similar cognitive abil ities. The nature of the deficit is similar across the two populations; in developmental cases it seems to result from a perceptual deficit, or else a processing limitation that is specific to phonology. In the case of aphasia, the deficit is the result of cerebral insult or neurodegeneration affecting brain regions known to be involved in phonological processing. 179 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The model of morphology that I propose also predicts that semantic deficits will have an effect on morphology, though to different forms from those caused by a phono logical problem. As I indicated above, semantic information is especially important in identifying forms that take irregular inflections. A semantic deficit could therefore result in a deficit in processing irregular verbs. Data from Ullman et al. (1997) are again informative in this regard. These authors studied patients with Alzheimer’s dis ease and Wernicke’s aphasia who tended to show semantic difficulties such as anomia. Such patients demonstrated a different pattern of impairment that specifically targeted irregular verbs, while leaving both regulars and irregulars somewhat intact. The present account explains this pattern as the result of semantic impairments that interfere with the ability to recognize verbs that take irregular past tenses. Since generalization does not tend to be as important to these words, phonology is less use ful to them compared to regulars and nonwords. As a result, patients with semantic deficits tend to have difficulty producing irregular past tenses. This was captured in the Joanisse and Seidenberg (1999) model by lesioning the network’s semantic layer (simulated by adding Gaussian noise to activations). Applying this semantic deficit to the fully trained network resulted in poorer than normal performance on all verb types, but had a much stronger impact on irregulars compared to regulars and nonwords. It is unclear what population represents the developmental corollary to AD and Wernicke’s aphasia. It has been suggested that Williams Syndrome (WS) represents such a case, though as I discuss below, this remains unclear. WS is a genetic dis order that affects brain development due to abnormal calcium metabolism (Bellugi, Lichtenberger, Mills, Galaburda, & Korenberg, 1999). Affected individuals exhibit IQ 180 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. scores in the range of 50-70, and typically show profound cognitive impairments in their visuospatial abilities, numerical cognition and problem solving. In spite of this, individuals with WS tend to have mental-age appropriate language processing abili ties. Most notably, WS patients show relatively good use of grammatical structure and good processing of complex syntactic structure. Instead, the primary language deficit in WS appears to be in the domain of semantic or lexical processing. With respect to morphology, it has been observed that individuals with WS have impaired devel opment of irregular past tense verbs, while demonstrating good regular morphology (Clahsen & Almazan, 1998). Nevertheless, there are problems with this account. It is unclear whether WS does in fact represent a semantic impairment. Studies investigating lexical and semantic processing in WS have tended to find that these individuals have semantic repre sentations that are appropriate for their general cognitive and linguistic development (Thomas & Karmiloff-Smith, 2000). This suggests that language difficulties in WS are not caused by a core semantic deficit, but instead by a more general developmental delay. This is further supported by the finding that the development of irregular verbs (along with regulars and nonwords) in WS appears to closely follow that of normally- developing children (Thomas et al., in press). As I discussed in §4.4.3, performance on irregulars tends to lag behind that of regulars. This finding suggests that poor irregulars in WS does not represent a deviant developmental trajectory, but is instead the consequence of a general delay in acquiring language. 181 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The question then remains whether any developmental profile exists in which the semantic impairment hypothesis can be tested. One possible population is children with comprehension-specific reading deficits. These children demonstrate poor com prehension of written language in spite of age-appropriate decoding skills (Strothard & Hulme, 1994). This is a different pattern than is found in dyslexics, whose com prehension deficits seem to be subordinate to their poor decoding skills (Stanovich, 1988). One hypothesis is that poor comprehenders’ reading problems result from se mantic memory impairments that limit their comprehension ability. Nation and Snowl- ing (1999) tested this hypothesis using a task that primed the semantics of category coordinates (CAT-DOG, P L A N E -T R A IN ) and words related through function (B R O O M - FLOOR, SHAM POO-H AIR). Their results indicated that poor comprehenders had worse than expected priming for some category pairs. They interpreted their results as indi cating that these children have disordered semantics compared to normally developing children. A prediction from the present model of morphology is that these children might also demonstrate problems with irregular past tenses. This is due to the importance of accurate semantic representations in the identification of verbs that take idiosyncratic past tenses. To my knowledge, no studies have been done to test this hypothesis. How ever, given children with a sufficiently strong semantic deficit, it is a good possibility that they will demonstrate this type of morphological impairment. 182 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.5 Phonology in SLI: Crosslinguistic Evidence The theory of SLI that is presented here is by no means a novel one; many other re searchers have also sought to implicate perception and phonology in the grammatical morphology deficits observed in these children. The novelty of this approach is instead the use of a connectionist model of language learning to better understand how deficits to perception and phonology interfere with learning a language’s morphological pat terns. One emerging area of inquiry into SLI is the crosslinguistic approach, which af fords us deeper insights into how a language’s use of morphological systems can affect the nature and extent of grammatical deficits in children. In this section I consider how these crosslinguistic data lend further support to the idea that morphological deficits in language impaired children actually derive from phonological difficulties. 4.5.1 Phonological Salience and Com plexity One prediction of the phonological deficit approach is that not all morphemes are equally susceptible to impairment. Instead, certain ones will be made more vulner able by the fact that they are relying to a greater extent on perceptual salience and phonological analyses to leam and use them. There is some evidence to support this assertion. In one study, Italian children with SLI were tested on their ability to use four dif ferent inflectional morphemes that varied in their perceptual salience (Leonard et al., 1992). The third person plural (3pp) agreement marker tends to occur in wordfinal, unstressed positions, meaning that it has reduced perceptual salience. Performance on 183 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. this morpheme was compared to other Italian morphemes that occur in acoustically more salient positions, like the plural, third person singular (3ps) and adjectival mark ers. As predicted, children with SLI showed much poorer performance on the less salient 3pp morpheme (50%, compared to 82% correct for MLU controls), whereas performance on the other three morphemes was nearly identical to MLU controls (plu rals: 87% SLI, 89% MLU; 3ps: 93% SLI, 93% MLU; adjectival: 97% SLI, 99% MLU). The same effect was found for articles and clitics in Italian. These free-standing grammatical morphemes also tend to occur in non-salient positions, because they do not tend to be stressed, and their vowels are relatively short in duration. Thus, Leonard et al. (1992) found much poorer performance in the SLI group for both articles and clitics (41% and 26% correct, respectively), compared to MLU controls (83% and 66% correct). In a different study, Leonard (1987) investigated how English and Italian children with SLI differ in the extent of their problems with inflectional morphology. The authors studied grammatically similar (but phonologically different) morphemes in the two languages, including the plural, third person singular, and past tense markers. Rather than eliciting individual forms from children, they studied how these children used these morphemes in spontaneous speech. A primary finding from this study was that English and Italian children differed in the extent to which they were able to use equivalent morphemes in their language. For instance, Italian children were more likely to correctly use plural nouns (94% correct) and third person singular verbs (92% 184 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. correct) in obligatory contexts, compared to English speakers (63% and 8% correct, respectively). The authors suggest that these effects owe to differences in the way in which the two languages use allomorphy. There are a number of surface forms for English plu rals (/s/ as in trucks', Izl as in cars and / z/ as in busses), and they are triggered by the phonological characteristics of the noun’s wordfinai phoneme. Regular plurals in Italian involve a somewhat simpler system in which the noun’ s final vowel is changed, such that masculine -o becomes -i, feminine -a becomes -e, and masculine or feminine -e becomes -i. So for example, libro (book) becomes libri, and palla (ball) becomes palle). What makes regular Italian plurals easier to learn and use than their English equiv alents is that their allomorphy does not involve a specific phonological process. Speak ers need only identify the consistency with which a noun’s wordfinai vowel is changed when it is pluralized. English plural allomorphy is more complicated because it re quires the user to analyze the phonological structure of the final phoneme of a present tense noun and assess the commonalities between them. For example, the nouns dogs and bows both take the /z/ allomorph, but end in very different phonemes (a voiced velar stop, and a vowel, respectively). Learning how plurals are used in English thus involves inferring a phonological rule that derives allomorphs. A child with a phono logical deficit would clearly have more difficulty learning a morpheme in this situation. 185 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.5.2 M orphological Frequency and D ensity The studies by Leonard and colleagues provide strong evidence that children with SLI are impaired in learning aspects of morphology that lack perceptual salience. However, it is clear that other factors must be relevant as well. Consider, for example, the s morpheme in English, which is used to mark both plural nouns (cat-s) and third person singular verbs (bakes). Leonard (1987) found that children with SLI were much better at producing it as a plural noun marker (79% correct) than as a third person verb marker (7% correct). This effect cannot be solely due to perceptibility because the two morphemes are phonologically identical and occur in similar phonological contexts. However, the two do differ greatly in terms of how often they occur in everyday usage. Plural marking in English is more frequent than 3ps agreement, occurring 26.7% of the time in adult language. 3 Because children with SLI have many more exposures to the plural than the third person marker, their ability to leam these forms is enhanced. An interesting difference between SLI in Italian and English is the fact that En glish speakers were much worse at third person singulars (English SLI: 7%; Italian SLI: 92%, (Leonard, 1987)). The source of this difference appears to be a special case of frequency, specifically what I would call the relative density of Italian morphol ogy compared to English. Under most circumstances, English verbs are unmarked for agreement, with the exception being 3ps (I bake, you bake, he/she bakes, we bake, you(pi) bake, they bake). As a result, any agreement morpheme is only rarely encoun tered by the language learner, some 4.3% of the time in adult speech. 3These frequency data are drawn from Francis and Kucera (1982), and represent how frequently the plural or third person (-s) form occurs, as a percentage o f overall noun or verb frequencies in the database. 186 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In contrast, Italian subject-verb agreement is obligatory, forming a dense morpho logical paradigm in which all six possible number-person combinations take a specific verb ending (e.g., ‘to sleep’: dormo, domii, dorme, dormiamo, dormite, dormono). As a result, Italian language learners have an increased opportunity to learn about the exis tence of morphological patterns. Unlike English, a verb’s ending changes consistently with grammatical context, offering a stronger signal to the existence of a morphologi cal paradigm, and presents the child with more opportunities to learn it. Data from Hebrew lends further support to this argument, because of how broadly it uses inflectional morphology. Hebrew morphology is especially dense because of how inflection is realized in words. Unlike languages that use concatenation (adding prefixes and suffixes to roots) to form morphologically complex words, Semitic lan guages represent roots as patterns of consonants to which sequences of vowels are inserted to form an inflected word. It is not possible to realize an uninflected root on its own; Hebrew phonology requires vowels in syllable nuclei. As such, language users are consistently using and hearing morphologically inflected forms. Another characteristic of Hebrew is that there are a number of different consonant and vowel patterns (or binyans) that a given root can belong to. A number of these are illustrated in Table 4.3. Because it is so highly inflectional, Hebrew would seem to be an ideal test case for morphological deficits in SLI. Leonard and colleagues have studied the spontaneous use of such forms in Hebrew speaking children with SLI, both in experimental and nat uralistic settings. An interesting result was that these children failed to show profound 187 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 4.3: Examples of Hebrew verb inflection patterns for three binyans (from Leonard, 1998)._____________________________________ __________ Pa ’ al Pattern Pi 'el Pattern Hitpa ’el Pattern r-x-v b- -1 1-v- ‘ride’ ‘cook’ ‘wear’ Present Masc. Sing. roxev meva el mitlabe Masc. Plu. roxvim meva lim mitlab im Fern. Sing. roxevet meva let mitlab et Fern. Plu. roxvot meva lot mitlab ot Past 3 Masc Sing. raxav bi el hitlabe 3 Fern. Sing.. raxva bi la hitlab a 3 Plu. raxvu bi lu hitlab u difficulties with such forms, compared to MLU controls (Dromi et al., 1993; Leonard & Dromi, 1994). Instead, they performed above 90% correct on most forms. Here again, a conclusion that can be drawn from these data is that the statistical characteristics of a language’s morphological paradigm can influence how easily it is learned. Denser paradigms provide language learners with a greater exposure to the evidence they need to acquire a language’s morphological structure. This leam- ability of a morphological paradigm based on its density speaks directly to the con- nectionist approach to morphology. On this account, a morphological alternation is learned by mapping consistent modulations in sound and meaning, including more grammatically-oriented ones like noun-verb agreement. The transparency of this map ping is influenced by a variety of factors, including the frequency with which a marker occurs. Gopnik (1997) has challenged the claim that perceptual salience is relevant, cit ing the case of an apparently acoustically salient grammatical morpheme that children with SLI still find difficult. Japanese marks the honorific past tense with -mashita, 188 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. which is more salient than English past tense morphology. Japanese SLI children were claimed to be just as impaired on this form as on less salient morphemes. However, the study cited by Gopnik tested eight Japanese SLI children on only two instances of the -mashita morpheme, and failed to apply the proper controls to determine whether such a deficit represents a deviant pattern in the development of Japanese; thus, the study’s authors acknowledge that it should be treated as preliminary (Fukuda & Fukuda, 1994). Nevertheless, the case of -mashita is a useful illustration of the non-obvious complica tions governing the acquisition and use of many morphemes. As in English, the regular (non-honorific) Japanese past tense morpheme exhibits allomorphy, surfacing as either -ta or -da, as illustrated below. Also as in English, the perceptibility of this morpheme is weak, because of its duration and word-final position. (2) kai - ta (write; past tense) yon - da (read; past tense) (3) kaki - mashi -ta (write; hon-past tense) yomi - mashi - ta (read; hon-past tense) Comparing the cases in (2) to the honorific past tense versions of the same words in (3) reveals that although -mashita is highly perceptible in isolation, the verb stems kai and yon change to kaki and yomi when followed by -mashi-. It might therefore be difficult for a child with disordered phonology to segment the -mashi morpheme, which requires recognizing the commonalities between yon-da and yomi-mashita, in order to determine where the verb root ends and the grammatical morpheme begins. 189 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.6 Conclusion Morphology has traditionally been treated as an independent module of grammar. I have presented an alternative model in which morphology is seen as an emergent property of the mappings between words’ phonological forms and their meanings. Morphological alternations involve small modulations in word meanings - including number, agreement, aspect and changes in grammatical category — accompanied by consistent changes in the word’s phonological form. This approach has been thor oughly investigated with respect to derivational morphology in Gonnerman (1998). The present work reconsiders this model with respect to inflectional morphology. It focuses on the importance of phonological representations in the development of in flection, focusing specifically on how poor phonological representations can lead to specific types of morphological deficits. SLI appears to be an appropriate case of this, given evidence of perceptual and phonological deficits accompanying these children’s poor morphological generalization. Recent interest in SLI by linguists has greatly increased our knowledge of the grammatical deficits in language impaired individuals. However, there is still little agreement about the basis for these impairments. Some have assumed that these gram matical impairments must result from genetic and neurobiological anomalies that af fect the development of Universal Grammar, the innate grammatical module of the brain. They have further assumed that the other deficits exhibited by these children are unrelated co-occurring symptoms. In this chapter I have discussed some of the kinds of evidence that suggest how linguistic impairments could follow from more basic speech processing deficits due to the importance of phonological information in the model of 190 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. morphology that I have presented. The challenges that confront this approach are to gain a better understanding of the nature of perceptual deficits in SLI and how they could lead to the specific problems in learning language that have been described in linguistic research. 191 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 5 Phonology - Syntax Interactions: Evidence from SLI The previous chapter investigated how deficits to phonology can lead to morphological impairments. These results speak to a general theory that considers grammatical mor phology an emergent property of a model that maps phonology and morphology, rather than being a monolithic module of grammar. On this theory, morphological deficits in specific language impairment (SLI) do not represent evidence for a neurally distinct morphological processing capacity, but are instead the consequence of a basic deficit to phonology. This chapter addresses a similar issue related to sentence comprehen sion deficits in SLI. As in Chapter 4, I put forward the theory that these impairments result from the interaction of linguistic and cognitive factors. On the surface, this chapter might seem out of step with the general topic of this work, because it does not deal specifically phonological phenomena. Instead, it fo cuses on syntax, an area of linguistic inquiry that has classically been treated separately from phonology.1 As I will argue however, these data fit well with the alternative view of language envisioned in this dissertation, in which linguistic processes are seen as the x Pace work on the ‘syntax-phonology interface’, which often seeks to illustrate the divisibility of syntax and phonology into separate modules (de Jong, 1989; Inkelas & Zee., 1995; Selkirk, 1984). 192 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. interaction of phonological and semantic knowledge. An important prediction of this theory is that phonological processing will influence how other aspects of language are used. In this chapter, I explore this prediction with reference to syntax by investigating how a deficit in phonological representation and processing, as would emerge from a speech perception impairment, might influence sentence recognition. This hypothesis is tested with the help of a connectionist model that leams sentence comprehension either under normal conditions, or in the context of a speech perception impairment. The focus of this chapter is on syntactic comprehension deficits in children with SLI. These children demonstrate deficits in processing a variety of syntactic construc tions, including resolving bound anaphora and reversible passives, van der Lely and Howard (1993) have shown that children with SLI have difficulty comprehending the difference between sentences like the ones in (1), even when compared to younger nor mally developing children. Similarly, van der Lely and Stollwerck (1997) have found that children with SLI had a greater difficulty resolving reflexive pronouns (himself, herself, itself) in sentences like the ones in (2) compared to a number of younger nor mals. (1) John was kissed by Mary. John kissed Mary (2) Mowgli says Balloo is tickling himself. Mowgli says Balloo is tickling him These types of results are important because they illustrate that the language deficits in children with SLI extend beyond morphology. Instead, language impaired children 193 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. also appear to have difficulty with the configural (hierarchical, rather than linear) as pects of syntax governing how anaphors relate to their antecedents, and how subjectival noun phrases can be shifted to the beginning of a sentence to form passives. Deficits of this type also present a challenge to the theory of a core phonological deficit in SLI. This is because it is difficult to imagine on the traditional modular view of grammar (Fodor, 1983; Chomsky, 1965) how subtle impairments to phonological processing and speech perception can lead to deficits in understanding sentences. 5.1 Anaphor Resolution Deficits in SLI This chapter focuses specifically on the case of pronouns and reflexives, and how they are impaired in SLI. In generative syntax, anaphoric reference is governed by three Binding Conditions (Chomsky, 1981): (3) Principle A: Reflexives are bound in their governing category. Principle B: Pronouns are free in their governing category. Principle C: Referential (unbound) pronouns are free everywhere. The first and second of these conditions are the important ones to here, as they de scribe how syntactic structure determines the behavior of pronouns and reflexives. A language learner who is learning the use of these forms is confronted with the prob lem of discovering what constitutes a ‘governing category’, something that varies from one language to the next. Government and binding theory (Chomsky, 1981) describes these categories using such structural properties as c-command and m-command. 194 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. This chapter explores one theory of how these types of structural properties of language can be learned within a model of language processing similar to the ones used in previous chapters. Deficits in these processes are considered the result of a phonological deficit, much as morphological deficits are. The account I propose is one in which children with SLI’s knowledge of syntactic principles is not altogether deviant, but is instead similar to that of normally developing children. Their sentence comprehension deficits are explained as the result of a processing deficit that limits their ability to use this syntactic knowledge. This account appeals specifically to how a speech perception deficit could lead to a reduced ‘phonological working memory capacity.’ Because normal sentence comprehension relies on this working memory, such an impairment could lead to reduced processing abilities in children with such a deficit. The idea that working memory capacity influences sentence comprehension is not a new one (Daneman & Tardif, 1987; Gathercole & Baddeley, 1990; King & Just, 1991; Montgomery, 1995, among many others). The novelty of the present work is the claim that impaired phonology leads to reduced working memory capacity in SLI, which in turn leads to syntactic deficits in these children. This chapter reviews evidence supporting the theory that phonological working memory influences sentence comprehension ability. It also considers evidence of working memory impairments in both children with SLI, and adult aphasics known to have sentence comprehension deficits. To further investigate how a perceptual deficit might influence working memory and sentence comprehension, I develop a model of syntactic processing in which the phonological forms of words in a sentence are mapped to their corresponding semantics. Syntactic structure is learned as a result of 195 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the serial nature of this task, and is directly affected by the quality of the phonological inputs it receives. The results o f this modeling work suggest an important link between phonology and sentence processing. 5.2 Overview of Empirical Data van der Lely and Stollwerck (1997) investigated anaphoric processing in 12 children with what they characterized as grammatical SLI. They were selected as having impair ments to grammar, specifically inflectional morphology deficits, while having unim paired nonverbal cognitive abilities. These children were compared to three control groups of younger children; the first group was matched for performance on two stan dardized language measures, the Test of Reception of Grammar (TROG, Bishop, 1989) and the Grammatical Closure subtest of the Illinois Test of Psycholinguistic Abilities (ITPA). The second control group was matched on a vocabulary measure (the British Picture Vocabulary Score or BPVS); the third control group was matched on a similar task, the Naming Vocabulary subtask of the British Ability Scales (BAS). The use of three separate control groups allowed the authors to control for the influence of either grammatical or vocabulary development on syntactic development. In two experiments, the authors assessed children’s ability to resolve anaphoric pronouns, by reading them a sentence paired with a picture. Children were asked to determine whether the sentence matched the picture by providing a yes or no answer. For example, children were shown pictures like the one in Figure 5.1 and read a sen tence like Is Mowgli tickling himself? or Every monkey is tickling him. Chien and Wexler (1990) have argued that this type of sentence-picture pairing procedure can 196 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. help elicit syntactic judgments from children without recourse to more artificial tasks such as grammaticality judgments. The authors tested children on a variety of conditions, as indicated in Table 5.1. These factors were the type of pronoun (regular or reflexive), and whether a quantifier- noun phrase was used as an antecedent (Every monkey... vs The Monkey...), both of which require the use of syntactic knowledge to determine the correct antecedent. Figure 5.1: Example of visual stimulus used in van der Lely and Stollwerck (1997), reproduced from Bishop (1997). In many cases, pronouns can be resolved non-syntactically, based on either prag matic or contextual knowledge. For instance, the antecedent of himself in (4-a) can be inferred from the fact that Bob is the only male entity mentioned in the sentence; however, this is not the case in (4-b), where himself can only be resolved syntactically. To determine whether children with SLI perform better in cases where pronouns can be resolved non-syntactically, subjects were tested on both types of cases. 197 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 5.1: Experimental manipulations used in van der Lely and Stollwerck (1997). Condition Examples Sentence Type simple Is Mowgli tickling himself? subordinate Mowgli says Baloo Bear is tickling himself. Pronoun Type plain Is Mowgli tickling him? plain Is Mowgli tickling himself? Subject NP name Is Mowgli tickling him? quantifier Is every monkey tickling him? Gender match Is Mowgli tickling himself? mismatch Is Mowgli tickling her? Note. NP: noun phrase (4) a. Does Sally think Bob likes himself? b. Does George think Bob likes himself? van der Lely and Stollwerck (1997) found poorer overall performance in the SLI group - compared to all three control groups — on sentences where syntactic knowledge was necessary to resolving the antecedent of a pronoun. In addition, they found poorer performance in the SLI group for complex sentences, compared to simple sentences, though again only when syntactic knowledge was necessary. The authors concluded that SLI involves a specific deficit to the processing of syntactic relationships, and leaves lexical and pragmatic knowledge intact. Based on this conclusion, they further suggested that this deficit indicates that grammatical processing is neurally distinct from other types of linguistic and non-linguistic knowledge. 198 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. One complication to this conclusion involves the extent to which the SLI group performed at chance on sentences requiring the use of syntactic knowledge. Figure 5.2 illustrates the SLI children’s performance on four sentence types, simple and complex sentences containing either plain or reflexive pronouns. The noun phrase (NP) and quantifier phrase conditions are conflated for the sake of clarity. The comparison group is a group of younger normal children matched for grammatical language achievement (the LA-1 group in the van der Lely and Stollwerck (1997) study). The SLI group per formed worse than controls on all four conditions, but also showed greater impairment on complex sentences and sentences containing reflexives. ■ Slmple-pronoun ■ Simple-reflexive ■ Complex-pro noun ■ Complex-reflexive Controls Figure 5.2: Comparison of SLI and MLU controls on a pronoun resolution task, from van der Lely and Stollwerck (1997). Quantifier and name-NP conditions have been conflated in the interest of clarity. Error bars are standard error estimates based on the mean standard deviations for each condition as reported in the original paper. Another important observation about these data is that the SLI group was not per forming at chance either overall, or on any single condition. This suggests that these children were using some amount of syntactic information in this task, but were nev ertheless not as proficient as younger normals, van der Lely and Stollwerck (1997) differ in their interpretation of these data; in their analyses they presented performance 199 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. on the picture match and mismatch conditions separately, and found that subjects were at chance for five of 18 subtasks. However, this occurred only in sentence mismatch conditions where the correct answer was ‘no,’ suggesting that this result reflected a tendency toward false positive responses, rather than a tendency toward incorrect re sponses for a specific syntactic condition. When actual detection rates are considered (hits + correct rejections / false alarms + incorrect rejections), as in Figure 5.2, it is clear that SLI subjects responded better than chance on all conditions. This result is important because it suggests an alternative view of syntactic deficits in SLI. Specifically, it indicates that these children are in fact able to process syntactic relationships, but that a processing deficit is hindering their ability to do so. In Chap ter 4, I proposed that the morphological deficits in SLI could be attributed to a more basic deficit in phonological processing that impaired the ability to acquire and use grammatical affixes. It is possible that this same type of deficit is also responsible for difficulties in syntax, due to the role of phonological processing in sentence recogni tion. The remainder of this chapter explores this possibility in depth, by first reviewing the literature on phonology and working memory in sentence processing, and then de veloping a model that explores the effects of a phonological impairment on syntactic processing. 5.3 Phonology and Working Memory in Syntax There is recent research indicating that working memory represents an important link between phonology and syntax, based on the role of phonological working memory in sentence comprehension. Sentence recognition typically involves maintaining the 200 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. phonological form of a sentence in working memory in order to determine the critical relationships between its constituents, and to resolve possible ambiguities (Waters & Caplan, 1985; MacDonald & Christiansen, in press). Additionally, there is evidence to suggest that a phonological working memory capacity most directly influences lan guage processing (e.g., Daneman & Tardif, 1987). There is some degree of disagreement about the exact definitions of ‘short-term memory’ and ‘working memory’ making it difficult to discuss the exact nature of such deficit in SLI. However, for the purpose of the present discussion I adopt the terminol ogy used in Baddeley’s multicomponent model of working memory (Baddeley, 1986). In this model a central executive mechanism controls the use of more specific sub systems such as an articulatory loop and a visuospatial sketchpad. The notion of a phonological working memory derives from the idea of an articulatory loop used used to maintain the phonological forms of words in memory during language processing. This model represents an older approach to short term memory in which language pro cesses and working memory are considered to be separable. The model that I present in §5.5 reflects a more recent way of thinking that does not consider working mem ory to be a separate component of a cognitive process (MacDonald & Christiansen, in press). However, I adopt Baddeley’s terminology for the sake of making contact with previous work in this domain, rather than as a firm commitment to that model of short term memory. 201 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.3.1 W orking M em ory Span The influence of working memory on sentence comprehension has been investigated in experiments that assess sentence processing in normal adults. One methodology has been to group normal subjects based on their reading span, a generally accepted quantification of working memory capacity involving the ability to maintain a list of words in memory while reading sentences (Daneman & Carpenter, 1983). Groups of high and low span readers are then compared on the ability to comprehend different sentence types. For instance, King and Just (1991) measured reading times of high and low span subjects on a variety of sentences in a self-paced reading task. Both groups showed similar reading times on simple sentences like the reporter admitted the error. However, low span readers showed slower reading times for the main verbs of object-relative sentences (The reporter that the senator attacked admitted the error) compared to subject-relative sentences (The reporter that attacked the senator admit ted the error.) In contrast, high span readers showed similar reading times for both constructions. In a similar experiment, Daneman and Carpenter (1983) studied the relationship between working memory span and the ability to resolve pronouns. The authors found that higher span subjects were better able to determine the referents of pronouns farther back in a sentence than lower span subjects. This finding was interpreted to suggest that listeners maintain a sentence’s constituents in memory for the purpose of deter mining a pronoun’s antecedent; as a result, listeners with greater working memory capacity perform better on tasks requiring them to resolve more distant antecedent- pronoun relationships. 202 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. These experiments are important in understanding the relationship between work ing memory and sentence comprehension because they illustrate how sentences differ in the demands they place on working memory. MacDonald and Christiansen (in press) discuss several factors that might influence sentence comprehension ability. These in clude the length of a sentence, as well as its structural complexity (for example, the number of embedded clauses, and how these clauses are arranged). In addition how ever, experience with specific sentence constructions might also play a role in compre hension, since exposure to more sentences of a given type seems to influence the ability to process them (St. John & Gemsbacher, 1998). For example, comprehending object relative clauses should be easier for listeners who are exposed to more of this type of construction. Similarly more experience with printed text should influence working memory span for written sentences, and could thus contribute to better performance in sentence processing experiments involving visual sentence comprehension. Another factor influencing sentence comprehension is the quality of the phono logical codes that are used to maintain sentences i" working memory. For example, sentences containing many words with the same initial phoneme (commonly known as tongue twisters) have been shown to be more difficult to comprehend (McCutchen, Dibble, & Blount, 1994). I propose that phonological processing is directly influenc ing working memory in SLI, and that this is the source of these children’s sentence comprehension deficits. On this account, speech perception deficits lead to impaired phonological representations that in turn have a negative impact on the capacity to maintain sentences in working memory. 203 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.3.2 Data from Aphasics and Normal Adults One piece of evidence for the importance of phonological codes in working memory and sentence processing comes from studies of syntactic processing following brain injury. Agrammatic aphasics have been shown to demonstrate difficulties in syntactic processing, especially sentence comprehension. It has been claimed that these deficits derive from an impairment to core grammar (Grodzinsky, 1990, 1995), similar to what is claimed with respect to syntactic impairments in SLI. Dick, Bates, Wulfeck, Utman, and Dronkers (1999) investigated an alternative the ory in which sentence comprehension deficits in aphasia derive from a processing deficit that inhibits the use of syntactic knowledge. Their study assessed syntactic comprehension in both aphasic and normal adults in a similar way to the van der Lely and Stollwerck (1997) study, by measuring their ability to match auditorily presented sentences to pictures. The authors tested subjects on four types of transitive sentences: actives (The dog is biting the cow), passives (The cow is bitten by the dog), subject cleft (It is the dog that is biting the cow) and object cleft (It is the cow that the dog is biting). These sentences all encode the same general meaning; the difference be tween them relates strictly to syntactic structure. In generative terms, these sentences all have the same underlying structure, but have had their subject or object NPs shifted in various ways. These different sentence types were administered to patients with a variety of brain injury types, including Broca’s, Wernicke’s, Conduction and Anomic aphasics. The results of this experiment showed that aphasic patients had the greatest difficulty on object cleft sentences, and were slightly better on passive sentences; performance was 204 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. best for subject cleft sentences and active sentences (Figure 5.3). An important dis covery of this experiment was that, on average, aphasics did not tend to show dissocia tions in the types of sentences they were impaired on. Instead, specific sentence types seemed to be more difficult for all aphasics.2 o £ o o 100 9 0 8 0 7 0 6 0 e CD £> 5 0 < D 4 0 3 0 20 ^ u J----- A------------- ■ A ° .............. a.. 1 3----------- ................................... 7 \ ©... O ^...............A , \ \ ▲ t ♦ \ a — Control o .... Anomic □ — Conduction ▲ . . . . Broca • — Wernicke Active Subj. Cleft Obj. Cleft Sentence Type Passive Figure 5.3: Sentence comprehension profiles for aphasic and normal elderly adults (Dick et al., 1999). The results of this study revealed different comprehension profiles for different sentence types, regardless of their aphasic profile. The authors also found that normal elderly adults also had slight difficulties with object cleft and passive sentences, suggesting that normal listeners might also be sus ceptible to difficulties in processing certain sentence constructions, and that some structures are intrinsically more difficult than others. These results suggest that apha sics have greater difficulty with some sentences than with others due to a deficit in processing, rather than an impairment to specific syntactic processes. 2A possible exception is the Wernicke’s aphasic group, which showed equally poor performance on all four sentence types due to a floor effect (chance level was 50%). 205 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A possible explanation for such deficits is that they result from processing sen tences under less than optimal conditions; differences in processing specific syntactic structures reflect the degree to which they place demands on systems that subserve sentence comprehension such as working memory. To test this hypothesis, Dick et al. investigated sentence comprehension in normal controls under degraded listening con ditions, to determine whether processing impairments similar to aphasics’ can occur in any listener under stress. The authors tested normal college-aged adults on the same task and stimuli above, but under a number of different stress conditions; (1) listen ing to auditory stimuli that were acoustically compressed such that the speech rate was 50% faster than normal; (2) listening to low-pass filtered speech; (3) listening to noise-embedded speech; (4) simultaneously performing an auditory digit memory task in which 6 digits were presented auditorily before the beginning of each sentence comprehension trial, and then deciding whether it was the same as a second series of 6 digits presented at the end of the trial; (5) performing a visual digit memory task, sim ilar to the auditory digit task, but with visually presented digits. A control condition was also used to determine baseline performance. The results of this experiment are reproduced in Figure 5.4. They indicate that spe cific stress conditions resulted in clear sentence comprehension difficulties, especially the noise-embedded and compressed speech conditions. In addition, the actual pattern of difficulty closely matched that of aphasics, though to a lesser degree of severity. These results suggest two important facts about syntactic comprehension difficul ties. First, they indicate that some language impairments are the result of degraded inputs to the language processing system, rather than a deficit to core grammar. For 206 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. instance, normal college-aged listeners could recognize all four sentence types at near- perfect levels of proficiency under some conditions, but performed much more poorly on the same sentences when their auditory forms were distorted or masked. 100 <D 9 5 O Control A ...... Auditory Digits A ----- Visual Digits ♦ Low-pass Filter O Compressed • — Noise CL P a s s i v e O b j. C le ft Active S u b j. C le ft Sentence Type Figure 5.4: Sentence comprehension profiles for normal college-aged adults, under a variety of degraded listening conditions (Dick et al., 1999). Results revealed a pattern of impairment similar to aphasics under conditions that distorted sentences auditory forms. Second, the pattern of impairment observed by Dick et al. indicates that sentence constructions vary in their susceptibility to a comprehension impairment. Since all four sentence types used in this study had similar underlying forms, differences in subjects’ performance must lie in their ability to process the syntactic structure of these sentences, rather than their meanings per se. It is also not the case that acous tically modifying these sentences distorted them beyond recognition, since listeners were clearly able to comprehend active voice and subject cleft sentences even under the most extreme conditions. Instead, these conditions made processing difficult for 207 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. sentences that placed a heavier load on working memory such as object cleft and pas sive sentences. These results lend support to the theory that not all sentence structures are equally easy to process, and that syntactic comprehension deficits can result from degraded speech inputs. It further supports the theory that working memory for sen tence comprehension relies on a speech-specific code, such that degraded auditory information had especially strong effects on sentence comprehension. 5.4 Working Memory Impairments in SLI Thus far, I have reviewed data suggesting that phonological working memory span influences how normal and aphasic adults process sentences. The literature seems to suggest that sentences differ in the extent to which they rely on working memory; for instance, reduced working memory capacity has a detrimental effect on listeners’ abil ity to comprehend some sentence types, but not others. In addition, sentence compre hension deficits similar to aphasics’ can be simulated by acoustically degraded speech, suggesting that sentences are represented in working memory using a speech-like code. These facts suggest that syntactic deficits in SLI can be explained as resulting from the influence of phonology on working memory for sentence comprehension. On this ac count, the source of syntactic deficits in SLI lies in speech perception, which leads to the development of poor phonological representations and makes it more difficult for listeners to maintain adequate representations of sentences in working memory in order to comprehend them. There is good evidence to support the notion of working memory deficits in with SLI. A growing body of research has shown that language impaired children have 208 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. problems on many tasks that draw on short-term memory (STM, Kirchner & Klatzky, 1985; Gathercole & Baddeley, 1990; Montgomery, 1995; Curtiss & Tallal, 1991). Gathercole and Baddeley (1990) investigated the hypothesis that children with SLI are specifically impaired in phonological working memory by testing a group of six children with SLI on a nonword repetition task. The use of this task is based on the theory that, as words get longer, the ability to accurately repeat them relies increas ingly on a temporary mental storage space. On this account, a phonological working memory deficit should impair the processing of longer words (3- and 4-syllables), rel ative to shorter ones (1- and 2-syllables). Gathercole and Baddeley (1990) found that children with SLI were significantly worse than two control groups matched for ei ther chronological-age or language achievement on longer nonwords, but not shorter ones. They interpreted these results as indicative of a phonological working memory capacity limitation. This result has been replicated in a variety of other studies, indicating that this is a reliable characteristic of SLI (Bishop et al., 1996; Edwards & Lahey, 1998; Kamhi et al., 1988; Montgomery, 1995). To investigate the relationship between these im pairments and syntactic processing difficulties in SLI, Montgomery (1995) compared children with SLI to younger normals at the same level of grammatical development on tasks of nonword repetition and sentence comprehension. As in previous studies, Montgomery found effects of word length in nonword repetition such that children with SLI were much worse at repeating 3- and 4- syllable words, compared to shorter words. In addition, this study found significantly worse sentence comprehension in children with SLI, and a strong positive correlation between subjects’ performance on 209 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the two tasks (r = .62, p < .001). Further analyses also revealed that the SLI group had greater difficulty comprehending longer sentences like The big brown furry dog is quickly chasing the little yellow and black cat, compared to shorter, syntactically similar sentences like The big brown dog is chasing the little cat. In contrast, controls showed very little difference on longer and shorter sentences like these. These results seem to suggest that phonological working memory plays a causal role in sentence comprehension impairments, given how children with SLI have prob lems with phonological working memory, and the reliable correlation between these working memory deficits and difficulties in sentence comprehension. The theory that emerges from this is that children with SLI represent an extreme lower end of a con tinuum of working memory span, resulting in especially poor sentence comprehension abilities, and that the source of this deficit is an impairment in perceiving speech that leads to poorly-developed phonological representations In the next section I develop a model of sentence comprehension that incorpo rates phonology, working memory and sentence comprehension. The purpose of this model is to investigate the theory that a deficit in perceiving speech can explain sen tence comprehension deficits in children because of the importance of phonological representations in working memory. 5.5 Simulating Normal Sentence Comprehension This section describes a connectionist model of sentence recognition that learns to re solve bound pronouns and reflexives. The primary purpose of this model is to develop a simulation of normal sentence comprehension that can also be used to investigate 210 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the influence of phonological factors on syntactic impairments. The model simulates several important aspects of sentence comprehension by learning to map the phonolog ical forms of words in a sentence to their meanings. It implements pronoun resolution by learning to bind the meanings of pronouns to their antecedents. The dynamics of this task are accomplished using an architecture that encodes various characteristics of grammatical sentences within its connection weights. Working memory is simulated by implementing a discrete feedback loop that allows the network to maintain internal representations as it is exposed to successive inputs. This model does not seek to simulate the entire task of syntactic processing, which surely involves a great deal more knowledge and structure than is implemented here. Instead, the model implements some important aspect of sentence comprehension, specifically the ability to recognize words in sequence, and to discover higher-level (syntactic) structure in the sequences that are input to it in order to resolve long distance syntactic dependencies. 5.5.1 Model Architecture and Task The model used in this study is illustrated in Figure 5.5. Each unit in the network used a logistic activation function (range: 0.0 — 1.0). The input layer consisted of 108 units that represented the phonological features of 6 phonemes, fitting a CVCCVC frame. Each phoneme slot consisted of 18 features, identical to those used in Chapter 4. The resulting scheme was able to represent a number of 1- and 2-syllable English words 211 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. with simple onsets and codas.3 For example, the words John and Mary were encoded as [ j a n ] and [ m _ ri_ ]. hidden units cleanup units Phonological Input Semantics Output Figure 5.5: Network used to simulate syntactic processing. Inputs represented of words’ phonological forms; outputs encoded word meanings. The network learned syntactic dependencies through exposure to grammatical sentences. The output layer represented the semantics of words using a system of 98 dis tributed semantic features taken from the WordNet database (Miller, 1990), which represents word meanings using a feature-based scheme. The features that were used were limited to the features needed to distinguish all the words in the train ing set, rather than all the possible features used to describe these words in Word- Net, which frequently exceeded 20 features per word. The resulting feature scheme nevertheless retained important characteristics of distributed semantic representations, such as the ability to represent general overlap in word meanings. For example, the 3Words that did not fit this frame were truncated, usually by deleting one of the consonants in a cluster. 212 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. word cat was represented by activating the output nodes corresponding to the fea tures [NEUTRAL-GENDER], [ANIMAL] and [FELINE]; dog was represented as [NEUTRAL-GENDER], [ANIMAL] and [CANINE]. Verbs were represented simi larly, for example surmise was represented as [INFER], [SPECULATE], [EXPECT] and [JUDGE]. As such, words with similar grammatical roles tend to resemble each other, a tendency that might represent a useful cue to learning such aspects of gram mar as ‘noun’ and ‘verb’ (Seidenberg, 1997). Since the network did not have access to discrete grammatical categories, providing it with this type of information might be useful in learning syntactic principles operating on such categories. The network learned to recognize sentences by mapping sequences of phonolo- gically encoded words to semantically-encoded outputs. The training procedure was as follows: at the start of each training trial, a sentence was chosen at random from the corpus of training sentences, described in §5.5.2. Each word in the sentence was presented to the network for two time steps, where a single time step is defined as the propagation of activations across a single set of connections. The activation of each word propagated to the output layer, at which point the resulting output pattern was compared to the target pattern for that word. Connection weights were then adjusted using a variation on the backpropagation through time learning algorithm (Williams & Peng, 1990) as described below. The learning rate was set to 0.005; error radius (the tolerance within which the network calculated error) was 0.1. Each training trial ended when every word in the sentence had propagated through the network. The input and output layers were connected through a set of 150 hidden units. These were in turn connected to and from 100 cleanup units (Plaut & Shallice, 1993), 213 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. creating a recurrent architecture in which information could be maintained within the network by propagating activations back and forth between the two layers. The ability to feed back information to the hidden layer from one time step to the next was impor tant because it allowed the network to maintain a representation of previous states. In very broad terms, these can be considered memory traces of previous inputs that can be used in processing the current input. For example, when the network was presented with the sentence Linda took the ball from Barney, the network was able to retain an internal trace of the verb took and use it further on in the sentence to better recog nize the preposition from, based on having been exposed to other similar sentences. In the present simulation, this ability is used to simulate working memory in sentence comprehension. The network was trained to identify the antecedents of pronouns and reflexives in sentences by outputting the semantics of an entity when a bound pronoun or re flexive was present on the input. For instance, the word himself in John says Bill appreciates himself was encoded as [MALE] [REFLEXIVE-PRONOUN] [HUMAN] [BILL], whereas in the sentence Bill says John appreciates himself it was encoded as [MALE] [REFLEXIVE-PRONOUN] [HUMAN] [JOHN]. The same encoding was used for non-reflexive pronouns that were bound to a noun in the sentence, as in John says Bill appreciates him. Unlike simple recurrent networks (SRNs, Elman, 1993), the recurrence in the present simulations was not the result of maintaining an exact copy of the network’s hidden units from previous states. Instead, the network’s state in previous iterations was maintained using distributed connections within the network’s architecture. This 214 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. had the advantage of allowing the network to learn to retain its previous states over longer periods of time. In addition to recurrence, the network also represented temporal sequences using what are called delay lines (Pearlmutter, 1995). Previous research into the dynamics of neural systems has revealed that assemblies of neurons are able to respond to tempo ral orderings of stimuli by organizing themselves into order-sensitive groupings (e.g., Carpenter, Georgopoulos, & Pellizzer, 1999). This type of temporal order sensitivity was simulated in the present model by modifying how unit activation and error were calculated. The equation in (5.1) illustrates how a unit’s activation (x,) is usually cal culated in backprop through time networks, where Wji is the weight of the connection between nodes j and i, and yj is the activation of nodes connected to it. The equation in (5.2) modifies this by adding the notion of a time delay Tji, signi fying the delay along the connection from unit j to i. This additional term modifies the behavior of the network by allowing the activation of neuron i at time t to be influenced not only by the state of neuron j at time t — 1, but also at times (t — 2)...(t — r) In essence, delay lines allow nodes in a network to be more directly sensitive to inputs from layers at time steps earlier than t — 1. A major limitation of standard (5.1) j (5.2) 3 215 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. backprop through time networks is the fact that activation propagating through the net work at time n can tend to obliterate activation from earlier time steps n — 1, n — 2, which makes it difficult for such networks to encode temporal dependencies spanning across many time steps. As a concrete example, it is difficult for a recurrent network to maintain the the referent of it in a sentence like Walk to the table and put the book on it. Delay lines help to offset this problem by providing a set of connections within the network that are sensitive to earlier network states, making it more likely that the network can maintain an antecedent over a greater number of time steps. Delay lines were implemented by randomly assigning r values of between 1 and 10 to each weight in the Hidden — ••Cleanup and Cleanup — ^Hidden connection sets; all other connection weights were assigned a delay value of 1, effectively mimicking standard backprop through time activation propagation. The use of delay lines within the hidden layer of the network allowed the network to better encode long-distance dependencies in sentences by allowing it to simultaneously maintain representations of several previous states in its ‘working memory.’ This memory is not an exact copy of previous inputs, but is instead a distributed representation of previous hidden layer states, and thus falls much more into line with the general connectionist assumption that memory is represented as distributed patterns of activation in a network. That is, the network cannot use a specific set of connections to literally access an input at time t — n, since the actual delay connections were limited to those between the Hidden and Cleanup layers. 216 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.5.2 Training Corpus A simplified English grammar was created by selecting a set of 23 verbs with various subcategorization structures.4 Sentences were created by algorithmically assigning each of the nouns in Table 5.2 to each of these verb’s thematic roles and adding any necessary prepositions. (For the sake of simplicity, determiners and tense markers were omitted from sentences, though there is nothing in the present methodology that precludes the network from having learned them). In some cases, verbs could take more than one type of complement (e.g., Bob laughs at Harry, Bob laughs with Harry, Bob laughs). To represent this, the verbs sleep, take, give, throw, run and walk were used in a variety of different subcategorization structures. Sentences with pronouns were created similarly, by adding the pronouns him, her, it, himself, herself and itself as direct and indirect object complements of verbs. The result was a set of 396,191 active voice, declarative simple sentences, some of which contained pronouns. A training of complex subordinate sentences was also created by adding a subject noun and psych verb {think, guess, say and surmise) to a simple sen tence, as in Bob says Mary put (the) tree on (the) island.5 The result was an additional 1,265,019 sentences in the training set, some of which contained pronouns. 4The following verbs were used.cfj, sing, hit, console, threaten, grab, mock, tickle, slap, bite, ap proach, touch, hold, sleep, take, give, put, throw, dance, look, run, step, walk. 5Parentheses here indicate determiners that were not included in the training sentences, but are provided for the sake o f clarity. 217 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 5.2: Nouns used in constructing network training sentences. Male Female Neutral Abe Dot Goose Bill Emma Hat Chuck Fran Island Tony Kim Turkey Yoshi Carrie Dog Bailey Laura Cat Pablo Kate Cow Bob Laurie Turtle Joe Mary Tree Mark Karen Dan Suzanne Mike Elaine David Alex 5.5.2.1 What the Grammar Encoded The resulting sentences represented a grammar that captured several important aspects of English syntax by encoding structural dependencies beyond simple linear relation ships. In order to properly learn this grammar the network had to acquire generaliza tions about grammatical classes of words, and facts about the structural organization of words in sentences. For instance, words group naturally into grammatical categories such as nouns and transitive verbs. The result is that members of the same category are interchangeable within a given syntactic construction. So, for example, a learner that is exposed to sentences like Bob likes Harry, Bob likes Mary and Mary likes Bob should also be able to recognize Mary likes Harry as grammatical, in spite of having never seen that specific sentence before. Likewise, the learner should not overgeneralize all members 218 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. of a broader category to a context that requires a narrower one; the learner should know that not all verbs take double-objects, and thus sentences like *Tony looked at Yoshi to Emma are ungrammatical. Additionally, the grammar required the network to learn structural relationships between words extending beyond linear dependencies (e.g., what word can or cannot directly follow another). For example, he current grammar encodes the fact that certain verbs require more than one noun phrase or prepositional phrase complement, as in Bob put Mike on (the) cat (vis. *Bob put Mike). These types of dependencies extend beyond bigram rules; they require the learner to encode configural patterns such as “VP: V(ditransitive)+NP+PP.” With respect to pronouns and reflexives, I also argue that the sentences in the training encoded a fairly complex grammar similar to what children know about antecedent-anaphor relationships. It was not sufficient for the network to learn n-gram rules such as ‘bind a pronoun to the fifth previous word’ for a sentence like (the) dog guesses (the) cow danced with it, because this rule would not hold for sentences like (the) turkey said Ellen mocked it. Instead, the relationships between pronouns and their referents were determined by statistics holding over abstract grammatical struc tures such as the non-governing subject of a verb phrase. Reflexives also rely on structural (rather than linear or n-gram dependencies). As such, while the generalization ‘bind the first word in a sentence to a reflexive’ may hold for some cases (Mike put himself on (the) dog), it was not useful for other sentences (Abe said Mike put himself on (the) dog). A rule such as ‘bind a reflexive to the second previous word’ was also not sufficient; while this would hold for Dot showed Sally 219 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. to herself and Dot showed herself to Sally, it would not for the equally grammatical sentence Dot showed Sally herself or Dot gave (the) turkey to herself I suggest that a learner that has successfully learned to recognize the sentences in the training set, and that is able to accurately generalize to novel sentences similar to the ones in this set, has arguably learned a grammar that reflects many of the important facts about human grammars in general. 5.5.2.2 Obtaining a Training Set Training the network on 1.5 Million different sentences was not desirable for several reasons. First, it is clear that language learners are not exposed to all possible sentences in a grammar. Instead, children learn to generalize syntactic structures from exposure to a small subset of the possible sentences in their language (Chomsky, 1965; Gold, 1967). Second, hardware limitations made it difficult to create a pattern file contain ing the semantic and phonological representations of all of these sentences, since the amount of RAM and hard disk space would have exceeded what was available by sev eral orders of magnitude. Instead, a sample of 40,000 sentences was randomly selected from this larger set, as indicated in Table 5.3. Table 5.3: Breakdown of sentence types used in network training. Pronoun Status Sentence Type no pronouns pronoun reflexive simple 10,000 5,000 5,000 complex-transitive 5,000 2,500 2,500 complex-di transitive 5,000 2,500 2,500 220 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Sentences were presented to the network at random, using a probability weight for each sentence. A raw probability estimate was calculated by summing the frequency of each word in the sentence (Francis & KuCera, 1982) and dividing it by the number of words in the sentence. For simple sentences, this value was then multiplied by 10. Finally, the presentation probabilities were obtained using the log of the raw probabil ity. The purpose of frequency weighting was twofold; first, it increased the network’s exposure to higher frequency words, and second, it increased the frequency with which the network was exposed to simpler sentence structures. While it is not obvious that either of these factors are absolutely necessary to normal learning of sentence struc ture, there is evidence to suggest that networks learn better when exposed to language data with realistic frequency-weighting (Daugherty & Seidenberg, 1992). This scheme might also effectively simulate the idea of ‘starting small’ in learning learning syntax (Newport, 1990), and has been shown to be useful for connectionist networks learning syntax (Elman, 1993, but see Rohde & Plaut, 1999). 5.5.3 Training Results - What Did the Network Learn? The network was trained on the corpus of 40,000 sentences until training error reached asymptote, at about 3 million training trials. The network was evaluated by presenting sentences to it as in training, and allowing it to output the appropriate semantic output for each word. The network’s ability to compute the correct outputs for the words in a sentence was assessed using a nearest-neighbor criterion. This was accomplished by computing the Euclidean distance between the network’s output and each word in the 221 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. network’s vocabulary; the word with the smallest Euclidean distance was determined to be the winner. Errors were scored when there was a mismatch between any word in the actual sentence and the nearest-neighbor output by the network. For instance, an error would be scored if the network was presented the sentence Harry danced with Karen, and calculated the output semantics best matching the sentence Harry danced with Carrie. 5.53.1 Grammaticality Judgments The concept of grammaticality is central to the standard approach to linguistics, and is used to describe whether given utterances conform to the hypothesized underlying grammar. The primary way to assess a sentence’s grammaticality is through the use of grammaticality judgments (Chomsky, 1961), in which speakers are asked to judge the acceptability of a given utterance. The primary assumption is that language is learned by acquiring a grammar, and that the ability to demonstrate the knowledge of a grammar can be assessed by judging the grammaticality of sentences. The present work diverges from many assumptions of the generative view, including the notion of an underlying symbolic grammar. However, it is clear that speakers are able to judge the grammaticality of a sentence like *the boy fell the chair, and it is thus important to demonstrate similar competence in any model of syntactic knowledge. The methodology that I use in the present model was first proposed by Allen and Seidenberg (1999), who trained a network similar to this one on a variety of English sentence types. Allen & Seidenberg then tested their network’s ability to compute the meanings of novel grammatical and ungrammatical sentences by presenting them as 222 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. phonological forms to the network, and observing the accuracy with which it could produce the correct semantic forms for them. The authors observed important differ ences in the network’s output in the two conditions; when the network was presented with a novel grammatical sentence, it computed the meanings of the words in the sen tence to a high degree of accuracy. In contrast, the network performed much more poorly on ungrammatical sentences by computing outputs that did not correspond as closely to the input. For example, their network produced significantly higher degrees of error for sentences like *he left to my house at noon than for a similar sentence he left my house at noon. Allen & Seidenberg interpreted this result as indicating that the network had learned important generalizations about English word order, and that ungrammatical sentences violated the expectations about sentence structure that the network had developed over the course of training. The present model was tested in a similar way, by presenting it with novel sentences that were either grammatical or ungrammatical. Grammatical sentences were obtained by randomly selecting 20 sentences that were not used in training from the corpus of simple transitive sentences described in §5.5.2. A set of 20 ungrammatical sentences was obtained by modifying the verb or a verb complement in each novel grammatical sentence, so that is was no longer grammatical (e.g., Dot took Joe from Emma was changed to *Dot look Joe from Emma). Both sets of sentences were presented to the fully-trained network. Its ability to compute the meanings of the sentences was assessed by measuring the mean sum- squared error (SSE) of the output across all words in the sentence, as defined in 5.3, 223 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. where SSE is a measure of the degree to which the network’s output y for unit A ;, as a function of input x, matches the target output t for unit k. C E(w) = 1/2 w) - tk}~ (5.3) k The mean SSE for the sentences in the grammatical testing set was a very low 0.0644 (s.d. 0.1284), compared to the ungrammatical sentences which yielded a mean SSE of 0.2346 (s.d. 0.5026). An independent-samples t-test confirmed that this differ ence was significant (£(19)= 6.78, p < 0.025). This result seems to suggest that the network was less likely to compute accurate meanings for sentences that did not conform to the grammar that it was exposed to during training. This was investigated more closely by examining network error as it processed each word in a grammatical and ungrammatical sentence. Figure 5.6 com pares how the network recognized the sentences Bob gave Bill to Carrie and *Bob thinks Bill to Carrie, using a Euclidean distance metric to compare the actual and ex pected network outputs as each word in the sentence was recognized. The region of interest is the point at which the network computed the output for to. In the grammati cal sentence, the network produced a very low error across each word; in contrast, the ungrammatical use of the to Carrie prepositional phrase (PP) resulted in a higher error rate, suggesting that the network found the ungrammatical sentence more difficult. Figure 5.7 illustrates this effect for another case of verb-complementizer mismatch. In this case, the sentence pair (The) cat gave (the) turkey to Elaine is contrasted with *(The) cat gave (the) turkey at Elaine. Here again, the network showed higher than expected error at the point of ungrammaticality at, compared to to. 224 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0.05 < D O ) 0 .0 4 O S 0 .0 3 • grammatical ♦ ungrammatical c e g J2 Q c < 0 < D •o 0.02 0.01 o 3 L i i 0.0 Carrie Bob gaveTthinks B ill to Figure 5.6: Comparison of output error for two sentences. When the verb thinks is used, the network has more difficulty computing the correct semantics of the ungram matical pronoun to. Error is calculated as Euclidean distance to target. In the traditional generative account, language acquisition and processing proceed by mapping utterances to an underlying symbolic grammar in order to derive the mean ings of words and sentences. Grammaticality is judged by comparing a given sentence to what is known about the legal combinations of symbols in a grammar. The present results suggest a different view of grammar, in which children learn what is and is not grammatical as a result of learning to understand the meanings of sentences. The grammaticality judgment results indicate that the network learned to use higher-order information about sequences of words in order to better perform this task. For example, it learned that certain verbs allow specific types of noun and prepositional phrases in complementizer positions. An important side-effect of this type of higher-order gener alization is that it can easily accommodate novel but grammatical sentences. However, when the network was presented with sequences of words that did not fit these gener alizations, the accuracy with which it recognized them was decreased. 225 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0.02 0 . 0 1 5 O © O c £ 0.01 V ) Q c €0 ■ g 0 . 0 0 5 7 3 3 LU • grammatical ♦ ungrammatical 0.0 turkey to/*at Elaine cat gave Figure 5.7: Comparison of output error for two sentences. Because at is not a gram matical complementizer to the verb gave, the network has more difficulty computing the correct semantics for it. Error is calculated as Euclidean distance to target. It is also worth noting that concepts such as noun and verb were not immediately available to the network; no single semantic bit — or group of bits - represented this distinction. Instead, the network has learned that certain words tend to occur in similar syntactic contexts, and has used generalizations about these word groupings to arrive at notions roughly corresponding to Noun, Verb, Pronoun and so on. An important consequence of this is that the network would readily accept the word Bob in the context of Sally likes , even if it had never seen constructions like Sally likes Bob, or even likes Bob. Based on these results, it seems clear that the network learned several important as pects of syntactic behavior: notions of grammatical category, generalization to novel sentences, and the tendency to reject ungrammatical sentences. However, it is impor tant to acknowledge several limitations of this model. First, it did not learn all the syntactic structures that are possible in English. Second, its vocabulary was much 226 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. more limited than that of even very young children. Third, the feedback that the net work received during training was much better than what children receive, since it was assumed that the network had access to the correct meanings of all the words in the sentences that were input to it. Based on these - and possibly other - limitations, it could be argued that the same network would not have been able to acquire similar behaviors for a larger grammar that included many more words and sentence types, or in the presence of degraded feedback about the meanings of the words that make up the sentences to which it was exposed. I would agree that this is an empirical issue that needs to be tested. However, there is reason to expect that the current results will tend to ‘scale up’ to more realistic conditions. First, other networks have been trained on more complete grammars, and have pro duced similar results. Allen and Seidenberg (1999) trained a similar network to this one using a broader vocabulary and grammar, and found results very similar to the present one in terms of the network’s ability to judge grammaticality. Harm, Thorn ton, and MacDonald (2000) have developed a larger scale version of this model which encodes a large number of words (over 8,000) into a set of 20,000 simple and complex English sentences of many types. Similarly, Rohde (1999) has modeled the compre hension and production of a large variety of sentence types, and has used the resulting model to simulate how the two interact in syntactic acquisition. The goal of these simu lations was to address questions pertaining to sentence processing and syntactic acqui sition. However, their successes indicate that there is no reason to think that grammars and vocabularies on the order of what children are able to learn would present a special challenge to recurrent connectionist networks. 227 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Another possible objection to this architecture stems from the fact that the network was trained learn under idealized circumstances. However, it is widely acknowledged that connectionist networks are highly resilient to noisy, imperfect and incomplete information (e.g., Rumelhart et al., 1981). Because information is learned and repre sented as statistical generalizations within a distributed architecture, a network is able to leam from incomplete, stochastic, and even incorrect information. Thus, one could imagine retraining the present network in less than ideal circumstances in which the task were made noisier in various ways. These could include occasionally exposing the network to ungrammatical sentences; presenting it with words for which there is no semantic target; presenting it with words for which there is an incomplete semantic pattern; or presenting it with words for which there is an incorrect semantic pattern. Even under these circumstances, it should be possible for the network to extract the relevant information from what is available to it in order to properly leam the task at hand. This is because connectionist networks do not require ideal training situations to leam important information about a given task. 5.5.4 Pronoun Resolution To test the network’s ability to resolve pronouns, a set of testing sentences was de vised that contained bound and reflexive pronouns under two different syntactic con ditions. The first set contained 24 transitive sentences like (the) cat threw itself at Bob, and tested the network’s ability to determine the referent of a reflexive in a simple sentence. The second set consisted of 48 complex-subordinate clause sentences con taining both reflexives and bound pronouns, as in Bob thinks Stan likes him. These 228 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. sentences tested the network’s ability to resolve the meanings of bound pronouns in cases where contextual information like gender could not be used. All sentences in the testing sets were novel, to the extent that the network was not exposed to them during training. However, all sentences contained words from the vocabulary that the network had been trained on. These two sets contained the same general sentence types used in van der Lely and Stollwerck (1997), allowing us to directly compare model performance to that of the children in their study. The network was tested by presenting these sentences to it, and comparing the output of the network to the target output. Outputs were compared on a word-by-word basis using a nearest-neighbor decision method and a Euclidean distance metric. Sentences containing one or more incorrect words were scored as incorrect. The results indicated that the network was performing at a high degree of profi ciency on both testing sets, scoring no errors on either. This result indicates that the network is able to correctly resolve the antecedents of bound pronouns in novel sen tences at an adult-like level. I will return to these results in the next section, which considers the impact of a phonological deficit on pronoun resolution. 229 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.6 Simulation 2: Sentence Comprehension With Impaired Phonology The previous section presented a model of sentence comprehension that used a recur rent connectionist network to acquire a number of principles of English syntax, includ ing pronoun resolution. Children with SLI have been shown to have an impairment in resolving pronouns using only syntactic (and not contextual) information, and have hypothesized that this impairment could be explained as a consequence of a phono logical impairment. The general claim is that impaired phonological processing will result in weaker working memory representations, known to be important to sentence processing. It is hypothesized that subtle distortions in this network’s phonological input will have a negative impact on its ability to perform this task, because the network’s ability to maintain representations of the words in a sentence over time depends on the quality of these inputs. In contrast, such a deficit should not have a significant impact on abil ities that are not working memory-intensive. For instance, it should display relatively normal auditory word recognition and grammaticality judgments. 5.6.1 Inducing a Phonological Deficit The same network architecture and training set described in the previous section were used in this simulation. A phonological deficit was simulated in the model by adding Gaussian noise to the phonological inputs presented to it during training. This was done by adding random noise to each input unit as it was presented to the network; 230 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the noise had a Gaussian distribution with a standard deviation of 0.4. Any resulting activations values that fell outside the range of the logistic activation function were trimmed such that negative values were set to 0.0 and values greater than 1.0 were set to 1.0. The effect of Gaussian noise was to expose the network to a slightly phonological form each time a given word was presented. I will argue that this variation was not typ ically so great as to change the identity of a word’s phonological form, but it did make it more difficult for the model to develop consistent phonological representations, since it had to leam word forms that changed from one exposure to the next. In addition, this variability was not systematic in the way that it is in spoken language. The difference was that the network is not being exposed to systematic variability the way normal listeners are. Much of the variability intrinsic to speech sounds is due to context. For example, vowels tend to be longer before voiced stops (Chen, 1970), stops tend to be less voiced wordfinally, and velar stops will tend to be fronted before front vowels. Listeners seem to be aware of this variability, and are able to accommodate it; for instance, they tend to adjust the perceptual boundaries between phoneme categories to compensate for coarticulatory effects (e.g., Mann & Repp, 1981). Connectionist networks are similarly able to leam phonological representations in the presence of contextual effects, by recovering them from incomplete or perturbed inputs (McClel land & Elman, 1986). In contrast, the noise that the present network was exposed to was not context-driven. It was thus more difficult for it to leam to ignore variations in how phonemes and words were realized. 231 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Of course, all speech sounds are produced with some naturally occurring variation not due to context. However, the quantity of random Gaussian noise used in training the current network was appreciably greater than what is typically encountered by lis teners. The degree of noise added to each each phoneme in a word was great enough to occasionally push it into the domain of another phoneme’s feature space; for example, there was sufficient noise to change the [+lateral] feature to [-lateral] approximately 15% of the time, effectively changing an IV to an /r/. It seems safe to assume that this is a greater rate of variability than what normal speakers would be expected to produce, or what normal hearers would (mis)perceive. There is reason to believe that children with SLI also tend to misperceive phonemes at this rate. Evidence of perceptual deficits abounds in studies of language impaired children, finding poorer than expected identification and discrimination of speech sounds in tests of categorical perception (Joanisse et al., 2000; Elliott et al., 1989; Tallal & Piercy, 1974; Stark & Heinz, 1996; Sussman, 1993; Thibodeau & Sussman, 1979). The effect of weak categorical perception is an increased tendency to mis- categorize speech sounds, resulting in the development of imperfect or incomplete phonological representations of phonemes and words. The present model attempts to simulate this weakness in categorical perception. 5.6.2 Training Results The phonologically impaired network was trained in the same manner as the previous network using the same training set, for 5 million trials. To assess the network’s per formance on the training sentences, the network’s mean squared error over the course 232 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. of training is plotted in Figure 5.8, along with that of the intact network. This figure illustrates how the phonologically impaired network had greater difficulty learning the sentences in the training set, as indicated by higher mean squared error over the course of training, a slower descent toward zero, and a higher asymptote. o u] " O ffl k- ( O 3 cr C/3 E C O < D 10 Impaired Unimpaired 9 8 7 6 5 4 3 2 1 0 0 . 5 1 . 0 1 . 5 2 . 0 2 . 5 3 . 0 3 . 5 4 . 0 Number of Training Trials (millions) Figure 5.8: Mean Squared training error for the impaired and unimpaired sentence comprehension model, over the course of training. Higher values in the impaired model indicate a greater degree of inaccuracy in producing the correct semantic forms of the sentences in the training set. Next I investigated the impaired network’s ability to recognize novel sentences. This was done by testing the network on the set of 20 simple transitive sentences used for grammaticality judgments in section 5.5.3.1. Comprehension was assessed using a Euclidean distance metric with a nearest-neighbor criterion. The unimpaired network was able to accurately recognize all words in all testing sentences, thereby scoring zero errors on the testing set. In contrast, the impaired network produced two incorrect sentences, resulting in a score of 90% correct. 233 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The actual errors that the network produced were interesting. First, the errors in dicated that the network was misidentifying words for ones that were phonologically similar (i.e., cat for Kate and give for grab). This result is similar to what is observed in actual children with SLI on word repetition tasks; language impaired children occa sionally produce word repetition errors that closely resemble the target form, and these errors tend to be real words, irrespective of whether the target is a familiar word or a nonword (Bishop et al., 1996; Edwards & Lahey, 1998). It is also interesting that the two errors that the network produced reflected a knowl edge of grammatical structure in the network. That is, the network’s misidentifications involved confusing one noun for another, or a verb for another verb. This suggests that the network was able to use contextual information to process sentence meanings, such that even incorrectly identified sentences were syntactically correct. The very small number of errors produced by the impaired network makes it dif ficult to determine just how indicative these patterns were of its processing abilities. However, it does indicate that training a network of this kind with moderate amounts of Gaussian noise will yield only small deficits for comprehending typical sentences. 5.6.3 Pronoun Resolution The impaired model’s performance on sentences containing bound pronouns was of primary interest here. The impaired and unimpaired networks were tested on two sets of sentences containing regular and reflexive pronouns that were not included in the training set. The first set consisted of 48 simple transitive sentences. Half of these sentences contained a bound reflexive in the object or indirect object position 234 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. {(The) goose danced with itself), and the other half contained an unbound pronoun of the same gender as the verb’s subject (Sally danced with her). The second testing set consisted of 48 complex sentences, half of which contained bound pronouns in a direct or indirect object position (Bill says Yoshi likes him), and half containing bound reflexives in similar positions {Bill says Yoshi likes himself). All testing sentences were taken from the set of sentences not used in training. The two networks were tested for pronoun errors after 3.5 million training trials, the point at which the unimpaired model had reached asymptote. Pronoun errors were scored when the network misidentified the referent of a specific pronoun in a given sen tence. The results indicated that the unimpaired model was performing at near-perfect levels for both sentence conditions, and for both the reflexive and bound pronoun con ditions (Figure 5.9). In comparison, the speech impaired network was producing a higher level of errors overall, and was also worse in the both complex-embedded con ditions. The quality of these results is similar to that of children with SLI in two major respects. First, the model performed above chance levels on all conditions, suggesting that it had learned important generalizations about pronoun reference in English sen tences. Second, the speech impaired network found pronoun reference more difficult for longer sentences, compared to shorter ones. One question is whether the speech-impaired model was simply showing a devel opmental delay relative to the unimpaired model. A major claim about SLI is that it does represent a delay in normal development. Instead, it is an aberrant developmen tal profile in which normal levels of grammatical performance are never attained. To investigate whether this was the case here, the two networks were tested for pronoun 235 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ■ Simpl e-p ro n o u n □ S im ple-reflexive ■ C o m plex-pronoun ■ C om plex-reflexive Intact Perceptual Deficit M odel Figure 5.9: Comparison of the phonologically impaired and unimpaired sentence com prehension networks, on novel sentences. Results are broken down by sentence type (simple and complex-embedded) and pronoun type (reflexives and bound pronouns). errors on both sets of testing sentences at 100,000-trial intervals, over the course of training. Results are illustrated in Figure 5.10, and reiterate the finding that impaired network did have more difficulty with pronouns compared to the intact network, and that this difficulty was greater in the case of pronouns in complex sentences. In addi tion, these graphs indicate that the speech impaired network never reached perfect, or near-perfect performance, and that it settled into a pattern of producing errors for 5- 10% of bound pronouns in simple sentences, and 30% of bound pronouns in complex sentences. These results were confirmed in a two-way ANOVA performed for the effects of model type (impaired, unimpaired) and sentence type (simple, complex), with training iteration as the random variable. The dependent variable was the proportion of cor rect sentences for the model at a given time in training, for a specific sentence type. Main effects were found for model type (F = 99.923p < 0.001) and sentence type 236 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 100 > • « O 9 0 C O » 8 0 7 0 I 6 0 5 0 O A n o 40 ■ j r 3 0 S 20 ® 10 • Control ■ Phon. Deficit •• Training Iterations (x 10,000) 100 a> 90 co « 8 0 7 0 2 . 6 0 O ® 5 0 o o 4 0 ■ j r 3 0 20 I 10 • Control ■ Phon. Deficit a > Training Iterations (x 10,000) Figure 5.10: Comparison of the phonologically impaired and unimpaired sentence comprehension networks on 20 simple (top) and complex (bottom) novel sentences. 237 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (F = 553.298p < 0.001). In addition, a significant Model x Sentence type interaction was observed (F = 9.179p < 0.005) which indicated that the impaired network was producing a greater proportion of errors for the complex sentences over the course of training. 5.6.4 Discussion In this section I investigated how a network trained on noisy phonological word forms learned to comprehend sentences. Several important patterns emerged from testing this network on novel sentences. First, it is clear that the network was not so severely impaired that it could not perform the basic task of sentence comprehension. The network generally produced correct outputs for novel sentences, and in the few cases where it did misidentify a word for another, those words were of the correct grammat ical class, and shared similar phonology with the target forms. This result indicates that a deficit in perceiving words’ phonological forms had only a minimal impact on recognizing them, and that listeners can frequently use top-down information such as sentence context to recover from noisy phonological inputs. However, this same impairment had a significant impact in comprehending more abstract aspects of sentences, such as pronoun reference. Thus, the phonological ly impaired network showed a significant deficit in producing a pronoun’s referent (e.g., knowing when him refers to Bill, and not Harry), especially as sentence length in creased. The explanation for this type of deficit is that pronoun resolution is a dynamic task that relies on the retention of previous words in a sentence in working memory. The network used the recurrent connections between the hidden and cleanup layers to 238 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. encode these previously-presented words. However, the task of maintaining a working memory representation of previously-presented words is difficult, because of how con- nectionist architectures represent knowledge using neural activations and connection weights. Rather than having explicit access to the input representations of previously- presented words, the network’s ‘working memory’ encoded intermediate results in an abstract and distributed system. An important consequence of this was a certain degree of fragility in the network’s working memory representations. The addition of noise to the network’s phonological inputs weakened its ability to maintain accurate work ing memory representations of words over time, due to the nature of how the network implemented working memory. 5.7 Conclusions This chapter investigated syntactic comprehension impairments in language impaired children. Research has found deficits in pronoun resolution in these children, which have been taken to indicate that a module of linguistic competence is either impaired or missing in these children (van der Lely, Rosen, & McClelland, 1999). This in turn has lent support to the notion that an innate linguistic mechanism is used to learn and process language, and that normal language acquisition and processing is impossible in its absence (Chomsky, 1965; Pinker, 1989). The competing view of grammar suggests the task of language acquisition is not one of grammar identification, but that grammatical structure is acquired in the process of learning to comprehend utterances. An important aspect of this task is its dynamical nature, whereby syntactic comprehension involves the maintenance of a sentence’s 239 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. entire form in order to properly interpret its meaning. For example, small differences in the sentences in (5) signal a crucial difference in meaning. (5) The man was bitten by the dog The man was biting the dog The acquisition of syntactic structure was modeled using a connectionist network that encoded grammar as statistical regularities that it learned as it performed a sentence recognition task. This model dealt with only one aspect of syntactic structure, specif ically how the interpretation of pronouns depends on syntactic context. However, the fact that the model was able to acquire these principles leaves open the possibility that other syntactic dependencies are also leamable through similar means. For example, recent work in this domain suggests that such a model can learn broader, more complex aspects of syntax (Rohde, 1999; Harm et al., 2000). The results presented in this chapter suggest that processing syntactic information depends on apparently non-syntactic factors, such as the impact of phonological input on working memory capacity. Work by Daneman and Carpenter (1983), King and Just (1991), MacDonald and Christiansen (in press) and others suggests that variability in normal listeners’ putative working memory span can tend to predict their ability to process more complex syntactic structures. For instance, normal adults with a higher working memory span tend to perform better on sentence comprehension than those with lower spans. This work extends this notion to children with specific language impairments, suggesting that these children are at the lower extreme of this continuum. As a result, they show a particularly weak sentence comprehension profile. 240 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In a broader sense, these results are important because they indicate that phonology plays an important role in the acquisition and use of syntactic structure. Traditional generative models of grammar have suggested that modules of grammar like syntax and phonology represent distinct processing capacities that do not tend to influence one another. The present work suggests a different model of grammar and processing in which aspects of language interact in complex ways. In the case of children with SLI, this interaction involves the influence of phonological representations on sentence comprehension, by virtue of the influence of speech processing on working memory and sentence recognition. 241 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 6 Conclusion: Many Networks, One Model This dissertation presents a connectionist approach to language that seeks to capture a broad range of phonological phenomena using a simple model that maps sound and meaning. This model is illustrated in Figure 6.1, and represents an abstract charac terization of what a fairly complete implementation would look like. The relationship of this model to the theory of Connectionist Phonology is straightforward; the model is the theory. It embodies all the basic PDP assumptions laid out in the Introduction, and includes characterizations of all the types of cognitive capacities that are proposed to penetrate phonological knowledge, including auditory perception, articulatory plan ning and control, orthography semantic memory, and pragmatics. A central claim of this theory is that various types of linguistic knowledge derive from how these abilities interact within the mind/brain. These include the types of phenomena that are more commonly instantiated in formal theories of phonology: crosslinguistic trends in phoneme inventories and syl lable structure, abstract segmental and suprasegmental representations in languages, and patterns of language acquisition. However, these also include phenomena usually 242 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Auditory Perception Articulation Sound Meaning Figure 6.1: The Connectionist Phonology model. treated in the psycholinguistic literature, most notably data from language disorders. A main point of this dissertation is that there is no qualitative distinction between the importance of these data to a general theory of language and the mind. They are all important to understanding language at the formal and neural levels of analysis. Throughout this dissertation I have developed networks that represent discrete sub components of this larger model. Various simplifying assumptions have been made in each case in order to better focus on the issues at hand while abstracting away from other elements that appear to be less crucial. The use of simplified models allows me to address specific issues in phonology within a narrower context while setting aside factors that are less directly involved in the processes that are being simulated. This use of simplifying assumptions is part and parcel of any modeling account. By definition all models of cognitive processes involve reducing a problem to a more tractable size; otherwise they would not be models. It is simply assumed that these simplifications 243 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. are not changing the character of the results in a way that makes them less valid to our understanding of the more complete system. In Chapter 1, I developed models of speech perception that learned to recognize speech sounds from their spectral information (Figure 6.2). This task is nontrivial, given the extent of within- and between-speaker variation in speech. One interesting characteristic of this is that the extent of this variability differs from one speech sound to the next. As such, some vowels are more likely to be produced with greater vari ability (§1.1). Likewise, this tendency varies with syllabic context, such that place of articulation in stop consonants is more difficult to identify in coda position than in on set position (§1.2). Savings in learning and increased generalization in these models reflected cross linguistic tendencies, supporting the assertion that general characteris tics of how speech is produced and perceived can influence learning and processing of phonological systems. Sound Auditory Perception ^ — * Figure 6.2: Perception models. Chapter 1 also developed a simple model of how speech production is planned (Figure 6.3). The purpose of this model was to explore how learning temporal tasks could contribute to the ‘phonologization’ or ‘grammaticization’ of phonetic principles into more categorical behaviors, due to the influence of statistical preferences that exist 244 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. in the language input available to the language learner. A model of speech production was developed that learned to produce a variety of English syllable shapes, segment- by-segment. Statistical regularities in English influenced how well the model was able to perform this task, indicated by slower learning on more marked sequences like VC and VCC, compared to the more typical CV and CVC shapes. This was in spite of the fact that the network’s representation of onset and coda consonants was identical. What differed between them was the frequency with which it was exposed to specific sequences of phonemes. Here again, a small implementation of a linguistic task was informative in understanding how basic computational characteristics of neural sys tems influence phonological systems. Sound Articulation Figure 6.3: Production models. Chapter 2 used an amplified version of this production model to explore stage-like errors in Dutch stress acquisition (Figure 6.3). This model was used to investigate learning the task of producing words’ phonological forms, including stress. Over the course of training, the model produced different types of errors that were consistent with what is observed in children learning a first language. These errors seem to reflect how the model is learning the complex principles that govern stress assignment, but derive in particular from the temporal nature of the task it had to learn. 245 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The models used in the second part of this work integrated not only how speech information is perceived and produced, but also how information about meaning inter acts with it. One interesting possibility is that morphology is the result of this interac tion between sound and meaning (Gonnerman, 1998). This was explored in Chapter 4, with respect to how deficits in morphological processing could derive from a more ba sic impairment in perceiving and processing speech. The model in Figure 6.4 acquired English verb morphology by mapping verbs’ meanings and sounds in various ways, in order to simulate production, repetition, comprehension and the transformation of present tense verbs into past tenses. When a speech processing deficit was simulated in the same model, a morphological impairment was observed that specifically affected generalization to novel forms. Meaning Sound Figure 6.4: Model of morphology. A similar architecture was used to address a completely different type of phe nomenon in Chapter 5 (Figure 6.5). A model of sentence processing was obtained by adding a type of working memory that allowed the network to maintain abstract repre sentations of its previous states over multiple iterations of activation propagation. As a result, the network was able to map meaning to sound not only for individual words, 246 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. but for meaning sequences of words. Some aspects of normal sentence comprehen sion were simulated in this network, including the development of thematic roles, and anaphora resolution. ‘Working Memory' Sound Meaning Figure 6.5: Model of sentence comprehension. The importance of phonological representations to this model was illustrated by inducing a phonological deficit in it. The resulting model still performed adequately on sentence comprehension, as it did not tend to misidentify individual words in sen tences. However, its ability to maintain working memory representations of sentences for the purpose of resolving bound pronouns and reflexives was impaired. Results from the impaired and intact networks speak to theories of language mod ularity, suggesting that phonological and syntactic information are not separately en coded in the mind/brain. They are instead intertwined within a single model of lan guage processing. The speech impaired network illustrates one way in which this plays out. Difficulties in accurately perceiving speech translate into subtle syntactic comprehension deficits because of the role that phonological representations play in maintaining sentences in working memory. 247 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6.1 Conclusion A primary goal of any formal theory of language is to understand linguistic behaviors as the consequence of some basic set of broadly-applying principles. In that sense, the Connectionist Phonology framework is no different from theories that have come before it. What sets it apart is how it applies model-based Connectionism to a set of linguistic phenomena in order to arrive at a better understanding of how the principles of Parallel Distributed Processing determine language behaviors. The range of data considered in this work is intentionally broad, because a basic claim of this approach is that it is relevant to many types of data. A consequence of this is that it cannot be thought of as a complete treatment of all these areas of inquiry. Clearly, there are many types of phonological phenomena that remain to be worked out in this framework. In this work I have laid out the general mechanisms by which this framework can be used to address this larger set of problems; the use of model-based Connectionism to simulate how linguistic knowledge is acquired incidentally in the course of learning the mappings between sound, articulation and meaning. 248 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. References Albright, A. (1998). Phonological subregularities in productive and unproductive inflectional classes: Evidence from Italian. Unpublished master’s thesis, UCLA, Los Angeles CA. Albright, A., & Hayes, B. (1999). Burnt & splang: Some issues in morphological learning theory. Talk presented to USC Language and Cognitive Neuroscience Group. Allen, J. (1997). Probabilistic constrainst in acquisition. In A. Sorace, C. Heycock, & R. Shillcock (Eds.), Proceedings o f the GALA ’97 conference on Language Acquisition (p. 300-305). Edinburgh, Scotland: University of Edinburgh. Allen, J., & Seidenberg, M. S. (1999). The emergence of grammaticality in con nectionist networks. In B. MacWhinney (Ed.), The emergence o f language (p. 115-152). Mahwah NJ: Lawrence Erlbaum Associates. Archangeli, D., & Pulleyblank, D. (1994). Grounded phonology. MIT Press. Aronoff, M. (1976). Word formation in generative grammar. Cambridge, MA: MIT Press. Baayen, R. H., Piepenbrock, R., & van Rijn, H. (1993). The CELEX lexical database (CD-ROM) (Tech. Rep.). Philadelphia, PA: Linguistic Data Consortium, Uni versity of Pennsylvania. Baddeley, A. (1986). Working memory and comprehension. In D. Broadbend, J. Mc- Gaugh, E. Tulving, & L. Weiskrantz (Eds.), Working memory (p. 54-108). New York: Oxford University Press. Bauer, L. (1983). English word-formation. Cambridge, UK: Cambridge University Press. Beckman, M. E., Jung, T.-P., Lee, S., de Jong, K., Krishnamurthy, A. K., Ahalt, S. C., & Cohen, K. B. (1995). Variability in the production of quantal vowels revisited. Journal o f the Acoustical Society o f America, 97, 471 -90. 249 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Bellugi, U., Lichtenberger, L., Mills, D., Galaburda, A., & Korenberg, J. R. (1999). Bridging cognition, the brain and mollecular genetics" evidence from Williams syndrome. Trends in Neurosciences, 22(5), 197-207. Benton, A. (1964). Developmental aphasia and brain damage. Cortex, 1, 40-52. Benua, L. (1997). Transderivational identity: Phonological relations between words. Unpublished doctoral dissertation, (University of Massachusetts. Bernstein, L. E., & Stark, R. E. (1985). Speech perception development in language- impaired children: A 4-year follow-up study. Journal o f Speech and Hearing Disorders, 50, 21-30. Berwick, R. C. (1997). Syntax facit saltum: Computation and the genotype and phenotype of language. Journal o f Neurolinguistics, 10(2/3), 231-249. Bird, J., & Bishop, D. V. M. (1992). Perception and awareness of phonemes in phono- logically impaired children. European Journal o f Disorders o f Communication, 27, 289-311. Bishop, D. V . M. (1989). Test fo r reception of grammar (2 ed.). University of Manch ester: Age and Cognitive Performance Center. Bishop, D. V . M. (1997). Listening out for subtle deficits. Nature, 387, 129-130. Bishop, D. V . M., North, T., & Donlan, C. (1996). Nonword repetition as a behavioral marker for inherited languge development: Evidence from a twin study. Journal o f Child Psychology and Psychiatry, 37, 391-403. Blevins, J. (in press). The independent nature of phonotactic constraints: An alter native to syllable-based approaches. In C. Fery & R. van der Vijver (Eds.), The syllable in optimality theory. Cambridge University Press. Boe, L.-J., Schwartz, J.-L., 8c Valee, N. (1994). The prediction of vowel systems: Perceptual contrast and stability. In E. Keller (Ed.), Fundamentals o f speech synthesis and speech recognition (pp. 185-213). John Wiley 8c Sons. Boersma, P . (1998a). Functional phonology: Formalizing the interactions between articulatory and perceptual drives. The Hague: Holland Academic Graphics. Boersma, P . (1998b, June). Typology and acquisition in functional and arbitrary phonology. (Presented at Utrecht Phonology Workshop) Boersma, P., & Hayes, B. (1999). A ranking algorithm fo r free variation. Talk presented at 73rd Annual Meeting of the Linguistics Society of America. Booij, G. E. (1995). The phonology o f Dutch. Oxford UK: Oxford University Press. 250 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Borer, H. (1998). The morphology-syntax interface. In A. Spencer & A. Zwicky (Eds.), Morphology. Blackwell. Boysson-Bardies, B. de, Vihman, M. M., Rough-Hellichius, L., Durand, C., Land- berg, I., & Arao, F. (1992). Material evidence of infant selection from target language: A crosslinguistic study. In C. A. Ferguson, L. Menn, & C. Stoel- Gammon (Eds.), Phonological development: models, research, implications (p. 369-391). Timonium, MD: York Press. Browman, C. P., & Goldstein, L. (1992). Articulatory phonology: An overview. Phonetica, 49(3-4), 155-180. Brown, R. (1973). A first language: The early stages. London: George Allen and Unwin. Bybee, J., & Slobin, D. (1982). Rules and schemas in the development and use of the English past. Language, 58, 265-289. Byrd, D. (1994). Articulatory timing in english consonant sequences. Unpublished doctoral dissertation, UCLA. Carpenter, A. F., Georgopoulos, A. P., & Pellizzer, G. (1999). Motor cortical encoding of serial order in a context-recall task. Science, 283, 1752-1757. Chen, M. (1970). Vowel length as a function of the voicing of the consonant envorin- ment. Phonetica, 22, 129-159. Chien, Y.-C., & Wexler, K. (1990). Children’s knowledge of locality conditions in binding as evidence for the modularity of syntax and pragmatics. Language Acquisition, 1, 225-295. Chomsky, N. (1961). Some methodological remarks on generative grammar. Word, 17, 219-239. Chomsky, N. (1964). Syntactic structures. The Hague: Mouton. Chomsky, N. (1965). Aspects o f the theory o f syntax. Cambridge, MA: MIT Press. Chomsky, N. (1981). Lectures in government and binding. Dordrecht: Foris. Chomsky, N. (1986). Knowledge o f language: Its nature, origin and use. New York: Praeger. Chomsky, N., & Halle, M. (1968). The sound pattern o f English. MIT Press. Clahsen, H. (1987). The grammatical characterization of developmental dysphasia. Linguistics, 27, 897-920. 251 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Clahsen, H., & Almazan, M. (1998). Syntax and morphology in Williams syndrome. Cognition, 68, 167-198. Clark, R. (1992). The selection of syntactic knowledge. Language Acquisition, 2, 83-149. Clements, G. N. (1985). The geometry of phonological features. Phonology Yearbook, 2, 225-52. Clements, G. N., & Hume, E. V. (1996). The internal organization of speech sounds. In J. Goldsmith (Ed.), The handbook o f phonology. Blackwell. Clements, G. N., & Sezer, E. (1982). Vowel and consonant disharmony in Turkish. In H. van der Hulst & N. Smith (Eds.), The structure o f phonological representa tions part ii (p. 213-255). Dordrecht: Foris. Curtin, S. (1999). Size restrictors in the acquisition o f prosodic structure. Tucson AZ. (To appear in Proceedings of WCCFL 18, University of Arizona) Curtin, S. (2000). Explaining overlapping stages in prosodic development. Chicago, IL. (Talk delivered at the 74th Annual Meeting of the Linguistics Society of America) Curtiss, S., & Tallal, P. (1991). On the nature of the impairment in language-impaired children. In J. Miller (Ed.), Research on child langauge disorders. PRO-ED. Dalalakis, J. (1994). Developmental language impairment in Greek. In McGill working papers in linguistics (Vol. 10, p. 216-227). Montreal, Canada: McGill University Department of Linguistics. Dalemans, W., Gillis, S., & Durieux, G. (1994). The acquisition of stress: A data- oriented approach. Computational Linguistics, 5(20), 420-451. Daneman, M., & Carpenter, P. A. (1983). Individual differences in integrating in formation between and within sentences. Journal o f Experimental Psychology: Learning, Memory and Cognition, 9, 561-584. Daneman, M., & Tardif, T. (1987). Working memory and reading skill re-examined. In M. Coltheart (Ed.), Attention and performance XII (p. 491-508). Lawrence Erlbaum Associates. Daugherty, K., & Seidenberg, M. S. (1992). Rules or connections? The past tense revisited. In Proceedings o f the fourteenth annual conference o f the cognitive science society (p. 259-264). Hillsdale, NJ: Lawrence Erlbaum Associates. 252 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Davis, M. H., Marslen-Wilson, W. D., & Gaskell, M. G. (1997). Ambiguity and competition in lexical segmentation. In Proceedings of the nineteenth annual conference o f the cognitive science society (Vol. 19, p. 167-172). Mahwah, NJ: Laurence Erlbaum. Davis, S., & Napoli, D. J. (1994). A prosodic template in historical change: The pas sage o f Latin second conjugation into Romance. Torino: Rosenberg & Sellier. de Jong, D. (1989). The syntax-phonology interface and variable data: The case of french liaison. In K. Hall, M. Meacham, & R. Shapiro (Eds.), Proceedings o f the fifteenth annual meeting o f the Berkeley Linguistics Society (p. 37-47). Berkeley Linguistics Society. Denes, P. (1955). Effect of duration on the perception of voicing. Journal o f the Acoustical Society o f America, 27, 761-764. Dick, F., Bates, E., Wulfeck, B., Utman, J., & Dronkers, N. (1999). Language deficits, localization, and grammar: Evidence fo r a distributed model o f language break down in aphasics and normals (Tech. Rep.). La Jolla, CA: UCSD Center for Research in Language. Dresher, B. E. (1999). Charting the learning path: Cues to parameter setting. Linguistic Inquiry, 30(1), 27-67. Dresher, B. E., & Kaye, J. D. (1990). A computational learning model for metrical theory. Cognition, 34, 137-195. Dromi, E., Leonard, L., & Shteiman, M. (1993). The grammatical morphology of Hebrew-speaking children with specific language impairment: Some competing hypotheses. Journal of Speech and Hearing Research, 36, 760-771. Edwards, J., & Lahey, M. (1998). Nonword repetitions in children with specific language impairment: Exploration of some explanations for their inaccuracies. Applied Psycholinguistics, 19, 279-309. Elliott, L. L., Hammer, M. A., & Scholl, M. E. (1989). Fine-grained auditory dis crimination in normal children and children with language-learning problems. Journal o f Speech and Hearing Disorders, 54, 112-119. Elman, J. L. (1993). Learning and development in neural networks: The importance of starting small. Cognition, 48, 71-99. Elman, J. L., Bates, E. A., Johnson, M. H., Karmiloff-Smith, A., Parisi, D., & Plunkett, K. (1996). Rethinking innateness: A connectionist perspective on development. Cambridge, MA: MIT Press. 253 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Ewing, A. W. G. (1930). Aphasia in children. Oxford University Press. Fikkert, P. (1994). On the acquisition o f prosodic structure. HIL dissertations. Flemming, E. (1995). Auditory representations in phonology. Unpublished doctoral dissertation, UCLA, Los Angeles, CA. Flemming, E. (1997). Phonetic optimization: Compromise in speech production. In V. Miglio & B. Moren (Eds.), Proceedings o f the Hopkins Optimality Theory Workshop. Baltimore, MD. Fodor, J. (1983). Modularity o f mind. Cambridge, MA: MIT Press. Fodor, J. A., & Pylyshyn, Z. W. (1988). Connectionism and cognitive architecture: A critical analysis. Special issue: Connectionism and symbol systems. Cognition, 28, 3-71. Fodor, J. D., Bever, T., & Garrett, M. F. (1974). The psychology o f language. New York: McGraw-Hill. Francis, W. N., & Kucera, H. (1982). Frequency analysis of English usage. Boston: Houghton-Miffl in. Frisch, S. (1996). Similarity and frequency in phonology. Unpublished doctoral dissertation, Northwestern University, Evanston, IL. Fukuda, S., & Fukuda, S. (1994). Developmental language impairment in Japanese: A linguistic investigation. In J. Matthews (Ed.), McGill working papers in lin guistics (Vol. 10). McGill University Department of Linguistics. Gathercole, S. E., & Baddeley, A. D. (1990). Phonological memory deficits in lan guage disordered children: Is there a causal connection? Journal o f Memory and Language, 29, 336-360. Gentner, D. (1982). Why nouns are learned before verbs: Linguistic relativity ver sus natural partitioning. In S. A. Kuczaj (Ed.), Language development (Vol. 2: Language Thought and Culture, p. 301-334). Lawrence Erlbaum Associates. Gleitman, L. (1990). The structural sources of verb meanings. Language Acquisition, 1, 3-55. Gleitman, L. R., Newport, E. L., & Gleitman, H. (1984). The current status of the motherese hypothesis. Journal o f Child Language, 11, 43-79. Gold, E. M. (1967). Language identification in the limit. Information and Control, 10, 447-74. 254 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Goldsmith, J. (1990). Autosegmental and metrical phonology. Cambridge MA: Basil Blackwell. Gonnerman, L. M. (1998). Morphology and the lexicon: Exploring the semantics- phonology interface. Unpublished doctoral dissertation, University of Southern California, Los Angeles, CA. Gopnik, M. (1990). Feature-blind grammar and dysphasia. Nature, 344, 715. Gopnik, M. (1997). Language deficits and genetic factors. Trends in Cognitive Sci ences, 1(1), 5-9. Gopnik, M L , & Crago, M. (1991). Familial aggregation of a developmental language disorder. Cognition, 39, 1-50. Gopnik, M., & Goad, H. (1997). What underlies inflectional errors in SLI? Journal o f Neurolinguistics, 10, 129-137. Grodzinsky, Y . (1990). Theoretical perspectives on language deficits. Cambridge MA: MIT Press. Grodzinsky, Y . (1995). A restrictive theory of agrammatic comprehension. Brain and Language, 50, 27-51. Guenther, F. (1995). Speech sound acquisition, coarticulation, and rate effects in a neural network model of speech production. Psychological Review, 102, 594- 621. Gussenhoven, C. (2000). Vowel duration, syllable quantity and stress in Dutch (Rut gers Optimality Archive No. ROA-381-02100). University of Nijmegen. Halle, M., & Marantz, A. (1993). Distributed Morphology and the pieces of inflection. In K. Hale & S. J. Keyser (Eds.), The view from building 20 (p. 111-176). MIT Press. Halle, M., & Mohanan, K. P. (1985). Segmental phonology of Modem English. Linguistic Inquiry, 16(\), 57-116. Halle, M., & Vergnaud, J. R. (1987). An essay on stress. Cambridge, MA: MIT Press. Hammond, M. (1990). Parameters of metrical theory and leamability. In I. Roca (Ed.), Logical issues in language acquisition (p. 47-62). Dordrecht: Foris. Hammond, M. (1999). The phonology o f english. Oxford UK: Oxford University Press. 255 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Harm, M. W., & Seidenberg, M. S. (1999). Phonology, reading, and dyslexia: Insights from connectionist models. Psychological Review, 163(3), 491-528. Harm, M. W., Thornton, R„ & MacDonald, M. C. (2000). A distributed, large scale connectionist model of the interaction o f lexical and semantic constraints in syn tactic ambiguity resolution. Poster presented at the 13th Annual CUNY Confer ence on Human Sentence Processing, La Jolla, CA. Hayes, B. (1991). A metrical theory o f stress. Chicago IL: Chicago University Press. Hayes, B. (1997). Phonetically driven phonology: The role of Optimality Theory and inductive grounding. In Proceedings o f the 1996 Milwaukee Conference on Formalism and Functionalism in Linguistics. Milwaukee WI. Hinton, G. E. (1989). Connectionist learning procedures. Artificial Intelligence, 40, 185-234. Hinton, G. E., & Shallice, T. (1991). Lesioning an attractor network: Investigations of acquired dyslexia. Psychological Review, 95(1), 74-95. Hoeffner, J. J. (1996). Are rules a thing o f the past? A single mechanism account o f English past tense acquisition and processing. Unpublished doctoral disserta tion, Carnegie Mellon University, Pittsburgh, PA. Hoeffner, J. J., & McClelland, J. L. (1993). Can a perceptual processing deficit explain the impairment of inflectional morphology in development dysphasia? A com putational investigation. In E. Clark (Ed.), Proceedings the 25th annual Child Language Research Forum (Vol. 25, p. 38-49). Stanford CA: Center for the Study of Language and Information. Hulst, H. van der. (1984). Syllable structure and Dutch. Dordrecht: Foris. Inkelas, S., Orgun, C., &Zoll, C. (1996). Exceptions and static phonological patterns: cophonologies vs. prespecification. (Unpublished Manuscript: U.C. Berkeley and University of Iowa) Inkelas, S., & Zee., D. (1995). Syntax-phonology interface. In J. Goldsmith (Ed.), The handbook o f phonological theory. Oxford: Blackwell. Ito, J., & Mester, A. (1986). The phonology of voicing in Japanese. Linguistic Inquiry, 77,49-73. Ito, J., & Mester, A. (1995). Japanese phonology. In J. Goldsmith (Ed.), The handbook o f phonological theory. Cambridge MA: Blackwell. 256 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Ito, J., & Mester, A. (1997). Sympathy theory and german truncations. In V. Miglio & B. Moren (Eds.), Selected papers from h-ot-97 (p. 117-138). University of Maryland Working Papers in Linguistics. Ito, J., & Mester, A. (1998). Markedness and word structure: OCP effects in Japanese (Tech. Rep. No. ROA: 255-0498). UC Santa Cruz. Jakobson, R. (1941/1962). Kindersprache, aphasie und allgemeine lautgesetze. In Selected writings I (p. 328-401). The Hague: Mouton. Joanisse, M. E (1999). Exploring syllable structure with connectionist networks. In Proceedings o f the XlVth International Congress of Phonetic Sciences. San Francisco CA. Joanisse, M. F , & Curtin, S. (1999). OT and connectionist approaches to Dutch stress acquisition. In Proceedings o f the 5th annual southwest optimality theory conference. UC San Diego: Linguistic Notes from La Jolla. Joanisse, M. F., Manis, F., Keating, P., & Seidenberg, M. (2000). Language deficits in dyslexic children: Speech perception, phonology and morphology. Journal of Experimental Child Psychology, 77( 1), 30-60. Joanisse, M. F., & Seidenberg, M. S. (1997). [i e a u] and sometimes [o]: Perceptual and computational constraints on vowel inventories. In Proceedings o f the nine teenth annual conference o f the Cognitive Science Society (Vol. 19, p. 331-336). Mahwah, NJ: Laurence Erlbaum. Joanisse, M. F., & Seidenberg, M. S. (1998a). Functional bases of phonological uni versal: A connectionist account. In Proceedings o f the 24th meeting o f the Berkeley Linguistics Society (p. 335-345). Berkeley, CA: University of Califor nia, Berkeley. Joanisse, M. F., & Seidenberg, M. S. (1998b). Specific Language Impairment: A deficit in grammar or processing? Trends in Cognitive Sciences, 2(7), 240-247. Joanisse, M. F., & Seidenberg, M. S. (1999). Impairments in verb morphology follow ing brain injury: A connectionist model. Proceedings of the National Academy o f Sciences, USA, 96(13), 7592-7597. Jusczyk, P. (1997). The discovery o f spoken language. Cambridge MA: MIT press. Kager, R. (1989). A metrical theory o f stress and destressing in English and Dutch. Dordrecht: Foris. Kamhi, A., Catts, H., Mauer, D., Apel, K., & Gentry, B. (1988). Phonological and spatial processing abilities in language- and reading-impaired children. Journal o f Speech and Hearing Disorders, 53, 316-327. 257 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Kamhi, A. G., & Catts, H. W. (1986). Toward an understanding of developmental lan guage and reading disorders. Journal o f Speech and Hearing Disorders, 5/(4), 337-347. Kaun, A. (1995). The typology of rounding harmony: An optimality theoretic ap proach. Unpublished doctoral dissertation, UCLA. Kelly, M. H. (1992). Using sound to solve syntactic problems: The role of phonology in grammatical category assignments. Psychological Review, 99(2), 349-364. Kent, R. D., Dembowski, J., & Lass, N. J. (1996). The acoustic characteristics of American English. In N. J. Lass (Ed.), Principles o f experimental phonetics. Mosby. King, J., & Just, M. A. (1991). Individual differences in syntactic processing: The role of working memory. Journal o f Memory and Language, 30, 580-602. Kirchner, D., & Klatzky, R. L. (1985). Verbal rehearsal and memory in language- disordered children. Journal o f Speech and Hearing Research, 28, 556-565. Klatt, D. (1990). Analysis, synthesis, and perception of voice quality variations among female and male talkers. The Journal o f the Acoustical Society o f America, 87, 820-57. Krakow, R. A. (1999). Physiological organization of syllables: A review. Journal o f Phonetics, 27(1), 23-54. Kraus, N., McGee, T. J., Carrell, T. D., Zecker, S. G., Nicol, T. G., & Koch, D. B. (1996). Auditory neurophysiologic responses and discrimination deficits in chil dren with learning problems. Science, 273, 971-973. Kuhl, P. K., Andruski, J. A., Chistovich, I. A., Chistovich, L. A., Kozhevnikova, E. V., Ryskina, V. L., Stolyarova, E. I., Sundberg, U., & Lacerda, F. (1997). Cross language analysis of phonetic units in language addressed to infants. Science, 277, 684-686. Leonard, L. (1993). The use of morphology by children with specific language im pairment: Evidence from three languages. In R. Chapman (Ed.), Processes in language acquisition and disorders. St Louis: Mosby-Yearbook. Leonard, L. B. (1982). Phonological deficits in children with developmental language impairment. Brain and Language, 16, 73-86. Leonard, L. B. (1987). Specific language impairment in children: A cross-linguistic study. Brain and Language, 32, 233-252. 258 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Leonard, L. B. (1998). Children with Specific Language Impairment. Cambridge, MA: MIT Press. Leonard, L. B., Bortolini, U., Caselli, M., McGregor, M., & Sabbadini, L. (1992). Morphological deficits in children with specific language impairment: The status of features in the underlying grammar. Language Acquisition, 2, 151-179. Leonard, L. B., & Dromi, E. (1994). The use of Hebrew verb morphology by children with specific language impairment and children developing language normally. First Language, 14, 283-304. Leonard, L. B., & Eyer, J. A. (1996). Deficits of grammatical morphology in children with Specific Language Impairment and their implications for notions of boot strapping. In J. L. Morgan & K. Demuth (Eds.), Signal to syntax (p. 233-248). Mahwah, NJ: Lawrence Erlbaum and Associates. Liljencrants, J., & Lindblom, B. (1972). Numerical simulation of vowel quality sys tems: The role of perceptual contrast. Journal o f Phonetics, 48(4), 839-862. Lindblom, B. (1986). Phonetic universals in vowel systems. In J. Ohala (Ed.), Exper imental phonology. Academic Press. Lindblom, B., MacNeilage, P., & Studert-Kennedy, M. (1984). Self-organizing processes and the explanation of phonological universals. In B. Butterworth, B. Comrie, & O. Dahl (Eds.), Explanations fo r language universals. Berlin: Mouton. Ludlow, C. L., Cudahy, E. A., Bassich, C., & L. Brown, G. erald. (1983). Auditory processing skills of hyperactive, language-impaired and reading-disabled boys. In E. Lasky & J. Katz (Eds.), Central auditory processing disorders: Problems o f speech, language and learning (ip. 163-184). Baltimore: University Park Press. MacDonald, M. C., & Christiansen, M. H. (in press). Individual differences without working memory: A reply to Just & Carpenter and Waters & Caplan. (In press, Psychological Review) MacDonald, M. C., Pearlmutter, N. J., & Seidenberg, M. S. (1994). The lexical nature of syntactic ambiguity resolution. Psychological Review, 101(4), 676-703. Macnamara, J. (1972). Cognitive basis of language learning in infants. Psychological Review, 79, 1-13. MacWhinney, B., & Leinbach, J. (1991). Implementations are not conceptualizations: Revising the verb learning model. Cognition, 29, 121-157. Maddieson, I. (1984). Patterns o f sounds. Cambridge: Cambridge University Press. 259 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Magen, H. S., & Blumstein, S. E. (1993). Effects of speaking rate on the vowel length distinction in Korean. Journal o f Phonetics, 21, 387-409. Mann, V. A., & Repp, B. H. (1981). Influence of preceding fricative on stop consonant perception. Journal o f the Acoustical Society of America, 69, 548-558. Manuel, S. Y . (1990). The role of contrast in limiting vowel-to-vowel coarticulation in different languages. Journal o f the Acoustical Society o f America, 88, 1286- 1298. Marcus, G., Ullman, M., Pinker, S., Hollander, M., Rosen, T. J., & Xu, F. (1992). Overregularization in language acquisition. Monographs o f the Society fo r Re search in Child Development, 57. Marcus, G. F. (1998). Rethinking eliminative connectionism. Cognitive Psychology, 37, 243-282. Marcus, M., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 79,313-330. Markman, E. M. (1989). Categorization and naming in children. Cambridge, MA: MIT Press. McCandliss, B. D., Fiez, J. A., Conway, M., & McClelland, J. L. (1999). Eliciting adult plasticity for Japanese adults struggling to identify english /r/ and /l/: In sights from a hebbian model and a new training procedure. Journal o f Cognitive Neuroscience, Suppl. S, 53. McCarthy, J. (1997). Sympathy and phonological opacity (Tech. Rep. No. ROA 252- 0398). University of Maryland. McCarthy, J. J., & Prince, A. S. (1993). Prosodic morphology I: Constraint interac tion and satisfaction, (ms: University of Massachusetts, Amherst and Rutgers University) McCarthy, J. J., & Prince, A. S. (1995). Faithfulness and reduplicative identity. In J. Beckman, L. W. Diskey, & S. Urbanczyk (Eds.), University o f massachusetts occasional papers in linguistics 18: Papers in optimality theory (p. 249-384). Amherst MA: GLSA. McClelland, J. L., & Elman, J. L. (1986). The TRACE model of speech perception. Cognitive Psychology, 78(1-86). McClelland, J. L., Rumelhart, D. E., & the PDP Research Group (Eds.). (1986). Par allel distributed processing: Explorations in the microstructure o f cognition. Volume 2: Psychological and biological models. Cambridge, MA: MIT Press. 260 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. McCutchen, D., Dibble, E., & Blount, M. M. (1994). Phonemic effects in reading- comprehension and text memory. Applied Cognitive Psychology, 8(6), 597-611. Merzenich, M. M., Jenkins, W. M., Johnston, P., Schreiner, C., Miller, S. L., & Tallal, P. (1996). Temporal processing deficits of language-learning impaired children ameliorated by training. Science, 271, 77-81. Miller, G. A. (1990). WordNet: An on-line lexical database. International Journal o f Lexicography, 3, 235-312. Montgomery, J. W. (1995). Sentence comprehension in children with specific language impairment: The role of phonological working memory. Journal o f Speech and Hearing Research, 38, 187-199. Nation, K., 8c Snowling, M. J. (1999). Developmental differences in sensitivity to se mantic relations among good and poor comprehenders: evidence from semantic priming. Cognition(70), B1-B13. Newell, A., & Simon, H. A. (1972). Human problem solving. Englewood Cliffs, NJ: Prentice-Hall. Newport, E. L. (1990). Maturational constraints on language learning. Cognitive Science, 14, 11-28. Noyer, R. (1997). Features, positions and affixes in autonomous morphological struc ture. New York: Garland Publishing. Ohala, J. J. (1990). There is no interface between phonology and phonetics: A personal view. Journal o f Phonetics, 18, 153-171. Pater, J. (1995). On the nonuniformity o f weight-to-stress and stress preservation effects in English. (Unpublished Manuscript, McGill University) Pearlmutter, B. A. (1995). Gradient calculation for dynamic recurrent neural networks: a survey. IEEE Transactions on Neural Networks, 6(5), 1212-1228. Perkell, J. S., & Nelson, W. L. (1985). Articulatory targets and pseech motor control: A study of vowel production. In S. Grillner, A. Persson, B. Lindblom, & J. Lubker (Eds.), Speech motor control (p. 187-204). New York: Pergamon. Pinker, S. (1989). Leamability and cognition: The acquisition o f argument structure. Cambridge, MA: MIT Press. Pinker, S. (1991). Rules of language. Science, 253, 530-535. Pinker, S. (1999). Words and rules. Perseus Books. 261 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Pinker, S., & Prince, A. (1988). On language and connectionism: Analysis of a parallel distributed processing model of language acquisition. Cognition, 28(1- 2), 73-193. Plaut, D. C., & Kello, C. T. (1999). The emergence of phonology from the interplay of speech comprehension and production: A distributed connectionist approach. In B. MacWhinney (Ed.), The emergence o f language (p. 381-415). Mahwah, NJ: Lawrence Erlbaum Associates. Plaut, D. C., McClelland, J. L., Seidenberg, M. S., & Patterson, K. E. (1996). Un derstanding normal and impaired word reading: Computational principles in quasi-regular domains. Psychological Review, 705(1), 56-115. Plaut, D. C., & Shallice, T. (1993). Deep dyslexia: A case study of connectionist neuropsychology. Cognitive Neuropsychology, 10(5), 377-500. Plunkett, K., & Marchman, V. (1993). From rote learning to system building: Ac quiring verb morphology in children and connectionist nets. Cognition, 48( 1), 21-69. Prasada, S., & Pinker, S. (1993). Generalisation of regular and irregular morphological patterns. Language and Cognitive Processes, 5(1), 1-56. Prince, A., & Smolensky, P. (1993). Optimality Theory: Constraint interaction in generative grammar (Tech. Rep.). Rutgers University. Rapin, I., & Allen, D. (1983). Developmental language disorders: Nosologic consid erations. In U. Kierk (Ed.), Neuropsychology o f language, reading, and spelling (p. 155-184). New York: Academic Press. Redford, M. A., & Diehl, R. L. (1999). The relative perceptual distinctiveness of initial and final consonants in CVC syllables. Journal o f the Acoustical Society o f America, 106(3), 1555-1565. Rice, K., & Avery, P. (1995). Variability in a deterministic model of language acqui sition: A theory of segmental elaboration. In J. Archibald (Ed.), Phonological acquisition and phonological theory. Mahwah NJ: Lawrence Erlbaum Asso ciates. Rice, M. L., & Wexler, K. (1996). A phenotype of specific language impairments: Extended optional infinitives. In M. L. Rice (Ed.), Toward a genetics o f language (p. 215-237). Mahwah, NJ: Lawrence Erlbaum Associates. Rohde, D. (1999). A connectionist model o f sentence comprehension and production. (Unpublished PhD dissertation proposal, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA.) 262 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Rohde, D. L., & Plaut, D. C. (1999). Language acquisition in the absence of explicit negative evidence: How important is starting small? Cognition, 72(1), 67-109. Rom, A., & Leonard, L. B. (1990). Interpreting deficits in grammatical morphology in specifically language-impaired children: Preliminary evidence from Hebrew. Clinical Linguistics & Phonetics, 4, 93-105. Rumelhart, D., Hinton, G., & Williams, R. (1986). Learning internal representa tions by error propagation. In D. E. Rumelhart, J. L. McClelland, & the PDP Research Group (Eds.), Parallel distributed processing: Explorations in the mi crostructure o f cognition. Volume I: Foundations. Cambridge, MA: MIT Press. Rumelhart, D. E., & McClelland, J. L. (1986). On learning the past tenses of English verbs. In J. L. McClelland, D. E. Rumelhart, & the PDP Research Group (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition. Volume 2: Psychological and biological models. Cambridge, MA: MIT Press. Rumelhart, D. E., McClelland, J. L., & the PDP Research Group. (1981). An interac tive activation model of context effects in letter perception: Part 1. An account of basic findings. Psychological Review, 88(5), 375-407. Russell, K. (1994). Morphemes and candidates in Optimality Theory. In Proceedings o f the 1994 Annual Conference of the Canadian Linguistics Association. Toronto Working Papers in Linguistics. Saffran, J., Aslin, R., & Newport, A. (1996). Statistical learning by 8-month old infants. Science, 274. Sagey, E. (1986). The representation offeatures and relations in nonlinear phonology. Unpublished doctoral dissertation, MIT. Seidenberg, M. S. (1993). Connectionist models and cognitive theory. Psychological Science, 4(4), 228-235. Seidenberg, M. S. (1997). Language acquisition and use: Learning and applying probabilistic constraints. Science, 275, 1599-1603. Selkirk, E. (1980). Prosodic domains in phonology: Sanskrit revisited. In M. Aronoff & M.-L. Kean (Eds.), Juncture. Saratoga, CA: Anma Libri. Selkirk, E. (1984). Phonology and syntax: The relation between sound and structure. Cambridge MA: MIT Press. Singleton, J. L., & Newport, E. L. (1993). When learners surpass their models: The acquisition o f American Sign Language from impoverished input. (Unpublished Manuscript, University of Illinois at Urbana-Champaign) 263 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Slobin, D. I. (1985). Crosslinguistic evidence for the language-making capacity. In D. Slobin (Ed.), The crosslinguistic study o f language acquisition (Vol. 2, p. 1157-1249). Hillsdale, NJ: Lawrence Erlbaum Associates. Smith, N. (1973). The acquisition o f phonology: A case study. Cambridge UK: Cambridge University Press. Smolensky, P. (1999). Grammar-based connectionist approaches to language. Cogni tive Science, 25(4), 589-613. St. John, M. F., & Gemsbacher, M. A. (1998). Learning and losing syntax: Practice makes perfect and frequency builds fortitude. In A. F. Healy & J. L. E. Bourne (Eds.), Foreign language learning: Psycholinguistic experiments on training and retention (p. 231-255). Mawah, NJ:: Erlbaum. Stampe, D. (1979). A dissertation on natural phonology. Unpublished doctoral dis sertation, University of Chicago. Stanovich, K. E. (1988). The right and wrong places to look for the cognitive locus of reading disability. Annals o f Dyslexia, 38, 154-177. Stark, R., & Heinz, J. M. (1996). Perception of stop consonants in children with expressive and receptive-expressive language impairments. Journal o f Speech and Hearing Disorders, 39, 676-686. Steriade, D. (1994). Positional neutralization and the expression o f contrast, (ms. UCLA) Steriade, D. (1998). Alternatives to syllable-based accounts of consonantal phonotac- tics. In O. Fujimura, B. D. Joseph, & B. Palek (Eds.), Proceedings o fL P ’98 (p. 205-245). Charles University in Prague: The Karolinum Press. Stevens, K. N. (1989). On the quantal nature of speech. Journal o f Phonetics, 17, 3-45. Strothard, S. E., & Hulme, C. (1994). Reading comprehension difficulties in children: The role of language comprehension and working memory skills. Reading and Writing, 245-256. Sussman, J. E. (1993). Perception of formant transition cues to place of articulation in children with language impairments. Journal of Speech and Hearing Research, 36, 1286-1299. Tallal, P. (1990). Fine-grained discrimination deficits in language-learning impaired children are specific neither to the auditory modality nor to speech perception. Journal o f Speech and Hearing Research, 33(3), 616-617. 264 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Tallal, P., Miller, S., & Fitch, R. H. (1995). Neurobiological basis of speech - a case for the preeminence of temporal processing. Irish Journal o f Psychology, 16, 194-219. Tallal, P., Miller, S. L., Bedi, G., Vyma, G., Wang, X., Nafarajan, S. S., Schreiner, C., Jenkins, W. M., & Merzenich, M. (1996). Language comprehension in language-learning impaired children improved with acoustically modified speech. Science, 272, 81-84. Tallal, P., & Piercy, M. (1974). Developmental aphasia: Rate of auditory processing and selective impairment of consonant perception. Neuropsychologia, 12, 83-94. Tallal, P., & Stark, R. E. (1980). Speech perception of language-delayed children. In G. H. Yeni-Komshian, J. F. Kavanagh, & C. A. Ferguson (Eds.), Child phonology (Vol. 2: Perception, p. 155-171). New York: Academic Press. Tallal, P., Stark, R. E., Kallman, C., & Mellits, D. (1980). Perceptual constancy for phonemic categories: A developmental study with normal and language im paired children. Applied Psycholinguistics, I, 49-64. Tesar, B., & Smolensky, P. (1996). Leamability in Optimality Theory (Tech. Rep. No. 96-3). Johns Hopkins University Department of Cognitive Science. Thibodeau, L. M., & Sussman, H. M. (1979). Performance on a test of categorical perception of speech in normal and communication disordered children. Journal o f Phonetics, 7, 379-391. Thomas, M. S. C., Grant, J., Gsodl, M., Laing, E., Barham, Z., Lakusta, L., Tyler, L. K., Grice, S., Paterson, S., & Karmiloff-Smith, A. (in press). Can atypical phenotypes be used to fractionate the language system? The case of Williams syndrome. Language and Cognitive Processes. Thomas, M. S. C., & Karmiloff-Smith, A. (2000). Modelling language acquisition in atypical phenotypes. (Manuscript submitted for publication) Trubetzkoy, N. (1939/1969). Principles o f phonology. Berkeley, CA: University of Califonia Press. (Translation from German) Ullman, M., & Gopnik, M. (1994). The production of inflectional morphology in hereditary Specific Language Impairment. In J. Matthews (Ed.), McGill working papers in linguistics (Vol. 10). McGill University Department of Linguistics. Ullman, M. T., Corkin, S., Coppola, M., Hicock, G., Growdon, J. H., Koroshetz, W. J., & Pinker, S. (1997). A neural dissociation within language: Evidence that the mental dictionary is part of declarative memory and that grammatical rules are 265 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. processed by the procedural system. Journal o f Cognitive Neuroscience, 9(2), 266-276. van der Lely, H. K. J. (1997). Narrative discourse in grammatical Specific Language- Impaired children - A modular language deficit. Journal o f Child Language, 24( 1), 221-256. van der Lely, H. K. J., & Howard, D. (1993). Children with specific language impair ment: Liguistic impairment or short-term-memory deficit. Journal o f Speech and Hearing Research, 36(6), 1193-1207. van der Lely, H. K. J., Rosen, S., & McClelland, A. (1999). Evidence for a grammar- specific deficit in children. Current Biology, 8(23), 1253-1258. van der Lely, H. K. J., & Stollwerck, L. (1997). Binding theory and grammatical Specific Language Impairment in children. Cognition, 62(1), 245-290. van der Lely, H. K. J., & Ullman, M. (1997). Past tense morphology in specifically language impaired and normally developing chidren. (Unpublished manuscript) Walker, R. (1998). Nasalization, neutral segments, and opacity effects. Unpublished doctoral dissertation, UC Santa Cruz. Walker, R. (1999). Reinterpreting transparency in nasal harmony. In Proceedings of the hil phonology conference 4. Leiden. Waters, G. S., & Caplan, D. (1985). The capacity theory of sentence comprehension: Critique of Just and Carpenter (1992). Psychological Review, 103, 761-772. Wexler, K., & Cullicover, R (1980). Formal principles o f language acquisition. Cam bridge, MA: MIT Press. Wijnen, F. (1992). Incidental word and sound errors in young speakers. Journal of Memory and Language, 31, 734-755. Williams, R. J., & Peng, J. (1990). An efficient gradient-based algorithm for on-line training of recurrent network trajectories. Neural Computation, 2, 490-501. Wright, R. (2000). Perceptual cues in contrast maintenance. In E. Hume & K. Johnson (Eds.), The role o f speech perception phenomena in phonology. San Diego, CA: Academic Press. Zevin, J., & Joanisse, M. (2000). Stress assignment in nonword reading, poster presented at the Cognitive Neuroscience Society. San Francisco, CA. Zipf, G. K. (1935). The psycho-biology o f language: An introduction to dynamic philology. Boston, MA: Houghton Mifflin. 266 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Zuraw, K. R. (2000). Exceptions and regularities in phonology. Unpublished doctoral dissertation, UCLA. 267 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Distributional cues and subject identification in the production of subject -verb agreement
PDF
Age -limited learning effects in reading and speech perception
PDF
Children's scope of indefinite objects
PDF
Factors influencing the interpretation of novel words as adjectives in 4-year-old Spanish -speaking children
PDF
An investigation of the pseudohomophone effect: where and when it occurs
PDF
Grammaticalization and the development of functional categories in Chinese
PDF
Form and meaning: Negation and question in Chinese
PDF
Discourse functional units: A re-examination of discourse markers with particular reference to Spanish
PDF
Beyond words and phrases: A unified theory of predicate composition
PDF
Ellipsis constructions in Chinese
PDF
"Master of many tongues": The Russian Academy Dictionary (1789--1794) as a socio -historical document
PDF
An Investigation Of Some Critical Factors In Language Synthesis And The Implications Of These Factors For Linguists As Language Engineers
PDF
Associations and mechanisms among attention deficit hyperactivity disorder symptoms, cognitive functioning, and drinking habits
PDF
Children in transition: Popular children's magazines in late imperial and early Soviet Russia
PDF
A two-dimensional model of cognitive empathy: An empirical study
PDF
Gossip, letters, phones: The scandal of female networks in film and literature
PDF
Factors that influence students' investment of mental effort in academic tasks: A validation and exploratory study
PDF
Issues in the syntax of resumption: Restrictive relatives in Lebanese Arabic
PDF
Constructing identities in social worlds: Stories of four adults with autism
PDF
Correspondence and faithfulness constraints in optimality theory: A study of Korean consonantal phonology
Asset Metadata
Creator
Joanisse, Marc Francis
(author)
Core Title
Connectionist phonology
School
Graduate School
Degree
Doctor of Philosophy
Degree Program
Linguistics
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
language, linguistics,OAI-PMH Harvest,psychology, cognitive
Language
English
Contributor
Digitized by ProQuest
(provenance)
Advisor
Seidenberg, Mark (
committee chair
), Byrd, Dani (
committee member
), MacDonald, Maryellen (
committee member
), Mintz, Toben H, (
committee member
), Walker, Rachel (
committee member
)
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c16-126785
Unique identifier
UC11328996
Identifier
3041475.pdf (filename),usctheses-c16-126785 (legacy record id)
Legacy Identifier
3041475-0.pdf
Dmrecord
126785
Document Type
Dissertation
Rights
Joanisse, Marc Francis
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA
Tags
language, linguistics
psychology, cognitive