Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
00001.tif
(USC Thesis Other)
00001.tif
PDF
Download
Share
Open document
Flip pages
Copy asset link
Request this asset
Request accessible transcript
Transcript (if available)
Content
CONNECTIONIST INFLECTIONAL MORPHOLOGY: A NETWORK-BASED ACCOUNT OF THE PAST TENSE by Kim Gary Daugherty A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (Computer Science) March 1994 Copyright 1994 Kim Gary Daugherty UMI Number: DP22880 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is d ep en d en t upon the quality of the copy subm itted. In the unlikely event that the author did not sen d a com plete m anuscript and th ere are missing pag es, th e se will be noted. Also, if m aterial had to be rem oved, a note will indicate the deletion. Dissertation Publishing UMI D P22880 Published by P roQ uest LLC (2014). Copyright in the D issertation held by the Author. Microform Edition © ProQ uest LLC. All rights reserved. This work is protected against unauthorized copying under Title 17, United S ta tes C ode P roQ uest LLC. 789 E ast Eisenhow er Parkway P.O. Box 1346 Ann Arbor, Ml 4 8 1 0 6 - 1346 UNIVERSITY OF SOUTHERN CALIFORNIA THE GRADUATE SCHOOL UNIVERSITY PARK LOS ANGELES, CALIFORNIA 90007 This dissertation, written by Kim Gary Daugherty under the direction of h . i . Dissertation Committee, and approved by all its members, has been presented to and accepted by The Graduate School, in partial fulfillment of re quirements for the degree of C?s S?3?> 2 .G 5 DOCTOR OF PHILOSOPHY Dean of Graduate Studies DISSERTATION COMMITTEE Chairperson DEDICATION I dedicate this dissertation to my wife, whose love and support I could not have done without. I love you Susan, and I hope that I can help you to accomplish your goals as much as you help me to fulfill mine. ii ACKNOWLEDGMENTS I thank God for giving me the ability and persistence to succeed in my academic endeavors. I thank my wife for her unconditional support. I thank my advisor, Mark Seidenberg, for teaching me to be something that I've always wanted to be—a scientist. I thank the other members of my committee, Michael Arbib and Elaine Andersen, for prompting me to provide the clearest possible exposition of my research in this dissertation. I thank my parents and sister for their constant encouragement and pride in my efforts. I thank Hughes Aircraft Company for their generous financial support in the Hughes Doctoral Fellowship program. I thank my manager Dave Hall for fully sponsoring me in the fellowship program and for never failing to support me as a student/employee. I thank my office mate and friend, Alan Petersen, for being my intellectual sounding board and for sharing all my ups and downs. I thank others in the lab, Joe, Bea, Maryellen, Dave, and Mike, for creating an enjoyable working environment, and for contributing to and challenging this research during our many lab meetings. I thank my friends and family for providing welcome diversions and encouragement when things got too serious at school, and for asking me time after time, "Are you almost finished?" Finally, I can say "Yes!" iii TABLE OF CONTENTS DEDICATION......................................................................................................... ii ACKNOWLEDGMENTS.....................................................................................iii LIST OF FIGURES................................................................................................viii LIST OF TABLES..................................................................................................x ABSTRACT..............................................................................................................xi CHAPTER 1: INTRODUCTION.......................................................................1 1.1 Language knowledge..............................................................................1 1.2 The English past tense........................................................................... 4 1.3 The connectionist network approach....................................................5 1.4 What is at stake?..................................................................................... 6 1.5 Thesis overview...................................................................................... 7 CHAPTER 2: THE TRADITIONAL ACCOUNT OF THE PAST TENSE.................................................................................................................12 2.1 What linguists agree o n ......................................................................... 12 2.2 English inflectional morphology .................................................13 2.3 The traditional theory.......................... 15 2.4 Discussion............................................................................................... 17 CHAPTER 3: A REVIEW OF COMPUTATIONAL MODELS OF THE PAST TENSE..........................................................................................19 3.1 The connectionist approach................................................................... 19 3.2 The first generation connectionist model............................................ 22 3.3 Response of the linguistic community................................................ 26 3.4 Second generation connectionist m odels............................................ 28 3.4.1 Plunkett & Marchman model..................................................28 3.4.2 MacWhinney & Leinbach model.................... 31 3.4.3 Cottrell & Plunkett model.......................................................33 3.5 The current debate..................................................................................36 3.6 A computational AI solution.................................................................40 3.7 Discussion...............................................................................................45 CHAPTER 4: THE IMPORTANCE OF ACCURATELY MODELING THE PHENOMENON...........................................................47 4.1 Motivation to m odel.............................................................................. 47 4.2 Choice of architecture and learning algorithm....................................48 4.3 Model implementation details...............................................................53 4.3.1 Phonological representation...................................................54 iv 4.3.2 Network architecture.............................................................. 57 4.3.3 Training and scoring output.................................................. 57 4.4 Simulation 1: Initial corpus and results.............................................. 59 4.4.1 Training corpus........................................................................ 60 4.4.2 Performance on training set verbs..........................................61 4.4.3 Performance on novel verbs....................................................61 4.4.4 Summary of results..................................................................62 4.5 Hints for a better simulation..................................................................62 4.6 Simulation 2: A more realistic corpus and results............................. 65 4.6.1 Training corpus........................................................................ 66 4.6.2 Performance on training set verbs..........................................67 4.6.3 Performance on novel verbs....................................................68 4.6.5 Summary of results.............................. ..................................70 4.7 Analysis of the phonological components........................................ 70 4.8 Analysis of the hidden units............................................. ...................73 4.9 Discussion............................................................................................. 75 CHAPTER 5: FREQUENCY AND CONSISTENCY....................................79 5.1 Accounting for the data—The frequency by regularity interaction........................................................................... ............. 79 5.2 Separating the theories—The consistency effect............................... 81 5.3 Detailed analysis of the m odel !....................................................82 5.4 Discussion.............................................................................................. 85 CHAPTER 6: RELATING ERROR SCORE TO REACTION TIM E 88 6.1 Background........................................................................................... 89 6.2 Attractor networks..................................................................................92 6.2.1 Plaut model of orthography to phonology............................97 6.2.2 Kawamoto & Zemblidge model of homograph pronunciation......................................................................... 99 6.2.3 The relationship between error score and iterations to settle.........................................................................................100 6.3 A recurrent version of the past tense model.......................................102 6.3.1 Network architecture........................................................ 103 6.3.2 Training.................................................................................... 104 6.3.3 Performance on training set verbs.........................................105 6.3.4 Performance on novel verbs.................................................. 106 6.3.5 Analysis of the attractors.......................................................108 6.3.6 Analysis of the hidden units and clean up units.................. I l l 6.3.7 The time course of processing.............................................. 114 6.3.8 Frequency and consistency effects........................................115 6.4 Discussion.............................................................................................. 118 CHAPTER 7: THE ROLE OF SEMANTICS..................................................122 7.1 Background.............................................................................................123 7.2 Behavioral data........................................................................................131 7.2.1 M ethod...................................................................................... 132 7.2.2 Results and discussion.............................................................133 v 7.3 A connectionist model with semantic distance...................................135 7.3.1 Architecture...............................................................................135 7.3.2 Training se t...............................................................................137 7.3.3 Training.......................................... 138 7.3.4 Results........................................................................................139 7.4 Discussion............................................................................................... 140 CHAPTER 8: IS ENGLISH THE EXCEPTION TO THE R U L E ?............ 143 8.1 The problem of the low frequency default.......................................... 143 8.2 The connectionist account..................................................................... 145 8.3 The Hare and Elman model ...................... 149 8.4 A new connectionist model................................................................... 152 8.4.1 Architecture of the m odel................ 154 8.4.2 Training..................................................................................... 154 8.4.3 Results.................................... 157 8.5 Discussion............................................................................................... 164 CHAPTER 9: CONSISTENCY REV ISITED ...................................................166 9.1 Consistency and the modified traditional theory.................................167 9.2 Consistency within the irregulars......................................................... 169 9.3 A new analysis of the past tense model................................................170 9.3.1 Training................. 170 9.3.2 Results............................... 171 9.4 Discussion................ 174 CHAPTER 10: GENERAL DISCUSSION...................................... 176 10.1 Accounting for the data....................................................................... 177 10.2 Identifying new phenomena................................................................179 10.3 What is wrong with the traditional theory? ....................... 180 10.4 Conclusion.............................................................................................182 APPENDIX A: BACK-PROPAGATION.............................................. 184 A.l The input to a u n it................................................................................. 184 A.2 The activation of a unit......................... 184 A. 3 The computation of the unit error........................................................ 185 A.4 The determination of the error signal ........................................... 185 A.5 The updating of the weights................................................................. 186 APPENDIX B: BACK-PROPAG ATION THROUGH T IM E .......................187 B .l The units................................................................................................. 187 B.2 The forward p ass................................................................................... 187 B.3 The error function.................................................................................. 188 B.4 The backward pass................................................................................. 188 B.5 Weight updating.................................................................................... 189 APPENDIX C: TRAINING, GENERALIZATION, AND TEST SETS FO R SIMULATIONS IN CHAPTERS 4, 5, AND 6 ...................................191 C.l Training set 1..................................... 191 vi C.2 Training set 2 ..........................................................................................201 C.3 Generalization set 1 ............................................................................... 203 C.4 Generalization set 2 ...............................................................................206 C.5 Test set 1................................................................................................. 208 C.6 Test set 2 ..................................................................................................210 APPENDIX D: EXPERIMENTAL DATA AND TRAINING SET FOR SIMULATION IN CHAPTER 7 ...........................................................212 D. 1 Experimental data.................................................................................. 212 D.2 Stimuli for experiment...........................................................................215 D.3 Training set 1..........................................................................................221 APPENDIX E: TRAINING AND GENERALIZATION SETS FOR SIMULATION IN CHAPTER 8..................................................................... 231 E.l Training set 1 ..........................................................................................231 E.2 Generalization set 1 ............................................................................... 236 E.3 Generalization set 2 ......................'........................................................237 E.4 Generalization set 3 ............................................................................... 238 APPENDIX F: TRAINING AND TEST SETS FOR SIMULATION IN CHAPTER 9 ...................................................................................................239 F.l Training set 1 ..........................................................................................239 F.2 Test set 1 ................................................................................................. 241 F.3 Test set 2 ................................................................................................. 242 BIBLIOGRAPHY.....................................................................................................243 vii LIST OF FIGURES Figure 1: Traditional theory................................................................................ 16 Figure 2: Modified traditional theory................................................................37 Figure 3: Example decision tree..........................................................................42 Figure 4: Phonological representation of a syllable........................................ 56 Figure 5: Architecture of the m odel................................................................... 58 Figure 6: Generalization errors vs. number irregulars in training set 63 Figure 7: Breakdown of generalization errors.................................... 65 Figure 8: Frequency and regularity effects...................................................... 84 Figure 9: Performance on matched subsets of item s......................................85 Figure 10: Activation flow of feed-forward network...................................... 93 Figure 11: Activation flow of recurrent network............................................ 94 Figure 12: Multi-dimensional landscape with attractors.............................95 Figure 13: Architecture of the recurrent model...............................................104 Figure 14: Time course of processing for selected verbs................................115 Figure 15: Frequency and regularity effects in recurrent m odel.................116 Figure 16: Performance on matched subsets of items in recurrent m odel....................................................................................................................118 Figure 17: Past tense depends on distance from central meaning............... 126 Figure 18: FLY [off handle] is related to FLY [aggressive motion], not FLY [airborne movement]......................................................................... 127 Figure 19: BRAKE(v) is related to BRAKE (n) which is unrelated to BREA K ................................................................................................................129 Figure 20: Architecture of the semantic distance m odel................................136 viii Figure 21: Example network classifier............................................................... 148 Figure 22: Architecture of the low frequency default m odel........................ 154 Figure 23: Phonological space of rime for class item s.................................... 156 Figure 24: Enhancements to modified traditional theory..............................168 Figure 25: Matched sets of irregular verbs with and without regular neighbors...............................................................................................................172 Figure 26: Matched sets of irregular verbs with regular neighbors............ 173 ix LIST OF TABLES Table 1: Wickelfeature representation for kAm .............................................. 24 Table 2: Articulatory features for phonological segments.............................55 Table 3: Average error score in phonological clusters with partial inputs.................................................................................................................... 71 Table 4: Number of shared hidden units for each cluster..............................74 Table 5: Time step 2 average error score in phonological clusters with partial inputs......................................................................................................109 Table 6: Time step 12 average error score in phonological clusters with partial inputs............................................................................................. 110 Table 7: Number of shared hidden units for each cluster in recurrent m odel................................................................................................................. . 112 Table 8: Number of shared clean up units for each cluster out of the most active 25 units........................................................................................... 113 Table 9: Semantic distance measurements.........................................................137 Table 10: Training set classes........................................... 155 Table 11: Performance on generalization s e t....................................................160 Table 12: Performance on verbs with the novel stem vow el......................... 163 x ABSTRACT How do people learn natural language? There has long existed a controversy over exactly how much language knowledge is innately endowed to humans and how much knowledge can be learned. Traditional linguists have long proposed that linguistic knowledge is represented by innate rule mechanisms and principles that are specific to language. Elaborate systems of symbolic rules, architectures, and operations account for adult competence in language. Furthermore, inflectional morphology is considered to be a good example of a system governed by rules. Thus we focus on past tense inflection as a phenomenon that is representative of symbolic rule-based language systems. An alternative view to language knowledge is provided by connectionist theories and models. These systems do not use rules or symbolic processing. Rather, large numbers of highly connected, simple processing units learn and compute complex input to output mappings in parallel. In this thesis, we contrast the traditional theory of the past tense with the connectionist view. A connectionist model is presented that learns the English past tense for a representative sample of verbs. This model not only accounts for data previously xi thought to implicate a rule-based theory, but also accounts for generalizations that the rule-based theory misses. Additional simulations address a number of past tense phenomena that have been collected from psycholinguistic experiments. The model is extended with semantic information to account for people’s past tense preferences with verb usages that are very distant from their central meaning. The model is also trained on Old English verbs, which have a very different pattern of past tense inflection than modem English. This addresses concerns that the connectionist theory can only account for the idiosyncratic paradigm of the modern English past tense, which is not typical in morphological systems. The results of this thesis show that a connectionist theory can indeed rival the traditional theory in accounting for a representative linguistic system—the past tense. Furthermore, these results demonstrate how linguistic knowledge can be accounted for by the general learning and processing principles that are inherent in the connectionist theory. xii CHAPTER 1: INTRODUCTION 1.1 Language knowledge How do people learn natural language? The answer to this question is of great interest to linguists, psychologists, and cognitive scientists. Although language is innately endowed to humans, there remains a question of exactly what is innate. On one extreme is Chomsky's (1977) Universal Grammar and parameter setting theory which states that much of grammatical knowledge, including syntactic constraints and grammatical categories, is innate. The language learner must discover the correct parameters that govern his or her language. Once these are set, appropriate language rules will be in effect. Chomsky's views have had an enormous influence on linguistic theories because of his extensive formalism and commitment to theories that not only describe language competence of an idealized adult speaker (descriptive adequacy), but also account for the fact that knowledge of language is acquired on the basis of the evidence available (explanatory adequacy). In other words, these theories differentiate between aspects of language knowledge that must be innate (e.g. Universal Grammar) and knowledge that can be learned (e.g. parameter settings). 1 In direct contrast are language learning theories which are based on empirical observations of the manner that people acquire language. These theories acknowledge that acquisition must be constrained by the input the child receives, and allow that much of language knowledge is learned. For example, Hill (1983) and Arbib, Conklin, & Hill (1987) present a computational model that learns language at the two-year-old stage by generalizing from example sentences. Input to the model is adult sentences plus semantic context. Hill’s Classification by Word Use hypothesis is presented to categorize words without any prior notion of grammatical category or formal syntax. This model does not address acquisition of phonological forms and assumes that the concept of a word is already known. Another example of non-Chomskyan theories of language acquisition is provided by connectionist theories which seem to imply that language can be learned entirely without rules. Connectionist networks differ from conventional computer programs in that a large number of neuron-like processing units cooperate in parallel to solve a task. Mappings from "input" to "output" are learned rather than pre-defined or specified. Although connectionism is in its infancy, it has already begun to address such complex elements of linguistic knowledge as syntax (Berg, 1991), concept acquisition (Schynns, in press; Nenov & Dyer, 1988), and sentence processing (St. John & McClelland, 1988). Yet each of these models resolves to explain aspects of language processing from entirely different viewpoints. A coordinated effort to account for linguistic phenomena within a unified framework has yet to be developed in the connectionist community. 2 A tactic taken by other connectionist researchers has been to account for very simple processing mechanisms at the single word level. Seidenberg & McClelland (1989) developed a connectionist model of spelling to sound correspondences while Rumelhart & McClelland (1986) focused on explaining the acquisition of the English past tense. This view of language attempts to demonstrate how very simple and general processing mechanisms can account for phenomena thought to be governed by language-specific rules. In this framework, the focus is on determining the minimum requirements that are necessary to account for word level processing, such as representing a word’s sound and spelling patterns, representing a word’s meaning, and inflecting a word to create another form. A successful connectionist account of word-level processing can provide a unified foundation from which higher-level models of language can be developed. This foundation can address such sentence level issues as syntactic ambiguity resolution or subcategorization by explaining what information is necessary for a word representation (i.e. the lexicon in linguistic theories) as well as the manner in which this information is learned and stored. But although connectionist models of word reading have enjoyed considerable success, their account of the past tense has been highly criticized and has undermined the emergence of connectionism as a widely accepted theory of language processing. 3 1.2 The English past tense Linguists studying English have traditionally believed (Berko, 1958) that root forms of verbs are stored in a mental lexicon while most past tense forms are generated by an "add -ed" rule (e.g. WALK-WALKED, BAKE-BAKED). This regular marking also applies when generating the past tense for novel verbs. Verbs that are handled in this manner are known as regular verbs. Alternately, some verb past tenses are not generated by rule and must be learned by rote (e.g. TAKE-TOOK, GO-WENT, SEE-SAW). These are known as exception or irregular verbs. This dual-process account proposes entirely separate pathways for generating regular and irregular past tenses, and is a recurring theme in linguistic theory. For example, morphology is laden with rules to form such inflections as plurals (“add -s”, e.g. DOG-DOGS) or comparators (“add -er for single syllable adjectives or adjectives ending in a vowel sound”, RED- REDDER), but is also full of exceptions (e.g. MAN-MEN or REAL-MORE REAL). Another example is that syntactic rules determine the grammaticality of sentences, but speakers often accept as grammatical sentences that are agrammatical according to the rules. Traditional theories propose that the rule- governed items are marked to be processed by the default pathway, and the exceptions are marked to be processed by a different pathway. Novel items are considered to have a default marking. Yet with the past tense, behavior in people departs from the behavior predicted by this account. For example, children and adults occasionally overregularize irregular verbs with double past tense markings (Bowerman, 1982; 4 Brown, 1973) such as SAWED or WENTED, indicating that there is some interaction between regular and irregular past tense generation. Thus, an understanding of the past tense may provide insight into explaining elements of linguistic knowledge that are currently accounted for with rule-based theories, such as phonology, morphology, and syntax. 1.3 The connectionist network approach On the other hand, the connectionist approach claims to illustrate a new way of explaining linguistic phenomena by accounting for the acquisition of regular and irregular past tenses in a single mechanism. We have, we believe, provided a distinct alternative to the view that children learn the rules of English past-tense formation in any explicit sense. We have shown that a reasonable account of the acquisition of past tense can be provided without recourse to the notion of a "rule” as anything more than a description of the language .... A uniform procedure is applied for producing the past-tense form in every case. (Rumelhart & McClelland, 1986, p. 246). Connectionists claim a simple learning apparatus can account for these facts by merely associating inputs (present tenses) with outputs (past tenses), and by not making special allowances for exceptions. All verbs are handled within a single pathway without special rules or knowledge that are specific for language learning. A variety of verb related phenomena that have been taken as evidence for the past tense rule are purportedly accounted for by the connectionist approach as well. But although this general purpose approach has succeeded in other cognitive domains, its success in accounting for language has been challenged 5 since its introduction. This approach stands in stark contrast with the traditional notion of language learning, and the supporters of the traditional theories have been the most vocal opponents. 1.4 What is at stake? As we will see in this thesis, a number of researchers have chosen the past tense as a battleground to either discredit or legitimize the connectionist theory as a means of contributing to an understanding of language knowledge. A wealth of data exists on this seemingly minor aspect of language, which has been studied for decades. If no connectionist account can satisfactorily address this data, then the value of connectionism as an explanatory tool could be seriously challenged with respect to language. Traditional accounts which have stood the test of time would continue to be the best means of accounting for language knowledge. However, a successful connectionist account of the past tense would have serious repercussions for the study of language knowledge. Linguistic rules, which are the very foundation of linguistic theories, would be called into question along with an enormous body of linguistic research. In such a scenario, rule- based theories could be thought of as general descriptions of more complex underlying phenomena which are modeled by connectionist theories. Innate mechanisms thought to extract regularities from the language and set parameters in innate rules might be replaced with an innate architecture that merely maps inputs to outputs without the direction of a rule. A connectionist account would 6 prompt language theorists to reconsider what is thought to be innate and what is thought to be learned. What is at stake in this debate is a potential revelation in understanding how language knowledge is learned, stored, and processed. Connectionism provides tools to model language phenomena that have not existed until very recently. Thus, considerable enthusiasm exists in determining if a connectionist approach can account for a minor, albeit representative, example of traditional language knowledge— the past tense. What we amongst many other researchers strive to know, simply, is which theory provides the best account of the data? 1.5 Thesis overview It is the goal of this thesis to identify and explain the relevant issues that distinguish a traditional dual-process theory from a connectionist theory of past tense generation. A connectionist model has been constructed to capture key behavioral phenomena related to the past tense. This model deals with a broad set of issues including accounting for phenomena currently taken as evidence for the dual-process theory. Behaviors that are predicted by this model but not by a dual process model will be identified and shown to exist in people as observed in psycholinguistic experiments. It will be demonstrated that the connectionist account is not merely an alternative to a dual-process account, but is preferred because it captures generalizations that dual-process accounts miss. This thesis is organized as follows: 7 C hapter 2: A full exposition of the traditional account of the past tense is presented. The common threads between competing linguistic accounts will be described, as well as the role of the past tense in the broader realm of inflectional morphology. C hapter 3: A number of previous connectionist models of the past tense will be reviewed and evaluated, including Rumelhart & McClelland's landmark effort which began the debate. Computational models based on more traditional techniques from Artificial Intelligence (AI) will also be reviewed. The chapter concludes with a description of the most recent modifications to what has become known as the modified traditional theory of the past tense, which has been inspired by these connectionist efforts. C hapter 4: A new connectionist model of the past tense is presented that addresses many concerns over previous efforts. An initial and marginally successful version is thoroughly analyzed, and shown to be deficient in the training regime. A second version remedies this problem, and shows the importance of training connectionist models as accurately as possible with respect to the phenomena being modeled. C hapter 5: A frequency by regularity interaction observed in subjects in psycholinguistic experiments has previously been taken as evidence for the traditional account. We will show that this data is also commensurate with a connectionist account. Furthermore, a consistency effect observed among different classes of regular verbs in experiments will be presented that is addressed by the connectionist account, but not by the traditional account. This is 8 because the traditional account assumes that all regular verbs are generated by a rule and are therefore not affected by similarity to other items. Chapter 6: In the model developed in the previous two chapters, error score of the generated output was shown to relate quite closely with reaction time (RT) in psycholinguistic experiments. However the relation of error score to RT is certainly not intuitive. This chapter describes the work of several connectionist researchers in modeling RT with both the error score of a generated output in feed-forward networks as well as the number of cycles to settle to a stable output in recurrent networks. A reimplementation of the model in the previous two chapters as a recurrent network demonstrates that both error score and number of cycles to settle are accurate methods to model RT. Chapter 7: The one component clearly lacking in the previous model is semantics. People use more than the sound pattern of a verb to determine its past tense (e.g. He flewl*flied off the handle vs. Wade Boggs flied/*flew out to center field). It is unclear whether the information necessary to differentiate between these past tense forms is due to knowledge of formal grammatical categories (e.g. noun, verb, adjective) or merely due to the different meanings. A psycholinguistic experiment is presented that suggests knowledge of formal grammatical categories is not necessary to decide on the correct form. The connectionist model is augmented to include semantic knowledge, and is shown to produce similar results to subjects in the experiment. Chapter 8: It has often been speculated that connectionist networks can only demonstrate rule-like behavior if the tokens affected by the rule are a majority of the language. English verbs are approximately 95% regular, so critics 9 have stated that connectionist networks will not be able to address inflectional morphology in other languages that have a minority of rule-governed regular items. This phenomenon is known as the low frequency default rule. The same connectionist model developed in the previous chapters is shown to account for the low frequency default past tense rule in Old English verbs, which only affects approximately 20% of the items. However, the irregular inflections of the remaining 80% are highly predictable from phonological cues. The topology of this system differs considerably from the modem English past tense and is very similar to the modem Arabic broken plural inflection. Thus connectionist networks are demonstrated to be adequate mechanisms to model a variety of phenomena within inflectional morphology. C hapter 9: Whereas connectionist models of the past tense have been shown to account for a variety of human behavior, much of this data is also accounted for by the traditional theory. A new phenomenon is described that clearly differentiates between the two competing theories, and is shown to directly result from a connectionist architecture. The consistency effect described in Chapter 5 will be shown to also affect different classes of irregular verbs in the model. Thus we show that both regular and irregular verbs are affected by similarity to other items. C hapter 10: The final chapter summarizes the thesis by reiterating the results of the described research. The role of the past tense in accounting for language phenomena in a connectionist framework is discussed. It is concluded that connectionist models can indeed account for data that has previously been taken as evidence for traditional rule-based theories, and can furthermore account 10 for data that cannot be explained by these theories. We find that connectionism provides unique insight into certain language processes. 11 CHAPTER 2: THE TRADITIONAL ACCOUNT OF THE PAST TENSE The English past tense is a linguistic system that has been exhaustively studied by researchers from a myriad of disciplines. It is often cited as a typical example of a formal grammatical system, governed by rules and subject to language specific structures and mechanisms. But linguists have often disagreed on how the past tense system is organized and how it functions. Amidst this controversy, a detailed theory has been developed to account for the wide variety of data that has been collected on the past tense, and has come to be known as the traditional theory. 2.1 What linguists agree on One thing that most linguists have agreed on is that linguistic knowledge is represented by rules and principles. Whereas other cognitive tasks such as vision cannot be adequately accounted for by symbolic serial processing systems, the traditional view is that language is different: 12 Linguistic theories have posited symbolic representations, operations, and architectures of rule-systems that are highly structured, detailed, and constrained, testing them against the plentiful and complex data of language (both the nature of adults' mastery of language, and data about how such knowledge is learned and put to use in comprehension and speech). (Pinker & Prince, 1988, p. 78) Chomsky (1957, 1965) and his followers have developed an elaborate system of symbolic rules, architectures, and operations that account for adult competence in language. Linguists have spent considerable effort exploring and refining these theories to elevate the study of language into one of the most demanding symbolic systems in existence. Although much of the formal work in linguistics has been focused on syntax, symbolic rules and structures have been developed to account for all aspects of language phenomena. Whereas there is an ever increasing agreement on linguistic systems that account for syntax, there has been much less of a consensus on less-studied systems that are rampant with exceptions and irregularities, such as inflectional morphology. 2.2 English inflectional morphology English inflectional morphology is considered by linguists to be a good example of a system governed by rules (for a review see Spencer, 1991). Thus much work has focused on past tense inflection as representative of symbolic language systems. There exists controversy even within the rule-based accounts of the past tense. Halle & Mohanan (1985) and Kiparsky (1982) propose phonological rules to derive nearly all irregular verbs in English, suggesting that there are hardly any 13 truly “irregular” verbs. For example, they would classify both FLING-FLUNG and CLING-CLUNG as being generated by a very specific phonological rule, rather than as irregular verbs. Similarly, SING-SANG and RING-RANG would be generated by another rule that is sensitive to different phonological information than the previous rule. Clearly, many rules are necessary in this system to account for the sub-regular classes that exist in the traditionally irregular verbs. On the other hand, Anderson (1982) and Spencer (1991) suggest that it is unclear whether past tense inflection lies in a phonological component, a morphological component, or is part of syntax. This uncertainty stems from the lack of agreement on inflected forms from Government-Binding (GB) theory (Chomsky, 1981). Although GB does state that the lexicon contains at least partial phonological strings, grammatical categories, some semantics, and argument structure, it does not specify a means to inflect forms. Some GB theorists believe that an inflectional component derives inflected forms, some suggest that inflected forms are in the lexicon, and many others feel that inflectional morphology is outside the realm of GB theory. None of these views has been fully developed into a detailed theory that can account for the English past tense. What has emerged is a very detailed and highly specified theory that has built upon a mass of behavioral data from child- language experiments, grammaticality judgment tasks, and a number of other psycholinguistic experiments— the traditional theory. 14 2.3 The traditional theory The theory presented by Pinker & Prince (1988) has provided the most serious challenge to connectionist theories of the past tense. This theory takes the traditional GB view of the lexicon and proposes that lexical items are separate from syntax. Past tense inflection is an isolable subsystem in which grammatical mechanisms can be studied in detail, without complex interactions with the rest of language. It is computed independently of syntax, the subsystem that defines the form of phrases and sentences: The syntax of English forces its speakers to mark tense in every sentence, but no aspect of syntax works differently with regular and irregular verbs. (Pinker, 1991, p. 531) Morphological rules, which describe the formation of words, transform these lexical items into their past tense forms. These rules remain distinct from phonological rules which govern the sound of words. The lexical component feeds into the morphological component, which feeds the phonological component. Thus, past tense forms are not stored in the lexicon, but are generated from a lexical specification. Although the traditional theory does not provide a detailed account of the acquisition of the past tense, the following claims are made. Utterances are heard by the child and lexical entries are developed and stored, based on phonological, semantic, syntactic, and pragmatic information. The relationship between present and past tense is gradually observed by the child and the past tense regular rule (add -ed) is induced by the morphological component. The regular rule is further refined by the phonological component in selecting the proper allophone rule to apply—l\l, /d/, or /A d/. A specific rule must be learned for each irregular verb by 15 rote memorization. Figure 1 shows the two pathways to generate a past tense in the traditional theory. LEXICON Phonological code Semantics Grammatical category Reg/Exception flag Regular rule pathway Exception lookup table Figure 1: Traditional theory The simplicity of this theory accounts for much of the observed behavior of people when they utter the past tense. Regular inflection can be applied to any present tense that has been marked with the regular flag, as well as novel verbs (e.g. WUG-WUGGED) which are given the regular flag by default, making it fully productive and capable of predicting the appropriate past tense. Both adult and child speakers can correctly inflect an unlimited number of verbs. On the other hand, the past tense of irregular verbs cannot be predicted by a simple phonological rule, and must be memorized individually as specific rules. Normally, when an irregular past tense is generated, the regular rule is blocked. But sometimes, especially in children, a conflict arises when both the regular and specific rule are applied to an irregular verb. Overregularization errors 16 (Bowerman 1982; Brown 1973) occur when the regular rule is not blocked (e.g. GO-GOED). A different kind of overregularization error can occur when the default rule is applied to an inflected form In this case, a doubly-marked form can be generated, such as GO-WENTED. In its simplest form, the traditional theory accounts for many of the observed phenomena related to the past tense. Furthermore, it is a formalization of a system often casually described by people in two steps: 1) Add -ed to form the past tense but 2) memorize exceptions. Although inadequate in itself, the traditional theory is a foundation from which a comprehensive rule-based theory has been developed that will be described in the following chapters. 2.4 Discussion The traditional theory’s characterization of the past tense is typical of linguistic rule systems. A complex phenomenon is decomposed into components that interact with one another. Specific rules apply to each component, transforming the input into potentially competing representations which give rise to the observed complexity. Competition can arise when different components act on the same input, such as when the regular rule is not blocked for an irregular verb and an overregularized past tense is generated. In syntax, competition can be manifested by the many structures that are generated and considered when parsing a sentence. The competing syntactic structures are resolved due to the rules that apply to each structure. In this view, rules provide the necessary foundation and 17 explanation of phenomena which cannot be adequately accounted for in a single pass of mapping inputs to outputs. Linguists who follow GB rarely agree on the details of inflecting lexical items. Notwithstanding this disagreement, the traditional theory has provided an elegant manner of producing the inflected forms necessary for proper syntax while remaining separate from the syntactic component of the language system. Thus, although the past tense is representative of the language system, it can largely be studied in isolation from the rest of the system. The traditional theory is only the cornerstone of the most current rule- based account of the past tense. As we will see in the following chapters, the traditional theory has been refined and expanded to account for a wider set of phenomena related to the past tense. To a large degree, connectionist models have been the catalyst for many of these changes. In the next chapter, we will discuss a number of computational models of the past tense, including both connectionist and traditional AI models. 18 CHAPTER 3: A REVIEW OF COMPUTATIONAL MODELS OF THE PAST TENSE A number of computational models have been developed that generate the past tense from a present tense stem. Up until recently, all of these models were developed using a connectionist architecture. This chapter begins with a description of the motivation to use connectionist networks to account for the past tense, and how this account differs from the traditional theory. A survey of the contributions of key connectionist models follows. The linguistic community has responded to the connectionist challenge with the modified traditional theory, which will be discussed in detail. This theory has been recently implemented in a computational model that has been developed based on techniques from AI. A description and comparison of this model with connectionist models concludes the chapter. 3.1 The connectionist approach Modelers of cognitive processes have long used rule-based symbolic computers as tools of their trade. These systems execute symbolic rules from a 19 stored program in a serial fashion. The fields of linguistics, psychology, and artificial intelligence have embraced this methodology as a means of describing and defining high level cognitive functions such as language. Recently, connectionist models have provided an alternative set of tools. These systems do not have rules and do not operate serially. They use large numbers of highly connected, simple processing units to solve complex input to output mapping tasks. Connections between units are weighted by positive and negative real number values. The input to a unit is the sum of the weighted activations from all connected units. The activation of a unit, typically a real number value between 0.0 and 1.0, is a non-linear function of its input. Knowledge in the network is stored in the weights between units. Learning in the network consists of modifying these weights to alter the computed outputs of units. The input to the system is a specified pattern of activation across a designated set of "input" units. The output of the system is a generated pattern of activation across a designated set of "output" units. Any units that are not either input or output units are known as "hidden" units, because they are not specified by or visible to the learning environment. The simplest kind of connectionist network is known as a feed-forward network. In these, connections between units are very restricted. The output of any unit cannot feed back into itself, either directly or indirectly via other units. Typically, connections flow from the input units to the hidden units and then to the output units. Recurrent networks allow feedback connections between units 20 and are typically capable of learning more complex tasks than feed-forward networks. Training (or teaching) the network consists of presenting a set of input patterns, one at a time, to the network and evaluating the generated output patterns. If an output pattern differs from the desired (or target) output pattern, any of a number of learning algorithms can be used to modify the weights, so that the next presentation of the input pattern will be more likely to better approximate the appropriate output pattern. Training progresses until an error criterion, specified by the modeler, is achieved for all generated output patterns. Once training is complete, novel input patterns can be presented to the network and the generated output patterns can be evaluated. In domains with a predictable topography, if the training set patterns are truly representative of the phenomenon being modeled, novel input patterns should generate reasonable output patterns. Feed-forward networks have been well suited for tasks such as pattern recognition (Gorman & Sejnowski, 1988) and text-to-speech generation (Sejnowski & Rosenberg, 1987; Seidenberg & McClelland, 1989). Recurrent networks have achieved considerable success in addressing a variety of cognitive tasks such as object recognition (Hummel & Biederman, 1990), motor control (Arbib, 1989), and learning formal grammars (Cleermans, Servan-Schreiber, & McClelland, 1989). Connectionist networks can be contrasted with symbolic systems as follows (Dyer, 1988): 21 Symbolic Memory is static/structured Separate data and processing Rules are declarative Connectionist Basic categories supplied Fragile to noise/damage Programmed Memory is dynamic/reconstructive Data and processing merged Rules emerge statistically Basic categories emerge statistically Robust/lesionable Trained The connectionist approach does not equate a theory with a model. The theory must, of course, stand on its own and provide both descriptive and explanatory adequacy of a phenomenon without being dependent on a model. However, an understanding of the properties of connectionist networks may inspire or influence the development of a theory. This is done in much the same way that computational linguists have developed theories for parsing natural language based on the properties of symbolic-processing, serial computers (see Allen, 1987 and Powers & Turk, 1989, for a review). In this sense, connectionist theories are certainly more closely coupled to computational models than traditional theories from such fields as linguistics and psychology, in which the theories are developed and never intended to be implemented within a model. 3.2 The first generation connectionist model The first connectionist model of acquiring the past tense was developed by Rumelhart & McClelland (1986). Their theory stated that "lawful behavior and 22 judgments may be produced by a mechanism in which there is no explicit representation of the rule .... mechanisms that process language and make judgments of grammaticality are constructed in such a way that their performance is characterizable by rules, but that the rules themselves are not written in explicit form anywhere in the mechanism" (Rumelhart & McClelland, 1986, p. 217). Simply stated, they proposed that both regular and irregular verbs could be generated in a single mechanism. Their ambitious goal was to create a model that learned to accomplish this task. This was achieved using a simple, fully-connected, two layer connectionist network without hidden units that learned to relate the phonological form of a verb stem to the phonological form of its past tense. For example, if BAKE was presented on the input units, BAKED should have been generated on the output units (we use orthography here to represent these phonological codes for typographical convenience). The phonological representation of the stem and past tense forms in the model was composed of triples of values known as Wickelphones (Wickelgren, 1969). These triples consist of context-sensitive phoneme units including the phoneme itself, its predecessor, and its successor. A phoneme occurring at the beginning of a word is preceded by a special word boundary marker. Likewise, a phoneme occurring at the end of a word is succeeded by a word boundary marker. For example, BAKED consists of the phoneme string /bAkt/, which is comprised of the following Wickelphones where # corresponds to one of the word boundary markers: #bA, bAk, Akt, kt#. 23 Each phoneme can further be described as activations over the following articulatory constraints or features: interrupted, continuous, vowel, stop, nasal, fricative, sonorant, high, low, front, middle, back, long, short, voiced, and voiceless. The actual input and output units of the model correspond to Wickelfeatures, each of which is the conjunction of a single articulatory feature from the central phoneme, its predecessor, and its successor within a single Wickelphone. As an example, consider CAME which includes the Wickelphone kAm. /k/ has the features interrupted, back, stop, and unvoiced. /A/ is represented by vowel, front, low, and long, /m/ includes interrupted, front, nasal, and voiced. Thus, kAm is represented by the following Wickelfeatures (taken from Rumelhart & McClelland, 1986, p.237) in Table 1: Table 1: Wickelfeature representation for kAm Feature Preceding Central Following 1 interrupted vowel interrupted 2 back vowel front 3 stop vowel nasal 4 unvoiced vowel voiced 5 interrupted front vowel 6 back front front 7 stop front nasal 8 unvoiced front voiced 9 interrupted low interrupted 1 0 back low front 11 stop low nasal 1 2 unvoiced low voiced 13 interrupted long vowel 14 back long front 15 stop long nasal 16 unvoiced long voiced 24 By activating all of the Wickelfeatures for a word, a very distributed and unique representation can be created that allows similar sounding words to have similar representations. Also note that the ordering of the phonemes is implicit in this scheme, based on the context sensitive encoding of the Wickelphones. During training, each present tense was activated on the input units and its generated past tense was evaluated. Initially, the network contained random weights and thereby generated random past tenses. Weight correction proceeded according to the perception convergence procedure (Rosenblatt, 1959). For each set of input unit activations (corresponding to an individual verb stem), the weights to each output unit were raised or lowered incrementally to adjust the unit to generate its target value of either 0.0 or 1.0. Presentation and weight correction of each verb in the training set comprised one epoch. After approximately 200 epochs of training, the correct past tense was generated for most verb stems. Presentation of novel stems that the model had not been trained on resulted in correct past tenses being generated in most cases as well. The three stages of acquiring the English past tense were also modeled as described by Bowerman (1982) and Brown (1973). In the first stage, children produce both regular and irregular past tenses correctly. During the second stage, irregular verbs are “regularized”, and forms such as WENTED and GOED are produced. Both regular and irregular past tenses are again correctly produced by the third stage. This phenomenon is known as “u-shaped” learning and was accomplished in the model by manipulation of the training corpus. For the first 10 epochs, the 25 training set consisted of a majority of irregular verbs. At this point, both regular and irregular verbs were generated correctly by the model. During the intermediate epochs, the training set was adjusted to have a majority of regular verbs. This resulted in an overregularization of the irregular verbs that had been correctly generated during the first stage, since the model became overly productive of the regular marking with the large influx of regular items. The final epochs used a training set with an even larger majority of regular verbs. The model again produced both regular and irregular verbs correctly by learning that the regular marking does not apply to all verbs. This model was admittedly not intended to be a complete account of learning the past tense. It was a first generation connectionist model that used the simplest possible architecture and demonstrated a proof-of-concept that a single mechanism could exhibit “rule and rote” like behavior. In spite of being subjected to painstaking analysis and criticism as described in the next section, this model has provided the foundation for a connectionist theory of language and has inspired several other connectionist models that have moved the debate forward. 3.3 Response of the linguistic community The Rumelhart & McClelland (1986) model was not received well by the linguistic community. Responses ranged from suggesting that the model actually encoded rules to claiming that the English past tense is an uninteresting phenomenon to model. The most prominent critics were Pinker & Prince (1988) 26 and Lachter & Bever (1988) who leveled 13 specific criticisms against the model, the most serious of which are described below. Both Pinker & Prince (1988) and Lachter & Bever (1988) criticized Rumelhart & McClelland's (1986) account of u-shaped learning by claiming that the input to the child is not varied in the way they describe. The phonological representation was also criticized as being unrealistic and predefined to solve the task at hand (e.g. selecting only the 260 Wickelfeature units that were relevant for English). Positional information of segments was not encoded, which is crucial in capturing phonological regularities in irregular patterns (e.g. RING-RANG, SING-SANG). Furthermore, the model only encoded a phonology to phonology mapping of stem to past tense. Homophonous words such as RING and WRING would have the same stem representation but different past tense representations. This was clearly an impossible task for the model. The shortcomings pointed out by Pinker & Prince (1988) and Lachter & Bever (1988) were flaws that had to be addressed if connectionism was to be seriously considered as an explanation of linguistic phenomena. They felt that these criticisms in particular were serious enough to discount connectionism as having any kind of explanatory power with regard to language. Pinker & Prince (1988) argued that the flaws in the model's performance could only be remedied by implementing a symbolic theory. 27 3.4 Second generation connectionist models 3.4.1 Plunkett & Marchman model In order to address Pinker & Prince’s (1988) and Lachter & Bever’s (1988) criticisms, Plunkett & Marchman (1991) developed a 3 layer, feed-forward connectionist model. This model addressed problems with the Wickelphonology representation of the Rumelhart & McClelland (1986) model by creating a new phonological representation for the input and output to the model. Plunkett & Marchman limited the input and output of this model to phonemic strings up to a maximum length of five. They also created an artificial language that was representative of English verbs so that they could tightly control the phonological representation of the verbs within a limited number of units. In this representation, each phoneme is specified by its activations over articulatory features which are similar to those used in the Rumelhart & McClelland model. Units in the input and output layers corresponded to each of these features and were repeated for each of the five possible phonemes. Thus the word CAME, which consists of the phoneme string /kAm/ would be represented by activating the interrupted, back, stop, and unvoiced features for the first phoneme; the vowel, front, low, and long features for the second phoneme; and the interrupted, front, nasal, and voiced features for the third phoneme. None of the features for the fourth or fifth phonemes would be activated. By using this representation, two specific criticisms of the Wickelfeature representation were addressed. First, the positional information of each phoneme 28 is explicitly rather than implicitly encoded in order to capture phonological regularities in irregular verb patterns. Second, the representation is not limited to English. It can represent strings of any phonemes including non-words. Training proceeded according to the backpropagation learning algorithm (Rumelhart, Hinton, & Williams, 1986). The training set was designed to accurately reflect the proportions of the different kinds of English past tense mappings. The following types of past tense mappings were learned by the model: regular (e.g. BAKE-BAKED), identity (e.g. HIT-HIT), vowel change (e.g. RING-RANG), and arbitrary (e.g. GO-WENT). The frequency that each past tense was presented to the model during training was based on the frequency of usage in children (Brown, 1973). U- shaped learning for individual irregular patterns was observed without manipulating the input corpus. However, macro u-shaped learning for all irregular patterns did not occur. This result runs counter to the data reported by Bowerman (1982) and Brown (1973), in which u-shaped learning was suggested to be a macro phenomenon affecting all irregular verbs. On the other hand, u- shaped learning for individual verbs without the presence of a macro phenomenon is reported in an overregularization study conducted by Marcus, Pinker, Ullman, Hollander, Rosen, & Xu (1993). Several simulations varied the type and token ratios of the regular and irregular classes in the corpus. Type frequency refers to the number of items in each class and token frequency refers to the number of times items from each class are presented to the network. For example, regular verbs may comprise 95% 29 of the types in the corpus, but only 50% of the tokens if regular verbs are not presented as often as irregular verbs. Ratios for each verb class that were closest to the structure of English were learned best. The vowel change class, which is by far the largest irregular class, was highly disruptive to learning the dominant regular class. This effect was lessened considerably when phonological sub-regularities were allowed in the corpus. This fact makes a strong prediction about acquisition of irregulars in other languages. A clean partitioning of the mapping space is assisted by properties that are predictive of class membership at the surface level. Plunkett and Marchman (1991) predict that languages which have more irregular than regular patterns (e.g. Arabic and German plurals) will be characterized as having highly productive cues for irregular items within a class. One such cue is phonological similarity, which occurs in the semi-productive sub-regular classes of English irregular verbs (e.g. RING-RANG, SING-SANG). What remained to be seen was whether a connectionist model could replicate these results using a real English corpus. Perhaps the results were merely an artifact of using a highly controlled and restricted corpus. Although considerable effort was expended to make the artificial language representative of English, the possibility of successfully scaling up the model was unclear and thought to be a non-trivial task. 30 3.4.2 MacWhinney & Leinbach model MacWhinney & Leinbach (1991) also addressed many of Pinker & Prince’s (1988) and Lachter & Bever’s (1988) criticisms with a connectionist model that implemented past tense learning using a real English corpus. The model built upon Rumelhart & McClelland’s (1986) original model and replicated many of Plunkett & Marchman’s (1991) findings. This was achieved with a 4- layer feed-forward architecture trained with the backpropagation learning algorithm (Rumelhart, Hinton, & Williams, 1986). The phonological representation of the input and output units expanded on Plunkett & Marchman’s encoding of phoneme strings. Words up to four syllables in length were represented within two phonological templates of their consonant (C) and vowel (V) positions as follows: The VV representation allows for diphthongs and the CCC encoding allows for the maximum consonant cluster in English. Both consonants and vowels are left-justified within their respective positions of the larger template, which encodes the entire word. The smaller template holds a right justified representation of the last VVCCC cluster only, in order to capture the similarities of the endings of verbs, such as OVERSEE-OVERSAW and SEE-SAW. Each cccvvcccvvcccvvcccvvccc u - n d - A- r s t a - n d ----------- s— e--------------------- VVCCC o ------- a - n d - e---- 31 slot of the template is represented with a set of articulatory features that were activated corresponding the phoneme that resides in the slot. If the slot is empty, no features are activated. The training set to the model consisted of approximately 3000 English verbs, of which 180 were irregular. The verbs were presented to the model according to their Francis & Kucera (1982) word count frequencies, which are based on a written corpus of nearly one million words of English sentences. After training, the model was able to successfully address all but two of the 13 specific criticisms Pinker & Prince (1988) and Lachter & Bever (1988) aimed against the Rumelhart & McClelland (1986) model. All regular forms and 91% of the irregular forms were correctly generated by the model. These results were due to the improved phonological representation that could represent a much larger training corpus and explicitly encode positional information. Even though the input corpus was static, overregularizations of individual verbs appeared after irregular forms had first been produced correctly. The overregularized forms were correctly produced again after additional training. These results support Plunkett & Marchman’s (1991) findings as well as the Marcus, Pinker, Ullman, Hollander, Rosen, & Xu (1993) study of overregularization in child acquisition. The model was also augmented with a set of semantic feature units to the input layer in order to distinguish between homophonous forms. Activation of semantic features along with the phonological features gave each homophone a unique input representation. For example, the network learned the past tense of RING conjoined with the semantic features of action, auditory-result, high-pitch, object-thing, and sharp-onset is RANG and the past tense of WRING conjoined 32 with action, durative, object-flexible, object-permeable, remove-liquid, use-of- hands, and torque is WRUNG. The success of this model showed that the improvements over the Rumelhart & McClelland (1986) model achieved by Plunkett & Marchman (1991) were not mere artifacts of an artificial language. It was shown that results with small-scale models that are representative of English might indeed scale up to complete models that operate on English in its entirety. This was encouraging because in the past tense domain, large-scale models such as MacWhinney & Leinbach’s (1991) are computationally intensive and require lengthy training times. Smaller models are much more reasonable and tractable given limited computer resources, and are justifiable as long as care is taken to make the training corpus truly representative of English. 3.4.3 Cottrell & Plunkett model One criticism not addressed by any of the above models is the direct access of past tense forms. That is, although the present tense is clearly used in learning the past tense, people can utter the past tense form of a verb without overtly accessing its present tense phonological form. Direct access would be a problem for the connectionist models described above, since the present tense form is must be used to generate the past tense form. This problem was addressed by the Cottrell & Plunkett (1991) model which generated both present and past tense forms of verbs from their meaning. Thus, this model learned a mapping from semantics (meaning) to phonology (present and past tense) rather than 33 previous models which learned a mapping from phonology (present tense) to phonology (past tense). A new potential problem not faced by the connectionist models previously described is the arbitrary relationship between input (meaning) and output (sound). Models that map from the present tense to the past tense benefit from the fact that the present tense is a very good indicator of the past tense because of their similarity. On the other hand, Pinker & Prince (1988) point out that semantics is not a good indicator of the past tense mapping: HIT, STRIKE, and SLAP are all similar in meaning but map to HIT (identity), STRUCK (vowel change), and SLAPPED (regular) respectively. They suggest that spurious generalizations will develop over the semantic units. Thus, in order to be correct, the model must not map similar inputs to similar outputs as previous past tense models have done. The inventory of verbs for the model was identical to the artificial language corpus used in Plunkett & Marchman (1991). A simple recurrent network (Elman, 1990) was used to generate a sequence of phonemes given an artificial semantic representation as input. The semantic representation for the verbs was generated from distortions on prototype vectors that were randomly generated. That is, verbs with similar meanings only varied from a prototype vector by a few units, thereby sharing most units of their representation. Two additional units on the input layer specified whether the stem or past tense should be generated. During training, the semantic representation and the tense units (present or past) were activated on the input units and the appropriate sequence of segments was generated one at a time. The phonological representation of the 34 output was created by Plunkett & Prince and combined articulatory features with an indication of a sonority hierarchy over 15 units. The model was able to directly generate either a present or past tense form from semantics. Since the same semantic pattern generated both the present and past tense, the model learned to relate both forms to each other based on the semantic representation alone. That is, neither the present or past tense was overtly accessed to generate the other form. Furthermore, verbs with similar meanings were trained to map to different past tense classes (i.e. regular, identity, and vowel change), thus showing that unwanted generalizations will not develop over the semantic units and that verbs with similar meaning need not map to similar past tense classes. During training, the model was only required to produce the past tense forms of half the corpus and the present tense forms of the other half. The remaining present and past tense forms were removed from the training set to test the model’s ability to generalize. After training, the semantics of each verb was activated on the input units along with a representation of the tense that the model was not trained on. Given that a vast majority of the verbs in the training corpus were regular, the regular past tense was generated for most verbs when the model was trained on its present tense form. Likewise, the model correctly produced the present tense of most verbs by removing the regular affix when its past tense form was known. 35 3.5 The current debate With the advent of the models described above, a connectionist theory of learning the past tense had evolved to challenge the traditional theory. The success of the original Rumelhart & McClelland (1986) model in providing a single mechanism account of learning regular and irregular past tenses was largely overshadowed by its inadequate phonological representation and its inaccurate account of overregularization. Plunkett & Marchman (1991) demonstrated a model that manipulated an artificial language similar to English. Not only did it address earlier criticisms, but it provided the interesting result that irregular English past tense classes could best be learned by a connectionist model if their type and token ratios approximated those in English. This result countered many critics’ claims that connectionist models are too powerful and that they can learn any mappings, including unrealistic ratios of irregular verbs, equally well as the actual ratios of English verb classes. MacWhinney & Leinbach (1991) updated the Rumelhart & McClelland (1986) model and addressed nearly all of its criticisms including the phonological representation, overregularization, and homophony. Cottrell & Plunkett (1991) solved the final remaining problem of direct access. Yet the connectionist theory is far from complete. No single existing model can address all the issues described above. But even if one did, it is not sufficient for a model to merely account for the data observed in people if the data are commensurate with both a connectionist theory and the traditional theory. Issues must be identified that distinguish between the theories. This was the 36 thrust of Pinker & Prince’s (1988) and Lachter & Bever’s (1988) critiques of the original Rumelhart & McClelland (1986) model. Their objective was to demonstrate that a connectionist theory could not address key issues in past tense acquisition. Now that these issues have been successfully addressed, new issues have been identified to support the traditional theory. During the time that the connectionist theory was evolving, the traditional theory was also evolving. Pinker & Prince (1988) formalized the traditional theory and Pinker (1991) presented a modified traditional theory which retains the lexicon and rule-generated regular past tenses, but replaces the rote memory system of irregulars with an associative memory that relates irregular stems with their past tenses, as shown in Figure 2. LEXICON ....... Phonological code Semantics Grammatical category Reg/Exception flag Regular rule pathway Exception Associative Memory _ o--------- -o Figure 2: Modified traditional theory 37 The associate memory is described as and behaves similarly to a connectionist network, in that all irregular past tenses are generated from their stems within a single mechanism. This modification was meant to account for certain phenomena regarding irregulars that are not explained with a rote memory system. For example, in a rote memory system there is no reason for irregular verbs to have similar stem and past tense forms, or for families of irregulars to develop (e.g. BLOW-BLEW, GROW-GREW or SING-SANG, RING-RANG). An associative memory allows such similar forms to attract and support one another, thereby simplifying the learning of irregular verbs. In this scheme, generation of an irregular past tense form is affected by the frequency of its stem, since more frequent verbs will be reinforced in the memory. Also, generation of any irregular can be assisted by similar sounding irregulars within its family. On the other hand, generation of a regular past tense is not based on frequency or similarity to other forms because all past tenses are generated by a separate rule mechanism. Another reason for the associative memory is that it explains the fact that people sometimes generate irregular past tenses for novel verbs, such as SPLING- SPLUNG, because of the attraction to similar sounding irregular verbs, such as STRING-STRUNG. A rote memory system predicts that all novel forms would be regularized (e.g. SPLING-SPLINGED). It is important to note that the manner or degree by which the associative memory attracts other verbs is not described in detail or formalized in the modified traditional theory. Therefore, within this theory, predictions on the difficulty of generating past tense forms for irregular and novel verbs are hard to justify. 38 Marcus, Pinker, Ullman, Hollander, Rosen, & Xu (1993) show how this theory accounts for overregularization of irregular verbs. In addition to a detailed description of the overregularization phenomenon, they describe how overregularization occurs when the regular rule is not blocked. Families of similar sounding irregular verbs are less likely to be overregularized than isolated irregular verbs. They also claim that an irregular verb with many similar sounding regular verbs is no more likely to be overregularized than an irregular verb with no similar sounding regular verbs, thus suggesting that regular verbs do not form attracting clusters and are handled by a different mechanism than irregular verbs. Kim, Pinker, Prince, & Prasada (1991) present evidence on subjects' well- formedness judgments of past-tense verbs that are derived from nouns and adjectives. They claim that judgment of such forms as "The batter flied/*flew out during his last at bat" is explained by the formal grammatical theory, which states that a regular/irregular feature exists for each verb that has been learned by the lexicon. If a verb is derived from a different grammatical category, then even if it is homophonous with an irregular verb, its proper past tense form is regular since the irregular feature does not exist for items that are not formally verbs. This explanation supports the modified traditional theory (Pinker, 1991) which proposes a lexicon containing grammatical category information. Pinker (1991), Marcus, Pinker, Ullman, Hollander, Rosen, & Xu (1993), and Kim et al. (1991) retain the idea that a proper account of the past tense will have to include a rule governing regular forms such as LIKE-LIKED. This conclusion is based on a mass of evidence thought to implicate this rule. Insofar 39 as connectionist models do not, by definition, incorporate this type of knowledge representation, connectionism cannot provide a complete account of the past tense. Pinker (1991) therefore opts for a mixed model employing both a rule and an associative network for irregulars. It would be important to determine whether a connectionist network can explain all of the relevant facts about the past tense or whether, as Pinker & Prince (1988) suggest, it will have to be supplemented by a rule, since these represent very different claims about linguistic knowledge. 3.6 A computational AI solution Until recendy, a distinct advantage of the connectionist theory was the capability to implement the theory in a computational model. This fact seemed somewhat odd, given that the traditional theory manipulates symbols, and that mainstream AI research within the field of Computer Science implements symbolic processing models. Ling & Marinov (in press) present a general purpose symbolic pattern associator (SPA) that learns the English past tense. Their model directly builds upon the ED3 algorithm (Quinlan, 1986), which is in the class of top-down induction of decision tree (TDIDT) learning systems made popular within AI. ID3 implements supervised learning and is not application specific. Knowledge is represented as decision trees which are easily converted into production rules. Previous successful applications of ID3 include weather prediction and medical diagnosis. 40 Given a training set, ID3 attempts to build a decision tree by recursively applying a divide and conquer strategy to the input space. Training exemplars consist of vectors of attributes v = [vi, V 2 , V 3, . . v„] and their classification. For a real example from Ling & Marinov, consider 4 attributes as follows: vi = outlook {sunny, overcast, rain}, V 2 = temperature {cool, mild, hot}, V 3 = humidity {high, normal}, and V 4 = windy {true, false}. Each vector corresponds to the weather conditions for a given day. The training set consists of weather conditions for a number of days and a classification of each day as being either suitable for playing golf (P) or unsuitable (N). The divide and conquer strategy works as follows. ID3 picks an attribute to start building the tree. Assume that the attribute is “outlook”, and that all training set vectors with a value of “overcast” for this attribute are classified as P. Then ID3 generates a branch labeled “overcast” that goes to a leaf with the value of P. Next, the algorithm will generate a branch labeled after one of the remaining values of the attribute “outlook”, namely “sunny” or “rain”. If training set exemplars that contain this value are not all classified the same, a different attribute is chosen to begin a subtree and the algorithm is recursively applied. The procedure continues until all training set exemplars can be classified by the resulting decision tree. Figure 3 gives one possible decision tree. 41 outlook sunny ■ cun humidity windy high > rm al false true Figure 3: Example decision tree The goals are to use the tree to classify all examples in the training corpus as well as novel examples that may be encountered in the future. Small decision trees, which generalize best, are ensured by selecting the most informative attributes of the input to start building the tree. One serious shortcoming of ID3 is that it is limited to the classification of exclusive patterns that do not overlap. The past tense is clearly a system with many overlapping patterns that must be classified together in order to properly generalize to novel occurrences. The SPA implemented by Ling & Marinov addresses this limitation with a clever combination of individual decision trees into a joint set of trees that can learn to associate present tenses with past tenses. Each tree merely generates a single phoneme of the past tense, given all the phonemes of the present tense as input. The input and output of the model are represented by left-to-right phonetic codes in the UNIBET format (MacWhinney, 1990), which allows single ASCII characters to represent each phoneme, rather than the distributed activations of features that are common to connectionist implementations. 42 During training, verb tokens were sampled according to their Francis & Kucera (1982) frequency estimates. The training set was gradually increased during learning to simulate the child’s vocabulary increase. U-shaped learning of the irregular past tense was exhibited as described by Marcus, Pinker, Ullman, Hollander, Rosen, & Xu (1993). Upon completion of training, the model was able to produce the correct past tense for 99.2% of the regular and irregular verbs in the training set. A variety of novel regular and irregular verbs were tested on the trained model with very good results— approximately 90% were generated correctly. Ling & Marinov present a lengthy comparison between the generalization ability of their SPA and the latest version of the MacWhinney & Leinbach (1991) connectionist model. Although this demonstration is meant to show the superiority of the SPA, it is problematical in the following manner. The connectionist model that is implemented for testing purposes is not identical to the one described by MacWhinney. MacWhinney specifies that the input and output shall be represented by both left and right justified templates. This ensures that the model is given similar representations for similar sounding verbs— a crucial and necessary issue for connectionist models of the past tense. Ling & Marinov omit the right justified template “... since this information was redundant, and further, the left and right justified templates could contradict to each other in the output (p. 47).” It seems reasonable that if a head-to-head comparison is to be made between two models, the models should be implemented exactly as the authors specify. The very element that is omitted in the connectionist 43 implementation could lead to much better generalization performance, which MacWhinney reports. While ignoring MacWhinney’s model specifications, Ling & Marinov go on to state the best version of the SPA used letter rather than feature encoding, and did not use templates at all, but rather a left-to-right representation of letters. They conclude that distributed representations are therefore bad and that phonetic templates are not necessary in modeling the past tense. In the next chapter a new connectionist model will be presented that performs generalization tasks as well as the SPA. This model uses a distributed representation of phonetic features within a template specification. Yet even if a connectionist implementation can perform as well as the SPA, Ling & Marinov state that the SPA is superior because knowledge can be represented in explicit form; namely in production rules. This claim is puzzling, because it contradicts the entire debate between the traditional theory and the connectionist theory. If a model is superior merely because its knowledge can be explicitly represented, they why does the debate exist? The assertion that knowledge is explicitly represented in production rules is the very claim that is being challenged by connectionists. On the other hand, one might speculate that the decision tree generated by an SPA and a connectionist network are equivalent solutions to the past tense problem. After all, the goal of these implementations is to generate the appropriate past tense for both training set exemplars and novel verbs. It is important to note that the solutions are very different. Production rule systems are invariant to the frequency of occurrence of each exemplar. Furthermore, the 44 generation of rule-governed items is not affected by rules applying to similar sounding inputs. In Chapter 5, we will demonstrate that these characteristics of rule systems prevent them from accounting for certain behavioral data regarding the past tense. Connectionist networks rely on both frequency of occurrence and consistency of input to output mappings to learn the training set, and are well- suited for accounting for these data. O f course, we realize that the addition of probabilistic constraints to a production rule system could allow it to more closely mimic the behavior of a connectionist network. However, the current implementation of the SPA does not allow this. 3.7 Discussion A number of connectionist models have emerged to provide an alternative account to the traditional theory of the past tense. Although no single model is complete in itself, they all address certain criticisms and shortcomings of the connectionist account. The successes of these models have prompted Pinker and his colleagues to modify the traditional theory to account for a variety of past tense related phenomena. Indeed, to the connectionist community, the inclusion of an associative memory to handle irregular verbs appears to be a step toward a single-route theory that is similar to the connectionist account. Throughout the debate, Pinker and his colleagues have had the luxury of presenting a theory that was not implemented and thereby not testable to the same rigorous standards as connectionist models. The connectionist challenge to them was to develop a model of their theory. Ling & Marinov (in press) have answered 45 this challenge with an impressive first attempt at a symbolic computational model of the past tense. Their model leams a representative training set and generalizes well to novel verbs. They make strong claims about the superiority of their solution over all connectionist implementations. We believe their enthusiasm is premature. There are a variety of other means to evaluate models of the past tense. A wealth of data has been collected on human performance with respect to the past tense. Much of this data consists of reaction time measurements, indicating that certain past tenses are harder to generate than others. Ling & Marinov suggest a means to implement real-time processing within their model, but have not done so yet. The remainder of this thesis will present a variety of past tense phenomena that have been collected from subjects in psycholinguistic experiments. In many cases, the data have been taken to support the traditional theory. But we will develop several connectionist models that account for these data as well. Furthermore, we will present data that are supported by the connectionist account, but are difficult to account for with the traditional theory in its present form. 46 CHAPTER 4: THE IMPORTANCE OF ACCURATELY MODELING THE PHENOMENON This chapter introduces a new connectionist model of the past tense that is the foundation of the research in this thesis (see also Daugherty & Seidenberg, 1992; in press). An account of our motivation to develop a connectionist model begins the chapter, followed by a description of two sets of simulations which show the importance of accurately modeling the phenomenon one wishes to explain. 4.1 Motivation to model Our motivation to model the past tense is prompted by our understanding of the capabilities of connectionist networks. We view a connectionist theory of the past tense as the most parsimonious account of this phenomenon, in which both regular and irregular verbs are accounted for in a single, very general learning mechanisms. Furthermore, the modified traditional theory's (Pinker, 1991) concession of an associative network for irregular verbs seems to be a step toward a connectionist account and away from the traditional notion of a dual 47 process account. If the associative network could be expanded to also account for items that have previously been thought to be rule-govemed, at what point will the rule-govemed pathway no longer be necessary? This is the express purpose of developing connectionist networks— to show that the past tenses for all verbs can be generated in a single mechanism. The development of a connectionist model gives us an explicit means of formulation and allows us to empirically test the theory that both regular and irregular past tenses are generated in a single mechanism. As the model is altered to account for the phenomena (e.g. different training corpus, different architecture, varying degree of innate constraints, etc.), a better understanding of the theory may be achieved. 4.2 Choice of architecture and learning algorithm Although we are aware of many alternatives, we chose to use the a multi layer perceptron architecture with feed-forward connections1 and the backpropagation learning algorithm (Rumelhart, Hinton, & Williams, 1986; see also Appendix A) in our model, for reasons described below. The single-layer perceptron was shown to have severe computational limitations by Minsky & Papert (1969), while Plunkett & Marehman (1991) demonstrated that it had inadequate resources to learn the past tense mapping problem. Multi-layer perceptrons trained with backpropagation have been demonstrated to capture Chapter 6 w e im plem ent the m odel as a recurrent network, and show that the results betw een the feed forw ard netw ork and the recurrent netw ork are equivalent. 48 regularities within the hidden units such as sensitivity to inputs features (Hinton, 1986) and acquisition of principal components (Cottrell, Munro, & Zipser, 1987). Limiting the connections between units to be strictly feed-forward allows an output to be generated in the model by calculating the activations of all units only once during a single iteration. This is computationally much less intensive than architectures with less-restrictive recurrent connections such as Elman's simple recurrent network (1990), Jordan's motor control network (1986), or Wang's temporal sequence learning network (1991), which must compute an output or sequence of outputs over several iterations of the network. On the other hand, recurrent networks allow the development of attractors which enable the network to more easily learn arbitrary associations (Hinton & Shallice, 1989; Plaut, 1991). These networks accept an input and iteratively compute an output, refining and correcting it, until the output stabilizes over time. The activations of every unit must be calculated during each iteration of the network. The computed output of a backpropagation network can be thought of as the first iteration of an attractor network. We typically evaluate our networks in two ways. First, we judge how well the network learns a training corpus, and how well it applies the knowledge it has learned to a generalization task. At this level of analysis, it is sufficient to observe the correct and incorrect responses the network makes, and compare them to human responses. But some of our analyses require a more refined evaluation of the generated output. For instance, some past tenses are harder for people to generate than others, as measured by their response latencies in psycholinguistic experiments. Thus, a second method we use to evaluate the difficulty of a 49 generated past tense is to measure the sum of squared error between the output units of the generated past tense and the ideal, or target, past tense. A large error corresponds to a long reaction time in a human subject. Although we choose to use a feed-forward network for the results in this chapter and Chapter 5, we show in Chapter 6 that our claim of using error score to predict reaction time is valid— we implement this model as a recurrent network and show the number of cycles to setde to a response corresponds to the naming latency in human subjects. A recent, alternative view to a single connectionist network is Jacobs, Jordan, & Barto's (1991) competitive modular architecture. In this system, associative and competitive learning is combined to learn task decomposition using multiple networks. Thus, with this architecture, the regular mapping could be learned by one component and the irregular mapping by a different component. Although it has shown promise in some domains, we do not implement a past tense model with this architecture. Proponents of the traditional theory have already implied that a connectionist model of this kind would merely be an implementation of the traditional theory (Marcus, Brinkman, Clahsen, Wiese, Woest, & Pinker, 1993; Ling & Marinov, in press). However, they assume that the tasks will be decomposed according to the regular and irregular pathways of the traditional theory. It is conceivable that a different partitioning of the training set will emerge. The backpropagation learning algorithm (Rumelhart, Hinton, & Williams, 1986) is perhaps the most common procedure in connectionist modeling to train feed-forward networks2. The performance of the network is evaluated by 2 See A ppendix A for a form al description of the backpropagation learning algorithm. 50 summing the squared difference between each output unit activation and its target activation, for every training set pattern. The larger the sum of squared error, the worse the model’s performance in generating the correct output values. Since the weights between units determine the generated output, the learning algorithm must incrementally adjust the weights to reduce the error upon the next presentation of the input. In this algorithm, each weight change calculation is computed assuming that it is the only weight that will be changed. As long as the weight change is small enough, the combined changes of all weights will yield the desired improvement in network performance. The partial derivative of the error with respect to weight is explicitly calculated. This derivative is then propagated backward to proportionally adjust the weights according to the chain rule. Although not biologically plausible because of its dependence on backward information flow, backpropagation can be viewed as one of many techniques for performing gradient descent learning. In this manner, it is claimed that backpropagation is an efficient means to develop representations and exhibit properties that would develop with a more plausible learning procedure. One such procedure is contrastive Hebbian learning, which is implemented on a Deterministic Boltzman Machine (Peterson & Anderson, 1987; Hinton, 1989a), or DBM. In this architecture, sets of input and output units are specified in which every connection is bi-directional and each weight is symmetric. As a set of input unit activations is clamped, the activations of the output units are repeatedly updated, and the network eventually settles into a state that is the minimum of a specified energy function. If two units have a positive 51 weight between them and they both have positive activations, the network energy is decreased. If the activations have opposite signs, the network energy will be increased. The process of settling to a good minimum is ensured by a procedure known as simulated annealing (Kirkpatrick et al., 1983), in which only the units that are most strongly constrained to have positive or negative states become active early in the settling process. As the network settles, units requiring less input are allowed to become active. The net effect is that the network becomes progressively more sensitive to subtle constraints between the input and output units during settling. . Weight correction in the DBM is specified by contrastive Hebbian learning, in which the network generates two phases for each input. The negative phase is the settling process described above when only the input activations are clamped. The positive phase operates in the same way, except the output unit activations are clamped as well. Weights are incrementally changed to minimize the difference in the unit activations between the positive and negative states. These changes make the network more likely to generate the correct output unit activations on the next presentation of the input. Contrastive Hebbian learning is thought to be more biologically plausible than backpropagation because it does not depend on backward information flow of output unit error derivatives. Target activations for the output units are propagated throughout the network in the same manner as input unit activations. The symmetric connections between units do not correspond to actual neuronal connections, but rather correspond to reciprocal pathways between different brain 52 areas. Furthermore, this algorithm is based on Hebbian learning (Hebb, 1949), which has been shown to exist in biological neural systems (Lynch, McGaugh, & Weinberger, 1984). Plaut (1991) conducted extensive comparisons of learning algorithms on a network that modeled deep dyslexia. This network learned arbitrary mappings from orthography to semantics to phonology. For the initial results, the model was trained with the backpropagation learning algorithm. A later version was trained using contrastive Hebbian learning. Plaut found that the learning algorithm had little effect on the model’s ability to master the required mappings. Both networks could account for the data of interest and behaved similarly, although the network trained with contrastive Hebbian learning required considerably more computational resources. We believe a feed-forward network trained with the backpropagation learning algorithm can provide an approximate solution to the past tense mapping problem. Its low computational demand, proven ability to extract regularities, success in many cognitive domains, and established ability to implement the connectionist theory of the past tense make it a good candidate for providing the set of tools for our initial foray into connectionist modeling. 4.3 Model implementation details The model presented in this chapter and the next is a simple 3 layer feed forward network that accepts the phonological form of the present tense of English monosyllable verbs and produces the phonological form of their 53 respective monosyllable past tenses. We do not claim that people necessarily access the present tense in an overt manner when uttering a past tense in normal speech. This model accounts for very specific behavioral data obtained from a psycholinguistic experiment in which a subject is given a present tense verb on a computer screen and is asked to utter the past tense as quickly as possible (Seidenberg & Bruck, 1990). People make systematic mistakes in generating the past tense. Also, certain past tenses take longer for subjects to generate than others. These are precisely the data to be addressed in the model. 4.3.1 Phonological representation The phonological representation is similar to one used by MacWhinney, Leinbach, Taraban, & McDonald (1989) and Cottrell & Plunkett (1991). 120 units on the input layer encode a CCCVVCCC syllabic template, which is the maximum structure an English monosyllable can take (i.e. a tri-consonant onset, a diphthong nucleus, and a tri-consonant coda). Each phonemic segment is represented by an activation over 15 binary articulatory feature units: back, tense, labial, coronal, velar, nasal, sibilant, voiced, and 7 units to encode a sonority hierarchy ranging in value from 1 to 7. This scheme was developed by Plunkett and Prince and modified by Hare, and represents a plausible compromise among various proposals within phonetics. If a feature exists for a segment, its value is set to 1.0; if not, its value is set to 0.0. The phonemic segments are represented as shown in Table 2. 54 Table 2: Articulatory features for phonological segments ae voiced, sonority 7 ng velar, nasal, voiced, sonority 3 A back, voiced, sonority 7 f labial, sonority 2 a back, tense, voiced, sonority 7 V labial, voiced, sonority 2 e tense, voiced, sonority 6 s coronal, sibilant, sonority 2 E voiced, sonority 6 S coronal, sibilant, sonority 3 o back, tense, voiced, sonority 6 z coronal, sibilant, voiced, sonority 2 O back, voiced, sonority 6 Z coronal, sibilant, voiced, sonority 3 i tense, voiced, sonority 5 th coronal, sonority 2 I voiced, sonority 5 TH coronal, voiced, sonority 2 u back, tense, voiced, sonority 5 P labial, sonority 1 U back, voiced, sonority 5 b labial, voiced, sonority 1 y voiced, sonority 4 t coronal, sonority 1 w back, voiced, sonority 4 tS coronal, sibilant, sonority 1 h sonority 4 d coronal, voiced, sonority 1 r coronal, voiced, sonority 4 dZ coronal, sibilant, voiced, sonority 1 1 coronal, voiced, sonority 3 k velar, sonority 1 m labial, nasal, voiced, sonority 3 S velar, voiced, sonority 1 n coronal, nasal, voiced, sonority 3 The phonological representation is centered on the nucleus of the syllable as shown in Figure 4. Hence, the vowel and final consonant clusters, or rimes, of 55 the words BLACK and BACK receive the same representation. Aligning the representations on the rimes (i.e. nucleus and coda combination) was thought to be desirable because of the perceptual salience of these units in English. Pinker & Prince (1988) noted that most irregular patterns are shared across several rhyming or near-rhyming stems (e.g. RING-RANG, SING-SANG and BLOW-BLEW, GROW-GREW). The Seidenberg & McClelland (1989) model of word pronunciation analyzed the role of rimes in the consistency of spelling-sound correspondences. By centering the phonological representation on the nucleus, similar sounding words have similar representations. For unused segments in a word representation, the units are set to 0.0. cccvvccc b 1 a k b a k Figure 4: Phonological representation of a syllable One advantage of using this representation is that consonants and vowels are encoded in the same manner. In other representations, different sets of features are used for consonants and vowels, in effect building knowledge into the network by restricting different kinds of phonemes to specific positions within the syllable. In keeping with the central theme of this thesis, we do not wish to build in too much knowledge into the network to assist the learning process. This was a major criticism of the Wickelfeature representation in Rumelhart & McClelland’s 56 (1986) original model. We are interested in addressing the past tense with a very general representation, network architecture, and learning algorithm. 4.3.2 Network architecture Figure 5 shows the architecture of the entire model. Between the input and output layers is a hidden layer consisting of 200 units. This number was arrived at by running a number of simulations that varied the number of hidden units. Fewer hidden units allowed the training set to be learned, but proper generalization to novel forms suffered. More hidden units caused the training set to not be learned as well, but generalization was good. 200 units in the hidden layer provided a compromise in performance between learning the training set and generalization to novel forms. Every unit in the input layer is connected to every unit in the hidden layer making the layers fully connected. 120 units in the output layer have an identical interpretation to the input layer and are fully connected to the hidden layer. 4.3.3 Training and scoring output The BP program developed by McClelland & Rumelhart (1988) was used for all our simulations which were run on a SUN Microsystems IPX. The phonological form of a present tense stem (e.g. BAKE) is activated on the input units and the model's task is to generate the phonological form of the past tense (e.g., BAKED) on the output units. We do not address the direct access problem 57 in this preliminary model. In Chapter 7, however, we will demonstrate a model that uses both phonological and semantic information to generate the past tense. input layer hidden layer output layer OOP ■ ■ OOP Figure 5: Architecture of the model During the training phase, the model was presented with verb stems. The frequency of exposure for each stem was determined by the logarithm of the verb’ s Francis & Kucera (1982) frequency, which are based on a written corpus of normal adult sentence usage that is over one million words in length. We chose to compress the frequencies because the Francis & Kucera word count frequencies for verbs range from 1 to approximately 25,000. If we used the raw frequencies in the simulation, we would have to present the most frequent verbs 25,000 times more often than the least frequent verbs during training. A logarithmic compression allows us to convert the frequencies into a more manageable range, while preserving the distinction between low frequency items. One other reason for compressing the frequency range is that we wanted the model’s training regime to approximate the early stages of children learning 58 the past tense, even though the model’s performance is to account for data from adults. Children begin to learn the past tense mapping during the early stages of language learning, at which time the relative frequencies of the words in a child’s vocabulary span a much narrower range than adult vocabularies. Compressing the frequencies in adult usage allows the model to approximate this range during the time that is critical for learning the past tense. The generated past tense phonological form on the output layer was compared to the desired or target past tense on a unit by unit basis. In scoring the model's performance, we determined for each phonemic segment on the output layer whether the best fit to the computed output was provided by the correct target segment. The output pattern was scored as correct only if the correct target segments provided the best fit for all segments in a word. We also calculated the mean error score for all units in the output as a measure of goodness of fit. Training progressed until the number of correctly generated past tense outputs reached asymptote. 4.4 Simulation 1: Initial corpus and results Two simulations using the above architecture but with different training sets were performed. The input training corpus was extracted from an on-line computerized dictionary. In the first simulation, we wished to include as many verbs as possible to realistically represent an adult speaker's vocabulary of verbs. 59 4.4.1 Training corpus Since the model was limited to a single syllable representation, all monosyllabic present-past tense pairs with Francis and Kucera frequencies greater than 1 were selected from the on-line dictionary and converted into the phonological representation described above. This included 309 verbs with regular past tenses and 104 verbs with irregular past tenses. 112 verbs with frequency = 1 were reserved for testing the trained network's capacity to generalize on novel items. See Appendix C for a complete list of the training set (section C .l) and generalization set (section C.3) for this model. These verbs were all regular, which is to be expected since there are very few low frequency irregular verbs. We wish to point out that since this version of the model only allows a single syllable input and output, monosyllabic verb stems ending in It/ or /d/ could not be included in the training set, because their past tense is a two syllable verb (e.g. PAINT-PAINTED, RID-RIDDED). We felt that this was not a serious omission because it only affects very few regular verbs and would not drastically alter their number in the training set. Also, in Chapter 7, we extend the model to allow two syllable outputs, and replicate the results in this chapter The present/past pairs in the training corpus were probabilistically presented during training according to the logarithm of their frequency. Training was done in epochs, during which each item of the training set was considered and potentially presented to the model, depending on its frequency. The most frequent pairs were presented once per epoch; the least frequent once per 100 60 epochs. The model was trained on this corpus for 400 epochs, at which point learning approached asymptote. The weights were frozen and the training phase was completed. The results below were averaged over three training sessions with random initial weights. 4.4.2 Performance on training set verbs Each of the present tense verb stems in the training corpus was presented to the trained model and a past tense form was generated. This past tense was compared to the target or correct past tense and only complete segment-by - segment matches were considered to have been learned. The model learned to correctly produce all 309 (100%) of the regular past tenses and 86 of 102 (84%) of the irregular forms. Errors on the irregulars included regularization errors (FALL-FALLED), no change errors (GET-GET, analogous to HIT-HIT), and vowel errors (HIDE-HED). All of these error types are very common in children and have been observed in adults during psyeholinguistic experiments. 4.4.3 Performance on novel verbs Next, the 112 additional regular present tense verb stems were presented to the model to assess its capacity to generalize. None of these forms had been seen by the model during training. The generated past tense forms were compared to their respective target past tense forms. The model produced the regular past tense for 84 of 112 (77%) of these items. The two most frequent errors were no 61 change (PEEK-PEEK) and assimilation with phonologically-similar irregular past tenses in the training set (e.g., SEEP-SEPT, which is similar to SWEEP-SWEPT, and WRITHE-WROTHE, which is similar to WRITE-WROTE). 4.4.4 Summary of results To summarize, the results of the initial simulation were mixed. The model learned to correctly produce all of the regular items in the training set and a high proportion of the irregular items; it produced plausible errors as compared to errors subjects make during experiments, and correct output on most generalization trials. However, the number of errors made on both the irregulars in the training set and in the generalization trials is much higher than those observed in people during psycholinguistic experiments (Bybee & Moder, 1983). In the experiments, subjects are given time pressure and asked to utter the past tense when a verb stem appears on a computer screen. Although people sometimes utter an incorrect past tense for either irregular verb, it is a rare occurrence. Also, people overwhelmingly prefer to utter the regular past tense for novel verbs. 4.5 Hints for a better simulation At this point, we noticed several hints that the deficiencies in the model's performance appeared to be principally due to the large number of irregular items in the training corpus. The model failed to master all of the irregulars in the 62 training corpus. Moreover, many of the errors on the generalization trials seemed to occur because the model was affected by phonologically-similar irregular past tenses in the training corpus. In order to assess the effects of the proportion of irregular items on performance, we conducted a series of simulations in which we varied the number of irregular verbs in the training set from 0 to 104, while keeping the number of regular verbs constant at 309. Each simulation was performed three times and the results were averaged, as shown in Figure 6. Vi L-t © U b * X * U © £ © © JO £ © 40 35 30 25 20 15 10 5 0 0 20 40 60 80 100 120 Number of Irregulars in Training Set Figure 6: Generalization errors vs. number irregulars in training set The 112 novel regular verbs were presented to the trained simulations as generalization trials. As seen in the figure, the number of errors was related to the 63 number of irregular verbs in the training set. This indicates that irregular verbs which have been learned create interference when generating regular past tenses. The stimuli used in the generalization tests all require the regular past tense; however, some of them are entirely regular with respect to the training corpus whereas some are regular but inconsistent. Of the 112 novel regular verbs used for the generalization trials, the number of entirely regular verbs vs. the number of regular inconsistent verbs varies depending on how many irregulars are in the training set. For instance, when there are no irregulars in the training set, all 112 novel verbs are entirely regular. But, when there are 104 irregulars in the training set, 23 of the 112 novel verbs are regular inconsistent and 89 are entirely regular. Figure 7 shows the percentage of errors for both entirely regular and regular inconsistent generalization trials. The percentage of errors on entirely regular novel verbs remained largely invariant as the number of irregular verbs was increased. However, the percentage of errors on regular inconsistent novel verbs was affected by the number of irregular verbs in the training set. This is reasonable because regular inconsistent verbs have irregular neighbors that map to non-regular past tenses. As more irregular verbs are present in the training set, they increasingly attract similar sounding regular inconsistent verbs and interfere with the generation of the regular past tense. Entirely regular verbs are unaffected because they do not have irregular verb neighbors. Together, these findings suggested that the large number of irregular verbs in the training set was adversely affecting performance. 64 80 % E rro rs on E ntirely Reg % E rro rs on Reg Incon 70 S 50 ~ 4 0 © 30 # 20 10 0 20 40 60 80 100 120 Number of Irregulars in Training Set Figure 7: Breakdown of generalization errors 4.6 Sim ulation 2: A m ore realistic corpus and results We then compared the type and token frequencies in our corpus to those in the language at large. Type frequency refers to the number of items in each class (i.e. regular vs. irregular) and token frequency refers to the number of times items from each class appear in a corpus. An analysis of the Francis & Kucera (1982) sample revealed that irregular verbs comprise 5% of all verb types listed there and 22% of the verb tokens. In the corpus employed in the first simulation, 25% of the verb types were irregular and they accounted for 65% of the tokens presented during training. Thus, irregular items were overrepresented in our training corpus 65 compared to the language as a whole. Other factors also contributed to the overrepresentation of irregulars as well. The model's architecture only permitted the representation of monosyllabic words, and the proportion of irregular verbs is higher among the monosyllabic words than the polysyllabic words (this is because most irregular verbs are monosyllabic). Finally, regular verbs predominate in the lower frequency range; the training corpus was restricted to items with frequency > 1, meaning that the many regular but very low frequency verbs were excluded. In sum, the large number of irregular items in the training corpus had a negative impact on the model's performance, but these words were overrepresented in the training corpus. 4.6.1 Training corpus A new training set was then constructed with the goal of maintaining more realistic proportions of regular and irregular verbs. We also attempted to represent the different classes of irregular verbs accurately. Pinker & Prince (1988) identified 25 classes and sub-classes of irregular verbs which we collapsed into 5 major classes reflecting the most important subtypes. The classes were no change (HrT-HIT), vowel internal change (MEET -MET), vowel internal change plus consonant (LEAVE-LEFT), suppletion (GO-WENT), and consonant change (SEND-SENT). We then devised the training set so that the proportions of these subtypes matched the proportions in the Francis & Kucera (1982) corpus. The new training set included 309 regular verbs and 24 irregular verbs. See Appendix C, section C.2 for a complete list of these training set items. The number of 66 irregular verbs had to be relatively small in order to maintain the correct proportions while keeping the overall size of the training corpus within manageable limits. 4.6.2 Performance on training set verbs The model was trained as in the previous simulation for 500 epochs, at which point performance approached asymptote. The following results reflect averages over three training sessions with random initial weights. All regular verbs in the training set were learned, as before. 22 of the 24 irregular verbs in the training set (92%) were learned, better than in the previous simulation. The 2 errors on irregular verbs were FALL-FELLED, an overregularization error, and WIN-WAN, a vowel error. We note that FALL-FELLED in actually a doubly marked past tense, thus indicating that the model was able to learn the appropriate irregular past tense, but also applied the regular marking as well. We attribute this error to the strength of the regular neighbors of FALL, such as CALL- CALLED, STALL-STALLED, and MAUL-MAULED, in addition to the lack of any irregular neighbors. The WIN-WAN vowel error can be attributed to the attraction of similar sounding regular verbs such as WARN-WARNED, WALK- WALKED, and WATCH-WATCHED. We discuss the errors made by the model further in the general discussion section. 67 4.6.3 Performance on novel verbs To test the model’s performance on novel verbs, we used the same generalization set as the previous simulation (see Appendix C, section C.3 for a complete list of these items). On these trials, 106 of 112 regular past tenses (95%) were correctly generated, an improvement over the first simulation and a rate that compares well with that of people. The 6 past tenses that were incorrectly generated were MERGE-MERGT, BROOK-BROOK, WHINE-WOOND, CLINK-CLANGT, WANE-WONE, MEW-VIEW. The first reflects a substitution of /t/ for /d/ (i.e., incorrect voicing) and the second is a no-change error, which can be attributed to the similarity between /k/ and /t/ and the fact that most no change verbs end with /t/ (e.g. HIT, CAST, BEAT). The model appears to be overgeneralizing its knowledge of no-change verbs in this case. The other errors are a variety of vowel, consonant and no change errors. As before, some of these can be described as assimilation with irregular verbs in the training set. WHINE-WHOUND is similar to WIND-WOUND, CLINK- CLANGT is similar to CLING-CLANG, and WANE-WONE is similar to WAKE-WOKE. Subjects in behavioral experiments produce many of these responses as well (e.g., Bybee & Moder, 1983). We discuss the errors made by the model further in the general discussion section. In addition to novel regular verbs, we tested the model on a set of 46 novel irregular verbs (see Appendix C, section C.4 for a complete list of these items). Although these verbs are irregular, the expected result is that they be regularized, since the regular past tense is the dominant mapping. On these trials, 36 of 46 68 irregular past tenses (78%) were correctly generated as regular past tenses. The 10 past tenses that were incorrectly generated were BRING-BRANG, STINK- STANK, SHRINK-SHRANKED, THINK-THANKED, STRING-STUNG, DRAW-DREW, CLING-CLINGED, SLING-SLENGED, SWING-SWENG, and SHINE-SINED. Note that 5 of these have the regular marking, and are therefore doubly-marked past tenses. Many of the errors can be attributed to assimilation with families of irregulars in the training set. BRING-BRANG, STINK-STANK, SHRINK- SHRANKED, and THINK-THANKED are phonologically very similar to the irregular cluster of verbs comprised of RING-RANG, SING-SANG, DRINK- DRANK, and SPRING-SPRANG. Likewise, STRING-STRUNG is veiy similar to the irregular cluster comprised of SPIN-SPUN and WIN-WON, and DRAW- DREW is very similar to the irregular cluster comprised of BLOW-BLEW, GROW-GREW, and THROW-THREW. Some of the other errors can be attributed to blends between the correct past tense and a similar sounding irregular cluster. CLING- CLENG, SLING- SLENGED, and SWING-SWENG are all close to the RING-RANG, SING- SANG, DRINK-DRANK, SPRING-SPRANG cluster, and the generated vowel ‘E ’ in the past tense is a plausible blend between the ‘I ’ vowel of the expected regular past tense and the ‘A ’ vowel of the past tenses of the verbs in the cluster. The remaining error SHINE-SINED is merely a featural error in the generated past tense since ‘S’ only differs from ‘SH’ by a single feature. 69 4.6.5 Sum m ary of results Changing the corpus so that it better reflected the facts about the distribution of regular and irregular types and tokens in the language yielded better simulation results. The model continued to master the regular items and over 90% of the irregulars and there was better generalization on novel verbs. 4.7 Analysis of the phonological components Although we have demonstrated that a connectionist model can learn both regular and irregular past tenses in a single mechanism and can generalize its knowledge to novel inputs, we do not have a clear understanding how the model accomplishes these tasks. Intuitively, learning the regular past tense appears to be a much simpler task than learning the irregular past tense. Each phoneme from the onset, vowel, and coda cluster of the present tense is merely copied to the same positions in the output and the regular affix is appended. Thus, the task could be described as componential in nature, since phonemes from specific clusters in the input provide an unambiguous cue for the correct phonemes in the output. For irregular verbs, however, the task appears to be much more difficult. Many irregular past tenses change the vowel from the input to the output. Thus, it appears that the task is less componential because a conjunction of the onset, vowel, and coda clusters in the input provide the necessary information to generate the correct past tense. 70 We predict that this distinction of componentiality for regular and irregular verbs can be observed in the performance of the model. Following an idea proposed by Plaut & McClelland (1993), we compose test items of regulars and irregulars from the training set that are missing either their onset, vowel, or coda clusters. We accomplish this by setting the values of all units in the affected cluster to zeroes in the input. When these items are presented to the trained model, we examine the error score of the onset, vowel, and coda clusters on the generated past tenses from the expected target values. Table 3 displays the average error scores for regular and irregular verbs in the training set. Table 3: Average error score in phonological clusters with partial inputs Past tense Regular Irregular Present tense Onset Vowel Coda Onset 5.33 0.09 0.02 Vowel 0.18 1.94 0.12 Coda 0.02 0.38 5.53 Onset 5.93 1.07 0.11 Vowel 0.10 1.06 0.19 Coda 0.00 1.33 3.34 The high error score down the diagonals of the table for both regular and irregular clusters is to be expected, indicating that each phonological cluster in the past tense is dependent on the corresponding cluster in the present tense. We 71 additionally observe that for regular verbs, the clusters in the past tense are largely insensitive to non-corresponding clusters in the present tense, although there is a very slight dependency between the vowel and the coda. These facts support our notion of the componential mapping of the regular verbs. The only information necessary to generate an onset, vowel, or coda in the regular past tense is the corresponding cluster in the present tense. Thus, even though novel verbs have not been trained on the network, the regular past tense can be correctly generated because each individual cluster of the novel verb has been exposed to the network during training. The situation for irregular verbs differs greatly from that of regular verbs. In the table, we observe that the vowel in the past tense is adversely affected by the absence of either the onset, vowel, or coda clusters in the present tense. Thus, the network must have information from the entire present tense to correctly generate the vowel in irregular past tenses. This fact is interesting given that most irregular past tenses involve a vowel change. Given these results, the network appears to dedicate resources for both kinds of mappings. Rather than individually learning the past tense of each regular verb, the network takes advantage of the componentiality of the regular mapping and develops a mechanism to copy the individual clusters of the regular verbs to the output. It is a simple matter to additionally append the regular affix to the coda. For irregular verbs, the network must consider the entire input when generating the vowel cluster of the past tense. Resources must be allocated to override the network’s tendency to copy the vowel cluster of the present tense to the past tense. Thus, the network largely takes advantage of the componential 72 nature of the mapping, but relies on non-componential information when necessary. The errors generated by the model on both training set and novel items can be thought of as an inability of the network to separate the relevant kinds of information for an individual verb. 4.8 Analysis of the hidden units Although the analysis in the previous section establishes the componential nature of the mapping from present to past tense, it does not inform us how the mapping is accomplished in the model. It is possible that our model is merely an implementation of the traditional theory, with two sub-networks separately generating the past tense of regular and irregular verbs. If this is the case, we would expect that some of the 200 hidden units would contribute to the production of regular past tenses but not irregular past tenses, while others would contribute to irregular but not regular past tenses. In order to evaluate the nature of the hidden units, we present both intact and partial present tense inputs to the trained model and compare the activations of these units. We then determine which hidden units are most active for each phonological cluster of the input. For example, for each regular verb, we compare the hidden unit activations when the entire present tense is activated on the input units, and when the present tense with the onset units zeroed out is activated. The hidden units that differ by 0.5 or more (since activations can range from 0.0 to 1.0) between the two inputs are noted as contributing to the mapping of the onset for regular verbs. We then do a similar analysis for all irregular verbs. 73 If two different pathways are implemented in the model, we would not expect much overlap between regular and irregular verbs in the hidden units that contribute most to the onset, vowel, and coda clusters. Table 4 shows the number of units that contribute most to each cluster which are shared between the regular and irregular verbs. Table 4: Number of shared hidden units for each cluster Number Onset Vowel Coda 50 35 (70%) 38 (76%) 35 (70%) 100 87 (87%) 57 (57%) 66 (66%) As seen in the table, there is indeed considerable overlap in the most active hidden units for each phonological cluster in the regular and irregular verbs. When we examine the 50 most active hidden units that contribute most to the onset, 35 of these are shared between regular and irregular verbs. Likewise for the 50 most active units contributing to the vowel cluster, 38 of 50 are shared, and 35 of 50 units are shared for the coda cluster. If we examine the 100 most active hidden units for each cluster, a similar proportion of units overlap between regular and irregular verbs. Thus, the network does not appear to have partitioned itself into two sub-networks that generate the regular and irregular past tense independently of each other. The same hidden units contribute to the production of both kinds of verbs. 74 4.9 Discussion The success of the second simulation shows the importance of correctly modeling the phenomenon of interest. We initially thought that a large training corpus was the most appropriate means to capture the English past tense paradigm. But a careful analysis of all English verbs demonstrated that irregular verbs are largely overrepresented with this method of sampling. The model could clearly not learn both a largely regular mapping as well as numerous exceptions in a single mechanism. But then, this is not the structure of English— there are relatively few exceptions as compared to the large number of regular verbs. It should be noted that other connectionist models of the past tense reviewed in a previous chapter did not accurately represent the type and token frequencies of regular and irregular verbs in their training sets. Our analysis of the phonological components of the network provides some insight into how the model accomplishes the task of learning the past tense. Resources are allocated to take advantage of the componential nature of the mapping between the present and past tense for regular verbs. These resources allow proper generalization of the regular mapping to novel verbs as well. The network also avoids the tendency to apply the regular affix to irregular verbs by dedicating resources to learn the non-componential mapping for these items. Our analysis of the hidden units further demonstrates that the network is not a direct implementation of the traditional theory. We show that many of the same hidden units contribute to the mapping of both regular and irregular past tenses. 75 One area that requires further discussion is the errors that were produced on the training set items and on novel items. These consisted of overregularization errors, vowel errors, and consonant errors, which we largely attribute to assimilation with similar sounding training set items. But what causes assimilation errors? It is important to note that the task of the model is much more difficult than the task faced by a child when learning the past tense. The model must concurrently learn both the phonotactics of the language and the mapping of the present to the past tense. Children clearly become familiar with the phonological patterns of a language well before they learn morphological transformations. Thus, a more realistic model of child language acquisition could be trained in this stepwise manner as well. In combining the tasks, we made certain choices in the phonological representation that we believe contributed toward the errors produced by the model. Since our feature set encompasses both vowel and consonant features, our representation is necessarily more sparse than an encoding that uses separate features for consonants and vowels. On average, only 5 of the 15 features are active for a single phoneme. Given this sparse representation, phonemes that do not appear in the training corpus often are not reinforced as much as common phonemes. The result appears to be blends of features for certain infrequent phonemes. For example, the vowel in the past tense WON is one of the least frequent in the training corpus. The model generates WIN-WAN as the past tense instead. The vowel in WAN is a plausible featural blend between vowels in such common words as WALK-WALKED, SHOW-SHOWED, and LOOK-LOOKED. 76 W e speculate that the sparse representation coupled with either infrequent phonemes or infrequent combinations of phonemes gives the model a tendency to assimilate the input into a better known output that is more common in the training set. In Chapter 6, we find evidence to support this claim when an entirely different architecture using the same phonological representation is shown to generate errors very similar to our feed-forward implementation. If we suspect the phonological representation as being the cause of the errors, then why do we not abandon it in favor of a better representation? In short, we have not found a better representation. In earlier versions of the model, we tried a variety of phonological encodings that were used in other connectionist models of the past tense. Many of these provided separate sets of features for consonants and vowels. The performance of these models did not approach that of our current model, particularly in the generalization trials. Although our current phonological representation is not perfect, it does perform on a level similar to that observed in people. Furthermore, the general approach taken in this thesis toward accounting for the past tense is supported by our choice of a phonological encoding that does not give special treatment to either consonants or vowels. W e acknowledge that with considerable effort, a better encoding might be developed to provide better performance. But the results thus far demonstrate that our representation is adequate for the tasks we wish to model. The failure of the first simulation in this chapter provided an important lesson that is key to many of the simulations that follow in this thesis. Phenomena that had previously been thought to be indicative of rule-based theories, or thought to be impossible for connectionist models to address, will be 77 shown to be adequately accounted for by connectionist models. Connectionist models can succeed when the modeler has a thorough understanding of the phenomena to model. 78 CHAPTER 5: FREQUENCY AND CONSISTENCY Learning the training set and generalizing to novel occurrences are only the coarsest level of analyzing a connectionist model of the past tense. There are a number of psycholinguistic experiments that attempt to find data to support either the traditional dual-route theory or a connectionist single-route theory. This chapter relates the performance of the past tense model in the previous chapter with two of these experiments. It will be demonstrated that data that had previously been taken as evidence for the traditional theory can in fact be captured by a connectionist network. Furthermore, data that supports the connectionist theory, but not the traditional theory, will be presented. 5.1 Accounting for the data—The frequency by regularity interaction Many of the kinds of data that Pinker (1991) sees as evidence for a rule of past tense formation may reflect very simple properties of connectionist networks. As such, these phenomena cannot be taken as uniquely compatible with the rule- based account. W e have identified one such phenomenon; the frequency by regularity interaction. Prasada, Pinker & Snyder (1990) observed that the 79 frequency of a past tense form (how often it is used in the language) affects the generation of irregular past tenses, but not regulars. RANG, for example, is higher in frequency than TOOK and takes longer for subjects to generate. However, there is no frequency effect for regular past tenses; BIKED (low frequency) is as easy to generate as LIKED (high frequency). Pinker (1991) interprets this pattern as follows. Regular past tenses are generated by rule; hence they are not affected by frequency. All that matters is how long it takes to recognize the present-tense stem and apply the rule. Irregular past tenses are different, however. Either they have to be looked up in a list (traditional theory) or generated by a associative network (modified traditional theory). Both processes are thought to be affected by frequency. Thus, the interaction between frequency and regularity of the past tense was thought to implicate two separate mechanisms, a rule and a network. We thought it likely, however, that our network would also produce this interaction, mainly because we observed the same effect in the Seidenberg & McClelland (1989) model of word reading. In that model, frequency has a bigger effect on words with irregular pronunciations (e.g., DEAF, SHOE) than words with regular, rule-governed pronunciations (LIKE, MALE). The explanation for this effect is also simple. Regular, "rule-governed” words contain patterns that occur repeatedly in the corpus. The weights reflect exposure to all these patterns. Learning a rule-governed instance does not depend very much on its frequency because performance benefits from exposure to neighbors that contain the same pattern. Learning an irregular instance, however, is highly sensitive to frequency; performance on DEAF (irregular pronunciation) or TAKE-TOOK (irregular past 80 tense) depends on how often the model is exposed to these patterns because the correct output cannot be derived from exposure to neighbors. Thus, we expected that at least one of the behavioral phenomena that Pinker takes as evidence for a rule would be exhibited by our model. 5.2 Separating the theories—The consistency effect But accounting for the data is only one motivation to model. The connectionist theory can also make predictions about the past tense that are not consistent with the traditional (Pinker & Prince, 1988) or modified traditional (Pinker, 1991) linguistic approaches. One such prediction is consistency effects, which have been identified in previous work on spelling-sound correspondences and were simulated in the Seidenberg & McClelland (1989) model of word pronunciation. Briefly, networks trained using backpropagation (Rumelhart, Hinton, & Williams, 1986) pick up on the consistency of the mapping between input and output codes. The mapping between the present and past tenses is highly consistent in English because most past tenses obey the regular rule. However, the mapping is not entirely predictable because of irregular cases such as TAKE-TOOK and SIT-SAT. Standard accounts such as Pinker & Prince's (1988) distinguish between rule-governed cases and exceptions. Connectionist models, on the other hand, predict the existence of intermediate cases, so-called "regular but inconsistent" patterns such as BAKE- BAKED and FLIT-FLITTED, which obey the rule but have inconsistent rhyming "neighbors" (Seidenberg, in press). So even though BAKE-BAKED is rule- 81 governed, performance may be impaired because the model must also encode the neighbors MAKE-MADE and TAKE-TOOK which have irregular past tenses. Specifically, the model should perform worse than on a completely regular pattern such as LIKE-LIKED (all of the -IKE verbs have regular past tenses). Thus, the standard theory predicts that BAKE-BAKED should be no more difficult to generate than LIKE-LIKED, because both are rule-governed, while the connectionist account predicts the network to perform more poorly on inconsistent items than on completely regular items. Moreover, these behaviors are also observed in psycholinguistic studies of people (Seidenberg, in press; Seidenberg & Brack, 1990). The subjects in their experiment (college students) were presented with a present tense stem on each trial and had to generate the past tense. Response latencies were as follows: Irregular » Regular Inconsistent > Entirely Regular. Therefore, the network model is not merely an alternative to a rule-based account; rather, it is to be preferred because it captures generalizations that rale-based accounts miss. 5.3 Detailed analysis of the model One assumption in evaluating the response of the model requires explanation. We relate mean error score to response latencies. In our backpropagation model, a verb stem input generates a past tense output in a constant amount of time, regardless of how difficult the past tense is to generate. W e propose that the larger the mean error score is for a past tense output, the longer the past tense will take to generate. This is motivated by attractor 82 networks, which iteratively settle into an output over a number of cycles. Plaut (1991) shows that when backpropagation networks, including the Seidenberg & McClelland (1989) model, are converted into attractor networks, the mean error score for generated outputs in the backpropagation network relates closely with the number of iterations the attractor network takes to settle for a given input. We validate our claim in Chapter 6 by implementing our model as a recurrent network and showing that the number of iterations to settle into an output relates to the error score in our feed-forward network. We re-examined our connectionist model of the past tense described in the previous chapter. For the frequency by regularity interaction, we constructed sets of the 10 highest frequency regular verbs, 10 lowest frequency regular verbs, 10 highest frequency irregular verbs, and 10 lowest frequency irregular verbs from the training set. Only verbs that generated the correct past tenses were selected (see Appendix C, section C.5 for a complete list of these items). The model's performance on these verbs is shown in Figure 8. As predicted, frequency has little effect on performance for regular verbs since both high and low frequency regulars were produced equally well. For irregular verbs, performance is better on high frequency items than on low frequency items. This is the pattern that was reported by Pinker (1991) and Prasada et al. (1990) and taken as evidence for a rule-based mechanism. 83 o > u © © ZA lm o u t- W c S Q < U 0.05 Exceptions Regulars 0.04- 0.03- 0.02 0.01 0.00 LOW HIGH Figure 8: Frequency and regularity effects In order to test our prediction of consistency effects, we constructed sets of 20 entirely regular, 20 regular inconsistent, and 20 irregular verbs from the training set equated in terms of frequency. Only verbs that generated the correct past tenses were selected (see Appendix C, section C.6 for a complete list of these items). The model’ s performance on the three types is given in Figure 9. The model showed the graded effects of the consistency of the mapping between present and past tense that is not predicted by rule-based accounts. Irregular items are more difficult to generate than regular inconsistent items, which in turn are more difficult to generate than entirely regular items. 84 © i« © © C /5 J- o L. t - td e 0 3 © 0.03 0.01 0.00 R eg Incon Entirely Reg Irreg Figure 9: Performance on matched subsets of items 5.4 Discussion Our results, like those of MacWhinney & Leinbach (1991), Plunkett & Marchman (1991), and Cottrell & Plunkett (1991), suggest that connectionist models can exhibit certain phenomena that Pinker & Prince (1988) see as central to an understanding of the past tense. While connectionist models can continue to address criticisms from traditional linguistics, they can also provide the means to forge ahead and predict phenomena that distinguish the connectionist theory from the traditional theory. As we achieve a better understanding of the capability of connectionist models, we can predict performance issues which can be empirically tested in people. 85 Two aspects of our models contributed to their relatively better performance. First, the phonological representation that we employed addresses many of the concerns that Pinker & Prince (1988) expressed concerning the Wickelphonology that Rumelhart & McClelland (1986) had used. Our representation of segments is motivated by articulatory constraints. The slot positions in the representation are motivated by independent evidence concerning the salience of the rime as observed in spelling to sound correspondences (Seidenberg & McClelland, 1989) and in the composition of the sub-regular classes of irregular verbs (Pinker & Prince, 1988). Verbs that are perceived to sound similar are therefore guaranteed to have similar representations. This encoding is by no means complete; however, it uses plausible featural, segmental, and syllabic representations, and avoids some of the problems with earlier approaches. Second, the simulations highlight the importance of using a realistic training regime (see also Hetherington & Seidenberg, 1989). Our first simulation clearly overrepresented the number of irregular types and tokens in the training set and its performance was inadequate. Once the training set was modified to have more realistic proportions, performance improved greatly. This result supports Plunkett & Marchman's (1991) finding that for English, the many classes of irregular past tenses are leamable only if they are limited in number. It is crucial that a simulation of this kind realistically represents the phenomena that it attempts to model. By understanding the nature of the input representation, the learning algorithm, the phenomena we were trying to capture, and the architecture of the 86 model, we were able to make predictions about the difficulty of learning different types of verbs. The frequency effects (i.e. the fact that frequency only affects irregular past tenses, not regular past tenses) indicate that a kind of phenomenon that Pinker (1991) cites as evidence for a rule may be simply captured within connectionist networks. Further work may determine whether all of the phenomena he cites can be accommodated in a similar way. In the second simulation, we observed the expected consistency effects. These effects have also been observed in experiments with human subjects and replicate results that have been obtained in another domain (spelling-sound correspondences; Seidenberg & McClelland, 1989). The consistency effects are not predicted by the earlier theories and strongly implicate the connectionist alternative. Connectionist models have been able to account for a wide range of issues previously thought to implicate a rule-based theory. We have only recently begun to look for phenomena that are predicted by a connectionist theory, but inconsistent with the traditional theory. We suggest that consistency effects are one such phenomenon. O f course, it will be necessary to develop our models further, as we do in the remainder of this thesis, to account for a wider range of verb-related phenomena. 87 CHAPTER 6: RELATING ERROR SCORE TO REACTION TIME A recurring criticism of connectionist models that propose to account for reaction time in people is their method of doing so. Typically, these models are implemented as feed-forward architectures. The error score between the generated output for a given word and its target output is compared to the reaction time for people to use the word in a psycholinguistic experiment. Yet how does one make the non-intuitive case for relating error score to reaction time? In this chapter; we first review connectionist network implementations that account for reaction time data from several domains. We present demonstrations of feed-forward networks that have been extended with recurrent connections and describe empirical data from these simulations showing the relationship between naming latencies in people, error scores in the feed-forward network, and number of iterations to settle in the recurrent networks. From this data, we submit an informal analysis of the relationship between feed-forward networks and recurrent networks that account for reaction time. We then implement the feed-forward network of the previous chapter as a recurrent network. The number of iterations for the recurrent network to settle to 88 a stable output will be analyzed, and it will be shown that the frequency by regularity interaction and the consistency effect of the previous model are not artifacts of a feed-forward architecture. Our claim of relating error score to reaction time will be justified by showing how error score in a feed-forward network relates to a recurrent network architecture in which an output is computed over many time steps rather than in a single step. 6.1 Background Until recently, connectionist models of language processing have been more concerned with demonstrating that a particular task could be solved, rather than showing how the performance of the model could match the performance of humans. In addressing the latter concern, models would have to account for the differences in performance on certain tasks, which is often measured as reaction time (RT) data in psycholinguistic experiments. Seidenberg & McClelland (1989) claim that feed-forward networks provide an adequate means to account for this data. In their spelling-to-sound model, they show that the error score of the generated output is closely related to the RT for subjects to name the same items. They describe a theory of word naming in which the phonological output of the naming process is recoded into a set of articulatory motor commands. Differences in RT result from differences in the quality of the computed phonological output. The model implements only the first step of this theory by generating a phonological output. “A word that the model ‘knows’ well produces phonological output that more clearly specifies its 89 articulatory-motor program than a word that is known less well (p. 531).” Thus, the error score of the output of the model can be directly compared to RT for subjects to name words. Several connectionist models have proposed methods to augment feed forward networks to account for RT in a more direct manner. In a straightforward extension to the spelling-to-sound model above, Lacouture (1988) shows that a decoding module can be added to the output layer in order to “clean-up” noisy outputs. The decoding module consists of an additional layer of completely interconnected units which are attached one-to-one to the output layer. This new layer is separately trained from the feed-forward network as a Brain-State-in-the- Box (BSB) model (Anderson, Silverstein, Ritz, & Jones, 1977; Anderson, 1983). A noisy phonological representation of each word in the training set is activated on the layer, and the module is allowed to modify unit activations for 10 iterations. Weight connections between units are adjusted according to the delta rule (Golden, 1985), in order to generate unit activations that are more similar to the idealized target. Lacouture shows that the decoding module will take the generated output of the feed-forward network as its starting set of activations and iteratively settle to the correct output. When the units in the decoding module stop changing activations, the output is considered to have settled. Lacouture demonstrates that the number of iterations to settle to a stable output is closely related to the error score of the generated output from the feed-forward portion of the model. Furthermore, both of these measurements are shown to relate well to RT for subjects to name selected groups of words. 90 In an entirely different approach, Cohen, Dunbar, & McClelland (1990) demonstrate that the cascade model (McClelland, 1979) can provide a mechanism to measure the time of processing for input patterns, and they relate this measurement to RT data. In the cascade model, a time-averaging version of the logistic unit activation function allows a unit’s activation to build up over time, rather than in a single step. Each unit is assured to reach an asymptotic activation level based solely on the input pattern and the connection weights. This particular demonstration accounts for RT data from the Stroop task (Stroop, 1935) in which subjects are asked to either name a word or name the ink color of written words. When asked to name the word, ink color has no effect on performance. But when asked to name the ink color, subjects are consistently slower when the word names a different color (e.g. the word GREEN written in red ink) than in the control task (e.g. a row of Xs written in red ink). In the model, outputs that take longer to reach stable activation levels relate well to subject responses that have longer RTs. The most promising approach to account for reaction time is not a simple extension to the feed-forward network, but instead a different kind of architecture altogether. These are known as recurrent networks and they encompass any architectures than allow unrestricted connections between units. We are particularly interested in implementations that add recurrent connections to a feed forward network because it is a simple matter to compare the performance between the two. Since these kinds of networks are informally described as “attracting” noisy outputs to stable patterns, they are often called attractor networks. We will now provide a thorough description of these networks. 91 6.2 Attractor networks Attractor networks refer to a class of models in which connections between units are not restricted to a feed-forward flow. Rather, activation may freely flow between different layers, or even between units within a layer. Thus, unlike feed-forward networks that generate an output during a single pass, these networks accept a fixed input and iteratively compute an output, refining and correcting it, until the output stabilizes over time. Because of this, recurrent networks are more psychologically plausible than feed-forward networks in accounting for language phenomena that occur over time. For a comparison between a simple feed-forward network and a simple recurrent network, see Figures 9 and 10. In these networks, it is assumed that the values of the weights between the input, hidden, and output units are all 1.0. However, the recurrent network in Figure 10 has an additional connection from the output unit to itself which has a value of -0.5. It is further assumed that the networks are fully trained and the units implement a linear activation function. In the feed-forward network of Figure 10, an input value of 1.0 generates a hidden unit activation of 1.0 and an output unit activation of 1.0 in a single pass. The state of each unit only has to be calculated once, because the flow of activation is strictly in a feed-forward manner. The output of any unit cannot affect the input of a previous unit. 92 © © I I © t=0 t=l All weights are 1.0 Figure 10: Activation flow of feed-forward network Figure 11 shows a recurrent network that is very similar to the feed forward network of Figure 10. The only difference is an additional connection from the output unit to itself. In this network, an input value of 1.0 generates an output value that continually changes values over the course of 4 time steps. From this simple illustration, it is clear that recurrent networks differ considerably from feed-forward networks. Since connections can exist between any two units, the input activations to a given unit may change from iteration to iteration. In Figure 11, we note that during time step 2, the inputs to the output unit consist of 1.0 from the hidden unit, giving the output unit a value of 1.0. But at time step 3, the inputs to the same unit consist of 1.0 from the hidden unit, and -0.5 from the output unit’s value at time step 2, giving it a value of 0.5. By time step 4, the input is again 1.0 from the hidden unit, but only -0.25 from its value at 93 time step 3, giving it a value of 0.75. It is evident that each unit must compute a new state at each iteration. t=0 t=l t=2 t=3 t=4 Weights between input, hidden, and output layer are 1.0; recurrent weight on output layer is -0.5 Figure 11: Activation flow of recurrent network The goal of a recurrent network is to learn a set of weights so that it can accept a set of fixed input unit activations and gradually settle to set of output unit activations that correspond to the correct target values for that input. Plaut (1991) conceptualizes this as a movement through a multi-dimensional space that has a dimension for the state of each unit. At any time, the current unit activations represent a point in this state space. The first iteration of the network in which activations reach the output units is considered to be the starting point. As the iterations of the network progress, the point in state space settles to a final point that corresponds to the correct output activations for the given input. This point is 94 called an “attractor” in state space3. Furthermore, there exists a region around the attractor such that if the starting point falls within the region, the network will settle to the attractor. This region is called the “basin of attraction”. Figure 12 illustrates this entire concept. m m m Figure 12: Multi-dimensional landscape with attractors In the figure, the surface represents the entire state space. Each dip in the surface can be considered a basin of attraction. The lowest point in each basin is an attractor. The operation of the network can be visualized as dropping a marble on the surface within a basin of attraction. This location represents the output generated by the network during the first iteration. The closer the marble is placed to the bottom of the basin, the less distance it must travel before it reaches the attractor, which represents a stable output of the network. Depending on the 3 In this thesis, w e discuss recurrent netw orks that develop point attractors, w hich correspond to outputs that stablilze to a static value. R ecurrent netw orks can also develop “lim it cycle” attractors (Pearlm utter, 1989) as well as “chaotic” attractors (Skarda & Freem an, 1987). 95 size and shape of the basin of attraction, the values of the output units will differ by varying degrees from the desired target values during the initial time steps, and will become closer after several iterations of the network. This illustration brings up a possible shortcoming of the recurrent network approach. What if the starting point in the state space is not within a basin of attraction? This might occur if the network is given a novel input that is sufficiently different from all training set exemplars. For example, if the spelling- to-sound model is implemented as a recurrent network, how would the network respond to a nonword input?. The appropriate output should not be a training set item, but should be an entirely novel phonological representation. The answer depends on the kinds of attractors that develop during training. If the attractors correspond to individual patterns of target activations from the training set (e.g. words), then the basins of attraction will not adequately cover the state space, and a novel input may not be correctly processed. If, however, attractors develop that correspond to subsets of patterns (e.g. phonemes), then novel inputs may generate reasonable outputs, provided the attractors span the entire state space. In the recurrent network we develop later in the chapter, we address the issue of generalization in an attractor network. We next look in detail at two recurrent networks that address a domain which is very similar to our work with the past tense— spelling to sound correspondences. In these models, the number of time steps to settle to stable outputs is carefully analyzed for different sets of spelling patterns and is shown to correspond well with people’s naming latencies. This analysis compares 96 favorably with earlier feed-forward networks that address the same phenomenon by relating error score with reaction time for the same sets of words. 6.2.1 Plaut model of orthography to phonology In Seidenberg & McClelland’s (1989) spelling-to-sound correspondence model, a connectionist network was shown to account for both regular and irregular word pronunciations in a single network. The most common pronunciations of spelling patterns are considered regular (e.g. GAVE, PAVE, RAVE) while uncommon pronunciations are considered irregular (e.g. HAVE). Once the model was fully trained, it was observed that regularly spelled words, which have shorter naming latencies in people, generated outputs that were closer to their desired target outputs than irregularly spelled words, which have longer naming latencies. Although this correlation was demonstrated to be significant, there was no method to test the time course of processing for a given pronunciation, since the model used a feed-forward architecture. In a series of simulations, Plaut (1991) reimplemented their model as an attractor network. The input, hidden, and output layers were preserved and activation flowed in a feed-forward manner between these layers. Recurrent connections were used on the output layer only such that each output unit was connected to every other output unit4. The input to the network was an 4 A lthough recurrent connections were allow ed on the output units only in this m odel, Plaut (1991) perform ed num erous sim ulations on an attractor netw ork that m apped orthography to sem antics to phonology. He system atically varied the location and num ber o f recurrent connections in the m odel, and found that sim ilar results w ere achieved for all versions. He reports that the existence o f any recurrent connections allows the appropriate basins o f attraction to develop, although som e versions o f the m odel w ere trained m uch faster than others. 97 orthographic representation of a monosyllable word and the output was the phonological representation of the word. W eight correction was accomplished by a variation of the backpropagation learning algorithm (Rumelhart, Hinton, & Williams, 1986) known as back propagation through time (see Appendix B), so that the desired pronunciation of each word would be more likely to be generated as the spelling patterns were learned. During training, unit activations were allowed to iterate for eight time steps for each input pattern. Once trained, each word of the training set was presented to the network and allowed to generate an output for up to seven iterations. As is typical in these networks, the output activations during the initial time steps were quite different from the desired target activations, but became closer as the time steps progressed. At each iteration the generated output was compared to the previous time step to determine if the output activations had become closer to the target activations. If not, the output was considered to have settled to a stable state. It was demonstrated that words with short naming latencies in people took fewer iterations to settle to the desired output than words that have longer naming latencies. These results are very similar to Seidenberg & McClelland’s demonstration in which words with short naming latencies had lower error scores than words with longer naming latencies. Thus in the domain of spell to sound correspondences, it was demonstrated that both error score in a feed-forward network and number of iterations to settle in a recurrent network are related to the reaction time in naming words for people. 98 6.2.2 Kawamoto & Zemblidge model of homograph pronunciation In the Seidenberg & McClelland (1989) model, there is no way to examine the time course of processing in word pronunciation. Also, the model cannot distinguish between different pronunciations of homographs such as BASS (musical instrument vs. fish) since the input to the network consists of only the orthographic representation of a word. For all homographs, there is both a regular and irregular pronunciation. The regular pronunciation (e.g. BASS - fish) corresponds with the most common pronunciation of words with similar spelling patterns (e.g. PASS, MASS, LASS). Kawamoto & Zemblidge (1992) report that during psycholinguistic experiments, the mean latency to name the regular pronunciation of homographs is faster than the mean latency of the irregular pronunciation, despite the fact that the irregular pronunciation is much more frequent in the language. Kawamoto & Zemblidge augmented the Seidenberg & McClelland model to include the part of speech and the meaning for each word as well as its orthography as the input to the network. The input, hidden, and output layers were preserved from the original model and recurrent connections were added to the output units to allow the pronunciation of the word to be iteratively generated over time. A corpus of both homographs and non-homographs comprised the training set. During training, activations were allowed to settle for twelve time steps. Irregular pronunciations of the homographs were presented much more frequently than their regular pronunciations. 99 After training, it was demonstrated that the regular pronunciation of homographs (e.g. BASS - fish) took fewer iterations to settle than the irregular pronunciation (e.g. BASS - musical instrument), even though the irregular pronunciation is much more frequent in the training corpus. Upon analysis of the time course of processing, the pronunciation of a word appeared to be initially affected by the spelling to sound correspondence, but is subsequently refined by other factors, such as its meaning and part of speech. Since spelling to sound correspondences are largely very consistent, the “regular” pronunciation of homographs is easier and quicker to activate. These results can be compared to the Seidenberg & McClelland feed forward model in which low frequency irregular words had higher error scores than low frequency regular words. The homographs used in the Kawamoto & Zemblidge model were relatively low in frequency for both the regular and irregular pronunciations. Furthermore, the irregular pronunciation of a homograph took more iterations to stabilize than its regular pronunciation. Thus, both the number of iterations to settle for low frequency words in the recurrent network as well as the error score of low frequency words in the feed-forward network was demonstrated to relate well to naming latencies for people pronouncing these same words. 6.2.3 The relationship between error score and iterations to settle We have presented a variety of demonstrations of specific connectionist models that account for reaction time, showing that feed-forward networks and 100 recurrent networks are related. In the spelling-to-sound task, the model was required to map similar inputs to similar outputs for the regular words, as well as similar inputs to different outputs for the exception words. At this time, the relationship has only been empirically established through the performance of both feed-forward and recurrent networks on the same task. In this section, we present an informal analysis of this association. A feed-forward network can be intuitively thought of as a special case of the recurrent network, in which only one iteration of processing is allowed. In the feed-forward case, each unit can compute its activation state only once for a given input pattern. This creates a conflict for mapping similar inputs to different outputs. Feed-forward networks are biased toward “averaging” across all input to output mappings, thus having a tendency to generate similar outputs for similar inputs. This tendency can be overcome with either many layers of non-linear units or with very large weights between units, coupled with lengthy training times. However, the consequence in most feed-forward implementations, including the spelling-to-sound model and the past tense model, is that similar inputs which map to very different outputs interfere with each other, and generate outputs that are distant from their target values. This distance is often reported as the sum of squared error. In a recurrent network, the non-linear units contribute to each iteration of processing, in effect behaving as if they comprise many layers . Similar input patterns will initially generate similar start states. But as long as these start states lie within different basins of attraction, the small differences in the inputs can be magnified as the start states approach different attractors. The net effect is that 101 similar input patterns can eventually generate very different output patterns. The number of iterations for the start state to reach its attractor clearly depends on the distance between the two. Since the feed-forward network is equivalent to a recurrent network that processes for only a single iteration, the generated output of the feed-forward network can be considered the start state in the recurrent network. As the recurrent network continues to process the input, the start state will approach its attractor. The sum of squared error is a quantitative measurement of the distance between the start state and its attractor, and provides a good indication of the number of iterations the recurrent network requires to settle to a stable set of activations for a given input. O f course, this informal description requires a great deal of work before a formal analysis can be provided, which is beyond the scope of this thesis. Our goal is merely to provide the basis for an intuitive grasp on the relationship between error score and number of iterations to settle. 6.3 A recurrent version of the past tense model We have seen empirical evidence that the error score in the feed-forward version of the spelling-to-sound model is a good indication of the number of iterations to settle in the recurrent version. Therefore, we felt it important to demonstrate that the past tense model developed in chapters 4 and 5 could exhibit similar results as a recurrent network. Our goals in this model are threefold. First, we will test the ability of the recurrent network to learn the same training set 102 as the feed-forward network. Second, the recurrent network’s ability to generalize by generating the past tense of novel inputs will be tested. And last, we will determine if there is a relationship between the frequency and consistency effects described in chapter 5 and the number of iterations to settle to a stable output in a recurrent network. If the recurrent network could show that the classes of verbs which have longer past tense latencies for people and larger error scores in the feed-forward model also take longer to settle in the model, then our claim that error score in a feed-forward network is related to reaction time in people would be justified. 6.3.1 Network architecture The architecture used for the feed-forward version of the model was modified to create a recurrent version as seen in Figure 13. The connections between the input, hidden, and output layers were preserved from the original model. A new layer of 50 clean up units receives activation from the output layer and passes activation to the output layer. We do not make any claims that these particular recurrent connections are optimum or even psychologically valid; we merely chose this structure as a simple extension to the feed-forward architecture from Chapter 4. 103 in p u t layer h idden layer OOO OOP clean up output layer layer Figure 13: Architecture of the recurrent model 6.3.2 T raining The RBP, or recurrent back propagation, simulator obtained from McClelland was used to train the model on the same corpus as the feed-forward version (see Appendix C, section C.2 for a complete list of these items) using the backpropagation through time learning algorithm (see Appendix B). This algorithm is a direct extension of the backpropagation learning algorithm (Rumelhart, Hinton, & Williams, 1986). Error derivatives are calculated during each iteration of the network in which target activations are specified. Weight changes are then calculated for each pattern in the training set to reduce the error during these iterations. The accumulated weight changes for all patterns are implemented after each epoch of training. During training, each verb was allowed to iteratively generate an output for 12 time steps. The present tense is clamped on the input units during the 104 entire 12 iterations. Activation from the input layer reaches the hidden layer during step 1, and is passed to the output layer during step 2. Thus, target activations are only specified during iterations 2 to 1 2 and weight updating takes place during these steps. The clean up units first receive information from the input units by step 3, and pass it back to the output units by step 4. All units are contributing to the generation of the output units from iterations 4 to 12. As in the previous version, the present/past pairs were probabilistically presented during training according to the logarithm of their frequency. The learning rate was set at 0.001 and the momentum was set to 0.9. The model was trained on this corpus for 2700 epochs, at which point learning approached asymptote. The weights were frozen and the training phase was completed. The results below were averaged over three training sessions with random initial weights. 6.3.3 Performance on training set verbs Each of the present tense verb stems in the training corpus was presented to the trained model and a past tense form was generated over 1 2 iterations of the network. The past tenses during time steps 2 to 12 were compared to the target or correct past tense. Only verbs that generated complete segment-by-segment matches by the twelfth time step were considered to have been learned. As in the feed-forward version of the model, all 309 (100%) of the regular past tenses were learned. In addition, 22 of the 24 irregular verbs (92%) were learned, which was one less than the feed-forward model. The 2 errors on irregular verbs were the 105 same as those made in the feed-forward version of the model: FALL-FELLED, an overregularization error, and WIN-WAUN, a vowel error. Possible explanations for these errors are given in Chapter 4. As in the feed-forward model, these errors are typical of the kinds observed in children during naturalistic studies and in adults during psycholinguistic experiments. It is interesting that any errors occurred on these items at all. A priori, one might speculate that the computational power of attractor networks resulting from applying iterations of non-linearities would allow it to learn all training set items. This was not the case. The one common factor between the recurrent network implementation and the feed-forward version is the phonological representation. In Chapter 4, we speculated that many of the errors appeared to be due to a deficiency in this representation. Since the errors on training set and generalization items (as described in the next section) in the recurrent network are very similar to those in the feed-forward network, we have additional evidence that this representation rather than our architectural choice is a contributing factor to many of the past tense errors. However, we still believe that this representation is adequate for our purposes, as we describe in the general discussion of Chapter 4. 6.3.4 Performance on novel verbs We can examine the kinds of attractors that develop during training by testing the m odel’s performance on novel verbs. If the network has developed word-level attractors, it will be unable to produce the correct output for novel 106 verbs, but will instead generate the closest past tense form from the training set. If, however, the network develops attractors for sub-units of words, such as phonemic attractors, then it should be able to correctly inflect novel verbs. We used the same generalization set of regular verbs as in the previous simulation (see Appendix C, section C.3 for a complete list of these items). On these trials, 106 of 112 regular past tenses (95%) were correctly generated, which is the same performance as the feed-forward version of the network. 4 of the 6 past tenses that were incorrectly generated were identical to the feed-forward version and are described in more detail in Chapter 4: MERGE-MERGT, CLINK-CLANGT, WANE-WONE, MEW-VIEW. The 2 additional incorrectly generated past tenses were LINK-LANGT and PEEK-PICKED. As before, these can be described as assimilation with verbs in the training set. LINK-LANGT is similar to CLING-CLANG and PEEK-PICKED is similar to PICK-PICKED. These errors are also similar to the kinds observed in people. In addition to novel regular verbs, we tested the model on a same set of 46 novel irregular verbs as in the previous simulation (see Appendix C, section C.4 for a complete list of these items). The results on these items was identical to the previous simulation— 36 of 46 irregular past tenses (78%) were correctly generated as regular past tenses. The 10 incorrectly generated past tenses are described in detail in Chapter 4. The successful performance of the model on generalization trials indicates that an attractor network can indeed generalize properly to novel verbs in a past tense model. Rather than merely learning the exact patterns in the training set, the network leams the mappings of individual phonemes from present to past tense. 107 The fact that errors do occur in the generalization trials indicates that the attractors cannot be characterized as purely phonemic attractors. The novel words presented to the model all conform to well-formedness standards for English, and thereby contain phonemic substrings that are familiar to the model. Since many of the novel word errors appear to be due to assimilation with training set items, the attractors appear to be sensitive to conjunctions of phonemes that occur in the training set. In the next section, we discuss the nature of the attractors that develop in this model. 6.3.5 Analysis of the attractors As we described in Chapter 4, the mapping of the regular past tense is much more componential than the irregular past tense. In our analysis of the feed forward model, we demonstrated that for regular verbs the network can partially map the individual phonological clusters of the present tense to the past tense, but must consider the entire present tense of irregular verbs when generating their past tense. We now relate these results to the attractor network developed in this chapter. As in the analysis of the feed-forward model, we compose test items of regular and irregular training set verbs that are missing either their onset, vowel, or coda. When these items are presented to the trained model, we examine the error score of the onset, vowel, and coda of the generated past tenses. Table 5 displays these average error scores for regular and irregular verbs in the training 108 set during time step 2 , which is the earliest iteration that targets are specified for the outputs. Table 5: Time step 2 average error score in phonological clusters with partial inputs Past tense Regular Irregular Present tense Onset Vowel Coda Onset 5.22 0.09 0 . 0 2 Vowel 0.44 1.99 0.14 Coda 0 . 0 2 0.13 5.55 Onset 5.78 1.09 0.08 Vowel 0.33 1.09 0.18 Coda 0 . 0 0 1.38 2.75 The results are very similar to those in the feed-forward network. The high error score down the diagonals of the table for both regular and irregular clusters indicates that each phonological cluster in the past tense is dependent on the corresponding cluster in the present tense. For regular verbs, the clusters in the past tense are largely insensitive to non-corresponding clusters in the present tense, although there is a very slight dependency between the vowel and the coda. For irregular verbs, the vowel in the past tense is adversely affected by the absence of either the onset, vowel, or coda clusters in the present tense. 109 Table 6 displays average error scores during time step 12, at which time the output activations have reached their final state. We observe that the network becomes slightly more sensitive to the componential structure of the regular verbs, as seen in the lower error scores of the vowel-onset and onset-coda entries. Likewise, the non-componential nature of the irregular verbs is reinforced, as seen in the higher errors scores of the onset-vowel and coda-vowel entries. Table 6 : Time step 12 average error score in phonological clusters with partial inputs Past tense Regular Irregular Present tense Onset Vowel Coda Onset 5.27 0.09 0 . 1 1 Vowel 0.40 2.06 0.14 Coda 0 . 0 2 0 . 1 1 5.73 Onset 5.82 1.15 0 . 1 1 Vowel 0.33 1.15 0.17 Coda 0 . 0 0 1.41 2.80 Taken together, these results suggest that two kinds of attractors develop to generate the past tense. For regular verbs, componential attractors become sensitive to the individual phonological clusters of the present tense. Thus, separate attractor basins develop to generate the onset, vowel, and coda clusters of the past tense. Since the attractors are not sensitive to entire words, the individual 110 clusters of novel verbs will fall into these attractor basins thereby generating an appropriate past tense output. For irregular verbs, non-componential attractors develop that are sensitive to the entire present tense. Basins develop for each irregular verb in the training set, and override the tendency to partially map individual clusters to the output. The errors generated by the model on both training set and novel items can be thought of as an inability of the network to correctly partition the state space into attractor basins such that an individual verb will fall into the appropriate basin(s). 6.3.6 Analysis of the hidden units and clean up units As we did with the feed-forward network, we evaluate the nature of the 2 0 0 hidden units to determine the degree of overlap between units that contribute to the production of regular and irregular past tenses. If the network merely implements the separate pathways for regular and irregular verbs as stated in the traditional theory, then a large degree of overlap would not be expected. Both intact and partial present tense inputs are presented to the trained model. We then determine which hidden units are most active for each phonological cluster of the input. Table 7 shows the number of units that contribute most to each phonological cluster which are shared between the regular and irregular verbs. Ill Table 7: Number of shared hidden units for each cluster in recurrent model Number Onset Vowel Coda 50 37 (74%) 40 (80%) 33 (6 6 %) 1 0 0 76 (76%) 60 (60%) 42 (42%) Similarly to the feed-forward network, we observe that there is considerable overlap in the most active hidden units for each phonological cluster in the regular and irregular verbs. When we examine the 50 most active hidden units that contribute most to the onset, 37 of these are shared between regular and irregular verbs. Likewise for the 50 most active units contributing to the vowel cluster, 40 of 50 are shared, and 33 of 50 units are shared for the coda cluster. If we examine the 1 0 0 most active hidden units for each cluster, a similar proportion of units overlap between regular and irregular verbs. Thus, the network does not appear to have partitioned itself into two sub-networks that generate the regular and irregular past tense independently of each other. The same hidden units contribute to the production of both kinds of verbs. With the recurrent network, we can also examine the contribution of the 50 clean up units with respect to the individual phonological clusters of the input. Table 8 shows the number of units that contribute most to each phonological cluster which are shared between the regular and irregular verbs. The 25 most active units for each cluster are considered and results are shown for time steps 3 through 1 2 , during which time the clean up units receive activation. 112 Table 8: Number of shared clean up units for each cluster out of the most active 25 units Iteration Onset Vowel Coda 3 6 (24%) 7 (28%) 2 (8 %) 4 6 (24%) 7 (28%) 2 (8 %) 5 6 (24%) 10 (40%) 4 (16%) 6 5 (20%) 10 (40%) 4 (16%) 7 5 (20%) 11 (44%) 5 (20%) 8 5 (20%) 11 (44%) 5 (20%) 9 5 (20%) 11 (44%) 5 (20%) 1 0 5 (20%) 11 (44%) 5 (20%) 1 1 5 (20%) 11 (44%) 5 (20%) 1 2 5 (20%) 11 (44%) 5 (20%) The degree of overlap between the clean up units is much less than that of the hidden units, indicating that they are more sensitive to the differences between regular and irregular verbs. These units appear to enhance the differences between the two kinds of mappings during the time course of processing. In effect, the clean up units can be thought of as guiding regular and irregular verbs toward different basins of attraction. 113 6.3.7 The time course of processing Once the training set was adequately learned, we observed the time course of processing for each verb in the training set by activating the present tense on the input units and generating the past tense over 12 iterations. The mean error score for time steps 2 through 1 2 during the generation of selected verbs is shown in Figure 14. The pattern observed here is typical of all verbs. The error score for steps 2 and 3 is initially high, and decreases sharply by step 4. This is due to the activation from the clean up units which first affect the output during this iteration. The error score then slowly continues to decrease until it flattens out completely sometime before step 1 2 . Since this kind of pattern was observed for all verbs, we thought it reasonable to count the number of iterations until the output stabilized for each verb and relate it to reaction time. Our basis in considering an output to be stable is that the error score must not decrease for 3 consecutive time steps. According to this strict criterion, the verbs BUY-BOUGHT, CREEP-CREPT, WALK- WALKED, and ACHE-ACHED settle in 8 , 10, 6 , and 6 iterations respectively. However, we observe that if we relax the requirement for settling, the number of iterations can be considerably different. For example, in BUY-BOUGHT, the error score only changes by a very small amount from step 5 to step 6 . If we allow a delta parameter by which the error score must change before it is considered to have decreased, the output patterns may settle more quickly. In the results of the next section, we vary the settling criterion by using different delta parameters and compare the results. 114 1 > h . © o ha © h a h a [23 re re o > 0.8 0.7 0.6 0.5 0.4 0.3 2 4 6 8 10 12 Time Step BUY-BOUGHT ( 8 steps) Time Step WALK-WALKED ( 6 steps) 0.06 v 0.05 £ 0.04 fa* £ c 0.03 Ed 0.02 S 0.01 re re v 0.00 Time Step 0.006 0.005 0.004 0.003 0.002 0.001 0.000 CREEP-CREPT (10 steps) 0.010 0.009 f a * © © /: 0.008 f a . o f a . 0.007 a e 0.006 0.005 0.004 2 4 6 8 10 12 Time Step ACHE-ACHED ( 6 steps) Figure 14: Time course of processing for selected verbs 6.3.8 Frequency and consistency effects We next determine if the recurrent model has the expected frequency by regularity interaction exhibited by the feed-forward model in Chapter 5 and observed in people. W e used the same sets of the 10 highest frequency regular verbs, 1 0 lowest frequency regular verbs, 1 0 highest frequency irregular verbs, and 10 lowest frequency irregular verbs from the training set. Only verbs that generated the correct past tenses were selected (see Appendix C, section C.5 for a 115 complete list of these items). The number of iterations for the model to settle on these items is shown in Figure 15, according to a variety of settling criteria. v > O . - 2 V E 9 8 7 6 5 HIGH LOW delta = 0 . 0 0 0 v> O . 2 V 3 7 Regulars Exceptions 6 5 4 3 LOW HIGH delta = 0 . 0 1 0 Vi Q . £ V i V E P R egulars E xceptions LOW HIGH delta = 0 . 0 0 1 a a w a « E H 6 Regulars Exceptions 5 4 3 LOW HIGH delta = 0 . 1 0 0 Figure 15: Frequency and regularity effects in recurrent model As in the feed-forward model, frequency has little effect on performance for regular verbs, since on average both high and low frequency regular verbs took the same number of steps to settle. For irregular verbs, high frequency items are quicker to settle to stable patterns compared to low frequency items. 116 Although the frequency by regularity interaction can be observed regardless of the settling criteria, a delta value of 0 . 0 0 1 provided the largest effect. Our final test is to determine if the consistency effects exhibited by the feed-forward model in Chapter 5 and observed in people also exist in the recurrent network. W e use the same sets of 20 entirely regular, 20 regular inconsistent, and 20 irregular verbs from the training set equated in terms of frequency. Only verbs that generated the correct past tenses were selected (see Appendix C, section C . 6 for a complete list of these items). The number of iterations for the model to settle on these three types is shown in Figure 16, according to a variety of settling criteria. As in the feed-forward version of the model, the recurrent network shows the graded effects of the consistency of the mapping between the two kinds of “regular” verbs and irregular verbs. Irregular verbs take longer on average to settle than regular inconsistent verbs. Entirely regular verbs are generated even more quickly. Varying the settling criteria for these verbs did make a difference. With a delta value of 0.10, the consistency effect is not observed in the model. This is most certainly due to a floor effect. The lowest possible number of iterations to settle in the model is 4, since target values are only specified from time step 2, and the output must be non-decreasing for 3 consecutive time steps to be considered as settled. Using a delta value of 0.10, both regular inconsistent and entirely regular verbs settle in the shortest possible number of iterations, thereby not exhibiting the consistency effect. 117 f/i f i . Vi o > a P 6 ■ Irreg Reg Incon Entirely Reg delta = 0 . 0 0 0 CA f i e V 5 O ) a 7 6 5 4 3 Irreg Reg Incon Entirely Reg delta = 0 . 0 1 0 ( A a . £ V 3 i P 8 7 6 5 4 Irreg Reg Incon Entirely Reg delta = 0 . 0 0 1 C L . £ Vi < u E P 6 5 4 3 Irreg Reg Incon Entirely Reg delta = 0 . 1 0 0 Figure 16: Performance on matched subsets of items in recurrent model 6.4 Discussion Connectionist models of word processing typically are evaluated in two ways. First, it is determined if a training set that represents the phenomenon of interest can be learned, and second, if the knowledge in the trained model can be applied to novel inputs. In the past tense domain, feed-forward architectures have almost exclusively been used for this level of analysis. Since the output of the 118 model is merely judged as either correct or incorrect, it has not been necessary to study the time course of processing. Recently, these models have been required to account for behavior in people that cannot be described as simply correct or incorrect. Certain words take longer for people to say than others, but are still said correctly. Feed-forward models compute all outputs in a fixed amount of time. However, it has been demonstrated empirically in a number of simulations that the output of a feed forward network has a relationship to the number of iterations to settle in a recurrent network. Our previous model developed in Chapter 5 showed that error score in a feed-forward network relates well to reaction time in people to generate past tenses. The goal of this chapter has been to show that the results of the feed forward network hold true in a recurrent version of the same network. Our reimplementation of the past tense model as a recurrent network was successful in a number of ways. The model learned to correctly generate the past tense for a vast majority of the training set verbs. It also generalized appropriately to novel verbs, thereby addressing the kinds of attractors that will develop in a past tense model. An analysis of the network reveals that componential attractors which are sensitive to phonological clusters generate the regular past tense for both known and novel verbs, while non-componential attractors which are sensitive to entire words generate the irregular past tense. We also observed that although many of the same hidden units participate in the generation of both regular and irregular past tenses, the clean up units are partitioned to guide regular and irregular verbs toward different basins of attraction. Although the recurrent network architecture is more proficient than the feed-forward architecture at 119 learning complex input to output mappings, our new model did not perform better than the original model. We deduce that the errors made by both models are due to a deficiency in the phonological representation, rather than our choice of network architecture, as we first noted in Chapter 4. As a psychological model of generating the past tense, the recurrent network can be related to human performance in the following way. Consider that a person will not utter an inflected form until it has been computed by a language module. This computation is a dynamic process and remains in effect until the generated form stabilizes. The longer the past tense form takes to stabilize, the longer the RT in the subject when generating the past tense. Our recurrent network is a direct implementation of this idea. An output is not considered to have stabilized until its error from the target activations is no longer decreasing. The frequency by regularity interaction and the consistency effect are phenomena observed in people that can be addressed by a connectionist model that accounts for RT. Both of these are demonstrated by our recurrent network. The effects are robust and can be observed with a variety of settling criteria in the model, although the least strict criteria subjects the results to a floor effect. We have shown in our model that the number of iterations to settle to stable past tense outputs relates well to the RT measured in subjects uttering the same past tense forms. Given the successful account of both the frequency by regularity interaction and the consistency effect in this chapter, we feel justified in using feed-forward networks to model reaction time data in people. Although recurrent 120 networks appear to be more psychologically plausible in accounting for the time course of processing, we have observed very similar results by evaluating the error score in the feed-forward models. Additionally, a feed-forward model is less computationally intensive than an equivalent recurrent network by at least a factor of 10. The recurrent network has the additional parameters of the number of clean up units and the number of iterations to allow the network to settle for each pattern, which adds a degree of complexity to determining the optimum architecture of the model. Our limited computational resources make the feed forward option much more appealing. The remainder of the thesis will describe several more connectionist models that are extensions of our original model in which a variety of past tense related phenomena will be addressed. Some of these models will be evaluated by the number of correct and incorrect outputs, while others will relate their outputs to reaction time. Due to the results of this chapter and our limited computational resources, all extensions of our original models will be implemented as feed forward networks. 121 CHAPTER 7: THE ROLE OF SEMANTICS In response to the many criticisms of Pinker and Prince (1988), several connectionist models have been developed that compute a mapping between a present tense phonological form of a verb to a past tense phonological form. Most of these models cannot distinguish between homophones such as FLY-FLEW and FLY-FLIED (as in “flied out”), where one homophone is derived from a noun meaning (denominal) and the other homophone is derived from a verb meaning (deverbal). Kim, Pinker, Prince, & Prasada (1991) have suggested that the addition of semantic information to such nets will not provide an adequate solution to this homophony problem. They show evidence that English speakers are aware of the grammatical category a novel or nonstandard verb is derived from (i.e. its derivational status) and use this information to generate the appropriate past tense form. Their data consists of past tense preference ratings for uncommon verb usages that are derived from either nouns (denominals) or verbs (deverbals). They further demonstrate that semantic information is not an adequate substitute for derivational status information. This chapter provides evidence that contradicts their account (see also Daugherty, MacDonald, Petersen, & Seidenberg, 1993). We conduct a 122 psycholinguistic experiment in which subjects’ rated preferences for past tense forms are predicted by semantic measures; moreover, we demonstrate a simulation model which shows that semantic distance provides a basis for learning the alternative past tenses for words such as FLY. We suggest a reconciliation of the two theories in which knowledge of “derivational status” arises out of semantic facts in the course of learning. 7.1 B ackground Pinker (1991) and Kim et al. (1991) describe some past tense phenomena that remain problematical for connectionist models. These concern homophonous verbs with different past tense forms. For example, the past tense of the verb FLY (meaning "airborne movement") is FLEW. There is an alternative, baseball-related sense of FLY, the past tense of which is FLIED ("the batter flied out to center"). Connectionist models that map from the phonological form of the present tense to the phonological form of the past tense (such as Rumelhart & McClelland's) cannot learn such alternative forms (see, however, MacWhinney & Leinbach, 1991; Hoeffner, 1992). This limitation has been repeatedly mentioned as a failing of connectionist models of the past tense (see Pinker & Prince, 1988; Marcus, Pinker, Ullman, Hollander, Rosen, & Xu, 1993; Pinker, 1991). The standard account in the traditional theory is that FLY-FLEW is a known irregular verb and the irregular flag is associated with its lexical entry. Therefore, FLY will generate the appropriate irregular past tense. On the other 123 hand, FLY-FLIED is derived from the noun “pop fly” from baseball jargon. When a verb is derived from a noun, it cannot have an irregular flag associated with it, since the irregular flag would have no meaning for a noun. If no irregular flag is present, the regular rule is applied to a lexical entry, thereby generating the regular past tense. One obvious suggestion for connectionist models to solve the homophony problem is to introduce semantic information. For example, the conjunction of the phonological form FLY and the meaning "airborne movement" would indicate that the past tense is FLEW, whereas the conjunction of FLY and the meaning "creating a fly ball" would indicate FLIED. Generation of the past tense would be treated as a mapping problem in which there are partial cues from phonology, meaning, and possibly other sources. Pinker (1991) and Kim et al. (1991) suggest that this solution will not work, however. They observe that the semantics of verbs are not very good predictors of past tense morphology. Pinker (1991) believes that a network encoding relationships between meaning and phonology will necessarily tend to form the same kind of past tense for semantically-related verbs. It is well known, however, that nets with attractors (e.g., Hinton & Shallice, 1991; Hoeffner, 1992) can learn to map similar inputs onto dissimilar outputs without massive interference or over generalization. A more serious problem is that Kim et al. provide evidence that the derivational status of a verb determines past tense morphology, not semantics. Subjects were asked to rate their preferences for regular vs. irregular forms of novel and nonstandard verbs that were derived from either nouns (denominals) or verbs (deverbals). Consider FLY again. According to Kim et al., the sense of 124 FLY in (la) is derived from the noun FLY [fly ball]. The sense of FLY in (2b) is said to derive from the verb FLY [airborne movement]. In general, subjects preferred the regular past tense for denominals and the irregular past tense for deverbals (see Appendix D, section D. 1 for a complete list of subject ratings for past tense forms). Thus, derivational status apparently determined the formation of the past tense. Denomial: 1. Wade Boggs has a bad habit of hitting fly balls into center field. a. Yesterday, he got one hit, and then flied out twice. b. Yesterday, he got one hit, and then *flew out twice. Deverbal: 2. The math professor flies off the handle at the slightest things. a. Last week, he * flied off the handle when one student talked during class. b. Last week, he flew off the handle when one student talked during class. An alternative hypothesis (Lakoff, 1987) is that past tense preferences are determined by the distance between the meaning of the derived verb and the central meaning of the existing irregular verb (see Figure 17). The past tenses of FLY [out to center] and FLY [off the handle] are determined by their distances from the central meaning of FLY [airborne movement]. Kim et al. obtained subject ratings on the distance of each verb sense from the central meaning of the verb (see Appendix D, section D. 1 for a complete list of subject ratings on 125 semantic distance). Subjects preferred FLEW [off the handle], which they also rated as close to the central meaning of FLY, and FLIED [out to center], which they rate to be more distant. However, Kim et al.'s data only partially supported this account. Rated distance from the central meaning was correlated with past tense preferences; however, there were residual effects attributable to derivational status. Thus, the authors concluded that the facts cannot be explained entirely in terms of semantic distance. FLY - d ev e rb al v erb [o ff h an d le] p a s t te n s e : FLEW FLY - ce n tral m eaning [a irb o rn e m o v em en t] p a s t te n s e : FLEW FLY - denom inal v erb [ to c e n te r field] p a s t te n s e : FLIED Figure 17: Past tense depends on distance from central meaning Of course, there is nothing about connectionist models that precludes encoding derivational status as a constraint on past tense formation. Nonetheless, we thought it might be premature to abandon the semantic distance hypothesis. There are two principal issues. The first is that there is some question about the relevant measure of semantic distance. Kim et al., following Lakoff s informal suggestion, assessed distance from the central meaning. However, FLY has several secondary meanings: "to rush; to run;" "to flee; to try to escape;" “to react explosively; to burst." We will collectively refer to these as the "aggressive 126 motion" sense of FLY, all of which take the irregular past tense. The fact that the past tense of FLY [off the handle] is FLEW would be explained by its relative proximity to FLY [aggressive motion]. The fact that the past tense of FLY [out to center] is FLIED follows from the fact that it is more distantly related to either primary sense of the verb FLY (Figure 18). FLY - d ev erb al verb [off handle] p a s t te n s e : FLEW FLY - n e a re s t m ean in g [a g g re ssiv e m otion] p a s t te n s e : FLEW FLY - d en o m in al verb [to c e n te r field] p a s t te n s e : FLIED Figure 18: FLY [off handle] is related to FLY [aggressive motion], not FLY [airborne movement] Harris (1992) obtained a measure of the distance of a derived meaning from the closest existing verb meaning, rather than the "central" meaning. These ratings are reported in Appendix D, section D .l. This semantic distance measurement was again correlated with past tense preferences. However, derivational status still accounted for a significant portion of the variance in her data. Hence, Harris suggested that both semantic distance from existing meanings and derivational status are relevant. In the example in the figure, we must also consider the historical implications of when the term was first coined. “Pop fly” is a common parlance 127 in baseball jargon. When a batter hits a pop fly to right field, a fan, sports columnist, or sportscaster can either say that the batter “FLEW out to right field” or that the batter “FLIED out to right field”. Either usage is probably acceptable to a listener who is familiar with the sport. But to a naive listener, the first usage may convey the highly unlikely and unintended meaning that the batter somehow propelled his body through the air to a location on the playing surface. The proximity of this usage to the central meaning of FLY [airborne movement] most certainly influenced the person who first coined the phrase to choose “FLIED out”, to ensure that the proper meaning is conveyed. It is interesting to note that in an informal survey, speakers who are completely unfamiliar with baseball jargon usually choose “FLEW out”, whereas baseball fans choose “FLIED out”. This uncertainty of the preferred past tense form is reflected in the subject ratings, with “FLIED out” being chosen over “FLEW out” by a very narrow margin. A second problem concerns the derivational status factor itself. Pinker (1991) and Kim et al. (1991) assume that grammatical category— whether a word is a noun or verb— determines the derivation of the past tense. Derivational status is quite confounded with semantic distance from existing verb meanings, however. In general, deverbals are closer in meaning to existing meanings than are denominals. Deverbals such as "break in a new employee" or "fly off the handle” typically overlap with or metaphorically extend an existing meaning. Denominals, however, are derived from a noun that happens to sound like an existing verb but can be completely unrelated in meaning to it. For example, Kim et al. compared preferences for the deverbal BREAK (he breaks/broke in the new 128 employees) and the denominal BRAKE (he brakes/braked for animals; see Figure 19). -Unrelated- BRAKE - noun [stopping device] p a st te n s e : N/A BREAK - cen tral m eaning [sm ash] p a s t te n s e : BROKE_____ BREAK • deverbal verb [to break in] p ast te n se : BROKE BRAKE - denom inal verb [apply brakes] p a st te n se : BRAKED Figure 19: BRAKE(v) is related to BRAKE (n) which is unrelated to BREAK Although BREAK and BRAKE are homophonous, the regular past is preferred for the denominal and the irregular past for the deverbal. BRAKED, however, is derived from the noun BRAKE, which is wholly unrelated to any meaning of BREAK. In contrast, deverbal BREAK is semantically related to an existing sense of BREAK. Harris' (1992) data indicate this clearly. She obtained ratings of the distance of Kim et al.'s verbs from their nearest homophonous irregular past tense. Denominals were rated as being further from an existing meaning (mean = 4.75 on 6 -point scale) than deverbals (mean = 2.35). This difference is highly reliable, f(l,36) = 10.4, p < .001. For this example, the difference in spelling between BREAK and BRAKE provides the speaker with an additional cue that the words are unrelated. Because of this, the speaker will not be influenced to apply the irregular past tense to BRAKE. 129 In sum, there is a confound between derivational status and distance from an existing irregular verb in the Kim et al. materials. One way to avoid this would be to include an equal number of deverbals derived from wholly unrelated verb homophones. For example, WRITE-RIGHT (as in "righted the boat") would be analogous to BREAK-BRAKE. Kim et al. instead dealt with the confound statistically, performing regression analyses indicating that derivational status accounted for unique amounts of variance after semantic distance was partialed out. This analysis cannot be taken as definitive, however. The factor labeled "derivational status" could simply have been coding other aspects of semantic distance not captured by their other measure. It is important to note that in their account, Kim et al. contend that derivational status is the single predictive factor that accounts for unique variance of the past tense ratings. They entirely discount semantic factors and do not consider additional cues such as spelling differences or the speaker’s desire to not confuse the listener. We find that semantic factors can encompass these additional cues as well. In the semantic distance ratings, denominals which are spelled differently than homophonous deverbals are reliably rated as very distant from any existing verb meaning. Similarly, a speaker’s desire to not convey an unintended meaning can be accounted for by the proximity of the chosen past tense form (e.g. irregular vs. regular) to existing meanings. We explored these issues further in the research described next. We first obtained a second measure of semantic distance, providing further evidence concerning the relevance of this factor. We also conducted simulations which 130 addressed whether a connectionist model could learn the past tenses of homophonous verbs using semantic distance as a cue. 7.2 Behavioral data Pinker and colleagues' theory elegantly suggests that a single factor, derivational status, should predict past tense preferences: Irregular forms will be used for deverbals and regular forms for denominals. Our view is that verb preferences are based on the distance between the meaning of a verb and the meaning of a homophonous irregular verb. BROKE, for example, cannot be the past tense of BRAKE because it is dissimilar in meaning to BREAK. Subjects' ratings in the Kim et al. study departed from what the simple theory predicts. Verbs varied greatly in the degree to which the regular past tense was preferred over the irregular past. For example, whereas subjects greatly preferred BRAKED (not BROKE) as the past tense of BRAKE, there was only a small advantage for FLIED (over FLEW) as the past tense of "fly out." Moreover, for several denominals, the irregular pasts were actually preferred overall5. These deviations from the predicted patterns were attributed to subjects' "uncertainty" about the derivational status of individual items. This uncertainty was not independently assessed, however. Our view is that subjects' preferences are based on the distance from existing irregular verb meanings. Kim et al. partitioned this distance into two components: "derivational status" (denominals are more distant than deverbals) and "uncertainty" (which reflects the relative 5These items are BROADCAST, THREE-HIT, OUT-BLEW, and OUT-FLUNG See Kim et al. (1991) for the context in which these forms were presented to the subject. 131 semantic distance from existing meanings). Thus, although they admit that semantic factors contribute to ratings preferences, they maintain that derivational status is the primary and significant predictor and that semantic factors alone cannot account for the data. We examined these issues by obtaining a second measure of semantic distance. For all denominal verbs used by Kim et al. we had subjects rate their distance from the source noun. The hypothesis was that this distance would account for variability in subjects’ responses that Kim et al. attributed to "uncertainty" over derivational status. 7.2.1 Method Subjects. Fifteen native English-speaking USC undergraduates volunteered to participate in the experiment. Materials. The 37 present tense denominal passages from Kim et al. were presented as in the example below: The general is going to order his artillery to form a ring around the city. But if he rings the city with artillery, then a battle is certain. Procedure. Subjects were told to rate the similarity of the meaning of the verb in bold to the meaning of the noun homophone on a 6 point scale ( 1 = very similar; 6 = very dissimilar). In the above example, most subjects rated the distance to be 1 . 132 7.2.2 Results and discussion Multiple regression analyses were performed on Kim et al.'s preference ratings (preference for regular over irregular past tense) for the 37 denominals in their experiment. The mean semantic distance to the nearest homophonous verb (from Harris, 1992) and the mean semantic distance to the homophonous noun (from this experiment) were the predictor variables. See Appendix D, section D .l for a complete list of the subject ratings. See also Appendix D, section D.2 for the stimuli and data from our experiment. Distance to the nearest verb uniquely accounted for 20.5% of the variance in preference ratings, F (l,34) = 9.002, p < .01. Distance to the closest noun accounted for an additional 20% unique variance, F (l,34) = 8.599, p < .01. These results strongly indicate that subjects' past tense preferences depend on semantic factors. The regular past is preferred when the intended meaning (e.g., past tense of BRAKE) is far from an existing irregular verb and close to the source noun. The irregular past is preferred when the distances are in the opposite directions. These data indicate that variability that Kim et al. attributed to "uncertainty" over derivational status is instead due to semantic distance. For the deverbals, the semantic distance measure also correlated with subjects preference ratings (r = .26). Because deverbals are derived from verbs, not nouns, there were no data concerning their distance from a "source noun." In keeping with the hypothesis that derivational status merely indicates semantic distance, we conducted an omnibus analysis of both types of verbs in which deverbals were assigned the maximally unrelated score on the "distance from 133 noun" measure (thus, for example, BREAK was rated as unrelated to BRAKE). Derivational status and distance to the noun were correlated -.84, because denominals (coded 1 ) were closer to the noun and deverbals (coded 0 ) were farther. Derivational status and distance to the verb were correlated .75, because denominals were further from the verb and deverbals closer. These data are consistent with the hypothesis that derivational status merely encodes semantic distance. In the multiple regression, the relationships between the predictor variables and the past tense preference ratings were as follows. All three measures were highly intercorrelated: noun distance and derivational status = -.84; noun distance and verb distance = -.61; verb distance and derivational status = .75. None of the predictor variables by itself significantly accounted for unique variance in the past tense preference ratings. The confounded effect of the three predictors, however, is highly significant, F(l,72) = 74.774,p < .001). These results indicate that the predictor variables are capturing the same information, which can be termed distance between the verb's meaning and the meanings of homophonous words. These results differ from Kim et al.’s, which showed that derivational status accounted for unique variance in the ratings. Once the second measure of semantic distance was included, however, the unique effects of derivational status were removed. In summary, these data suggest that preferences concerning the past tense can be explained in terms of semantic distance, provided that it is measured appropriately, obviating the role of derivational status. 134 7.3 A connectionist model with semantic distance We then explored how connectionist models might deal with these phenomena. This work builds on our connectionist model developed in Chapter 4. 7.3.1 Architecture We modified the output layer to allow for bisyllabic as well as monosyllablic verbs. W e chose to allow bisyllabic outputs so that present/past tense pairs such as PUNT-PUNTED could be represented. W e selected the syllable boundary in bisyllabic words to be similar to PUN-TED rather than PUNT-ED. This choice was arbitrarily made, in light of a lack of consensus in dictionaries and other sources of phonemic representations. As in our previous model, the phonological representations are centered on the nucleus of the syllables, as shown in Figure 20. There are 150 hidden units in the model. During training, the phonological form of a regular, exception, deverbal, or denominal verb is activated on the input units along with an encoding of its semantic distances to the closest verb and noun definitions. The task of the model is to generate the phonological form of the past tense on the output units. 135 distance to verb c c c v v c c c distance to noun □□□□ □□msDiiun □□□□ / / □□□□□□ j / / \ \ □□BBDSDD □□BQDUDD Input Units Hidden Units O utput Units C C C V V C C C c c c v v c c c Figure 20: Architecture of the semantic distance model The model encodes the two measurements of semantic distance by augmenting the input layer with two additional sets of input units. One set represents the distance of a present/past tense pair to the closest verb definition and the other represents the distance to the closest noun definition. Each set of units represents a numerical value ranging from 1 (closely related) to 6 (not related). The representation is a bar-encoding of this range in which the semantic distance values are rounded to the nearest .5. It is implemented as an activation of 5 units out of 15 total units as follows. 1.0 is represented by an activation of the leftmost 5 units, and increments of 0.5 are represented by sliding the bar of active units to the right by 1 for each increment. Thus, 1.5 is represented by activating units 2 to 6 , 3.5 is represented by units 6 to 10, and 6.0 is represented by the rightmost 5 units. 5 units seemed to be a reasonable number to represent each semantic distance measurement, given that the phonological representation is encoded as an average of 15 units out of 120. This kind of bar encoding has been shown to be successful in other models that represent ranges of numerical values 136 (McCloskey & Lindemann, 1992; Viscuso, Anderson, & Spoehr, 1989; Anderson, Spoehr, & Bennett, 1991). The theory here is that people are able to judge semantic distances and that this information enters into the computation of the past tense. We have not attempted to simulate the similarity-judgment process, however. As in the other version of the model, deverbals were trained to produce the exception past tense form and denominals were trained to produce the regular past tense form. A few examples are shown in Table 9. Table 9: Semantic distance measurements Present Tense Sem. Dist. to V erb / Noun Past Tense BAKE 0 .0 / 6 . 0 BAKED BREAK 0 .0 / 6 . 0 BROKE FLY 2 .0 / 6 . 0 FLEW FLY 2 .5 /1 .7 FLIED Used in Context John BAKED a pie. (regular) Sally BROKE the vase, (exception) He FLEW off the handle, (deverbal) He FLIED out to center field, (denominal) 7.3.2 T raining set As seen in the table, regular verbs like BAKE-B AKED and irregular verbs like BREAK-BROKE are presented with a semantic distance of 0.0 to their closest verb definition and a semantic distance of 6 . 0 to their closest noun definition, indicating that BAKE and BREAK correspond to central and not extended verb meanings. Deverbals and denominals are presented with their 137 closest semantic distance to a homophonous exception verb, as reported in Harris (1992). An encoding of their closest semantic distance to a homophonous noun, as reported in this paper, is also included. In the examples above, the distance of the deverbal FLY-FLEW to the exception verb FLY-FLEW was rated by subjects to be 2 . 0 and its distance to the closest noun homophone was set to be 6 .0 , since deverbals are not derived from nouns. The distance of the denominal FLY- FLEED to the exception verb FLY-FLEW was rated by subjects to be 2.5 and its distance to the noun FLY (ball) was rated to be 1.7. The training set consisted of all 367 regular verbs with Francis & Kucera (1982) frequency greater than or equal to 2, a monosyllabic present tense, and either a monosyllablic or bisyllabic past tense. We also selected 20 irregular verbs from the Kim et al. data set as well as both their denominal and deverbal forms. See Appendix D, section D.3 for a complete list of the training set items. 7.3.3 Training During training, all regular and exception verbs were probabilistically presented to the model according to the logarithm of their Francis & Kucera frequencies. Denominals and deverbals were probabilistically presented during 10% of the epochs. W eight correction was by standard back-propagation (Rumelhart, Hinton, & Williams, 1986). In scoring the performance of the model, we compared the generated output for each segment to an inventory of known segment representations. The output of the model was considered correct only if the target output segments provided the best fit for all generated segments. We 138 also calculated the total sum of squared error for all output units as a measure of goodness of fit. For the training set, all 367 regular monosyllabic verbs with a Francis & Kucera frequency greater than 1 were chosen. An analysis of the Francis & Kucera corpus revealed that exception verbs comprise 5% of all listed verb types and 22% of the verb tokens. Thus, we selected 20 exception verbs from the Kim et al. data to maintain the correct relative verb type proportion. Exception verb classes and subclasses, as identified by Pinker & Prince (1988), were represented within the training set by selecting verbs with the appropriate token frequencies from these classes. Each exception verb was also represented both as a deverbal and as a denominal in the training set by encoding semantic distances. 7.3.4 Results Training progressed for 700 epochs, at which point performance approached asymptote. The following results reflect averages of three training sessions with random initial weights. All 367 of the regular verbs were learned (100%). 18 of the 20 exception verbs were learned (90%). The errors were LIGHT-LET, a vowel feature error, and RING-RANGED, an over regularization error. 18 of 20 deverbals were learned (90%) and 20 of 20 denominals ( 1 0 0 %). We performed a simple regression using the error score for generated deverbals and denominals in the model as the predictor variable and past tense preference ratings as the predicted variable. We found that by training a model on 139 only the phonological form of verbs and an encoding of their semantic distances to the closest noun and verb definitions, the model’s performance accounts for a significant amount (2 1 .8 %) of the variance in people’s preference ratings, F(l,16) = 9.462,/? < .01. 7.4 Discussion Pinker (1991) and Kim et al. (1991) theorize that derivational status determines the past tense of verbs that sound like existing irregular verbs. This places the explanation for the FLY/FLEW/FLIED facts at a morphological level of representation that governs the organization of the mental lexicon. In generating past tenses for homophones, people are thought to follow a simple rule: if the verb is derived from a noun, use the regular past; if derived from an existing irregular verb, use its irregular past tense. Deviations from the predictions of this rule are explained in terms of uncertainty about derivational status. We have explored an alternative hypothesis, which holds that past tense preferences are subject to semantic constraints. The way in which the past tense of a novel verb is realized depends on the relationship between the meaning of the new verb and the meanings of the noun or verb from which it is derived. If the novel verb is similar in meaning to an existing irregular verb, the latter's past tense form can be used. If the novel verb is dissimilar in meaning to an existing irregular verb (because, for example, it is derived from a semantically-unrelated noun, as in BREAK-BRAKE), this contradicts using the existing verb’s past tense. 140 Thus, BROKE cannot be recruited for the past tense of BRAKE because it already has the meaning "past tense of BREAK". We also acknowledge that additional cues may assist in determining the relatedness of verbs. In particular, homophones that are spelled differently are certain to be semantically unrelated. Past tense preferences depend on the degree of semantic distance, rather than the deverbal-denominal dichotomy. FLY is especially complex because, as the ratings indicate, the baseball sense of flying out is semantically related to both the source noun (fly ball) and an existing irregular verb (fly-airborne motion). That is why subjects sometimes say "flew out to center field" even though the derivational theory predicts that it should always be "flied". This account explains the phenomena in terms of the communicative consequences of using an existing irregular form as the past tense of a novel verb. In a sense, it describes an on-line monitor in the language system that influences a speaker’s choice of the past tense form. When we use a novel verb, we judge its similarity in meaning to existing forms. Clark’s theory of contrast claims that humans endeavor to find a one-to-one correspondence between spoken forms and meanings. Thus, an irregularly marked form can only be used if its meaning is intended. If its meaning is not intended— as in the case of a semantically unrelated homophone— a different form must be used instead. The regular form is used to distinguish the meaning of the novel form from that of the homophonous irregular verb. Performance then depends on the degree to which a novel verb sense is judged to be related to existing noun and verb senses. Similarity to existing forms act as soft constraints pulling subjects' preferences either toward or away from a given past tense form. 141 Our behavioral data and simulations are consistent with this semantically- based account. The data indicate that derivational status is confounded with semantic distance. Both distance from existing irregular verb and distance from source noun affect subjects' preferences concerning the past tenses of denominals. The same factors also apply to deverbals. A model that encodes these measures of semantic distance is able to perform at a high level and comparably to people in generating the past tense. We suggest that the two theories of the past tense can be reconciled by considering how people acquire knowledge of a word's "derivational status." This information derives from facts about how words are used and what they mean. Pinker (1984) and Hill (1983), among others, has suggested that knowledge of a word's syntactic category arises out of facts about lexical semantics (the so-called "semantic bootstrapping hypothesis" of syntactic category learning). Our models can be taken as showing how such categories arise. They arise, for example, out of observations of semantic similarity and dissimilarity— the "distances" measured by our ratings. Looking down at the model and attempting to formulate a high-level description of what it had learned, one could say that it had captured the distinction between denominal and deverbal verbs. Importantly, it did so on the basis of semantic information, rather than a morphological representation. Thus, where Pinker's treatment of the past tense takes notions such as "derivational status" as primitive, we consider it to be secondary to facts about semantic knowledge. The morphological theory therefore provides an approximate, folk-psychological description of what our nets achieve. 142 CHAPTER 8: IS ENGLISH THE EXCEPTION TO THE RULE? The past tense debate has largely been focused on the English language to date. The English past tense is unique in that the default marking occurs in a vast majority of the language. Prasada & Pinker (1993) and Marcus, Brinkman, Clahsen, Wiese, Woest, & Pinker (1993) speculate that connectionist models will not be able to address inflectional morphology in other languages because they do not exhibit this trait. In this chapter, we apply our connectionist model to the question of low-frequency default inflection, which is the application of a regular marking when the default category does not outnumber the other categories (see also Daugherty & Hare, 1994; Hare, Daugherty, & Elman, in press). An account of this phenomenon exists in the traditional approach, and we will show that a connectionist account is possible as well. 8.1 The problem of the low frequency default Cross-linguistically it is possible to find languages where, unlike English, a default class is not significantly larger than its competitors. One frequently cited 143 example is the Arabic plural system, in which the "sound" plural, the default, is of relatively low type and token frequency compared to the many non-default, or "broken" plural classes (Plunkett & Marchman, 1991). The traditional theory of the past tense can be taken as a more general theory of inflectional morphology and can easily account for this data. Rule-governed items have a “regular” flag associated with their lexical entry, and are not affected by the number of irregular items. Novel items that are similar to an irregular class may be analogized as members of that class. Dissimilar novel items take the regular flag, and thereby productively apply the default marking. In relying on modem English data, connectionist network accounts may have given the erroneous impression that in the models a class can become the default only as a consequence of its superior size. Critics of the connectionist approach have therefore speculated that superior class size is necessary to achieve default behavior. Prasada and Pinker (1993) and Marcus, Brinkman, Clahsen, Wiese, Woest, & Pinker (1993) contend that networks can generalize only on the basis of frequency and surface similarity. If this were true, then a novel form could be treated as a member of a default class only if it resembled a previously learned member of that class. To get true default behavior, a network would need a default category that was extremely well-populated, for its members would have to span the entire phonological space of the language. Prasada and Pinker show that the original Rumelhart and McClelland model does indeed generalize on the basis of surface similarity, and conclude that the network was able to produce what looked like default behavior only because of the great difference in size between the regular and irregular classes. 144 Prasada and Pinker take this as evidence that this shortcoming is equally true of all connectionist networks. In the rest of this paper we will argue that this claim is unjustified, since it takes an overly simplistic view of network dynamics, on the one hand, and of the linguistic facts to be accounted for, on the other. We will show instead that under specific circumstances a simple feed-forward network learns a default classification without relying on superior class size or complete coverage of the phonological space. Significantly, the necessary circumstances are also those that are found in the documented real-language examples. 8.2 The connectionist account In addressing this issue, a number of points can be made. First, the Rumelhart & McClelland model was a two-layer perceptron, capable only of learning a linearly separable mapping between input and output. Current models can take advantage of an intermediate processing layer, and this leads to a significant difference in the dependence on input similarity. The second point has to do with the data, specifically with the structure of the non-default classes. Plunkett and Marchman (1991) raise this issue in their discussion of the Arabic plural system, where they suggest that the solution lies in the fact that "the numerous exceptions to the default mapping .. . tend to be clustered around sets of relatively well-defined features." In other words, while the sound plural encompasses an essentially arbitrary and phonologically disparate group of nouns, the various broken plural classes exhibit clear phonological cues to class 145 membership. Items that accept the default mapping can be characterized as not having these phonological cues. Given this description of a language that exhibits the low-frequency default rule, it appears that this phenomenon shares properties of both a pattern associator task and a pattern classification problem. The third point we address is some well understood properties of connectionist networks. In addition to their proficiency as pattern associators (e.g. spelling-to-sound model, past tense model), these networks have a long history as pattern classifiers as well. Selfridge (1959) introduced the Pandemonium model as a paradigm for learning to classify novel patterns. He proposed that a number of modules corresponding to each possible class can simultaneously and independently examine an input pattern. These modules provide a graded response corresponding the similarity of the input to the class they represent. The module that is activated to the highest level classifies the input. He goes on the specify that with a “hill climbing” learning algorithm (which is often used in traditional AI learning systems), the modules will become sensitive to both the combinations of features and the absence of features in the input patterns by adjusting the weights on connections between the modules. Although this is not strictly a connectionist system, it displays many properties that have been shown to exist in connectionist classifiers. A number of cognitive domains have been addressed by network implementations of pattern classifiers. Fukushima, Miyake, & Ito (1983) developed the Neocognitron model based on the anatomy and physiology of the visual system. This complex model learned to recognize handwritten characters 146 presented in any visual field location, even if they were badly distorted. Gorman & Sejnowski (1988) present a feed-forward network that classifies sonar patterns as either rocks or mines. The performance of the network closely resembled that of Naval sonar operators, who were unable to list the criteria by which they made their choice. Shanks (1991) conducted a series of behavioral experiments in which subjects were asked to memorize a list of hypothetical diseases and their associated symptoms. The subjects were then asked to diagnose novel groups of symptoms as one of the diseases. He found that the certainty of the diagnosis depended on the predictiveness of the symptoms. In a feed-forward network implementation of this domain, a connectionist model performed similarly to the subjects’ responses. In pattern classifiers, it is obvious that classes will develop based on the existence of certain features in the input. But what may not be obvious is that classification may be assisted or even based on the absence of features as well. In the rock/mine classifier, an analysis of the connection weights revealed that the absence of certain frequencies in the sonar signature provided crucial information for the network’s performance. We can observe this more clearly in the following simple example as shown in Figure 21. The task of this network is to classify all binary vectors of length 2 according these rules: if the first element of the vector is on, the vector belongs to class 1 and should activate output unit 1 ; if the first element is off, the vector belongs to class 2 and should activate output unit 2 . 147 input unit 1 input unit 2 - 1.0 - 1.0 2.0 0.5 output unit 1 output unit 2 bias = 0.0 bias = 0.5 threshold = 0.5 threshold = 0.5 Figure 21: Example network classifier In this network, output unit 2 has a bias of 0.5, which means that it will have an activation level of 0.5 even if it does not receive any input activations6. Furthermore, both output units have a threshold of 0.5, which the total unit activation must exceed before the unit will fire. The generated output for each possible input pattern is as follows: [ 1 0 ] -> [1 0 ], [ 1 1] -> [1 0 ], [ 0 1 ] -> [ 0 1], [0 0] -> [0 1]. It is a simple matter to extend this model to allow a large number of classes which are sensitive to either the presence or absence of a feature. Our operating assumption in the connectionist theory is that phonological information is what allows the regular-irregular patterns to be learned in a language that exhibits a low-frequency default. In learning to produce the non default classes, the net naturally learns to respond to the phonological 6In this exam ple, w e assum e a linear activation function. 148 characteristics that are cues to each class. If a network is taught a series of such mappings, and generalizes on the basis of the shared regularities, then novel patterns lacking those regularities cannot be adopted into the class. If, in addition, the inflectional system includes a class whose membership is not keyed to phonological generalizations, this class can become productive since it is capable of accepting members that do not fit elsewhere. 8.3 The Hare and Elman model To demonstrate the connectionist account of the low-frequency default on a real language example, Hare and Elman (1992) taught a network a simple categorization task with Old English verbs. In this paradigm, 5 strong, or irregular, past tense classes exist, which are based on the following phonological cues within the rime of the verb: 1 . i + any one consonant 2 . e + one stop or fricative 3. e + a consonant cluster 4. i + a nasal + stop cluster 5. a + any one consonant The sixth class is considered to be the default class since is has no class characteristics, but instead can contain any other VC or VCC string. Given this 149 topology of classes, the Old English past tense is much more similar to the Arabic broken plural than it is to the modem English past tense. The model used a feed-forward, three-layer network, trained with the backpropagation algorithm. Input to the net was a set of 50-element vectors, each representing a word. A subpart of each word was a particular vowel or vowel + consonant pattern defined over distinctive features, while the rest was a unique random pattern. On the output layer there were six nodes, one for each of six categories. The task of the network was to categorize each input by activating the appropriate category label on the output. The goal of the simulation was to show that the net would learn to take the feature combinations in the first five classes as predictive of class membership. As a result, it would generalize novel items displaying those characteristics into the appropriate classes, and treat any novel item not displaying those characteristics as a member of the sixth class. The training set consisted of 32 randomly generated members of each of the six classes. After 20 passes through the data set, performance on the training items was essentially perfect. In the first testing phase, the trained net was shown 32 randomly generated patterns that matched the criteria for each of classes 1-5, and all were categorized correctly. It was then shown 63 novel patterns that did not precisely match any subtype seen in the training phase. In classifying these data the net was clearly influenced by similarity to learned patterns. If a novel item differed from a learned class by only one or two features it was placed in that class. Test patterns that shared features of two classes ambiguously activated both class nodes to an intermediate degree. Similarity cannot explain the entire set of 150 results, however, since the majority of patterns were devised to be dissimilar to any training exemplars. As expected, these dissimilar patterns were all placed in Class 6 . One possibility is that the network was still relying on similarity in generalizing to Class 6 , even though the "most similar" training pattern was very distant from the test item. To eliminate this possibility the authors computed the distance between all test items and all members of the test set. The results show that novel items placed in classes 1-5 always had a learned class member as their closest match, while many of the items placed in Class 6 had a closest match in some other category. Furthermore, for classes 1-5 the network exhibited the sort of graded response shown by English speakers in the Bybee and Modor (1983) experiment: a novel item strongly activated the category node for these classes only if it closely matched a training exemplar. As the match became more distant category activation decreased, and as the distance became too great the item was placed in Class 6 . For Class 6 , on the other hand, there was no such effect of distance: category node activation remained high regardless of the distance between test and training exemplars, or the presence of closer targets from some other class. This suggests, again, that while resemblance to a learned member was crucial for generalization to Classes 1-5, it was irrelevant for generalization to Class 6 . These results are consistent with an account which says that the network normally generalized on the basis of phonological similarity, but treated Class 6 as a default for patterns not fitting one of the learned prototypes. As such, they 151 demonstrate that a network is capable of developing a default category without the benefit of superior frequency. However, past tense inflection is not simply a categorization task, but is also a pattern association task in which a present tense must be transformed into a past tense. The task facing the Hare and Elman network was overly simple. On the input layer, the phonological string that was intended as the basis for generalization was clearly presented to the model in each of the five predictable classes. On the output layer, the model was compelled to accept one of the six categories offered, eliminating the possibility of a no-response or entirely novel response to a test item. By presenting the problem as an overt categorization task, instead of requiring the network to categorize implicitly by producing correctly inflected forms, this network cannot be considered as an actual model of verb production. 8.4 A new connectionist model In the remainder of this chapter we will replicate the earlier results with a more realistic task, in order to demonstrate that the categorization performance of the earlier model could be implicitly accounted for in a pattern associator network. This second simulation differs from the first in two ways. First, the input is "words" represented as phonological strings, forcing the model to decide what the relevant generalizations are for each class. Second, the task of the model is to produce not a decision about the correct inflectional category, but an 152 inflected version of the input string. The choice of inflection, in this case, can be taken as an indication of how the network categorizes each input. The underlying assumptions with which we approach the model are the same as for the previous model. As in the first simulation, we expect that the net will extract relevant generalizations about the structure of the phonologically defined classes, and inflect novel items in the same way if they fit those generalizations. As in the earlier set of results, test items that are a close but not exact fit to a certain class can be expected to be placed in that class, and items that equally well match two or more predictable classes should be placed ambiguously between the two. Test items that differ sufficiently from the training exemplars of the defined classes should be placed in the default class, regardless of whether they match any learned member of that class. The model should be able to learn a training set that is representative of a real language and generalize properly to novel verbs. As in the earlier simulation, the language chosen for the data set is Old English. By Early OE (ca. 870) the equivalent of the modem regular verbs already vastly outnumbered any irregular forms. In earlier stages of the language, however, the suffixed past appears to have been the default despite its small size relative to other classes of past tense inflection. For this reason we will use the earlier stage as an example of a low- frequency default. 153 8.4.1 Architecture of the model This work builds on the connectionist model described in chapter 4, as seen in Figure 22. There are 175 hidden units in the model. During training, the phonological form of a present tense verb is activated on the input units. The task of the model is to generate the phonological form of the past tense on the output units. cccvvccc □□BSDSDD □□□□□□□□□□ w □□SBDBiD CC CV VC CC In p u t U n its H id d en U n its O u tp u t U n its Figure 22: Architecture of the low frequency default model 8.4.2 Training A training set was selected based on Old English strong verb classes as defined in Table 10. 25 items from each class were chosen for the training set. In classes that did not have 25 actual words in Old English, we created additional 154 verbs to extend the set in order to maintain equal proportions of items in each class, keeping with the Hare & Elman model. See Appendix E, section E. 1 for a complete list of the training set items. The strong verbs form their past tense by changing the present tense vowel. In our set the past tense vowel is predictable from the present tense rime. Note that the stem vowel alone does not provide enough information to predict the past tense form. Only the Class IV past tense is predictable based on the stem vowel (/a/ goes to /of). Table 10: Training set classes Class Stem Vowel Coda Changed Vowel Exam ple I i {d, t, g, p, k, b} a bid -> bad n e {T, w, s, f, v} ea dref -> dreaf n il e {rst, rS, zd} as kerst -> kasrst m.2 i (n, m, N} + C u SriNk -> SruNk IV a {r, 1, k} o brak -> brok V any any none kark -> karkt In addition to the 125 strong verbs, there were 25 verbs in the training set that take the regular past tense suffix /t/ or /d/, depending on the voicing of the final consonant. We define these verbs as Class V. Verbs in this class can have any stem vowel, including the vowels used by verbs in Classes I to IV. Furthermore, these verbs can have rimes that cause them to fall into one of the 155 strong classes. For example, /kark/ -> /karkt/ is a member of Class V, even though its rime is phonologically identical to members of Class I. The introduction of Class V items in the training set poses an interesting constraint for the model, since it cannot solely count on the consistency of the strong classes to predict the past tense of any verb. Figure 23 shows the phonological space of the present tense verbs in the data set. Note that verbs from Classes I to IV are phonologically consistent, while Class V verbs can occur anywhere in the space. When we compared the phonological representations of each training set item, there did not exist a single feature whose presence or absence would predict Class V membership. III.2 III.2 III.2 ^ ' m . 2 "1-2/ III. I IILli n * V III. 1 IV IV IV IV IV IV Figure 23: Phonological space of rime for class items 156 When both regularly governed and exception items are learned by a single mechanism, the frequency of occurrence and consistency of mapping of the classes play an important role in the leamability of the training set and in proper generalization to novel items. In the current model we have made the simplifying assumption that all classes are equally frequent, and only the strong Classes I to IV have consistent mappings between their present tense and past tense. Thus, novel verbs that are phonologically similar to one of the strong classes would be expected to naturally generalize and take the predicted past tense for that class. On the other hand, novel verbs that are phonologically distant from any strong class would be expected to generalize to the default or regular past tense. Note that Class V does not contain a majority of items from the training set, nor does it cover the phonological space of the language, yet it still is expected to behave as the default class. The model was trained using the standard back-propagation learning algorithm (Rumelhart, Hinton, & Williams, 1986), with learning rate of 0.001 and momentum of 0.9. Each item in the training set was presented during each epoch. Training progressed for 2000 epochs, at which point performance on the training set reached asymptote. The results below were averaged over three simulation runs with random initial weights. 8.4.3 Results In scoring the model’s performance, we determined for each phonemic segment whether the best fit to the computed output was provided by the correct 157 target. The output pattern was scored as correct only if the correct targets provided the best fit for all segments in a word. The total sum of squared error was also calculated as a measure of goodness of fit. The model learned 148 of the 150 words (99%) in the training set. Both items that were not learned were Class V verbs /spar/ -> /spor/ instead of /spard/ and /war/ -> /wor/ instead of /ward/. Note that these verbs were generalized into Class IV, since they differ only by a single feature from the Class IV verbs /spal/ and /wal/. Given the consistency of Class IV, the model found it easier to assimilate /spar/ and /war/ into Class IV rather than devote the necessary resources to learn their idiosyncratic past tense mappings. Three generalization sets were created to test the model's response to new words. The first set contained new verbs that were perfect examples of the strong classes. 5 verbs were created to exactly match the definitions of each of the five strong classes (see Appendix E, section E.2 for a complete list of these items). The expected past tense was generated for 21 out of 25 (84%) of these. The following verbs generated past tenses that were not expected: Class I: /lig/ -> /log/ instead of /lag/ Vowel Feature error Class IV : /far/ -> /ford/ instead of /for/ Overregularization error Class II: /smev/ -> /smev/ instead of /smeav/ No Change error Class II: /keT/ -> /keas/ instead of /keaT/ Consonant Feature error Each of these verbs was then examined in detail. In three cases, the output can be explained by competing pulls from two possible outcomes, /lig/ is equally 158 close to a Class I item in the training set (/dig/ -> /clag/) and a Class V item (/tig/ -> /tag/). Assuming both these items affect generalization, the generated output /log/ is a plausible blend between the two outputs /lag/ and /ligd/. Likewise, /far/ is equally close to a Class IV item (/bar/) and a Class V item (/war/). The generated output is an overregularization, indicating a blend between the two possible outputs of /for/ and /fard/. The no change error of /smev/ is probably due to conflicting constraints between the two strong classes that use /e/ as the stem vowel, Classes II and ELI. /keT/ did generalize properly, except for a single feature error on the coda (/s/ is one feature from /T/). The second generalization set consisted of 27 new verbs that did not exactly match the definitions of the strong classes. These verbs varied in the number of features of the rime that differed from the closest verbs in the training set (see Appendix E, section E.3 for a complete list of these items). Table 11 describes the performance of the model on this generalization set. The network showed interesting performance on these verbs. As in the earlier model, the network extracted information about the phonological definition of the first four classes, and used these when generalizing to novel verbs. In the test set, 14 of the 27 novel verbs are "good" examples of a training class: they differ from training set exemplars by only 1 to 3 features, and do not also resemble members of other classes. As expected, these verbs were given the past tense inflection of the class whose members they most closely resemble7. 7 T he one exception to this is /slelp/ -> /slol/ w hich should have been /slealp/ according to the vowel. H ow ever, the closest m atch in the training set was the w ord /torf/, a Class V verb, which could explain the choice o f output vowel. 159 Table 11: Performance on generalization set Closest Class Num Cor rect Error Types Dist Comments I 6 4 Blend (2): /lif/-> /lef/ (instead of /laf/) 1 Equidistant to Classes I, II, V II 8 3 Blend (1): /slelp/ -> /slol/ (instead of /slealpt/) No Change (4): /trek/ -> /trek/ (instead of /treak/) 3 1 Closest to Class V Closest to Class I m .2 2 2 None IV 11 5 No Change (6): /dat/ -> /dat/ (instead of /dot/) 2 or 3 Closest to Class IV The remaining verbs can all be considered "bad" examples of the classes. Many of the novel verbs matched the characteristics of 2 or more classes equally well. In these cases the conflict was resolved in two different ways. In the first case, the verbs generated past tenses that were blends of the competing class outcomes. For example, /lif/ generated /lef/ instead of /laf/. Note that /lif/ is equally close to members of Classes I, II and V in the training set. On our representation, the /e/ in /lef/ can result from a blend of the vowels /a/, /ea/ and /I/, which would be predicted by generalization to the three competing classes. In the second case there was no change between the present and past tenses. There were two kinds of verbs that demonstrated this behavior, /trek/ represents one of several verbs whose stem vowels match Class II, but that are actually closest to a member of Class I. /dat/ is one of several Class IV verbs that 160 are closest to a member of Class IV, but do not meet the strict coda constraints of that class. In both these conditions, the network has conflicting constraints. In the first, both Classes I and II exert influence on the novel verb, while in the second the novel verb is not a sufficiently good example of the nearest class. The no-change response has two possible explanations. On the one hand the network could simply be unable to respond due to conflicting constraints. This is not an implausible outcome, since people in similar situations can also find themselves unable to chose an alternative (Prasada & Pinker, 1993). Since the network does not have the option of refusing to respond, a no change response is perhaps most similar to a non-response. Alternately, the no change response may actually be an affixing error. A look at the items in question offers a reason why this may occur. The two /a/- stem verbs that undergo no change at all are /dat/ and /stad/, both ending in consonants identical to the Class V suffix. Experimental evidence shows that children who are aware of the regular past suffix often avoid applying it to verbs that end in /t/ or /d/ (Bybee & Slobin, 1982) in order to avoid redundantly marking the past tense. This phenomenon is also observed in naturalistic work with child language (Andersen, 1992). It is conceivable that the network is doing the same as children in not double marking the regular affix. This suggestion is strengthened by the network's response to two other /a/-stem verbs, /glaS/ and /slaz/. In both cases the vowel remains /a/, but the final consonant is changed to /T/ and /D/, respectively. By our code, this final consonant is a combination of the original fricative and the alveolar stop of the Class V affix, suggesting that the output error is due to an incomplete attempt to suffix these items. 161 The final generalization set consisted of 11 verbs with the novel vowel /A / in the stem (see Appendix E, section E.4 for a complete list of these items). By our hypothesis these verbs should be placed in Class V, on the assumption that generalization to Class V does not require similarity to trained patterns. And, as Table 12 shows, this was overwhelmingly the case. As we did with the training set items, we compared the phonological representations of each of these verbs and verified that Class V membership could not be predicted by the presence or absence of a single feature. Only one word, /trA v/ -> /trav/, takes a vowel change without affixation. All others generated the Class V past tense even though most of these items were closest to training set verbs from some other class. Table 12 organizes the /A / stem verbs by their closest class, and as it shows, the novel vowel items generalize into Class V with no regard for whether their closest match was also a Class V item. For some of these items, /A / was changed to /a/ in the past tense. Since /A / never appears in the training set, it is not unreasonable for the model to assimilate /A / into another similar vowel (/A / only differs from /a/ by one feature). Finally, one verb that did not use the /A / stem vowel was regularized, the verb /keb/. This item is equidistant to training set verbs from both Classes I and II, and we believe that regularization is the manner the model chose to resolve the competing attraction to the two classes. 162 Table 12: Performance on verbs with the novel stem vowel Closest Class Num Cor rect Generated Past Tense Dist Comments II 1 1 /grA S/ -> /grA Tt/ 1 1 feature off IV 6 6 /glA g/ ->/glagd/ 2 /a/ instead of /A / /bA g/ -> /bagd/ 2 /a/ instead of /A / /stA d/ -> /stadd/ 3 /a/ instead of /A / /slA z/ -> /slA zd/ 3 /mA d/ -> /mA dd/ 3 /lA w/ -> /loyt/ 3 3 features off V 4 3 /stA th/ -> /statht/ 1 /a/ instead of /A / /stA T/ -> /staTt/ 2 /a/ instead of /A / /trA v/ -> /trav/ 2 Error (instead of /trA vd/) /grA sh/ -> /grA TTt/ 3 4 features off In summary, the model was able to learn the distinct classes of the training set. and generalize properly to novel items. New verbs were adopted into a phonologically-based class if they were good examples of that class. If the verb was an equally good match to two or more classes, either a blended past tense or a no change past tense would be produced. Finally, if the verb was very different from the training set, the Class V past tense would be produced regardless of similarity to other Class V items. 163 8.5 Discussion Default categorization has generally been considered a hallmark characteristic of a rule-based account of inflectional morphology, and the fact that some languages exhibit default inflectional categories of relatively low frequency has been taken as evidence of the inadequacy of the connectionist approach to morphology. Our results suggest that connectionist networks can indeed model true default behavior, including the low frequency default class. Even though the model was trained on the Old English past tense, we have demonstrated how this paradigm is quite different from the modem English past tense, and in fact more similar to the Arabic broken plural. Following Plunkett and Marchman’s unverified claims, we demonstrate that the crucial aspect of modeling default behavior is the structure of the data. Proper generalization of novel items is not strictly dependent on similarity to known items: if there is sufficient structure to the non-default classes, default generalization can be influenced by the absence of similarity to known items. The account given in this chapter relies on certain parameters: an input representation that reflects the phonological information in the data set, a network architecture and learning rule that permit the model to generalize on grounds other than input similarity, and a data set based on the systematic structure that real- language examples of the phenomenon appear to require. In combination, these allow us to predict the model’s responses to novel items, and gave results that closely matched both the predictions and the observed data in the inflectional systems of real languages. While the traditional rule-based theory is a concise and 164 elegant means of describing the data at a high level, its success does not entail that default behavior cannot also be explained by a connectionist net. Although the performance of this network was predicted by an understanding of the capabilities of connectionist networks, it nevertheless provides an important demonstration of solving a real language task. The connectionist theory embodies well documented properties that have been exhibited in networks learning complex input to output mappings. By thoroughly understanding these properties, we were able to predict that the performance of our model would embody attributes of both pattern associator and pattern classification networks. We suggest that the two theories of the past tense can be reconciled by considering that they each address different levels of the phenomena. Looking down on our model and attempting to formulate a high-level description of its performance, one might say that a “rule” to generate the past tense is applied when there are no compelling reasons to choose a strong class inflection. But if one wished to know how the rule-like behavior arises, an understanding of the connectionist account is in order. 165 CHAPTER 9: CONSISTENCY REVISITED Much of the past tense research in the connectionist community has been in response to specific criticisms of previous connectionist models by Pinker and his colleagues. The goal of the linguistic community has been to present data that supports the traditional or modified traditional theories, but cannot be accounted for by the connectionist theory. Time after time, connectionist models have risen to the challenge and have shown that connectionism can account for these phenomena within the broader scope of inflectional morphology. Yet is this the only way the debate can progress? The connectionist account differs from the traditional account in its very foundation. Certainly, connectionist models can predict past tense behavior that would not be supported by the traditional account. Within this chapter, we present and describe an extension of the consistency effect we observed in Chapter 5. We make a prediction that certain irregular verbs will be more difficult to produce than others, and claim that the consistency effect is not restricted to regular verbs. This effect cannot be accounted for by either the traditional or modified traditional theories in their present form. 166 9.1 Consistency and the modified traditional theory The foundation of the traditional theory relies on two distinct pathways to generate the past tense: a regular, rule-governed pathway and a lookup table for the irregular verbs. In order to account for the sub-regular clusters within the irregulars, the modified traditional theory replaces the lookup table with an associate memory, which is described as something very similar to a connectionist network. After our presentation of the consistency effect as described in Chapter 5, Pinker further enhanced the modified traditional theory to account for these data as well. He now allows for connections between the regular rule pathway and the exception associative memory, as seen in Figure 24. With these new connections the generation of regular past tenses can be attracted to similar sounding irregulars and effectively slowed down. For example, even though BAKE-BAKED is regular, it is adversely affected by the similar sounding irregular neighbors MAKE-MADE and TAKE-TOOK. Regular verbs with many or high frequency irregular neighbors would be more affected than regular verbs with few or low frequency irregular neighbors. 167 LEXICON Regular rule pathway Exception Associative Memory Figure 24: Enhancements to modified traditional theory The connectionist model described in Chapter 5 accounted for this data by merely generating all past tenses in a single pathway. Since an overwhelming number of the tokens in the training set were regular, the tendency of the network to regularize the past tense dominated. However, similar sounding irregulars would also interfere with the production of certain regular past tenses. This interference produced higher error scores in the regular verbs with irregular neighbors than in regular verbs with no irregular neighbors, which translated into longer latencies in human subjects. Although the enhancements to the modified traditional theory can account for the consistency data observed in humans, one must ask about the motivation to make these enhancements. Other than accounting for the data, is there any other reason to suspect that these additional connections might exist? It has long been purported that the strength of the traditional theory is its regular pathway. The Phonological code Semantics Grammatical category Reg/Exception flag 168 regular past tense is .. a system that is modular, independent of real-world meaning, nonassociate (unaffected by frequency and similarity), [and] sensitive to abstract formal distinctions (for example, root versus derived, noun versus verb) . . . (Pinker, 1991, p 534). If the regular pathway is subject to influence from the exception associative network, its strength is undermined. What other factors might also influence the previously autonomous rule? 9.2 Consistency within the irregulars The consistency effect is accounted for in the connectionist theory with no additional assumptions or enhancements. When a number of items are processed by a single mechanism, it is natural to assume that each item will influence every other item. Therein lies a means to differentiate between the two theories of the past tense. If the modified traditional theory allows connections from the exception associative memory to the regular pathway, how does it differ, at least in architecture, from the connectionist theory? Pinker (personal communication) states that although irregular past tenses may affect regular past tenses, the converse is not true— there is no connection from the regular pathway to the exception associate memory. Therefore, the modified traditional theory would not predict a consistency effect within the irregulars, such as a group of regular verbs “regularizing” the generation of a similar sounding irregular past tense. The connectionist theory predicts the existence of this very case. Since regular and irregular verbs are generated by the same mechanism, one would 169 expect the consistency effect to apply to all verbs. Based on the model we developed in Chapter 5, we next develop a new connectionist model that will test for the consistency effect that we predict to occur within the irregular verbs. 9.3 A new analysis of the past tense model If a consistency effect exists in the irregulars, we should be able to observe the effect of similar sounding regulars on the generation of irregular past tenses. That is, irregular verbs with many regular neighbors should be harder to generate than irregular verbs with no or few regular neighbors. In order to test this, we use the same model that we introduced in Chapter 5, in which we demonstrated the consistency effect within the regulars. 9.3.1 Training Although we could use the same model architecture as in Chapter 5, we had to choose a different training set to carefully control the number of regular neighbors for each irregular verb. As in the previous model, we wished to maintain the correct type and token frequencies for the regular and irregular verbs. Therefore, we used the same 309 regular verbs, but selected 24 new irregular verbs. Since our goal was to vary the number of regular neighbors for each irregular verb, we selected irregular verbs that did not have irregular neighbors. For example, we did not select both SING-SANG and RENG-RANG to be in the training set. We did this to eliminate the possibility of irregular neighbors 170 affecting the generation of irregular past tenses. See Appendix F, section F. 1 for a complete list of these training set items. As in the previous simulation, we used the BP simulator and performed training with the back-propagation learning algorithm. Present/past tense pairs were again presented during training according to the logarithm of the Kucera & Francis frequency of each past tense. The generated output pattern was scored as correct only if the correct target segments provided the best fit for all segments in a word. 9.3.2 Results The model was trained for 1000 epochs, at which point learning approached asymptote. The weights were frozen and the training phase was completed. The results below were averaged over three training sessions with random initial weights. The model learned all 309 (100%) of the regular past tenses and 20 of 24 (80%) of the irregular forms. Errors on the irregulars included DRAW-DREWED and FALL-FELLED, both overregularization errors, as well as SHOOT-SHOTE and FLY-FLO, both vowel errors. The relatively large number of error in the irregulars is probably due to the lack of similar sounding neighbors, or sub-regular pools, within the irregulars. It was determined in our previous model that accurately modeling these pools is crucial to learning the irregulars. Since this model explicitly eliminates irregular neighbors, sub-regular pools cannot be included. 171 The remaining 20 irregulars that were generated correctly were divided into two groups of 1 0 verbs each, depending on their number of regular neighbors. In group 1, none of the verbs had regular neighbors. In group 2, each verb had an average of 3.8 regular neighbors. See Appendix F, section F.2 for a list of the verbs in each group. The model’s performance on each of these groups is shown in Figure 25. 0.12 0.10 0 | 0.08 u § 0.06 c u 1 0.04 s 0.02 0.00 Figure 25: Matched sets of irregular verbs with and without regular neighbors Note that verbs with regular neighbors are harder to generate (mean error 0.104) than verbs with no regular neighbors (mean error 0.015), as we predicted. We can discount the frequency of individual verbs affecting these results because care was taken to equate both groups in terms of frequency. 0.0 Reg. Nbrs. 3.8 Reg. Nbrs. (n = 10) (n = 10) 172 As a further demonstration of the consistency effect within the irregulars, we divided group 2, the verbs with regular neighbors, into two groups of 5 verbs each, which we will call groups 2a and 2b. Group 2a had an average of 2.8 regular neighbors, and group 2b had an average of 4.8 regular neighbors. See Appendix F, section F.3 for a list of the verbs in each group. The model’s performance on these groups is shown in Figure 26. 0.18 0.16 2 o 0.14 u in | 0.12 w 0.10 J 0.08 0.06 0.04 Figure 26: Matched sets of irregular verbs with regular neighbors As predicted by the connectionist theory, the performance on the irregulars appears to be affected by the number of regular neighbors. The past tenses of irregulars with fewer regular neighbors (mean error 0.047) are clearly easier to 2.8 Reg. Nbrs. 4.8 Reg. Nbrs. (n = 5) (n=5) 173 generate than the past tenses of irregulars with more regular neighbors (mean error 0.162). 9.4 Discussion Connectionist networks have often been used to account for past tense data that is observed in people. Many times, this data can be accounted for by two very different theories: the modified traditional theory which is built upon grammatical rules and symbolic processing, and the connectionist theory which presumes no special mechanisms, rules, or structures. The past tense debate to date has often been driven by advocates of some variation of the traditional theory, who present evidence meant to support a rule-governed pathway and to present difficulty for a connectionist implementation. Connectionist modelers in kind respond with a model that accounts for this evidence. And the cycle repeats. We suggest another approach could be informative to the debate. Connectionist networks need not only be tools of implementation, but may also be used to inspire and drive the development of a theory. With the knowledge that all items are processed through the same network, it is natural to make certain assumptions. First, each item in the training set can exert an influence on other similar items. Second, when frequency of exposure to items is held constant, consistency of mapping is the most important factor in determining the output of a connectionist network. Irregulars tend to be very inconsistent in their past tenses except for the cases of sub-regular pools (e.g. SING-SANG, RING-RANG). When irregulars do not benefit from the support of pools of similar sounding 174 irregulars, a single path theory would predict that interference from regular neighbors affects the generation of irregulars. The consistency effect within the irregulars is a prediction of human behavior based on the connectionist theory. The connectionist account has already demonstrated its ability to explain a wide variety of past tense data in a much different manner than the modified traditional theory. It remains to be seen in psycholinguistic experiments whether this effect actually exists in people. If so, it will be interesting to observe the enhancements that will be required of the modified traditional theory to account for this data. 175 CHAPTER 10: GENERAL DISCUSSION Our stated goal at the beginning of this thesis was that we simply wish to know the best account of the data concerning the past tense. Thereafter, we have provided evidence supporting a novel approach toward understanding how this small, albeit representative language system can be learned— the connectionist account. Over the last few decades, a tremendous body of data has been collected on the past tense system. Until recently, the best accounts of the data have been variations on one of the most common themes in linguistics, the dual-route theory. Pinker and his colleagues have put forth considerable effort to develop this into the modified traditional theory, which is considered by many to be the most comprehensive theory of the past tense. The connectionist account is a radical departure from the traditional theory. Gone are the language-specific rules and structures that have previously been postulated from the data, to be replaced by very general learning principles. Indeed, it is hard to imagine a simpler account of learning the past tense than a system that associates inputs to outputs within a single mechanism. This account need not make any specific claims about the underlying architecture or the exact implementation of past tense learning. The central claim is merely that all verbs 176 must be processed in a single mechanism that is sensitive to the frequency of occurrence and the consistency of mapping between present and past tense forms. In the following sections, we summarize our rationale for making this claim. 10.1 Accounting for the data On first impression, the past tense appears to be a very simple inflectional process that generates both regular and irregular past tenses. Initially, proponents of the traditional theory underestimated the capabilities of the connectionist approach and challenged its ability to demonstrate such phenomena as u-shaped learning and proper generalization to novel verbs. Once these challenges were met, a broad range of additional behavioral data was presented as evidence to support their theory. The frequency by regularity interaction, the past tense preferences for denominals and deverbals, and the low-frequency default of the Old English past tense are the data specifically addressed by this thesis. There are two views toward a connectionist account of this diverse set of data. Proponents of the traditional theory such as Pinker and his colleagues have stated that the connectionist theory cannot provide a proper account. They contend that the frequency by regularity interaction is due to the autonomy of the rule-governed pathway and that a connectionist network would show a frequency effect with regular verbs as well as irregular verbs. They claim that a demonstration of denominal and deverbal past tense preferences would require knowledge of grammatical category derivation, which entails language-specific grammatical structures which cannot be provided by semantic knowledge alone. 177 They further state that connectionist networks are too powerful at similarity-based learning and that a network account could only display default behavior if the items in the default category are a majority of the training corpus. On the other hand, connectionist researchers have speculated that these phenomena would not be overly difficult for a connectionist account. The frequency by regularity interaction is expected when the topology of the past tense is considered. The extremely consistent mapping of many regular present tenses to their past tenses overcomes any tendency to of the model generate high frequency regulars better than low frequency regulars Additionally, Pinker and his colleagues’ criticism of connectionist networks being too dependent on similarity is clearly unfounded. An earlier connectionist model of the past tense demonstrated that the network would not generalize past tense forms based strictly on similar semantics. Furthermore, connectionist classification networks have been shown to develop classes based on either the presence or absence of individual features. This suggested that the development of a low-frequency default rule based on conjunctions of features could be possible as well. In light of these contrasting views, we feel our success in accounting for these data demonstrates an important point. Our choice of connectionist architecture and learning algorithm have remained constant in all of our simulations. In fact, these choices are the same as those used in nearly all connectionist accounts of the past tense. Certainly, better performance might have resulted with the optimum architecture for each phenomenon. Also, a stronger claim could be made regarding the psychological validity of the connectionist theory had we chosen a biologically plausible learning algorithm. 178 But in making the choices that we did, we can demonstrate a very strong claim— the wide range of past tense behavioral data can be addressed by the simplest and most general connectionist architecture and learning algorithm. 10.2 Identifying new phenomena But accounting for the existing data is only half of the story. By understanding the learning principles inherent in the theory, we were able to make predictions about the past tense that could be tested on the model. It is well known that connectionist models take advantage of the consistency of input to output mappings during learning. We predicted and observed a consistency effect in our network in which regular verbs with no irregular neighbors are easier to learn than regular verbs with irregular neighbors, because their input to output mapping is more consistent. Our prediction has been observed in behavioral data as well. This effect cannot be accounted for by the traditional theory, which predicts that all regular verbs are processed by the same rule and are invariant to frequency and consistency effects. We felt that since the consistency effect applies to regular verbs, it should affect irregular verbs as well. Our model predicts that the past tense of irregular verbs with many regular neighbors will be more difficult to generate than irregular verbs with no regular neighbors. Future research efforts will inform us if this prediction can be observed in people as well. By making the most basic assumptions when developing our model, we are able to draw upon well-understood concepts when predicting its performance. 179 Since we do not develop a different network for each phenomenon of interest, we need not be concerned that the model’s accomplishments are due to artifacts of a specific architecture. As our model accounted for more and more data, we only made very minor modifications to our original network. Our initial operating assumptions have proved sufficient throughout all simulations. 10.3 What is wrong with the traditional theory? The traditional theory of the past tense has greatly evolved from its simplest “rule and rote” account. But one must question the motivation for these changes and ask if they depart from very foundation of this theory— the past tense rule. Do the modifications have any basis other than to account for the data? What limitations can be placed on these changes and how can the theory be tested? One approach has been to implement the traditional theory in the SPA model. Although it is a noble attempt at a computational solution, it falls short in accounting for the difficulty of producing different classes of past tenses. At this stage in its development, the SPA does not provide a comprehensive account of the data, although we understand that probabilistic constraints on the production rules could embellish this account. Thus, if there is no implementation of the traditional theory, proponents are free to make wanton changes without the rigorous and time-consuming task of verification within a model. Replacing the rote memory system of irregulars with an associative memory certainly gave a better account of the facts about the irregulars and novel verbs, but it also opened the door to groundless speculation 180 on how frequency and consistency affect the learning and performance on these items. In the connectionist account, this data is the result of well-understood and very general principles upon which precise predictions can be made. Likewise, the traditional theory describes that past tense preferences of denominals and deverbals are due to knowledge of grammatical category derivation. However, denominal preferences that are not predicted by their theory are due to an “uncertainty” of derivational status, possibly based on semantic factors. But “uncertainty” clearly diminishes the power of a rule-governed system. Furthermore, if semantics can affect the notion of derivational status, then perhaps semantic knowledge alone determines these past tense preferences. This is the premise behind the connectionist account of these facts. However, the most interesting modification to the traditional theory has been a proposed connection between the irregular associative memory and the rule-governed pathway. This is meant to account for the consistency effect within the regulars. We take this to be a serious blow to their theory. The traditional notion of a rule is that it is not sensitive to the frequency or consistency of the items that it affects. All items that are processed by the rule are done so in the same manner. This additional connection undermines the very foundation of the traditional theory, but is necessary to account for the data. On the other hand, the connectionist theory provides an account of the consistency effect for free. If a consistency effect is shown to exist in the irregulars as well, the traditional theory will find it necessary to add another connection between the rule-governed pathway and the associative memory, making it indistinguishable from the connectionist account. 181 10.4 Conclusion We believe that the past tense can be characterized at its highest level by a rule-based description. But a thorough understanding of behavioral data in people departs from a rule-based theory. At best, the traditional theory can provide a folk-psychological account of the data. We have demonstrated in this thesis that a very simple connectionist model can provide a comprehensive account of the past tense. Our modest network is able to address a broad range of behavioral data without departing from the basic tenet of the connectionist theory; namely, a single-pathway of processing that is affected by both frequency and consistency of mapping. Additionally, we have demonstrated that our account is not limited to the idiosyncratic topology of the modem English past tense. We used our original model to successfully account for the Old English past tense, which is very similar to the topology of other inflectional systems such as the Arabic broken plural and the German plural. This result lends credence to the view that the connectionist theory has the capability to account for a variety of inflectional systems. A successful connectionist account of a simple system such as the past tense provides a foundational understanding for the kinds of data necessary in basic language processing. We have demonstrated a representation of phonological and semantic information that are the bare minimum to account for inflectional processes. W e have further shown how the connectionist theory can provide an explanatory account based on the well-understood principles of these 182 models. As the connectionist theory evolves, accounts of higher-level processes can build upon this foundation. Taken together, we view the evidence in this thesis as a significant step along the route toward an explanatory computational theory of language processing. 183 APPENDIX A: BACK-PROPAGATION This appendix describes the mathematical details of the backpropagation learning algorithm, and is based in part on Seidenberg & McClelland (1989). For a complete derivation of the equations, see Rumelhart, Hinton, & Williams (1986). A .l The input to a unit For each unit j in the network, a quantity called the net input is computed. This value is the activation of each input unit i times the weight wji on the connection from i to j. The net input is given as: net, = £ w y z ( (A .l) i A.2 The activation of a unit The activation aj of a unit j is determined from the net input using a nonlinear function called the logistic function: 184 a,=f,( "«P = ^ 4 = ; (A.2) A.3 The computation of the unit error For each unit j that has a specified target value tj, the difference between the correct or target activation and its actual activation aj is computed as follows: dj = {tj-a }) (A.3) A.4 The determination of the error signal The determination of the error signal Sj for output unit j is computed as: Sj=djfj(netj) (A.4) The error signal Sj for hidden unit j, for which there is no target value, is determined recursively in terms of the error signals of the units to which it directly connects 8k and the weights of those connections w /cj: 5i = f i (nelI)YJSlwk l (A.5) k 185 A.5 The updating of the weights The strength of a weight w y, on each line in the network is adjusted in proportion to the extent to which this change will reduce the error on the unit i receiving input on that line: Aw;i = e8jai + a ■ A wfi (A.6 ) Here e is a learning rate parameter and a is a momentum term between 0 and 1, which specifies the magnitude by which the previous weight change w ’ affects the current weight change w. 186 APPENDIX B: BACK-PROPAGATION THROUGH TIME This appendix is reprinted with permission from Plaut (1991), and gives the mathematical details of the back-propagation through time learning algorithm. For other work on this algorithm, see also Williams & Peng (1990). B .l The units Let xj(0 be the total input of unit j at time r, and let yjfO be its output. Then if Wij is the weight on the connection from unit i to unit j, then B.2 The forward pass The network runs for a fixed number of iterations tmax. (12 in our simulations). The input is presented to the network by setting y /0 for each input (B.l) (B.2) 187 unit i and every t as specified by the input. of the remaining units are initialized to some constant value (0.2 in our simulations). Then for / = 1 to tmax, unit inputs and outputs are calculated according to Equations B .l and B.2, respectively. B.3 The error function In addition to the states of the input units, the environment specifies the desired states d/V of each output unit j for some times t (typically the last 10 iterations in our simulations). The error for time t, called the cross-entropy (Hinton, 1989b), is defined over output units j to be E u > = Z K )to8W ,’) + (1_<i“ )lo*(1 _ y5'))) < B-3) j where the total error E = £,£?)• B.4 The backward pass The backward pass calculates the derivatives of the error with respect to the states and weights in the network. The error derivative of a unit’s state has two components: the derivative of the “external” error function (which is 0 for non-output units and for iterations without desired states) and the derivative of the error caused by the unit’s influence on other units. The error derivatives for weights have two corresponding terms. Specifically, for t = tmax to 1, the 188 derivatives of the error at time t with respect to the states and weights of each unit are calculated according to the following equations: dEw 1 ~ d f d f dyf ' 1 - y f ' y f d£0) _ V ..?*L.. _ y d£it) y^(\ -y«\ dy\*~l) y dyf dxf d y y dyf y‘ 1 y' ' v dE(,) dEw dyf dxf dEw d y f x ) dxf~x ) dwi} ~ dyf dxf dwtj ' dy{ ;~l) dx{‘-X ) dwi; dEW dE{,) (,)/ (0V (f_i, dE(t) (r-D/j y H ) U - 2 ) dwif dyf }y‘ d yyn y' ' ^ dE _ y dE{,) dwi} y dwtj (B.4) B.5 Weight updating The procedure defined by Equation B.4 is applied to each example in turn, accumulating error derivatives for the weights. At this point, each weight in the network is changed according to: ( V"1 Aw\f = - e — — + aAw\n ~ l] (B.5) 189 where e determines the overall learning rate (0 . 0 0 1 in our simulations), a is a momentum term that causes weight changes to be similar to previous weight changes (0 . 9 in our simulations), and n is the number of “sweeps” through the examples so far. 190 APPENDIX C: TRAINING, GENERALIZATION, AND TEST SETS FOR SIMULATIONS IN CHAPTERS 4, 5, AND 6 The format of all items in these lists is as follows: orthographic form of present tense, phonological representation of present tense, present tense raw frequency, orthographic form of past tense, phonological representation of past tense, past tense raw frequency. The ' character in the phonological representation is a stress marker. C .I Training set 1 This training set contains all verbs with KF frequency greater than or equal to 2 that have monosyllabic present and past tense forms. It is composed of 104 irregular verbs followed by 309 regular verbs. Irregular verbs: beat b' it 66 bend b ' End 50 bite b ’Yt 26 bleed bl1 id 18 blow bl’ o 52 beat b' it 12 bent b ’ Ent 14 bit b' It 7 bled bl 'Ed 2 blew bl 'u 12 191 bound b 1 Wnd 13 bound b ' Wnd 5 break br 1 ek 228 broke br' ok 66 bring br 1IG 488 brought br' ct 133 build b ' lid 249 built b ' lit 21 burst b ' Rst 37 burst b ' Rst 11 buy b'Y 162 bought b ' ct 32 cast k ' 6st 28 cast k ' @st 4 catch k'@C 146 caught k' ct 54 choose C'uz 177 chose C'oz 37 cling kl • IG 30 clung kl' AG 13 come k' Am 1561 came k ' em 618 creep kr 1 ip 27 crept kr'Ept 9 cut k'At 245 cut k ' At 25 deal d' il 124 dealt d'Elt 8 dig d' ig 32 dug d ' Ag 7 draw dr ' c 222 drew dr' u 63 drink dr'IGk 93 drank dr'0Gk 19 drive dr' Yv 203 drove dr ' ov 58 eat 1 it 122 ate ' et 16 fall f'cl 239 fell f'El 87 feel f' il 643 felt f'Elt 302 fight f1 Yt 155 fought f • ct 23 find f' Ynd 1033 found f' Wnd 268 fling fl' IG 17 flung fl ' AG 9 fly f 1 ' Y 92 flew fl 'u 27 get g'Et 1486 got g ' at 338 give g' Iv 1264 gave g' ev 285 go g ' o 1844 went w' Ent 508 grind gr1Ynd 26 ground gr'Wnd 4 grow gr 1 o 300 grew gr' u 65 hang h' @G 131 hung h' AG 53 hear h' Ir 433 heard h'Rd 129 hide h'Yd 61 hid h’ Id 6 hit h' It 126 hit h' It 38 hold h ' old 509 held h'Eld 125 keep k'ip 523 kept k' Ept 115 kneel n'il 21 knelt n'Elt 7 know n 1 o 1473 knew n' u 394 lay 1 'e 138 laid 1' ed 24 leave 1 * iv 650 left 1 'Eft 157 lend 1 'End 29 lent 1 'Ent 3 lie 1' Y 211 lay 1 'e 81 lose 1' uz 274 lost 1' cst 49 make m ' ek 2312 made m ' ed 466 mean m ' in 376 meant m ' Ent 70 meet m ' it 339 met m'Et 80 pay p ' e 325 paid p ' ed 50 put p'Ut 513 put p'Ut 130 192 read r1 id 274 read r'Ed 36 ride r'Yd 126 rode r' od 40 ring r T IG 39 rang r' @G 21 rise r 1 Yz 199 rose r' oz 60 run r ' ~n 431 ran r' @n 134 say s 1 e 2765 said s 'Ed 1748 see s 1 i 1513 saw s' c 337 seek s 1 ik 179 sought s ' ct 35 sell s ’El 128 sold s' old 20 send s ’End 253 sent s ' Ent 69 set s ’Et 372 set s 'Et 71 shake S 1 ek 107 shook S 'Uk 57 shed S 'Ed 12 shed S 'Ed 3 shine S ’ Yn 32 shone S 'on 5 shoot S 1 ut 117 shot S' at 18 shut S’"t 50 shut S'"t 7 sink s ’ IGk 40 sank s ' @Gk 18 sit s ’ It 314 sat s ' Qt 139 slide si ’Yd 43 slid si' Id 24 smell sm1 El 43 smelt sm'Elt 3 speak sp ’ ik 274 spoke sp' ok 86 spend sp’End 194 spent sp'Ent 40 spin sp ’ In 31 spun sp' '"n 14 spit sp 1 It 21 spat sp' It 3 split spl’It 26 split spl'It 5 spread spr’Ed 90 spread spr'Ed 18 spring spr’IG 30 sprang spr'@G 13 stand st’@nd 468 stood st 'Ud 198 steal st ’ il 39 stole st' ol 10 stick st ’ Ik 50 stuck st' "k 13 strike str’Yk 108 struck str 1 ''k 40 strive str’Yv 18 strove str'ov 4 swear sw’ §r 33 swore sw' or 14 swim sw’ Ira 55 swam sw' @m 6 swing sw’ IG 77 swung sw' AG 43 take t ’ ek 1575 took t 'Uk 426 teach t ’ iC 153 taught t ' ct 19 tear t'Er 58 tore t ' or 15 tell t 'El 759 told t ' old 286 think T' IGk 982 thought T'ct 340 throw Tr' o 150 threw Tr' u 46 thrust Tr'Ast 23 thrust Tr'Ast 9 wake w ’ ek 45 woke w ' ok 14 wear w' @r 174 wore w ' or 65 weave w ’ ev 15 wove w ' ov 3 weep w ’ ip 28 wept w' Ept 7 win w' In 159 won w 1 ^n 45 wind w'Ynd 2 9 wound w'Wnd 7 write r'Yt 561 wrote r'ot 179 Regular verbs: aim 1 em 42 ask ' @sk 612 bar b 1 ar 17 beg b 1 Eg 34 belch b'EIC 5 blame bl1 em 32 blaze bl1 ez 11 blush bl * AS 12 bob b ' ab 5 bore b 1 or 26 bounce b ' Wns 28 breathe br' iD 31 bribe br' Yb 4 brush br' AS 38 bump b ' Amp 9 burn b'Rn 103 buzz b' Az 9 cease s 1 is 32 change C'enJ 225 charge C • ar J 82 chew C 'u 16. chill C' 11 12 choke C 1 ok 22 claim kl' em 99 clap kl1 0p 9 climb kl1 Ym 65 clip kl1 Ip 3 close kl' os 174 clutch kl ' AC 17 cock k 1 ak 6 cook k'Uk 50 cough k'cf 8 crack kr' @k 41 crash kr' @S 23 cringe kr'InJ 5 croon kr' un 3 crouch kr 'WC 22 crush kr' AS 17 curl k'Rl 17 dance d ' @ns 59 aimed 1 emd 10 asked ' @skt 300 barred b'ard 2 begged b'Egd 13 belched b'ElCt 4 blamed bl'emd 5 blazed bl1ezd 2 blushed bl'ASt 4 bobbed b ' abd 2 bored b 1 ord 3 bounced b 'Wnst 13 breathed br1iDd 9 bribed br'Ybd 2 brushed br'ASt 14 bumped b 1Ampt 2 burned b'Rnd 15 buzzed b 1 Azd 2 ceased s' ist 8 changed C1enJd 26 charged C1arJd 17 chewed C'ud 4 chilled d i d 2 choked C ' okt 4 claimed kl'emd 25 clapped kl1Qpt 4 climbed kl'Ymd 41 clipped kl'Ipt 2 closed kl'ost 39 clutched kl'ACt 5 cocked k 1 akt 4 cooked k'Ukt 2 coughed k'cft 2 cracked kr1@kt 11 crashed kr•0St 7 cringed kr'InJd 2 crooned kr'und 2 crouched kr'WCt 10 crushed kr'ASt 2 curled k 1 Rid 6 danced d’@nst 8 194 dash d' 0S 14 dashed d 1 0St 4 deem d ' im 17 deemed d1 imd 3 dip d' Ip 6 dipped d 1 Ipt 3 dodge d' aJ 8 dodged d' a Jd 2 doze d 1 oz 8 dozed d ' ozd 4 drain dr * en 16 drained dr 1 end 3 drape dr1 ep 9 draped dr 1ept 3 dress dr 'Es 67 dressed dr'Est 10 drip dr' Ip 14 dripped dr'Ipt 5 drown dr' Wn 14 drowned dr'Wnd 4 duck d' rtk 15 ducked d' Akt 5 dump d ' /ymp 14 dumped d'^mpt 7 earn 'Rn 45 earned ' Rnd 9 fail f ' el 142 failed f'eld 52 fan f 1 @n 13 fanned f' 0nd 4 file f ' Y1 87 filed f' Yld 12 fill f ' Il 184 filled f' lid 31 fix f' Iks 109 fixed f'Ikst 12 flash fl ' @S 28 flashed fl'0St 12 flex fl'Eks 3 flexed f1'Ekst 2 flip fl' Ip 8 flipped fl'Ipt 3 flock fl' ak 4 flocked fl'akt 2 flog f 1' ag 3 flogged f1'agd 2 flop f 1' ap 7 flopped f1'apt 6 force f 1 ors 124 forced f'orst 19 frame f r' em 23 framed fr'emd 2 frown fr'Wn 22 frowned fr'Wnd 7 gain g' en 77 gained g ' end 18 gape g'ep 5 gaped g'ept 3 gasp g1 @sp 11 gasped g ' 0spt 5 gaze g1 ez 21 gazed g ' ezd 7 glance gl'@ns 43 glanced gl' 0nst 25 grab gr' @b 37 grabbed gr'0bd 19 grasp gr10sp 23 grasped gr'0spt 5 grip gr' Ip 19 gripped gr'Ipt 9 groan gr' on 4 groaned gr'ond 3 grope gr' op 12 groped gr'opt 7 growl gr 1 Wl 5 growled gr'Wld 4 guess g' Es 77 guessed g ' Est 7 gulp g' Alp 3 gulped g'^Ipt 3 gush g' '■S 5 gushed g' ^ St 5 help h ' Elp 352 helped h 'Elpt 40 hiss h ' Is 4 hissed h' 1st 2 hop h * ap 10 hopped h ' apt 5 hope h ' op 164 hoped h ' opt 33 hug h' ~g 11 hugged h 1 ""gd 2 hurl h'Rl 12 hurled h ' Rid 3 jab J' 0b 4 jabbed J' Qbd 2 195 jerk J'Rk 16 jerked J'Rkt 12 join J1 On 139 joined J'Ond 33 judge J' AJ 42 judged J’ AJd 3 jump J’ Amp 58 jumped J'Ampt 32 kill k' 11 153 killed k ' lid 34 kiss k 1 Is 31 kissed k 11st 15 knock n'ak 47 knocked n 1 akt 17 lack 1 • @k 70 lacked 1' 0kt 15 lapse 11 @ps 7 lapsed 1'0pst 2 lash 1' 0S 9 lashed 1 • 0St 3 laugh 1 * 0f 89 laughed 1' 0ft 46 launch 1' cnC 31 launched 1'cnCt 3 learn 1 'Rn 254 learned 1 'Rnd 54 loathe 1' oD . 5 loathed 1' oDd 4 look l'Uk 910 looked 1 'Ukt 327 loom 11 um 15 loomed 1' umd 3 lounge 1'WnJ 8 lounged 1'WnJd 3 lug 1' Ag 6 lugged 1' Agd 3 lunge 1' AnJ 5 lunged 1'AnJd 4 lurch 1 'RC 7 lurched 1 'RCt 5 lurk l'Rk 8 lurked 1 'Rkt 3 march m 1 arC 37 marched m 1arCt 6 mark m 1 ark 126 marked m 'arkt 15 miss m' Is 95 missed m ' 1st 17 mourn m 1 orn 12 mourned m 1 ornd 2 move m ' uv 447 moved m ' uvd 138 nudge n ' A J 4 nudged n' A Jd 2 pass p ' 0s 298 passed p 1 0st 91 peck p 1 Ek 3 pecked p 'Ekt 2 pile p 1 Y1 26 piled p 1 Yld 7 pinch p' InC 11 pinched p 1InCt 2 pitch p' IC 20 pitched p ' ICt 4 please pi' iz 91 pleased pi'izd 11 pluck pi' Ak 6 plucked pi'Akt 4 plump pi'Amp 2 plumped pi1Ampt 2 plunge pi' AnJ 20 plunged pi1AnJd 10 poise p 1 Oz 12 poised p 1 Ozd 2 poke p 1 ok 13 poked p ' okt 3 pop P 1 ap 17 popped p ' apt 6 pose p 1 oz 20 posed p ' ozd 3 pour p 1 Ur 48 poured p ' Urd 21 praise pr' ez 21 praised pr1ezd 8 press pr' Es 82 pressed pr'Est 12 prove pr' uv 156 proved pr1uvd 48 puff p' Af 8 puffed p' Aft 2 pull p'Ul 145 pulled p 1 Uld 54 pump p 1 Amp 12 pumped p 1Ampt 2 push p'US 102 pushed p'USt 31 196 race r' es 30 raced r 1 est 11 raise r'ez 188 raised r' ezd 42 rap r ’ 0p 6 rapped r' 0pt 2 rip r' Ip 14 ripped r * Ipt 5 roar r * or 27 roared r' ord 18 rob r ' ab 15 robbed r ’ abd 2 rock r’ak 20 rocked r' akt 7 roll r ’ ol 88 rolled r' old 34 rouse r' Wz 5 roused r 1 Wzd 2 rub r' Ab 34 rubbed r,Abd 13 rush r' AS 42 rushed r 1 ASt 20 sag s ' @g 11 sagged s ' 0gd 3 save s ' ev 121 saved s ' evd 11 scan sk' @n 17 scanned sk'0nd 9 scowl sk 1W1 6 scowled sk'Wld 4 scrape skr'ep 18 scraped skr1ept 6 scream skr1im 40 screamed skr1imd 14 search s 'RC 41 searched s 'RCt 7 seem s ! im 831 seemed s ' imd 311 seize s 1 iz 33 seized s ' izd 12 serve s ' Rv 300 served s ' Rvd 52 shape S 1 ep 34 shaped S ' ept 3 shave S ' ev 23 shaved S ' evd 4 shock S 1 ak 23 shocked S ' akt 2 shove S ' Av 16 shoved S 1 Avd 8 shrill Sr' 11 3 shrilled Sr'lid 2 shrug Sr ’ Ag 18 shrugged Sr'Agd 18 sip s' Ip 10 sipped s 1 Ipt 2 skip sk 1 Ip 17 skipped sk1Ipt 6 slash si' 0S 18 slashed si’0St 6 slip si1 Ip 47 slipped si1Ipt 26 slump si'Amp 11 slumped si'Ampt 6 smack sm' 0k 7 smacked sm"0kt 2 smash sm1 0S 18 smashed sm'0St 11 smile sm' Y1 122 smiled sm'Yld 68 snap sn ' 0p 38 snapped sn'0pt 17 snarl sn'arl 11 snarled sn'arid 8 sniff sn 1 If 10 sniffed sn'Ift 6 soar s 1 or 9 soared s ' ord 3 soothe s ' uD 8 soothed s ' uDd 2 spill sp111 9 spilled sp’lid 2 splash spl10S 7 splashed spl10St 3 squeeze skw1iz 30 squeezed skw'izd 8 stain st1 en 45 stained st'end 4 stalk st1 ck 9 stalked st1ckt 6 stamp st10mp 16 stamped st'0mpt 2 stem st1 Em 22 stemmed stfEmd 3 stir st 'R 39 stirred st' Rd 7 197 stoop st' up 11 stooped st'upt 3 Stop st' ap 240 stopped st'apt 103 strain str'en 20 strained str'end 8 stretch Str'EC 61 stretched str'ECt 21 stroll str'ol 8 strolled str'old 4 suck s ' Ak 18 sucked s ' Akt 5 sue s' u 19 sued s 1 ud 2 sulk s • Alk 3 sulked s'Alkt 2 surge S'RJ 10 surged s 'RJd 7 swerve sw' Rv 5 swerved sw'Rvd 2 swoop sw' up 9 swooped sw'upt 4 talk t ' ck 275 talked t 1 ckt 41 tap t1 @p 19 tapped t 1 0pt 2 tease t 1 iz 8 teased t 1 izd 2 thrash Tr' @S 4 thrashed Tr'0St 2 thrill Tr' 11 4 thrilled Tr'lid 2 throb Tr' ab 5 throbbed Tr'abd 3 tip t 1 Ip 7 tipped t ' Ipt 4 tire t 1 Yr 46 tired t ' Yrd 4 top t ' ap 13 topped t ' apt 3 toss t1 cs 41 tossed t ' est 22 touch t ' AC 91 touched t ' A ct 24 trace tr' es 36 traced tr'est 2 trail tr 'el 17 trailed tr'eld 6 tramp tr'@mp 2 tramped tr'0mpt 2 trip tr' Ip 6 tripped tr'Ipt 2 trudge tr tAJ 4 trudged tr1AJd 4 tuck t ' Ak 9 tucked t ' Akt 4 tug t ' Ag 4 tugged t ' Agd 2 turn t 'Rn 566 turned t ' Rnd 253 twitch tw' IC 6 twitched tw'ICt 4 urge 'RJ 64 urged 'RJd 21 wag w ' 0g 4 wagged w ' 0gd 2 wail w' el 8 wailed w ' eld 3 walk w' ck 287 walked w ' ckt 143 warn w ' crn 62 warned w ' crnd 14 wash w 1 cS 83 washed w ' cSt 10 watch w 1 cC 209 watched w ' cCt 68 wave w ' ev 30 waved w ' evd 16 wax w ' 0ks 6 waxed w ' 0kst 3 whack hw 1 0 k 2 whacked hw'0kt 2 whip hw' Ip 24 whipped hw'Ipt 7 whirl hw' Rl 17 whirled hw'Rid 6 wince w ' Ins 5 winced w 'Inst 4 wipe w'Yp 35 wiped w ' Ypt 11 wish w' IS 161 wished w' ISt 52 wrap r' 0p 23 wrapped r' 0pt 2 wrench r'EnC 4 wrenched r'EnCt 2 198 yank y' @Gk 7 yanked y '0Gkt 4 yearn y 1 Rn 4 yearned y' Rnd 2 ache ' ek 11 ached ' ekt 3 bang b'0G 10 banged b ' 0Gd 4 bawl b 1 cl 3 bawled b'cld 2 blink bl'IGk 13 blinked bl'IGkt 6 bow b ' o 13 bowed b 1 od 6 call k'cl 627 called k'cld 165 crawl kr' cl 37 crawled kr'cld 17 creak kr 1 ik 11 creaked kr'ikt 6 cry kr'Y 64 cried kr' Yd 25 die d' Y 183 died d'Yd 63 dine d' Yn 32 dined d 1 Ynd 2 dive d'Yv 11 dived d1 Yvd 4 drawl dr' cl 4 drawled dr'cld 3 drum dr' / 'm 6 drummed dr' ' ’ md 2 flare fl' @r 9 flared fl'0rd 3 flee fl' i 40 fled fl' id 22 flick fl1 Ik 6 flicked fl'Ikt 5 flow fl1 o 40 flowed fl ’ od 4 fuse f y ' uz 6 fused fy'uzd 2 glare gl1 0r 13 glared gl10rd 5 glow gl' o 19 glowed gl' od 6 grin gr' In 38 grinned gr'Ind 29 haul h' cl 17 hauled h'cld 3 heal h' il 11 healed h1 ild 3 heave h' iv 10 heaved h ' ivd 4 kick k' Ik 34 kicked k' Ikt 10 kneel n' il 21 kneeled n 1 ild 2 leak 1 • ik 5 leaked 11 ikt 4 lean 1 • in 61 leaned 1' ind 37 leap 1' ip 33 leaped 1' ipt 18 leer 11 Ir 7 leered 1' Ird 3 lick 11 Ik 14 licked 1' Ikt 7 lie 11 Y 211 lied 1'Yd 5 like 1 1 Yk 294 liked 1 ' Ykt 45 live 1' Iv 472 lived 1' Ivd 72 match m ' 0C 77 matched m ' 0Ct 2 muse my' uz 5 mused my'uzd 4 owe ' o 34 owed ' od 12 peel p ' il 14 peeled p'ild 2 pick p' Ik 143 picked p ' Ikt 51 pray pr' e 30 prayed pr' ed 8 preach pr' iC 26 preached pr'iCt 6 rake r' ek 6 raked r' ekt 3 reach r' iC 324 reached r' iCt 106 rear r'lr 14 reared r' Ird 7 reel r' il 3 reeled r' ild 2 199 scare sk 1 Er 26 scratch skr'@C 22 scrawl skr'cl 5 screech skr1iC 12 share S 'Er 105 show S’o 640 shriek Sr' ik 5 sigh s ' Y 28 size s 1 Yz 5 skim sk' Im 8 smell sm 1 El 43 snatch sn' @C 17 sneak sn1 ik 11 spare sp' @r 19 sprawl spr'cl 18 squeak skw'ik 2 stall st ’ cl 6 stare st1 @r 95 stay st1 e 195 steer st 1 Ir 16 sway sw1 e 13 swell sw 1 El 20 thrive Tr'Yv 11 tick t • Ik 5 try tr'Y 472 veer v 1 Ir 8 weigh w 1 e 33 wink w 1 IGk 18 yell y 'El 31 scared sk'Erd 3 scratched skr'@Ct 4 scrawled skr1cld 3 screeched skr'iCt 5 shared S 'Erd 19 showed S ' od 138 shrieked Sr'ikt 4 sighed s ' Yd 22 sized s ' Yzd 2 skimmed sk'Imd 2 smelled sm'Eld 15 snatched sn’@Ct 9 sneaked sn'ikt 4 spared sp'0rd 3 sprawled spr'cld 2 squeaked skw'ikt 2 stalled st ' cld 3 stared st'@rd 58 stayed st' ed 60 steered st'Ird 4 swayed sw' ed 7 swelled sw'Eld 3 thrived Tr'Yvd 4 ticked t ' Ikt 2 tried tr' Yd 120 veered v ' Ird 3 weighed w ' ed 11 winked w'IGkt 7 yelled y'Eld 21 200 C.2 Training set 2 This training set contains all 309 regular verbs with KF frequency greater than or equal to 2 that have monosyllabic present and past tense forms. It also contains a set of 24 irregular verbs that have been selected to maintain the ratios of the type and token frequencies of the sub-regular classes of irregulars. The irregular verbs precede the regular verbs in the list. Irregular verbs: blow bl 'o 52 blew bl 'u 12 break br 1 ek 228 broke br' ok 66 build b' Ild 249 built b' lit 21 buy b'Y 162 bought b ' ct 32 creep kr 1 ip 27 crept kr'Ept 9 drink dr 1IGk 93 drank dr'@Gk 19 fall f' cl 239 fell f'El 87 fly fl ' Y 92 flew fl 'u 27 grow gr 1 o 300 grew gr' u 65 keep k1 ip 523 kept k' Ept 115 lie 1 1 Y 211 lay l'e 81 ring r 1 IG 39 rang r' @G 21 sink s ' IGk 40 sank s ' @Gk 18 spin sp 1 In 31 spun sp' An 14 spring spr'IG 30 sprang spr'0G 13 steal st' il 39 stole st' ol 10 stick st' Ik 50 stuck st' ^k 13 swear sw 1 @r 33 swore sw' or 14 teach t ' iC 153 taught t ' ct 19 throw Tr' o 150 threw Tr' u 46 wake w 1 ek 45 woke w ' ok 14 wear w ' @r 174 wore w ' or 65 weep w ' ip 28 wept w' Ept 7 win w ' In 159 won w' An 45 201 Regular verbs: Identical to the regular verbs in section C.l 202 C.3 Generalization set 1 This generalization set contains all verbs with KF frequency equal to 1 that have monosyllabic present and past tense forms. All 112 verbs in the list are regular. bask b ' @sk 3 basked b '@skt 1 bathe b 1 eD 26 bathed b ' eDd 1 boil b'Ol 2.7 boiled b ' Old 1 boom b 1 um 2 boomed b ' umd 1 brook br'Uk 1 brooked br1Ukt 1 cash k'@S 3 cashed k' @St 1 chop Cap 9 chopped C' apt 1 clang kl' @G 1 clanged kl'@Gd 1 clash kl' @S 1 clashed kl1 @ St 1 clench kl'EnC 7 clenched kl'EnCt 1 clinch kl'InC 3 clinched kl'InCt 1 cluck kl' "k 3 clucked kl'Akt 1 coax k 1 oks 5 coaxed k'okst 1 coil k 1 01 2 coiled k' Old 1 couch k 'WC 2 couched k' WCt 1 crave kr' ev 5 craved kr1evd 1 croak kr' ok 2 croaked kr'okt 1 cure ky 'Ur 20 cured ky'Urd 1 dab d' @b 3 dabbed d ' @bd 1 daub d ' cb 1 daubed d' cbd 1 daze d ' ez 4 dazed d' ezd 1 deign d ' en 1 deigned d' end 1 dole d' ol 2 doled d' old 1 douse d' us 1 doused d'ust 1 droop dr' up 3 drooped dr'upt 1 drowse dr' Wz 2 drowsed dr'Wzd 1 dub d ' ~b 4 dubbed d' ^bd 1 err 'Er 5 erred ' Erd 1 etch 'EC 2 etched 'ECt 1 filch f' I1C 1 filched f 1 n e t 1 flounce f11Wns 1 flounced f1'Wnst 1 flush fl' 8 flushed fl'Ast 1 furl f ’R1 1 furled f'Rid 1 gag g' @g 2 gagged g' 0gd 1 gnaw n ' c 6 gnawed n' cd 1 graze gr' ez 9 grazed gr'ezd 1 grease gr' is 2 greased gr'ist 1 203 hang h' @G 131 hanged h ' @Gd 1 hatch h' @C 7 hatched h ' @Ct 1 hike h'Yk 3 hiked h ' Ykt 1 howl h 1W1 5 howled h 1 Wld 1 hunch h' "'nC 3 hunched h 'AnCt 1 jam J' @m 10 jammed J1 0md 1 jar J1 ar 2 jarred J' ard 1 lag 1 ' @g 5 lagged 1' @gd 1 lap 11 6p 6 lapped 1' @pt 1 limp 1 ’ Imp 5 limped 1'Impt 1 lodge 11 a J 7 lodged 1 'aJd 1 merge m'RJ 20 merged m 1 RJd 1 mew my 1 u 2 mewed my' ud 1 mix m 1 Iks 56 mixed m 'Ikst 1 mock m 1 ak 11 mocked m ' akt 1 moo m ' u 1 mooed m ' ud 1 moor m ' Ur 3 moored m 1 Urd 1 nap n 1 @p 2 napped n ' @pt 1 nip n ' Ip 2 nipped n ' Ipt 1 parch p ' arC 2 parched p 'arCt 1 pave p ' ev 9 paved p 1 evd 1 peek p' ik 3 peeked p ' ikt 1 pen p 1 En 5 penned p ' End 1 perch p'RC 5 perched p'RCt 1 pierce p ' Irs 7 pierced p 1Irst 1 plop pi1 ap 1 plopped pi'apt 1 pore p ' or 1 pored p ' ord 1 prowl pr ' W1 4 prowled pr'Wld 1 punch p' AnC 3 punched p 'AnCt 1 purl p'Rl 3 purled p'Rld 1 rasp r' 6sp 2 rasped r1@spt 1 roam r' om 10 roamed r' omd 1 rove r' ov 4 roved r 1 ovd 1 scorch sk'crC 2 scorched sk'crCt 1 scrub skr,Ab 9 scrubbed skr'Abd 1 shun S ' ""n 5 shunned S ' And 1 singe s ' InJ 1 singed s'InJd 1 smirk sm' Rk 2 smirked sm'Rkt 1 sneer sn 1 Ir 3 sneered sn'Ird 1 sneeze sn' iz 3 sneezed sn'izd 1 snuff sn ’ Af 1 snuffed sn'^ ft 1 soak s ' ok 18 soaked s ' okt 1 sob s ' ab 3 sobbed s' abd 1 sock s ' ak 2 socked s ' akt 1 solve s ' civ 49 solved s'clvd 1 spark sp1 ark 7 sparked sp'arkt 1 spell sp 1 El 14 spelled sp'Eld 1 splay spl' e 1 splayed spl'ed 1 204 squirm skw'Rm 3 squirmed skw'Rmd store st1 or 47 stored st'ord stray str'e 5 strayed str1ed stun st1 "n 8 stunned st'And tack t ' @k 4 tacked t ' 0kt tag t ' 0g 7 tagged t ' 0gd tax t 1 0ks 27 taxed t 10kst thresh Tr 1 0S 2 threshed Tr'0St thump T' Amp 6 thumped T ' ''mpt trill tr 111 1 trilled tr'Ild twig tw1 Ig 1 twigged tw'Igd wage w 1 e J 11 waged w' eJd wane w 1 en 4 waned w ' end wheeze hw1 iz 2 wheezed hw'izd whine hw' Yn 9 whined hw'Ynd writhe r' YD 8 writhed r' YDd yelp y' Elp 2 yelped y'Elpt zoom z ' urn 3 zoomed z 1 umd bake b ' ek 15 baked b ' ekt blare bl' 0r 2 blared bl'0rd clink kl'IGk 1 clinked kl1IGkt fin f 1 In 1 finned f • Ind link 1 * IGk 25 linked 1'IGkt seep s ' ip 6 seeped s 1 ipt squeal skw'il 3 squealed skw"ild tow t ' o 1 towed t ’ od wake w 1 ek 45 waked w ' ekt 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 205 C.4 Generalization set 2 This generalization set contains irregular verbs that were not in the training set and that have monosyllabic present and past tense forms. All 46 verbs in the list are irregular. bring br' IG 488 bringed br'IGd 133 catch k ' @C 146 catched k ' @Ct 54 choose C ' uz 177 choosed C 'uzd 37 cling kl ’ IG 30 dinged kl'IGd 13 come k ' Am 1561 corned k ' Amd 618 deal d' il 124 dealed d' ild 8 dig d' ig 32 digged d ' Igd 7 draw dr' c 222 drawed dr' cd 63 drive dr' Yv 203 drived dr'Yvd 58 dwell dw ' El 15 dwelled dw'Eld 1 feel f 1 il 643 feeled f' ild 302 fling fl * IG 17 flinged fl'IGd 9 freeze f r' iz 53 freezed fr'izd 1 give g' Iv 1264 gived g' Ivd 285 go g' o 1844 goed g' od 508 hang h' @G 131 hanged h ' @Gd 53 hear h' Ir 433 heared h ' Ird 129 know n ' o 1473 knowed n' od 394 leave 1' iv 650 leaved 1' ivd 157 lose 1' uz 274 * ■ losed 1' uzd 49 make m ' ek 2312 maked m ' ekt 466 mean m ' in 376 meaned m ' ind 70 rise r'Yz 199 rised r' Yzd 60 run r' An 431 runned r' And 134 say s' e 2765 sayed s ' ed 1748 see s ' i 1513 seed s ' id 337 seek s ' ik 179 seeked s ' ikt 35 sell s 'El 128 selled s ' Eld 20 shake S 'ek 107 shaked S ' ekt 57 shine S ' Yn 32 shined S ' Ynd 5 shrink Sr'IGk 12 shrinked Sr'IGkt 1 sling si' IG 3 slinged si'IGd 1 smell sm' El 43 smelled sm'Eld 3 sneak sn' ik 11 sneaked sn'ikt 1 speak- sp' ik 274 speaked sp'ikt 86 sting st' IG 6 stinged st'IGd 1 stink st'IGk 4 stinked st'IGkt 1 206 strike str'Yk 108 strive str'Yv 18 swim sw1 Im 55 swing sw1 IG 77 take t 1 ek 1575 tear t ’Er 58 tell t 'El 759 think T' IGk 982 weave w' iv 20 wring r' IG 3 striked str1Ykt 40 strived str1Yvd 4 swimmed sw'Imd 6 swinged sw'IGd 43 taked t' ekt 426 teared t' Erd 15 teiled t 'Eld 286 thinked T'IGkt 340 weaved w ' ivd 3 wringed r' IGd 1 207 C.5 Test set 1 This test set contains selected groups of verbs from the training set that were generated correctly by the trained model. The 10 highest frequency regular verbs, 1 0 lowest frequency regular verbs, 1 0 highest frequency irregular verbs, and 1 0 lowest frequency irregular verbs are listed. High Frequency Regular Verbs: brush br' " S 38 brushed br•"St 14 burn b 1 Rn 103 burned b 1 Rnd 15 grab gr 1 @b 37 grabbed gr'@bd 19 kill k' 11 153 killed k' Ild 34 raise r' ez 188 raised r' ezd 42 smile sm1 Yl 122 smiled sm1Yld 68 stop st' ap 240 stopped st'apt 103 touch t 1 "C 91 touched t ’ "Ct 24 walk w ' ck 287 walked w ' ckt 143 watch w ' cC 209 watched w ' cCt 68 Low Frequency Regular Verbs: blame bl1 em 32 blamed bl'emd 5 breathe br'iD 31 breathed br'iDd 9 crack kr ’ @k 41 cracked kr'@kt 11 dress dr 1 Es 67 dressed dr'Est 10 flash fl * OS 28 flashed fl’0St 12 sip s 1 Ip 10 sipped s ' Ipt 2 splash spl'0S 7 splashed spl'0St 3 swerve sw' Rv 5 swerved sw'Rvd 2 thrill Tr 1 11 4 thrilled Tr'Ild 2 wipe w'Yp 35 wiped w ' Ypt 11 208 High Frequency Irregular Verbs: break br'ek 228 broke br' ok 66 build b ' l l d 249 built b' lit 21 drink dr'IGk 93 drank dr'0Gk 19 fly fl'Y 92 flew fl 'u 27 grow gr1o 300 grew gr 1 u 65 keep k 'ip 523 kept k 'Ept 115 ring r' IG 39 rang r' 0G 21 teach t 1iC 153 taught t ' ct 19 throw Tr1o 150 threw Tr' u 46 wear w ’@r 174 wore w ' or 65 Low Frequency Irregular Verbs: blow bl' o 52 blew bl 'u 12 creep kr'ip 27 crept kr'Ept 9 sink s1IGk 40 sank s ' 0Gk 18 spin sp"In 31 spun sp' An 14 spring spr'lG 30 sprang spr10G 13 steal st'il 39 stole st' ol 10 stick st' Ik 50 stuck st' Ak 13 swear sw'0r 33 swore sw' or 14 wake w'ek 45 woke w 1 ok 14 weep w'ip 28 wept w ' Ept 7 209 C.6 Test set 2 This test set contains selected groups of verbs from the training set that were generated correctly by the trained model. Each list is matched in terms of average frequency for all of its members. 2 0 entirely regular verbs, 2 0 regular inconsistent verbs, and 2 0 irregular verbs are listed. Entirely Regular Verbs: blame bl ’ em 32 blamed bl1emd 5 breathe br 1 iD 31 breathed br1iDd 9 brush br' "S 38 brushed br'"St 14 burn b ' Rn 103 burned b 1 Rnd 15 crack kr' @k 41 cracked kr1@kt 11 dress dr' Es 67 dressed dr'Est 10 flash fl ' QS 28 flashed f i ' e s t 12 grab gr ' @b 37 grabbed gr1@bd 19 kill k' 11 153 killed k' Ild 34 raise r'ez 188 raised r 1 ezd 42 sip s ’ Ip 10 sipped s 1 Ipt 2 splash spl'@S 7 splashed spl'0St 3 stir st 'R 39 stirred st' Rd 7 stop st' ap 240 stopped st'apt 103 swerve sw' Rv 5 swerved sw1Rvd 2 thrill Tr ' 11 4 thrilled Tr'Ild 2 touch t ' "C 91 touched t ' "Ct 24 walk w ' ck 287 walked w ' ckt 143 watch w ' cC 209 watched w ' cCt 68 wipe w ' Yp 35 wiped w 1 Ypt 11 Regular Inconsistent Verbs: ache ' ek 11 ached 1 ekt 3 blink bl•IGk 13 blinked bl'IGkt 6 bow b 1 o 13 bowed b 1 od 6 crawl kr' cl 37 crawled kr'cld 17 cry kr ■ Y 64 cried kr 1 Yd 25 210 die d'Y 183 died d1 Yd 63 flare fl' 0r 9 flared f1'0rd 3 flow fl'o 40 flowed fl'od 4 glare gl' @r 13 glared gl10rd 5 glow gl' o 19 glowed gl' od 6 grin gr 1 In 38 grinned gr1Ind 29 heal h' il 11 healed h' ild 3 kick k 1 Ik 34 kicked k' Ikt 10 leap 11 ip 33 leaped 1' ipt 18 owe 'o 34 owed 1 od 12 pick p 1 Ik 143 picked p ' Ikt 51 rake r 1 ek 6 raked r 1 ekt 3 reach r' iC 324 reached r 1 iCt 106 show S 1 o 640 showed S ' od 138 stare st1 0r 95 stared st1@rd 58 Irregular Verbs: blow bl 'o 52 blew bl 'u 12 break br 1 ek 228 broke br' ok 66 build b' Ild 249 built b 1 lit 21 creep kr 1 ip 27 crept kr'Ept 9 drink dr'IGk 93 drank dr 1@Gk 19 fly fl ' Y 92 flew fl 'u 27 grow gr 1 o 300 grew gr 1 u 65 keep k ' ip 523 kept k 'Ept 115 ring r' IG 39 rang r *0G 21 sink s 1 IGk 40 sank s ' @Gk 18 spin sp 1 In 31 spun sp1 ~n 14 spring spr1IG 30 sprang spr10G 13 steal st' il 39 stole st' ol 10 stick st1 Ik 50 stuck st' ~k 13 swear sw 1 @r 33 swore sw1 or 14 teach t * iC 153 taught t' ct 19 throw Tr ' o 150 threw Tr' u 46 wake w ' ek 45 woke w' ok 14 wear w ' @r 174 wore w' or 65 weep w ' ip 28 wept w ' Ept 7 211 APPENDIX D: EXPERIMENTAL DATA AND TRAINING SET FOR SIMULATION IN CHAPTER 7 D .l Experimental data The following table contains ratings preferences from subjects for 37 denominal and deverbal pairs (denominals occur before deverbals in each pair). See Kim et al. (1991) for the context in which these forms were presented to the subjects. The second column contains the preferred past tense form of each item (regular vs. irregular), while the third and fourth columns contain the mean subject ratings for the regular and irregular forms respectively. These ratings range from 1.0 (very unnatural sounding) to 7.0 (very natural sounding). The fifth column contains the mean subject rating for the distance of the denominal or deverbal from the “central” meaning of the irregular verb, as reported in Kim et al. The six column contains an alternate mean subject rating for the distance of the item from the “closest” meaning of the irregular verb, as reported in Harris (1992). The seventh column contains the mean subject rating for the item from the “closest” meaning of the noun that the denominal is derived 212 from, as reported in the experiment in Chapter 7. Section D.2 gives the experimental stimuli used to obtain this column of ratings. Item Preferred Past REG IRREG Dist to Dist to Dist to Num Tense Form Verb 1 Verb 2 Noun la flied 4.25 3.93 5.75 2.286 1.7 lb flew 1.813 6.87 5.375 1.75 6 2a grandstanded 4.5 1.8 6 2.25 4.1 2b withstood 1.75 6.75 5.875 1 6 3a broadcast 3.937 6.063 5.875 1.75 2.2 3b cast 3.062 6.93 3.25 3.375 6 4a steeled 5.43 1.37 7.625 5.706 3.5 4b stole 1.62 6.93 4.5 1 6 5a lied 7 1 4.5 5.444 1 5b lay 2.12 5.62 1.125 2.886 6 6a ringed 5.06 2.62 4.125 5.312 1.5 6b rang 1.75 6.93 3.75 2.111 6 7a braked 5.87 1.18 7 5.062 1.6 7b broke 1.56 6.5 5.5 4.556 6 8a righted 5.81 1.37 8 5.857 2.5 8b wrote 1.06 6.81 4.375 1 6 9a spitted 3.75 2.5 6.5 6 2.8 9b spat 2.25 5.81 4.625 1.688 6 10a sinked 2.81 2.5 5.5 4.889 3.5 10b sank 2.06 6.56 3.875 1.25 6 11a reeded 4.125 1.06 7.625 5.375 3.8 lib read 1.06 7 3.125 3.353 6 12a out-Go'd 3.5 1.45 6.25 4.171 5.2 12b out-went 1.56 4.13 4.875 3.75 6 13a waked 4.875 2.312 7.125 5.111 2.9 13b woke 2 6.125 3.75 1.812 6 14a byed 4.7 1.9 7.875 4.125 2.1 14b bought 1.25 6.93 4.375 2.5 6 15a shedded 4.68 2.688 6 4.625 3.2 15b shed 2.31 5.563 4.125 1.429 6 16a drinked 2.06 1.75 3.875 2.375 3.8 16b drank 1.65 6.75 4.375 1.5 6 17a high-sticked 5.812 1.938 5.75 4.25 1.6 17b re-stuck 1.625 5.875 2.625 1 6 18a interleaved 5.186 1.688 7.625 5.375 1.3 18b over-left 1.313 3.313 4.5 1.071 6 213 Item Preferred Past Num Tense Form REG IRREG Dist to Dist to Dist to Verb 1 Verb 2 Noun 19a out-Big- Sleeped 2.94 2.563 7.875 5.75 5.2 19b out-overslept 1.63 5.625 4.25 1.889 6 20a three-hit 3.125 4.438 5.875 3.444 4.5 20b underhit 1.625 5.563 3.375 1.812 6 21a two-setted 4.313 3.188 7.25 6 3.9 21b unset 1.75 5.125 3.75 1.938 6 22a Lucky-Striked 3.37 2.875 6.125 5.111 4.4 22b understruck 2.313 5.312 4 2.02 6 23a out-Hurted 3.813 3.563 8 4.114 5.3 23b out-hurt 1.87 3.688 4.375 2.889 6 24a out-blew 2.813 3 6.125 4.5 3.7 24b outblew 2.188 6.438 2.5 1.929 6 25a William-Telled 5 1.5 7.75 5.5 4.5 25b story-told 1.625 2.625 3 1.071 6 26a out-flung 3.313 3.625 5.625 4.824 3.5 26b out-flung 3.75 4.875 3.25 1.89 6 27a double-taked 3.81 3.06 6.5 5.667 1.9 27b double-took 1.5 3.94 3.625 3.125 6 28a lighted-out 3.813 2.25 7.625 5.571 5 28b lit-out 2.625 3.625 6 3 6 29a out-meaned 3.18 1.44 5.875 5.5 4.5 29b out-meant 1.37 3.81 5.375 1.857 6 30a out-shrinked 3.875 2.563 6.625 5.625 4.5 30b out-shrank 2.5 3.688 5 4.214 6 31a line-drived 5.563 2.938 6 2.375 1.9 31b line-drove 2 4.125 4.25 1.75 6 32a no-d 5.5 1.19 8 5.75 2.2 32b knew 3.19 4 3.5 2.222 6 33a shaked-out 4.75 2 6.125 5.778 4.1 33b shaked-out 4.06 4.06 3.625 2.812 6 34a de-flea'd 5.56 1.56 8 5.75 3.2 34b re-fled 1.81 3.19 4.75 1.667 6 35a meeted 4 1.31 5.5 4.75 4.5 35b met 2.38 4.19 4.625 3.444 6 36a de-beeted 4.938 2.5 7.625 5 3.5 36b re-beat 1.5 3.875 3.25 1.333 6 37a splitted 2.75 2.65 6.5 4.714 4.5 37b split 3.437 4.25 3.75 4.787 6 214 D.2 Stimuli for experiment Following are the stimuli for the experiment described in Chapter 7, in which subjects were asked to rate how similar or dissimilar denominals are to the noun they are derived from. The similarity ratings are given before the definitions below and range from 1 (very similar) to 6 (very dissimilar). 1. Wade Boggs is at the plate. As you may know, he has a bad habit of hitting fly balls into center field .... It’s a hit — he flies out to center field! 1.7 fly: a baseball hit high into the air 2. The quarterback has a bad habit of trying to impress the crowd in the grandstand rather than concentrating on the game. If he grandstands to the crowd too often, he'll get sacked. 4.1 grandstand: a roofed stand for spectators at a racecourse or stadium 3. Dan Rather usually does the broadcast for CBS on weekdays. He broadcasts the news every night I watch the news. 2.2 broadcast: the transmission of sound or images by radio or television 4. Brian will need nerves of steel to face the ordeal. But if he steels himself for it, he should be able to make it. 3.5 steel: a commercial iron 5. Sam always tells lies when he wants people to think he's better than he really is. For example, he lies about being a good golfer. 1.0 lie: an assertion or something known or believed by the speaker to be untrue with intent to deceive 215 6. The general is going to order his artillery to form a ring around the city. But if he rings the city with artillery, then a battle is certain. 1.5 ring: an encircling arrangement 7. Truck drivers often need to apply their brakes suddenly to avoid an accident. If the driver in front of you brakes suddenly, watch out! 1.6 brake: a device for arresting the motion of a mechanism 8. After repairing your boat, she set it upright. She usually rights the boat when she's done, but with mine she couldn't. 2.5 upright: the state of being perpendicular 9. He always puts the pig on spit to roast it over a fire. After he spits the pig, he begins husking the corn. 2.8 spit: a slender pointed rod for holding meat over a fire 10. When guests come, I hide the dirty dishes by putting them in boxes or in the empty sink. If Bob and Margaret come early, I'll quickly box the plates and sink the glasses. 3.5 sink: a stationary basin connected with a drain and usually a water supply for washing and drainage 11. Gilligan tied the posts together with a reed. The reason he reeds the posts together is in order to build a raft. 3.8 reed: any of various tall grasses with slender often prominently jointed stems that grow especially in wet areas 12. There is a board game in Japan called "Go" which is very famous and popular. But next year, if chess becomes so popular, it may out-Go Go. 5.2 Go: Japanese game played with stones on a board 216 13. Funeral directors often have to choose whether to conduct funerals, wakes, or memorial services when families cannot decide. Although this year they still funeraled most of the dead, next year I predict they will wake a larger number than ever before. 2.9 wake: a watch held over the body of a dead person prior to burial and sometimes accompanied by festivity 14. The pennant winners sometimes don't have to play in the first round of the playoffs; they get a bye into the second round. But often, when they are byed into the second round, the spectators get mad. 2.1 bye: the position of a participant in a tournament who has no opponent after pairs are drawn and advances to the next round without play 15. The farmer put all his equipment in the shed for the winter. He sheds equipment so that it doesn't get rusty from the snow. 3.2 shed: low-slung building or lean-to 16. It's always a good idea to relax your clients by making sure that are supplied with food and drink at all times. That why when MacTavish arrives, I immediate snack him, drink him and feed him. 3.8 drink: liquid suitable for swallowing 17. Gretzky got a penalty for hitting the goalie with a high stick. The next time Gretzky high-sticks the goalie, he'll be side-lined for the season. 1.6 high-stick: a penalty called in ice hockey when a player carries the blade of the stick at an illegal height 18. The best way to make lasagna is to interleave the noodles and the spinach leaves. You'll be sure to like the lasagna if I interleave the noddles and spinach carefully. 1.3 interleave: to arrange in or as if in alternate layers 217 19. Though the Big Sleep is a very popular cult movie, Citizen Kane has been accumulating quite a cult following of its own. Citizen Kane may even out-Big- Sleep the Big Sleep! 5.2 Big Sleep: the movie “Big Sleep” 20. Pitcher Roger Clemens allowed the Orioles only three hits in the entire game. But what else is new — he constantly three-hits the Orioles! 4.5 hit: a blow striking an object aimed at 21. Martina Navratilova beat Chris Evert in two sets. The next time she two- sets Chris, we're going to see one angry lady on the court. 3.9 set: a division of a tennis match won by the side that wins at least six games beating the opponent by two games or by winning a tiebreaker 22. These billboards advertising every brand of cigarettes, from Marlboros to Lucky Strikes, have been in our faces the whole trip. W e’ ve been Lucky-Striked so many times, we know the ad by heart. 4.4 Lucky Strike: a brand of cigarettes 23. The actor William Hurt has a reputation for attracting the most female autograph-seekers on the set during shooting, but Robert Redford also attracts large crowds. I bet that soon Redford might even out-Hurt Hurt. 5.3 William Hurt: the actor 24. Both boxers managed to land heavy blows on each other. But Tyson usually out-blows his opponents, and this time was no exception. 3.7 blow: a forcible stroke delivered with a part of the body 25. He put an apple on his son's head, and got ready to pull a W illiam Tell. If he William-Tells that apple without touching a hair, I'll faint. 4.5 William Tell: the folk hero 218 26. Janet was fed up with her husband Sam's recurring flings with various women, four at last count. For revenge she got a job where could meet lots of men and immediately began to try to out-fling that husband of hers. 3.5 fling: an extra-marital affair 27. In that movie, Charlie Chaplin does the best double-takes I've ever seen. He double-takes every time the cop comes over to him. 1.9 double-take: a delayed reaction to a surprising or significant situation 28. I've had so many light beers I'm sick of them. I don't think I could possibly drink another one. As far as light beers are concerned, I'm totally ttghted-out. 5.0 light beer: beer with a low alcohol content 29. The best football teams are those that are meaner on the field than their opponents. The Dolphins are undefeated because they always out-mean the rest of the teams in the NFL. 4.5 mean: characterized by malice; harassing 30. Sam is always acting like a shrink, psychoanalyzing half the people at the table. But tomorrow night we're having Jonathan over, and I bet he'll analyze ALL the people at the table. Yep, he'll out-shrink even Sam. 4.5 shrink: psychiatrist 31. The new player hit a line drive to center field. The next time he line- drives in this game, he’ ll be off the team. 1.9 line-drive: a batted baseball hit in a nearly straight line usually not far above the 32. My 6 -year-old son will yell "no" at me 10 or 20 times when I try to put him to bed. But to my wife, he "no's" all day long to everything she asks him. 2 . 2 no: an instance of refusing or denying by the use of the word no 219 33. I've had so many milkshakes, thickshakes, and chocolate shakes I couldn't have another shake of any kind. I’ m completely shaked-out (or shook-out). (Select and judge whichever of these two you feel is most appropriate for the sentence.) 4.1 shake: a milk shake 34. When the dog came around scratching incessantly in the house, Jim decided to get rid of the dog’s fleas once and for all. But when he tried to de-flea the dog, the dog ran across the street and was hit by a truck. 3.2 flea: a wingless, bloodsucking insect 35. I’ ve been to so many track-meets, I couldn't stand the thought of another one. I'm completely meeted-out (or met-out). (Select and judge whichever of these two you feel is most appropriate for the sentence.) 4.5 track-meet: an assembly of competitive sports 36. There’s a trick to making beet stew. In order to make a perfect beet stew, you have to pick out all the beets before you serve it. So if you don't de-beet your stew, you'll have a lumpy mess. 3.5 beet: a swollen root used as a vegetable 37. I've had a banana split every day this week and I couldn't possibly eat another one. I'm completely splitted-out (or split-out). (Select and judge whichever of these two you feel is most appropriate for the sentence.) 4.5 banana split: an ice cream dessert made with bananas, nuts, and syrups 220 D.3 Training set 1 The format of all items in this lists is as follows: orthographic form of present tense, phonological representation of present tense, present tense raw frequency, orthographic form of past tense, phonological representation of past tense, past tense raw frequency, distance to closest verb definition, and distance to closest noun definition. The ' character in the phonological representation is a stress marker and the - character is a syllable boundary. The training set consists of 20 irregular verbs, 20 denominals, 20 deverbals, and 367 regular verbs: Irregular verbs: hit h' It 126 hit h' It 38 1. 0 6 . 0 split spl'It 26 split spl'It 5 1.0 6.0 meet m' it 339 met m ' Et 80 1.0 6.0 light 1' Yt 9 lit 1' It 9 1. 0 6.0 wake w * ek 45 woke w 1 ok 14 1.0 6.0 break br1 ek 228 broke br' ok 66 1.0 6.0 shrink Sr'IGk 12 shrank Sr'§Gk 2 1.0 6.0 sink s ' IGk 40 sank s ' @Gk 18 1 . 0 6.0 ring r' IG 39 rang r' @G 21 1.0 6 . 0 know n 1 o 1473 knew n ' u 394 1.0 6.0 blow bl 'o 52 blew bl 'u 12 1.0 6 . 0 fly fl1 Y 92 flew fl 'u 27 1.0 6.0 write r' Yt 561 wrote r' ot 179 1. 0 6.0 drive dr 1 Yv 203 drove dr' ov 58 1.0 6.0 stand st' @nd 468 stood st' Ud 198 1.0 6.0 lie 1' Y 211 lay 1 'e 81 1. 0 6.0 leave 1' iv 650 left 1 'Eft 157 1. 0 6 . 0 sleep si' ip 18 slept si'Ept 18 1.0 6.0 mean m 1 in 376 meant m ' Ent 70 1.0 6.0 tell t 'El 759 told t ' old 286 1.0 6.0 221 Denominal verbs: hit h 1 It 126 hittedDN h'I-t |d 5 3.5 4 . split spl'It 26 splittedDN spl'I-t|Id 5 4.5 4 . meet m 1 it 339 meetedDN m 'i-t|d 5 5.0 4 . light 1 1 Yt 9 lightedDN 1'Y-t|d 5 5.5 5 . wake w ' ek 45 wakedDN w ' ekt 5 5 . 0 3 . break br ' ek 228 breakedDN br1ekt 5 5 . 0 1 . shrink Sr'IGk 12 shrinkedDN Sr'IGkt 5 5.5 4 . sink s ' IGk 40 sinkedDN s'IGkt 5 5.0 3. ring r' IG 39 ringedDN r' IGd 5 5.5 1 . know n' o 1473 knowedDN n ' od 5 6.0 2. blow bl 'o 52 blowedDN bl 'od 5 4.5 3. fly fl ' Y 92 flyedDN fl 'Yd 5 2.5 1 . write r 1 Yt 561 writedDN r'Y-t | d 5 6.0 2 . drive dr' Yv 203 drivedDN dr'Yvd 5 2.5 2. stand st'0nd 468 standedDN st'0n-d| d 5 2.5 4. lie 1' Y 211 liedDN 1 'Yd 5 5.5 1 . leave 1' iv 650 leavedDN 1' ivd 5 5.5 1 . sleep si1 ip 18 sleepedDN si'ipt 5 6.0 5. mean m 1 in 376 meanedDN m ' ind 5 5.5 4 . tell t 'El 759 telledDN t 'Eld 5 5.5 4 . Deverbal verbs: hit h' It 126 hitDV h' It 5 2.0 6. split spl'It 26 splitDV spl'It 5 5.0 6. meet m ' it 339 metDV m'Et 5 3.5 6. light 1' Yt 9 litDV 1 ' It 5 3.0 6. wake w ' ek 45 wokeDV w ' ok 5 2 . 0 6. break br ' ek 228 brokeDV br ' ok 5 4.5 6. shrink Sr'IGk 12 shrankDV Sr'0Gk 5 4 . 0 6. sink s ' IGk 40 sankDV s ' 0Gk 5 1.5 6. ring r' IG 39 rangDV r' 0G 5 2 . 0 6 . know n ' o 1473 knewDV n ' u 5 2 . 0 6 . blow bl 'o 52 blewDV bl 'u 5 2.0 6. fly fl' Y 92 flewDV fl 'u 5 2 . 0 6. write r' Yt 561 wroteDV r' ot 5 1.0 6. drive dr ' Yv 203 droveDV dr' ov 5 2.0 6. stand st'0nd 468 stoodDV st' Ud 5 1. 0 6. lie 1' Y 211 layDV 1 'e 5 3.0 6 . leave 1' iv 650 leftDV 1 'Eft 5 1.0 6 . sleep si' ip 18 sleptDV si'Ept 5 2.0 6. 5 5 5 0 0 5 5 5 5 0 5 5 5 0 0 0 5 0 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 222 mean m 1 in 376 meantDV m ' Ent 5 2.0 6 . tell t 'El 759 toldDV t ' old 5 1.0 6. Regular ache verbs: ' ek 11 ached ' ekt 3 1. 0 6 . blink bl'IGk 13 blinked bl'IGkt 6 1.0 6. bow b 1 o 13 bowed b ' od 6 1.0 6. cite s' Yt 44 cited s'Y-t|d 11 1. 0 6. cry kr'Y 64 cried kr' Yd 25 1.0 6 . die d'Y 183 died d'Yd 63 1.0 6 . dive d' Yv 11 dived d ' Yvd 4 1. 0 6. flow fl'o 40 flowed f 1' od 4 1.0 6 . glow gl1 o 19 glowed gl' od 6 1.0 6. greet gr' it 28 greeted gr'i-t|d 15 1.0 6. heave h 1 iv 10 heaved h'ivd 4 1.0 6. lean 1' in 61 leaned 1' ind 37 1. 0 6. leap 11 ip 33 leaped 1' ipt 18 1.0 6. owe ' o 34 owed ' od 12 1. 0 6. rake r 1 ek 6 raked r' ekt 3 1. 0 6. show S'o 640 showed S 'od 138 1.0 6. sigh s ' Y 28 sighed s ' Yd 22 1.0 6. smell sm1 El 43 smelled sm'Eld 15 1. 0 6. swell sw' El 20 swelled sw'Eld 3 1. 0 6. thrive Tr ' Yv 11 thrived Tr'Yvd 4 1. 0 6. treat tr' it 122 treated tr'i-tId 11 1.0 6. try tr 1 Y 472 tried tr' Yd 120 1. 0 6. wink w ' IGk 18 winked w 'IGkt 7 1.0 6. yell y'El 31 yelled y 'Eld 21 1.0 6. add ' @d 291 added ' @-dId 81 1. 0 6. aid 1 ed 46 aided 1 e-dId 2 1.0 6. aim ' em 42 aimed ' emd 10 1.0 6. ask 1 @sk 612 asked ' @skt 300 1.0 6. bang b' 0G 10 banged b ' @Gd 4 1.0 6. bar b ' ar 17 barred b ' ard 2 1.0 6. bawl b'cl 3 bawled b'cld 2 1. 0 6. beg b ' Eg 34 begged b 1 Egd 13 1. 0 6. belch b'EIC 5 belched b'ElCt 4 1. 0 6. blame bl' em 32 blamed bl'emd 5 1. 0 6. blaze bl 'ez 11 blazed bl'ezd 2 1. 0 6. blend bl'End 9 blended bl'En-dId 2 1. 0 6. blurt bl 'Rt 3 blurted bl'R-tId 2 1. 0 6. blush bl * "S 12 blushed bl' "St 4 1. 0 6. bob b' ab 5 bobbed b ' abd 2 1. 0 6. bolt b ' olt 10 bolted b'ol-tId 3 1. 0 6. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 223 boost b ' ust 11 boosted b 'us-tId 2 1.0 6.0 bore b ' or 26 bored b ' ord 3 1. 0 6.0 bounce b ' Wns 28 bounced b 'Wnst 13 1.0 6.0 bound b ' Wnd 13 bounded b 'Wn-dId 2 1. 0 6.0 breathe br' iD 31 breathed br'iDd 9 1.0 6.0 bribe br' Yb 4 bribed br'Ybd 2 1. 0 6.0 brush br 1 AS 38 brushed br'ASt 14 1. 0 6.0 bump b 1 Amp 9 bumped b 1Ampt 2 1. 0 6.0 burn b'Rn 103 burned b ' Rnd 15 1.0 6.0 buzz b* Az 9 buzzed b 1 Azd 2 1.0 6.0 call k' cl 627 called k'cld 165 1. 0 6.0 cease s 1 is 32 ceased s 1 ist 8 1. 0 6.0 change C 1 en J 225 changed C'enJd 26 1. 0 6.0 chant C'Qnt 9 chanted C'0n-t|d 3 1. 0 6.0 charge C 1 ar J 82 charged C'arJd 17 1. 0 6.0 chat C'@t 6 chatted C'0-t|d 2 1. 0 6 . 0 chew C'u 16 chewed C'ud 4 1. 0 6 . 0 chill C ' 11 12 chilled C' lid 2 1. 0 6.0 choke C 'ok 22 choked C ' okt 4 1.0 6.0 claim kl1 em 99 claimed kl1emd 25 1. 0 6.0 clap kl' 0p 9 clapped kl'0pt 4 1. 0 6.0 climb kl' Ym 65 climbed kl'Ymd 41 1. 0 6.0 clip kl' Ip 3 clipped kl'Ipt 2 1. 0 6.0 close kl' os 174 closed kl'ost 39 1. 0 6.0 clutch kl' AC 17 clutched kl'ACt 5 1. 0 6.0 cock k'ak 6 cocked k' akt 4 1. 0 6.0 cook k'Uk 50 cooked k'Ukt 2 1.0 6.0 cough k 1 cf 8 coughed k1 eft 2 1.0 6.0 count k 1 Wnt 65 counted k'Wn-tId 11 1. 0 6.0 crack kr' @k 41 cracked kr'0kt 11 1. 0 6.0 crash kr' @S 23 crashed kr'0St 7 1. 0 6.0 crawl kr' cl 37 crawled kr'cld 17 1. 0 6.0 creak kr' ik 11 creaked kr'ikt 6 1. 0 6.0 cringe kr1 InJ 5 cringed kr'InJd 2 1. 0 6.0 croon kr' un 3 crooned kr'und 2 1. 0 6.0 crouch kr'WC 22 crouched kr1WCt 10 1. 0 6.0 crowd kr 1 Wd 39 crowded kr'W-d|d 8 1. 0 6.0 crush kr' AS 17 crushed kr1ASt 2 1. 0 6.0 curl k'Rl 17 curled k'Rld 6 1. 0 6.0 dance d1 @ns 59 danced d ' 0nst 8 1. 0 6.0 dash d' @S 14 dashed d' 0St 4 1. 0 6.0 deem d1 im 17 deemed d' imd 3 1.0 6 . 0 dine d' Yn 32 dined d ' Ynd 2 1.0 6.0 dip d' Ip 6 dipped d' Ipt 3 1.0 6.0 dodge d' a J 8 dodged d 1 a Jd 2 1.0 6.0 doubt d ' Wt 28 doubted d'W-t|d 9 1.0 6.0 doze d 1 oz 8 dozed d 1 ozd 4 1. 0 6.0 drain dr' en 16 drained dr 1 end 3 1.0 6.0 224 drape dr 1 ep 9 draped dr'ept 3 1.0 6.0 drawl dr' cl 4 drawled dr 1cld 3 1.0 6.0 dress dr 1 Es 67 dressed dr'Est 10 1. 0 6.0 drip dr 1 Ip 14 dripped dr'Ipt 5 1.0 6.0 drown dr'Wn 14 drowned dr 1Wnd 4 1. 0 6.0 drum dr 1 ^m 6 drummed dr'^md 2 1. 0 6 . 0 duck d' Ak 15 ducked d' "kt 5 1.0 6.0 dump d ' / 'mp 14 dumped d '^mpt 7 1.0 6.0 earn 'Rn 45 earned ' Rnd 9 1. 0 6.0 fade f'ed 24 faded f'e-d|d 8 1. 0 6.0 fail f'el 142 failed f'eld 52 1. 0 6.0 fan f 1 @n 13 fanned f' ©nd 4 1. 0 6.0 file f' Y1 87 filed f' Yld 12 1. 0 6.0 fill f' 11 184 filled f * lid 31 1 . 0 6.0 fix f' Iks 109 fixed f'Ikst 12 1 . 0 6.0 flare fl1 @r 9 flared fl'0rd 3 1. 0 6.0 flash fl1 0S 28 flashed fl'0St 12 1 . 0 6.0 flaunt fl'cnt 3 flaunted f1 * cn-t1d 2 1.0 6.0 flex fl'Eks 3 flexed fl'Ekst 2 1 . 0 6.0 flick fl' Ik 6 flicked fl'Ikt 5 1.0 6.0 flip fl' Ip 8 flipped f1'Ipt 3 1. 0 6.0 flock fl 'ak 4 flocked f1'akt 2 1. 0 6.0 flog f 1' ag 3 flogged f1'agd 2 1. 0 6.0 flop fl 'a p 7 flopped f1'apt 6 1. 0 6.0 fold fold 20 folded f'ol-d|d 5 1.0 6 . 0 force f' ors 124 forced f'orst 19 1. 0 6.0 frame f r' em 23 framed fr'emd 2 1. 0 6.0 frown fr'Wn 22 frowned fr'Wnd 7 1.0 6.0 fuse f y' uz 6 fused fy'uzd 2 1. 0 6.0 gain g' en 77 gained g' end 18 1. 0 6.0 gape g'ep 5 gaped g' ept 3 1. 0 6.0 gasp g' 0sp 11 gasped g'0 spt 5 1.0 6.0 gaze g'ez 21 gazed g' ezd 7 1.0 6.0 glance gl10ns 43 glanced gl'0nst 25 1. 0 6.0 glare gl1 0r 13 glared gl'0rd 5 1. 0 6.0 glint gl'Int 7 glinted gl'In-t|d 2 1. 0 6.0 gloat gl' ot 3 gloated gl1o-t|d 2 1.0 6.0 grab gr' 0b 37 grabbed gr'0bd 19 1.0 6.0 grant gr'0nt 78 granted gr'0n-t|d 7 1.0 6.0 grasp gr'0sp 23 grasped gr10spt 5 1.0 6.0 grin gr' In 38 grinned gr'Ind 29 1.0 6.0 grip gr' Ip 19 gripped gr'Ipt 9 1. 0 6.0 groan gr' on 4 groaned gr'ond 3 1. 0 6.0 grope gr' op 12 groped gr'opt 7 1.0 6.0 growl gr 'W1 5 growled gr'Wld 4 1. 0 6 . 0 grunt gr'^nt 11 grunted gr 1 ' vn-t I d 9 1. 0 6.0 guess g ' Es 77 guessed g ' Est 7 1. 0 6.0 gulp g' "lp 3 gulped g'''Ipt 3 1.0 6.0 225 gush g AS 5 gushed g1 ASt 5 1.0 6.0 haul h cl 17 hauled h 1 cld 3 1. 0 6.0 haunt h cnt 13 haunted h'cn-t d 2 1.0 6.0 heal h il 11 healed h'ild 3 1.0 6.0 help h Elp 352 helped h 1Elpt 40 1. 0 6.0 hiss h Is 4 hissed h ' 1st 2 1. 0 6.0 hop h ap 10 hopped h' apt 5 1. 0 6.0 hope h op 164 hoped h ' opt 33 1. 0 6.0 hug h Ag 11 hugged h1 Agd 2 1. 0 6.0 hunt h Ant 44 hunted h 'An-t d 2 1. 0 6.0 hurl h R1 12 hurled h 1 Rid 3 1.0 6.0 jab J @b 4 jabbed J' 0bd 2 1. 0 6.0 jerk J Rk 16 jerked J'Rkt 12 1. 0 6.0 join J On 139 joined J'Ond 33 1. 0 6.0 judge J AJ 42 judged J1 A Jd 3 1.0 6.0 jump J Amp 58 jumped J1Ampt 32 1. 0 6.0 kick k Ik 34 kicked k' Ikt 10 1. 0 6.0 kill k 11 153 killed k 1 lid 34 1. 0 6.0 kiss k Is 31 kissed k' 1st 15 1. 0 6.0 kneel n il 21 kneeled n * ild 2 1. 0 6.0 knock n ak 47 knocked n * akt 17 1 . 0 6.0 lack 1 @k 70 lacked 1' 0kt 15 1. 0 6.0 lapse 1 §ps 7 lapsed 1'0pst 2 1 . 0 6.0 lash 1 @S 9 lashed 1 ' 0St 3 1. 0 6.0 last 1 0st 48 lasted 1'0s-t d 11 1 . 0 6.0 laugh 1 0 f 89 laughed 1 1 0ft 46 1 . 0 6 . 0 launch 1 cnC 31 launched 11cnCt 3 1. 0 6.0 leak 1 ik 5 leaked 1' ikt 4 1. 0 6.0 learn 1 Rn 254 learned 1 'Rnd 54 1. 0 6.0 leer 1 Ir 7 leered 11 Ird 3 1. 0 6.0 lick 1 Ik 14 licked 1 ' Ikt 7 1. 0 6.0 lift 1 Ift 69 lifted 1'If-t d 34 1. 0 6.0 like 1 Yk 294 liked 1' Ykt 45 1. 0 6.0 list 1 1st 59 listed 1•Is-t d 11 1. 0 6.0 live 1 Iv 472 lived 1' Ivd 72 1. 0 6 . 0 loathe 1 oD 5 loathed 1 'oDd 4 1. 0 6.0 look 1 Uk 910 looked 1 'Ukt 327 1. 0 6.0 loom 1 urn 15 loomed 1' umd 3 1.0 6.0 lounge 1 WnJ 8 lounged 11WnJd 3 1. 0 6.0 lug 1 Ag 6 lugged 1' Agd 3 1.0 6.0 lunge 1 AnJ 5 lunged 1'AnJd 4 1. 0 6.0 lurch 1 RC 7 lurched 1 'RCt 5 1. 0 6.0 lurk 1 Rk 8 lurked 1 'Rkt 3 1.0 6.0 march m arC 37 marched m 'arCt 6 1. 0 6.0 mark m ark 126 marked m 'arkt 15 1. 0 6.0 match m 0C 77 matched m ' 0Ct 2 1. 0 6.0 melt m Elt 32 melted m'El-t d 2 1. 0 6.0 miss m Is 95 missed m' 1st 17 1.0 6.0 mount m ' Wnt 62 mounted m 1Wn-t|d 13 1.0 6.0 mourn m ' orn 12 mourned m 1ornd 2 1.0 6.0 move m ' uv 447 moved m 1 uvd 138 1.0 6.0 muse my ' uz 5 mused my * uzd 4 1.0 6.0 nod n ' ad 62 nodded n 'a-d|d 49 1.0 6.0 note n ' ot 165 noted n 'o-t|d 27 1. 0 6.0 nudge n 1 "J 4 nudged n' "Jd 2 1. 0 6.0 opt ' apt p ' ent 2 opted 'ap-t|d 2 1.0 6.0 paint 95 painted p 'en-t|d 9 1. 0 6.0 pass p ' @s 298 passed p ' @st 91 1.0 6.0 peck p'Ek 3 pecked p'Ekt 2 1.0 6.0 peel p' il 14 peeled p' ild 2 1.0 6.0 pick p' Ik 143 picked p ’ Ikt 51 1. 0 6.0 pile p' Y1 26 piled p ' Yld 7 1. 0 6.0 pinch p 1 InC 11 pinched p 1 met 2 1. 0 6.0 pitch p 1 IC 20 pitched p ' ICt 4 1. 0 6.0 plant pi'@nt 18 planted pi'@n-t|d 5 1.0 6.0 plead pi' id 24 pleaded pi'i-d|d 7 1.0 6.0 please pi ' iz 91 pleased pi1 izd 11 1. 0 6.0 pluck pi' "k 6 plucked pi1"kt 4 1. 0 6.0 plump pi1 "mp 2 plumped pi'"mpt 2 1. 0 6 . 0 plunge pi'"nJ 20 plunged pi'"n Jd 10 1.0 6 . 0 poise p 1 Oz 12 poised p ' Ozd 2 1.0 6.0 poke p ' ok 13 poked p 1 okt 3 1. 0 6 . 0 pop p 1 ap 17 popped p 1 apt 6 1. 0 6.0 pose p ' oz 20 posed p 1 ozd 3 1. 0 6.0 post p ' ost 13 posted p 1os-tId 3 1.0 6.0 pound p 1 Wnd 11 pounded p 1Wn-d|d 4 1.0 6.0 pour p 1 Ur 48 poured p'Urd 21 1.0 6.0 praise pr ' ez 21 praised pr'ezd 8 1. 0 6.0 pray pr 1 e 30 prayed pr' ed 8 1. 0 6.0 preach pr ' iC 26 preached pr1iCt 6 1. 0 6.0 press pr' Es 82 pressed pr'Est 12 1. 0 6.0 prompt pr1ampt 11 prompted pr1amp-tId 3 1. 0 6.0 prove pr 1 uv 156 proved pr'uvd 48 1. 0 6.0 puff p' "f 8 puffed p' "ft 2 1. 0 6.0 pull p'Ul 145 pulled p 'Uld 54 1. 0 6.0 pump p ' "mp 12 pumped p 1"mpt 2 1. 0 6.0 push p'US 102 pushed p 'USt 31 1. 0 6.0 quote kw' ot 48 quoted kw'o-tId 8 1. 0 6.0 race r 1 es 30 raced r' est 11 1.0 6.0 raise r'ez 188 raised r' ezd 42 1. 0 6.0 rap r' @p 6 rapped r' @pt 2 1. 0 6.0 reach r 'iC 324 reached r'iCt 106 1. 0 6.0 rear r'lr 14 reared r' Ird 7 1. 0 6.0 reel r'il 3 reeled r 'ild 2 1. 0 6.0 rest r' Est 77 rested r'Es-tId 12 1.0 6.0 rip r' Ip 14 ripped r' Ipt 5 1.0 6.0 227 roar r' or 27 roared r' ord 18 1.0 6.0 rob r' ab 15 robbed r' abd 2 1.0 6.0 rock r ' ak 20 rocked r' akt 7 1.0 6.0 roll r'ol 88 rolled r' old 34 1.0 6.0 rouse r 1 Wz 5 roused r’Wzd 2 1 . 0 6.0 rub r - -b 34 rubbed r' "bd 13 1. 0 6.0 rush r' AS 42 rushed r ’ ASt 20 1. 0 6.0 sag s ' 0g 11 sagged s 1 0gd 3 1. 0 6.0 save s ' ev 121 saved s 1 evd 11 1.0 6.0 scan sk' @n 17 scanned sk'0nd 9 1. 0 6.0 scare sk' Er 26 scared sk'Erd 3 1. 0 6.0 scowl sk 1W1 6 scowled sk'Wld 4 1.0 6.0 scrape skr'ep 18 scraped skr'ept 6 1. 0 6.0 scratch skr'@C 22 scratched skr10Ct 4 1 . 0 6.0 scrawl skr'cl 5 scrawled skr'cld 3 1. 0 6.0 scream skr'im 40 screamed skr'imd 14 1. 0 6 . 0 screech skr•iC 12 screeched skr1iCt 5 1.0 6.0 search s 'RC 41 searched s 'RCt 7 1. 0 6.0 seem s ' im 831 seemed s 1 imd 311 1 . 0 6.0 seize s ' iz 33 seized s ' izd 12 1 . 0 6.0 serve s ' Rv 300 served s ' Rvd 52 1 . 0 6.0 shape S ' ep 34 shaped S ' ept 3 1 . 0 6.0 share S 'Er 105 shared S 'Erd 19 1 . 0 6.0 shave 5' ev 23 shaved S ' evd 4 1.0 6.0 shift S' Ift 47 shifted S'If-t1d 12 1.0 6.0 shock S'ak 23 shocked S 1 akt 2 1. 0 6.0 shout S 'Wt 77 shouted S'W-t | d 36 1. 0 6.0 shove S ' ~v 16 shoved S 1 Avd 8 1. 0 6.0 shriek Sr' ik 5 shrieked Sr•ikt 4 1. 0 6.0 shrill Sr' 11 3 shrilled Sr'Ild 2 1. 0 6.0 shrug Sr' /sg 18 shrugged Sr'~gd 18 1. 0 6.0 sift s ' Ift 3 sifted s ' If-t1d 2 1. 0 6.0 sip s ' Ip 10 sipped s ' Ipt 2 1. 0 6.0 size s ' Yz 5 sized s 1 Yzd 2 1. 0 6.0 skim sk' Im 8 skimmed sk1Imd 2 1. 0 6 . 0 skip sk 1 Ip 17 skipped sk'Ipt 6 1.0 6 . 0 slash sl'@S 18 slashed si•0St 6 1. 0 6.0 slip si' Ip 47 slipped si'Ipt 26 1.0 6.0 slump si''"mp 11 slumped si'Ampt 6 1.0 6.0 smack sm' @k 7 smacked sm10kt 2 1. 0 6.0 smash sm' @S 18 smashed sm10St 11 1.0 6.0 smile sm1 Y1 122 smiled sm'Yld 68 1.0 6.0 snap sn 1 0p 38 snapped sn10pt 17 1.0 6.0 snarl sn1arl 11 snarled sn'arid 8 1.0 6.0 snatch sn 1 0C 17 snatched sn10Ct 9 1. 0 6 . 0 sneak sn' ik 11 sneaked sn'ikt 4 1. 0 6.0 sniff sn1 If 10 sniffed sn'Ift 6 1. 0 6.0 snort sn1crt 6 snorted sn'cr-t|d 4 1. 0 6 . 0 228 soar s ' or 9 soared s ' ord 3 1 . 0 6.0 soothe s ' uD 8 soothed s ' uDd 2 1 . 0 6.0 spare sp' 0r 19 spared sp'0rd 3 1.0 6.0 spill sp 111 9 spilled sp'Ild 2 1. 0 6.0 splash spl'0S 7 splashed spl'0St 3 1. 0 6.0 spout sp' Wt 4 spouted sp'W-t|d 2 1. 0 6 . 0 sprawl spr'cl 18 sprawled spr'cld 2 1.0 6.0 sprint spr'Int 2 sprinted spr'In-t|d 2 1. 0 6.0 squat skw'at 12 squatted skw'a-tId 4 1.0 6.0 squeak skw1ik 2 squeaked skw'ikt 2 1.0 6.0 squeeze skw'iz 30 squeezed skw'izd 8 1. 0 6.0 stain st' en 45 stained st'end 4 1.0 6.0 stalk st! ck 9 stalked st'ekt 6 1. 0 6.0 stall st' cl 6 stalled st'cld 3 1. 0 6.0 stamp st'0mp 16 stamped st'0mpt 2 1. 0 6.0 stare st1 0r 95 stared st'0rd 58 1. 0 6.0 start st'art 386 started st'ar-tId 139 1. 0 6.0 stay st' e 195 stayed st' ed 60 1. 0 6.0 steer st' Ir 16 steered st!Ird 4 1.0 6.0 stem st' Em 22 stemmed st'Emd 3 1. 0 6.0 stir st 'R 39 stirred st ' Rd 7 1. 0 6 . 0 stoop st1 up 11 stooped st'upt 3 1. 0 6 . 0 stop st ' ap 240 stopped st'apt 103 1. 0 6.0 strain str1en 20 strained str'end 8 1. 0 6.0 stretch str'EC 61 stretched str'ECt 21 1.0 6.0 stroll str'ol 8 strolled str'old 4 1. 0 6.0 strut str'"t 4 strutted str'"-t|d 2 1. 0 6.0 suck s ’ "k 18 sucked s ' "kt 5 1.0 6.0 sue s' u 19 sued s ' ud 2 1.0 6.0 sulk s ' "lk 3 sulked s'"Ikt 2 1.0 6.0 surge s'RJ 10 surged s 'RJd 7 1.0 6.0 sway sw1 e 13 swayed sw' ed 7 1. 0 6.0 swerve sw1 Rv 5 swerved sw'Rvd 2 1. 0 6.0 swoop sw1 up 9 swooped sw'upt 4 1. 0 6.0 talk t 1 ck 275 talked t ' ekt 41 1. 0 6.0 tap t 1 0p 19 tapped t ' 0pt 2 1. 0 6.0 taste t 1 est 22 tasted t'es-t1d 7 1. 0 6 . 0 tease t 1 iz 8 teased t ' izd 2 1.0 6.0 tend t 'End 104 tended t'En-d1d 15 1. 0 6.0 test t 'Est 67 tested t 'Es-tId 3 1.0 6.0 thrash Tr ' 0S 4 thrashed Tr'0St 2 1.0 6.0 thrill Tr' 11 4 thrilled Tr1Ild 2 1.0 6.0 throb Tr' ab 5 throbbed Tr'abd 3 1.0 6.0 tick t' Ik 5 ticked t 1 Ikt 2 1. 0 6.0 tilt t1 lit 17 tilted t'Il-t|d 6 1. 0 6.0 tip t' Ip 7 tipped t ' Ipt 4 1. 0 6.0 tire t' Yr 4 6 tired t' Yrd 4 1. 0 6.0 top t' ap 13 topped t' apt 3 1. 0 6.0 229 toss t ' cs 41 tossed t ' est 22 1. 0 6.0 touch t ' AC 91 touched t' Act 24 1. 0 6.0 trace tr' es 36 traced tr'est 2 1.0 6.0 trail tr 1 el 17 trailed tr 1 eld 6 1. 0 6.0 tramp tr10mp 2 tramped tr'0mpt 2 1. 0 6.0 trip tr 1 Ip 6 tripped tr'Ipt 2 1. 0 6.0 trot tr' at 13 trotted tr'a-tId 5 1.0 6.0 trudge tr 1 "J 4 trudged tr' Jd 4 1.0 6.0 tuck t r Ak 9 tucked t 1 ''kt 4 1. 0 6.0 tug t ' ‘ "'g 4 tugged t'^gd 2 1. 0 6.0 turn t fRn 566 turned t 'Rnd 253 1. 0 6.0 twist tw*1st 34 twisted tw1Is-t|d 12 1. 0 6 . 0 twitch tw' IC 6 twitched tw1ICt 4 1.0 6.0 urge 'RJ 64 urged 1 RJd 21 1.0 6.0 veer v' Ir 8 veered v'Ird 3 1.0 6.0 wade w ' ed 4 waded w 1e-d|d 2 1.0 6.0 wag w 1 @g 4 wagged w ' 0gd 2 1.0 6.0 wail w ' el 8 wailed w ' eld 3 1.0 6.0 wait w ' et 263 waited w 'e-t | d 68 1. 0 6.0 walk w 1 1 ck 287 walked w ' ekt 143 1. 0 6.0 want w1 cnt 631 wanted w 'cn-t|d 204 1. 0 6.0 warn w' crn 62 warned w ' crnd 14 1. 0 6 . 0 wash w' cS 83 washed w ' cSt 10 1.0 6 . 0 watch w ' cC 209 watched w ' cCt 68 1. 0 6.0 wave w 1 ev 30 waved w ' evd 16 1.0 6.0 wax w 1 6ks 6 waxed w '0kst 3 1. 0 6.0 weigh w 1 e 33 weighed w 1 ed 11 1. 0 6 . 0 whack hw' 0 k 2 whacked hw'0kt 2 1. 0 6.0 whip hw' Ip 24 whipped hw'Ipt 7 1.0 6 . 0 whirl hw' R1 17 whirled hw * Rid 6 1. 0 6.0 wield w ' ild 4 wielded w 1il-dId 2 1. 0 6.0 wince w 1 Ins 5 winced w 1 Inst 4 1.0 6.0 wipe w' Yp 35 wiped w 1 Ypt 11 1.0 6.0 wish w' IS 161 wished w 1 ISt 52 1. 0 6.0 wrap r' 0p 23 wrapped r' 0pt 2 1.0 6 . 0 wrench r'EnC 4 wrenched r'EnCt 2 1. 0 6.0 yank y' 0Gk 7 yanked y'0Gkt 4 1.0 6.0 yearn y ’ Rn 4 yearned y' Rnd 2 1.0 6.0 yield y 1 ild 41 yielded y'il-dld 7 1.0 6.0 230 APPENDIX E: TRAINING AND GENERALIZATION SETS FOR SIMULATION IN CHAPTER 8 The format of all items in these lists is as follows: orthographic form of present tense, closest OE verb class, phonological representation of present tense, present tense raw frequency, orthographic form of past tense, phonological representation of past tense, past tense raw frequency. Note also that the /=/ character in the phonological representation is meant to encode the /ea/ diphthong. E.I Training set I The training set consists of 25 items for each of the 5 OE irregular verb classes as described in Chapter 8 . These are followed by 25 regular verbs. Class I verbs: bid I bid 100 bad bad 100 glid I glid 100 glad glad 100 rid I rid 100 rad rad 100 slid I slid 100 slad slad 100 strid I strid 100 strad strad 100 bit I bit 100 bat bat 100 slit I slit 100 slat slat 100 231 smit I smit 100 smat smat 100 wit I wit 100 wat wat 100 wlit I wlit 100 wlat wlat 100 writ I writ 100 wrat wrat 100 d ig I klig 100 clag klag 100 drig I drig 100 drag drag 100 grip I grip 100 grap grap 100 rip I rip 100 rap rap 100 hrip I hrip 100 hrap hrap 100 ship I Sip 100 shap Sap 100 sip I sip 100 sap sap 100 stric I strik 100 strac strak 100 swic I swik 100 swac swak 100 lit I lit 100 lat lat 100 writ I writ 100 wrat wrat 100 snit I snit 100 snat snat 100 rib I rib 100 rab rab 100 spik I spik 100 spak spak 100 Class II verbs • breth II breT 100 breath br=T 100 f leth II fleT 100 fleath f 1=T 100 zeoth II zeT 100 zeath z=T 100 greth II greT 100 greath gr=T 100 hleth II hleT 100 hleath hl=T 100 reth II reT 100 reath r=T 100 sheth II SeT 100 sheath S=T 100 brew II brew 100 breaw br=w 100 chew II Cew 100 cheaw C - w 100 hrew II hrew 100 hreaw hr=w 100 ches II Ces 100 cheas C=s 100 dres II dres 100 dreas dr=s 100 f res II fres 100 f reas fr=s 100 les II les 100 leas l=s 100 hres II hres 100 hreas hr=s 100 shes II Ses 100 sheas S = s 100 cref II kref 100 creaf kr=f 100 dref II dref 100 dreaf dr=f 100 zef II zef 100 zeaf z=f 100 drev II drev 100 dreav dr=v 100 lev II lev 100 leav l=v 100 clef II kief 100 cleaf kl=f 100 rew II rew 100 reaw r=w 100 232 sheth II SeT 100 sheath S=T 100 bew II bew 100 beaw b=w 100 Class III.l verbs: berst III. 1 berst 100 baerst b@rst 100 cerst III. 1 kerst 100 caerst k@rst 100 sterst III. 1 sterst 100 staerst st@rst 100 merst III. 1 merst 100 maerst m§rst 100 terst III. 1 terst 100 taerst t@rst 100 nerst i ii.I nerst 100 naerst n@rst 100 lerst III. 1 lerst 100 laerst l@rst 100 therst III. 1 Terst 100 taerst T@rst 100 thersh III. 1 TerS 100 thaersh T@rS 100 fersh III. 1 ferS 100 faerst f @rS 100 lersh III.l lerS 100 laersh 10rS 100 sersh III. 1 serS 100 saersh s@rS 100 hersh III. 1 herS 100 haersh h@rS 100 bersh III. 1 berS 100 baersh b0rS 100 wersh III. 1 werS 100 waersh w0rS 100 persh III. 1 perS 100 paersh p0rS 100 brezd III. 1 brezd 100 braezd br0 zd 100 strezd III. 1 strezd 100 straezd str0zd 100 mezd III. 1 mezd 100 maezd m0 zd 100 cezd III. 1 kezd 100 caezd k0 zd 100 thezd III. 1 Tezd 100 thaezd T0zd 100 gezd III. 1 gezd 100 gaezd g0 zd 100 dezd III. 1 dezd 100 daezd d0zd 100 strezd III. 1 strezd 100 straezd str0zd 100 bezd III. 1 bezd 100 baezd b0zd 100 Class III.2 verbs: bind III .2 bind 100 bund bund 100 find III. 2 find 100 fund fund 100 grind III .2 grind 100 grund grund 100 wind III .2 wind 100 wund wund 100 mind III .2 mind 100 mund mund 100 climb III .2 klimb 100 dumb klumb 100 cling III .2 kliG 100 clung kluG 100 sing III .2 siG 100 sung suG 100 sting III .2 stiG 100 stung stuG 100 233 swing III. 2 swiG 100 swung swuG 100 bing III .2 biG 100 bung buG 100 ling III. 2 liG 100 lung luG 100 dringc III. 2 driGk 100 drungc druGk 100 shringc III .2 SriGk 100 shrungc SruGk 100 singe III .2 siGk 100 sungc suGk 100 slingc III .2 sliGk 100 slungc sluGk 100 stingc III .2 st iGk 100 stungk stuGk 100 swinge III .2 swiGk 100 swungc swuGk 100 ringc III .2 riGk 100 rungk ruGk 100 wingc III .2 wiGk 100 wungk wuGk 100 winn III .2 winn 100 wunn wunn 100 simm III .2 simm 100 summ summ 100 f rim III .2 f rim 100 f rum f rum 100 swimm III .2 swimm 100 swumm swumm 100 bin III .2 bin 100 bun bun 100 Class IV verbs bar IV bar 100 bor bor 100 thwar IV Twar 100 thwor Twor 100 tar IV tar 100 tor tor 100 car IV kar 100 cor kor 100 mar IV mar 100 mor mor 100 lar IV lar 100 lor lor 100 spar IV spar 100 spor spor 100 war IV war 100 wor wor 100 zar IV zar 100 zor zor 100 cwal IV kwal 100 cwol kwol 100 hal IV hal 100 hoi hoi 100 stal IV stal 100 stol stol 100 mal IV mal 100 mol mol 100 swal IV swal 100 swol swol 100 ral IV ral 100 rol rol 100 spal IV spal 100 spol spol 100 wal IV wal 100 wol wol 100 gal IV gal 100 gol gol 100 brae IV brak 100 broc brok 100 mac IV mak 100 moc mok 100 dac IV dak 100 doc dok 100 spac IV spak 100 spoc spok 100 wac IV wak 100 woe wok 100 hac IV hak 100 hoc hok 100 cac IV kak 100 coc kok 100 234 Class V verbs: baern V b@rn 100 baerned b@rnd 100 cemb V kemb 100 cembed kembd 100 draef V dr@f 100 draefed dr@ft 100 f Eg V fEg 100 fEged fEgd 100 gleng V gleG 100 glenged gleGd 100 hip V hip 100 hiped hipt 100 laef V l@f 100 laefed 10ft 100 maen V m0n 100 maened m@nd 100 nEth V nET 100 nEthed nETt 100 swevv V swevv 100 swevved swevvd 100 fremm V fremm 100 fremmed fremmd 100 wegg V wegg 100 wegged weggd 100 bath V bath 100 bathed batht 100 care V kark 100 carced karkt 100 hors V hors 100 horsed horst 100 lang V lang 100 langed langd 100 spar V spar 100 spared spard 100 warn V warn 100 warned warnd 100 gin V gin 100 ginned gind 100 tig V tig 100 tigged t igd 100 ful V ful 100 fulled fuld 100 hang V hang 100 hanged hangd 100 rln V rln 100 rlnned rind 100 war V war 100 warred ward 100 torf V torf 100 torfed torft 100 235 E.2 Generalization set 1 Following are 25 novel verbs using vowels from the training set, which are considered to be “good” examples of the 5 strong classes. Since these are novel verbs, the predicted past tense, according to the closest class, is given in columns 5 and 6 : plib I plib 100 plab plab 100 trip I trip 100 trap trap 100 lig I lig 100 lag lag 100 stik I stik 100 stak stak 100 sit I sit 100 sat sat 100 keT II keT 100 keaT k=T 100 mew II mew 100 meaw m=w 100 ses II ses 100 seas s=s 100 gef II gef 100 geaf g=f 100 smev II smev 100 smeav sm=v 100 derst III. 1 derst 100 daerst d0rst 100 drerS III. 1 drerS 100 draerS dr@rS 100 kerst III. 1 kerst 100 kearst k0rst 100 stezd III. 1 stezd 100 steazd st0zd 100 lerS III. 1 lerS 100 learS 10rS 100 brind III .2 brind 100 brund brund 100 limb III .2 limb 100 lumb lumb 100 fliG III .2 fliG 100 fluG fluG 100 spinn III .2 spinn 100 spunn spunn 100 bimm III. 2 bimm 100 bumm bumm 100 gar IV gar 100 gor gor 100 far IV far 100 for for 100 bal IV bal 100 bol bol 100 tal IV tal 100 tol tol 100 trak IV trak 100 trok trok 100 236 E.3 Generalization set 2 Following are 27 novel verbs using vowels from the training set, which are considered to be “bad” examples of the 5 strong classes. Note the new column 3 which indicates the number of features the verb differs from its closest class. Since these are novel verbs, the predicted past tense, according to the closest class, is given in columns 6 and 7: trek I 1 trek 90 trak trak 90 lif I 1 lif 90 laf laf 90 stiT I 1 stiT 90 staT staT 90 bep I 1 bep 90 bap bap 90 gleg I 1 gleg 90 glag glag 90 met I 1 met 90 mat mat 90 det I 1 det 90 dat dat 90 f ligb I 3 f ligb 90 f ligb f lagb 90 keb II 1 keb 90 keab k-b 90 steS II 1 steS 90 steaS st=S 90 meS II 1 meS 90 meaS m=S 90 riv II 1 riv 90 reav r=v 90 griS II 2 griS 90 greaS gr=S 90 trav II 2 trav 90 treav tr-v 90 glaS II 3 glaS 90 gleaS gl = S 90 grish II 5 grish 90 greash gr=sh 90 sirt III .2 3 sirt 90 surt surt 90 pilp III .2 3 pilp 90 pulp pulp 90 bag IV 1 bag 90 bog bog 90 tap IV 2 tap 90 top top 90 dat IV 2 dat 90 dot dot 90 slaz IV 2 slaz 90 sloz sloz 90 stad IV 2 stad 90 stod stod 90 maT IV 2 maT 90 moT moT 90 law IV 2 law 90 low low 90 mab IV 3 mab 90 mob mob 90 slelp V 3 slelp 90 slelpt slelpt 90 237 E.4 Generalization set 3 Following are 11 novel verbs using the /A / vowel, which did not appear in the training set. All verbs are expected to be Class V. Note the new column 3 which indicates the number of features the verb differs from their closest class. Since these are novel verbs, the predicted past tense, according to the closest class, is given in columns 6 and 7: gr AS II 1 gr AS 90 greaS gr=S 90 gi^g IV 2 glAg 90 glog glog 90 bAg IV 2 bAg 90 bog bog 90 slAz IV 3 slAz 90 sloz sloz 90 st Ad IV 3 st Ad 90 stod stod 90 mAd IV 3 mAd 90 mod mod 90 lAw IV 3 1A w 90 low low 90 st Ath V 1 st Ath 90 stAtht stAtht 90 stAT V 2 StAT 90 st ATt st ATt 90 trAv V 2 trAv 90 tr Avd tr Avd 90 gr Ash V 3 gr Ash 90 grAsht grAsht 90 238 APPENDIX F: TRAINING AND TEST SETS FOR SIMULATION IN CHAPTER 9 The format of all items in these lists is as follows: orthographic form of present tense, phonological representation of present tense, present tense raw frequency, orthographic form of past tense, phonological representation of past tense, past tense raw frequency. The ' character in the phonological representation is a stress marker. F .l Training set 1 The training set consists of 24 irregular verbs that were selected to have no irregular neighbors. 309 regular verbs that are identical to the regular verbs in Appendix C, section C .l, complete the training set: Irregular verbs: beat b * it 66 beat b 1 it 12 build b ’ lid 249 built b' lit 21 burst b ' Rst 37 burst b 1 Rst 11 buy b 1 Y 162 bought b'ct 32 catch k' @C 146 caught k' ct 54 239 choose C'uz 177 chose C 1 oz 37 draw dr' c 222 drew dr' u 63 eat ' it 122 ate 1 et 16 fall f' cl 239 fell f'El 87 fight f' Yt 155 fought f 1 ct 23 flee fl1 i 22 fled fl 'Ed 22 fly fl' Y 92 flew fl 'u 27 lie 1 1 Y 211 lay 1 'e 81 lose 1 ’ uz 274 lost 1' cst 49 meet m ' it 339 met m ' Et 80 put p'Ut 513 put p'Ut 130 ride r' Yd 126 rode r' od 40 seek s'ik 179 sought s ' ct 35 set s'Et 372 set s 'Et 71 shoot S 'ut 117 shot S ' at 18 speak sp' ik 274 spoke sp' ok 86 steal st' il 39 stole st' ol 10 stick st' Ik 50 stuck st' *k 13 teach t ' iC 153 taught t ' ct 19 Regular verbs: Same as Appendix C, section C.l 240 F.2 Test set 1 The irregular verbs in the training set were divided into two groups, based on their number of regular neighbors, as follows: Easy (average num ber if regular neighbors is 0.0): beat b'it 66 beat b ' it 12 build b'lld 249 built b' lit 21 burst b'Rst 37 burst b ' Rst 11 eat 'it 122 ate ' et 16 fight f1Yt 155 fought f' ct 23 flee fl'i 22 fled f 1 'Ed 22 meet m'it 33 9 met m'Et 80 put p'Ut 513 put p'Ut 130 ride r'Yd 126 rode r ' od 40 set s'Et 372 set s 'Et 71 H ard (average num ber of regular neighbors is 3.8): buy b'Y 162 bought b ' ct 32 catch k'@C 14 6 caught k' ct 54 choose C'uz 177 chose C ' oz 37 lie 1'Y 211 lay 1'e 81 lose l'uz 274 lost 1' cst 49 seek s'ik 179 sought s' ct 35 speak sp'ik 274 spoke sp' ok 86 steal st'il 39 stole st' ol 10 stick st'Ik 50 stuck st' Ak 13 teach t 'iC 153 taught t ' ct 19 241 F.3 Test set 2 The hard irregular verbs from section F.2 were further divided into groups, based on their number of regular neighbors as follows: Easy (average number if regular neighbors is 2.8): catch k’ @C 146 caught k' ct choose C'uz 177 chose C ' OZ lose 1' uz 274 lost 1' cst steal st' il 39 stole st' ol teach t ' iC 153 taught t 1 ct Hard (average number of regular neighbors is 4.8): buy b'Y 162 bought b ' ct lie 1 ’ Y 211 lay l'e seek s ' ik 179 sought s ’ ct speak sp 1 ik 274 spoke sp ’ ok stick st' Ik 50 stuck st' ^k 2 54 37 49 10 19 32 81 35 86 13 242 BIBLIOGRAPHY Allen, J. (1987). Natural language understanding. Menlo Park, CA: Benjamin Cummings. Andersen, E.S. (1992). Complexity and language acquisition: Influences on the development of morphological systems in children. In Hawkins, J.A, & Gell- Mann, M. (Eds.), The Evolution o f Human Languages, SFI Studies in the Sciences o f Complexity, Proceedings, VolX.. Addison-Wesley. Anderson, J.A. (1983). Cognitive and psychological computation with neural models. IEEE Transactions on Systems, Man and Cybernetics, 5. Anderson, J.A., Silverstein, J.W., Ritz, S.A., & Jones, R.S. (1977). Distinctive features, categorical perception and probability learning: Some applications of a neural model. Psychological Review, 84. Anderson, J.A., Spoehr, K.T., & Bennett, D.J. (1991). A study in numerical perversity: Teaching arithmetic to a neural network. In Levine, D.S. & Aparico, M. (Eds.), Neural Networks fo r Knowledge Representation and Inference. Erlbaum. Anderson, S.R. (1982). W here’s morphology? Linguistic Inquiry, 13, 571-612. Arbib, M.A. (1989). The metaphorical brain 2: Neural networks and beyond. New York: Wiley. Arbib, M.A., Conklin, E.J., & Hill, J.C. (1987). From schema theory to language. Oxford University Press Berg, G. (1991). Learning recursive phrase structure: Combining the strengths o f PDP andX-bar syntax (Tech. Rep. No. 91-5). Albany: SUNY, Department of Computer Science. Berko, J. (1958). The child's learning of English morphology. Word, 14, 150-177. 243 Bowerman, M. (1982). Starting to talk worse: clues to language acquisition from children’s late errors. In S. Strauss (Ed.), U-shaped Behaviorial Growth. New York: Academic Press. Brown, R. (1973). A first language. Cambridge, MA: Harvard University Press. Bybee, J.L. & Moder, C. (1983). Morphological classes as natural categories. Language, 59, 251-270. Bybee, J.L. & Slobin, D. (1982). Rules and schemas in the development and use of the English past tense. Language, 58, 265-289. Chomsky, N. (1957). Syntactic structures. The Hague: Mouton. Chomsky, N. (1965). Aspects o f the theory o f syntax. MIT Press. Chomsky, N. (1977). Language and Responsibility. New York: Pantheon Books. Chomsky, N. (1981). Lectures on Government and Binding. Dordrecht, The Netherlands: Foris Publications. Cleermans, A., Servan-Schreiber, D., & McClelland, J.L. (1989). Finite state automata and simple recurrent networks. Neural Computation, 1, 372-381. Cohen, J.D., Dunbar, K., McClelland, J.L. (1990). On the control of automatic processes: A parallel distributed processing account of the Stroop effect. Psychological Review, 97, 332-361. Cottrell, G.W., Munro, P., & Zipser, D. (1987). Learning internal representation from gray-scale images: An example of extensional programming. In Proceedings o f the 9 th Annual Conference o f the Cognitive Science Society, Hillsdale, NJ: Erlbaum. Cottrell, G.W., & Plunkett, K. (1991). Using a recurrent net to learn the past tense. In Proceedings o f the 13th Annual Conference o f the Cognitive Science Society. Hillsdale, NJ: Erlbaum. Daugherty, K.G. & Hare, M. (1994). W hat’s in a rule? The past tense by some other name might be called a connectionist net. In Mozer, M. C., Smolensky, P., Touretzky, D. S., Elman, J. L., & Weigend, A. S. (Eds.), Proceedings o f the 1993 Connectionist Models Summer School.. Hillsdale, NJ: Erlbaum. Daugherty, K.G., MacDonald, M.C., Petersen, A.S., & Seidenberg, M. (1993). Why no mere mortal has ever flown out to center field, but people often say that they do. In Proceedings o f the 15th Annual Conference o f the Cognitive Science Society. Hillsdale, NJ: Erlbaum. 244 Daugherty, K.G., & Seidenberg, M. (1992). Rules or connections? The past tense revisited. In Proceedings o f the 14th Annual Conference o f the Cognitive Science Society. Hillsdale, NJ: Erlbaum. Daugherty, K.G., & Seidenberg, M.S. (in press). Beyond rules and exceptions: A connectionist modeling approach to inflectional morphology. In Corrigan, R., Iverson, G., & Lima, S. (Eds.), The Reality o f Linguitic Rules. Phildelphia: John Benjamins Press. Dyer, M.G. (1988). Symbolic neuroengineering for natural language processing: A multilevel research approach. (Tech. Rep. No. 88-14). Los Angeles: University of California. Center for Artificial Intelligence. Elman, J.L. (1990). Finding structure in time. Cognitive Science, 14, 179-211. Francis, W.N. & Kucera, H. (1982). Frequency Analysis o f English Usage. Houghton-Mifflin. Fukushima, K., Miyake, S., & Ito, T. (1983). Neocognitron: A neural network model for a mechanism of visual pattern recognition. IEEE Transactions on Systems, Man, and Cybernetics, 13, 826-834. Golden, R.M. (1985). A developmental neural model of word perception. In Proceedings o f the 7th Annual Conference o f the Cognitive Science Society. Hillsdale, NJ: Erlbaum. Gorman, R.P., & Sejnowski, T. (1988). Analysis of hidden units in a layered network trained to classify sonar targets. Neural Networks, 7(1), 75-89. Halle, M., & Mohanan, K.P. (1985). Segmental phonology and modern English. Linguistic Inquiry, 16, 57-116. Hare, M, Daugherty, K.G., & Elman, J.L. (in press). Default categorization in connectionist networks. To appear in Language and Cognitive Processes. Hare, M. & Elman, J.L. (1992). A connectionist account of English inflectional morphology: Evidence from language change. In Proceedings o f the 14th Annual Conference o f the Cognitive Science Society. Hillsdale, NJ: Erlbaum Harris, C.L. (1992). Understanding English past tense formation: The shared meaming hypothesis. In Proceedings o f the 14th Annual Conference o f the Cognitive Science Society. Hillsdale, NJ: Erlbaum. Hebb, D.O. (1949). The organization o f behavior. New York: Wiley. Hetherington, P., & Seidenberg, M.S. (1989). Is there “catastrophic interference” in connectionist networks? In Proceedings o f the 11th Annual Conference o f the Cognitive Science Society. Hillsdale, NJ: Erlbaum. 245 Hill, J.C. (1983). A computation model o f language acquisition in the two-year-old. Unpublished doctoral dissertation, University of Massachusetts, Amherst. Hinton, G.E. (1986). Learning distributed representations of concepts. In Proceedings o f the 8th Annual Conference o f the Cognitive Science Society. Hillsdale, NJ: Erlbaum. Hinton, G. E. (1989a). Deterministic Boltzmann learning performs steepest descent in weight-space. Neural Computation, 7(1), 143-150. Hinton, G. E. (1989b). Connectionist learning procedures. Artificial Intelligence, 40, 185-234. Hinton, G.E., & Shallice, T. (1989). Lesioning a connectionist network: Investigations o f acquired dyslexia (Tech. Rep. No. 89-3). University of Toronto, Department of Computer Science. Hoeffner, J. (1992). Are rules a thing of the past? The acquisition of verbal morphology by an attractor network. In Proceedings o f the 14th Annual Conference o f the Cognitive Science Society. Hillsdale, NJ: Erlbaum. Hummel, J.E., & Biederman, I. (1990). Dynamic binding in a neural network fo r shape recognition (Tech. Rep. No. 90-5). University of Minnesota, Department of Psychology, Image Understanding Lab. Jacobs, R.A., Jordan, M.I., & Barto, A.G. (1991). Task decomposition through competition in a modular connectionist architecture: The what and where vision tasks. Cognitive Science 15: 219-250. Jordan, M.I. (1986). Attractor dynamics and parallelism in a connectionist sequential machine. In Proceedings o f the 8th Annual Conference o f the Cognitive Science Society. Hillsdale, NJ: Erlbaum. Kawamoto, A.H., & Zemblidge, J.H. (1992). Pronunciation of homographs. Journal o f Memory and Language, 31(3), 349-374. Kim, J.J., Pinker, S., Prince, A. & Prasada, S. (1991). Why no mere mortal has ever flown out to center field. Cognitive Science, 15, 73-218. Kiparsky, P. (1982). Lexical morphology and phonology. In I.-S. Yang (Ed.), Lingistics in the morning calm. Seoul: Hanshin. Kirkpatrick, S., Gelatt, C.D., & Vecchi, M.P. (1983). Optimization by simulated annealing. Science, 220, 671-680. 246 Lachter, J., & Bever, T.G. (1988). The relation between linguistic structure and associative theories of language learning: A constructive critique of some connectionist learning models. Cognition, 28, 195-247. Lacouture, Y. (1989). From mean square error to reaction time: A connectionist model of word recognition. In Touretzky, D., Hinton, G., & Sejnowski, T. (Eds.), Proceedings o f the 1988 Connectionist Models Summer School.. San Mateo, CA: Morgan Kaufman. Lakoff, G. (1987). Connectionist explanations in linguistics: Some thoughts on recent anti-connectionist papers. Unpublished electronic manuscript, Univerity of California, Berkeley. Ling, C.X., & Marinov, M. (in press). Answering the connectionist challenge: A symbolic model of learning the past tenses of English verbs. To appear in Cognition. Lynch, G., McGaugh, J.L., & Weinberger, N.M. (1984). Neurobiology o f learning and memory. Guilford Press, New York. MacWhinney, B. (1990). The CHILDES Project: Tools fo r Analyzing Talk. Hillsdale, NJ: Erlbaum. MacWhinney, B. & Leinbach, J. (1991). Implementations are not conceptualizations: Revising the verb learning model. Cognition, 40, 121-157. MacWhinney, B., Leinbach, J., Taraban, R., & McDonald, J. (1989). Language learning: Cues or rules? Journal o f Memory & Language, 28, 255-277. Marcus, G., Brinkmann, U., Clahsen, H, Wiese, R, Woest, A, & Pinker, S. (1993). German inflection: The exception that proves the rule (Tech. Rep. No. 47). MIT, Department of Brain and Cognitive Sciences. Marcus, G., Pinker, S., Ullman, M., Hollander, M., Rosen, T., Xu, F. (1993). Overregularization in language acquisition. In Monographs o f the Society fo r Research in Child Development, 57 (4). McClelland, J.L. (1979). On the time-relations of mental processes: An examination of systems of processes in cascade. Psychological Review, 86, 287- 330. McClelland, J.L. & Rumelhart, D. (1988). Explorations in parallel distributed processing: A handbook o f models, progams, and exercises. MIT Press. McCloskey, M. & Lindemann, A.M. (1992). Mathnet: Preliminary results from a distributed model of arithmetic fact retrieval. In Campbell, J.I.D. (Ed.), The Nature and Origins o f Mathematical Skills. Elsevier. 247 McRae, K. (1991). Independent and correlated properties in artifact and natural kind concepts. Unpublished doctoral dissertation, McGill University, Montreal. Minsky, M., & Papert, S. (1969). Perceptrons: An introduction to computational geometry. MIT Press. Nenov, V.I. and Dyer, M.G. (1988). DETE: Connectionist!symbolic model o f visual and verbal association. (Tech. Rep. No. 88-6). Los Angeles: University of California. Center for Artificial Intelligence. Pearlmutter, B.A. (1987). Learning state space trajectories in recurrent neural networks. Neural Computation, 1(2), 263-269. Peterson, C. & Anderson, J.R. (1987). A mean field theory learning algorithm for neural nets. Complex Systems, 1, 995-1019. Pinker, S. (1984). Language learnability and language development.. Cambridge, MA: Harvard. Pinker, S. (1991). Rules of language. Science, 253, 530-534. Pinker, S. & Prince, A. (1988). On language and connectionism. Cognition, 28, 73- 194. Plaut, D. (1991). Connectionist neuropsychology: The breakdown and recovery of behavior in lesioned attractor networks. Unpublished doctoral dissertation, Carnegie Mellon University. Plaut, D.C. & McClelland, J.L. (1993). Generalization with componential attractors: Word and nonword reading in an attractor network. In Proceedings o f the 15th Annual Conference o f the Cognitive Science Society. Hillsdale, NJ: Erlbaum. Plunkett, K & Marchman, V. (1991). U-shaped learning and frequency effects in a multi-layered perceptron. Cognition, 39, 43-102. Powers, D.M.W., & Turk, C.C.R. (1989). Machine learning o f natural language. London: Springer-Verlag. Prasada, S., Pinker, S., & Snyder, W. (1990). Some evidence that irregular forms are retrieved from memory but regular forms are rule generated. Psychonomic Society meeting, November. Quinlan, R (1986). Induction of decision trees. Machine Learning, 1, 81-106. Rosenblatt, F. (1959). Two theorems of statistical separability in the perceptron. Selfridge, M. (Ed.), Mechanisation o f thought processes: Proceedings o f a symposium held at the National Physical Laboratory. London: HMSO. 248 Rumelhart, D., Hinton, G., & Williams, R.J. (1986). Learning internal representations by error propagation. In Rumelhart, D.E., McClelland, J.L., and the PDP Research Group (Eds.) Parallel distributed processing: Explorations in the microstructure o f cognition, vol. 1. MIT Press. Rumelhart, D. & McClelland, J.L. (1986). On learning the past tenses of English verbs. In Rumelhart, D.E., McClelland, J.L., and the PDP Research Group (Eds.) Parallel distributed processing: Explorations in the microstructure o f cognition, vol. 2. MIT Press. Schynns, P.A. (in press). A modular neural network model of concept acquisition. To appear in Cognitive Science. Seidenberg, M. (in press). Connectionism without tears. In S. Davis (Ed.), Connectionism: Theory and practice. Oxford University Press. Seidenberg, M., & Bruck, M. (1990). Frequency and consistency effects in the past tense. Psychonomic Society meeting, November. Seidenberg, M.S. & McClelland, J.L. (1989). A distributed, developmental model of word recognition and naming. Psychological Review, 96, 523-568. Sejnowski, T., & Rosenberg, C. (1987). Parallel networks that leam to pronounce English text. Complex Systems, 1, 145-168. Selfridge, O.G. (1958). Pandemonium: A paradigm for learning. Mechanisation o f Thought Processes: Proceedings o f a Symposium Held at the National Physical Laboratory, November 1958, London: HMSO, 513-526. Shanks, D.E. (1991). Categorization in a connectionist network. Journal o f Experimental Psychology: Learning, Memory, and Cognition, 17(3), 433-443. Skarda, C.A. & Freeman, W.J. (1987). How brains make chaos in order to make sense of the world. Behavioral and Brain Sciences, 10, 161-195. Spencer, A. (1991). Morphological theory. Basil Blackwell Ltd. Cambridge. St. John, M.F., & McClelland, J.L. (1988). Applying contextual constraints in sentence comprehension. Proceedings o f the 10th Annual Conference o f the Cognitive Science Society. Hillsdale, NJ: Erlbaum. Stroop, J.R. (1935). Studies of interference in serial verbal reactions. Journal o f Experimental Psychology, 12, 242-248. Viscuso, S.R., Anderson, J.A., & Spoehr, K.T. (1989). Representing simple arithmetic in neural networks. In Tiberghien, G. (Ed.), Advances in Cognitive Science. Vol. 2: Theory and Applications. New York: Wiley. 249
Asset Metadata
Core Title
00001.tif
Tag
OAI-PMH Harvest
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC11255721
Unique identifier
UC11255721
Legacy Identifier
DP22880
Linked assets
University of Southern California Dissertations and Theses