Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Age -limited learning effects in reading and speech perception
(USC Thesis Other)
Age -limited learning effects in reading and speech perception
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
AGE-LIMITED LEARNING EFFECTS IN READING AND SPEECH
PERCEPTION
by
Jason D. Zevin
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(NEUROSCIENCE)
August 2003
Copyright 2003 Jason D. Zevin
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
UMI Number: 3116813
INFORMATION TO USERS
The quality of this reproduction is dependent upon the quality of the copy
submitted. Broken or indistinct print, colored or poor quality illustrations and
photographs, print bleed-through, substandard margins, and improper
alignment can adversely affect reproduction.
In the unlikely event that the author did not send a complete manuscript
and there are missing pages, these will be noted. Also, if unauthorized
copyright material had to be removed, a note will indicate the deletion.
®
UMI
UMI Microform 3116813
Copyright 2004 by ProQuest Information and Learning Company.
All rights reserved. This microform edition is protected against
unauthorized copying under Title 17, United States Code.
ProQuest Information and Learning Company
300 North Zeeb Road
P.O. Box 1346
Ann Arbor, Ml 48106-1346
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
UNIVERSITY OF SOUTHERN CALIFORNIA
THE GRADUATE SCHOOL
UNIVERSITY PARK
LOS ANGELES, CALIFORNIA 90089-1695
This dissertation, written by
under the direction o f h \C. dissertation committee, and
approved by all its members, has been presented to and
accepted by the Director o f Graduate and Professional
Programs, in partial fulfillment o f the requirements fo r the
degree o f
DOCTOR OF PHILOSOPHY
Date A u gu st 1 2 . 2003
Dissertation Committee
7
7
i
Chair
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Acknowledgments
I can’t thank my parents enough, and not just for the obvious stuff. They
inspired my intellectual curiousity and encouraged me to follow it in whatever
direction it took me.
I think it' caught them off guard, however, when this curiosity took me to
St. Louis, Missouri. There, Dave Balota provided me with truly fundamental
training. The years I spent working in his lab really prepared me for a career in
science. I also thank my colleagues in the lab. Mike Cortese, Dan Spieler, and
Jason Watson who treated me as a colleague from the start. St. Louis is also
where I met Annie Shaw. Annie is the best.
The next stop on what is turning into a clockwise tour of the U.S. was Los
Angeles, where I joined Mark Seidenberg’s lab. More about Mark in a moment.
First, I’d like to thank my colleagues in the lab. And since I’m kind of going in
chronological order here, I’ll start with Mike Harm, whose patience for my initial
fumblings with the technical side of modeling was matched only by his impatience
with the unsyncronized stoplights on La Brea. Marc Joanisse, Robert Thornton,
Todd Haskell, Mike Grammer, Jelena Mirkovic and James Keidel have all been
there for me in ways that go above and beyond the call of duty for labmates.
Jelena and James provided helpful comments on early drafts of the chapters, and
ii
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
James’ collaboration on the speech modeling presented in Chapter 4 has been
indispensible. It has also been fun.
I’d also like to thank the faculty at USC. I have been blessed with a committee
that represents an unusual breadth and depth of expertise as well as a uniform
commitment to excellence. Working with Sarah Bottjer on studies of birdsong
has been an incredible experience. Sarah’s passion and rigor are contagious. Dani
Byrd and Elaine Andersen have also been enormously helpful and encouraging
throughout. Since moving to Wisconsin, I have also had the enormous good
fortune of working closely with Keith Kluender.
Finally, I would like to thank Mark Seidenberg for his unflagging support,
intellectual vigor, visionary perspective and consistently excellent advice on fine
dining establishments. Thanks, Mark.
Research reported in this dissertation was supported by NIMH grants MH64445
and MH58723 and NICHD grant MH29891 to Mark Seidenberg and NIDCD grant
DC04072 to Keith Kluender.
permission of the copyright owner. Further reproduction prohibited without permission.
Contents
Acknowledgments ii
List Of Tables vii
List Of Figures ix
Typographic Conventions x
Abstract xi
1 Two Kinds of Age-limited Learning 1
1.1 A critical period in visual cortex: Plasticity of ocular dominance columns 5
1.1.1 Formation of ODCs .............................................................................. 6
1.1.2 The critical period for ODC p la stic ity ............................................... 8
1.1.3 ODC plasticity as a model critical period............................................ 10
1.1.4 Inhibition and the organization of mouse binocular c o r te x ............. 11
1.2 Barrel fields in rodent somatosensory c o r te x ...................................... 13
1.3 A critical period for tuning of multimodal maps in barn ow ls............. 17
1.4 A behavioral sensitive period in so n g b ird s............................................ 23
1.4.1 Defining the sensitive period for birdsong learning............................ 24
1.4.2 Evidence for plasticity after the sensitive p erio d ............................... 25
1.5 Language Acquisition................................................................... 30
1.5.1 First language learning........................................................................... 31
1.5.2 Second language le a rn in g .................................................................... 35
1.5.2.1 Syntax ................................................................................................. 35
1.5.2.2 Speech ................................................................................................. 38
1.5.2.3 Age of Acquisition effects in lexical task s......................................... 41
1.6 Conclusions................................................................................................ 44
iv
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2 Age-limited Learning Effects in Reading I: Modeling 50
2.1 Previous S tu d ie s ...................................................................................... 55
2.2 Theoretical Issues...................................................................................... 66
2.2.1 Connectionist m o d elin g ........................................................................ 68
2.3 Simulation 1 ............................................................................................... 72
2.3.1 M ethods.................................................................................................... 73
2.3.1.1 Architecture ........................................................................................ 73
2.3.1.2 Corpus and Training........................................................................... 74
2.3.2 Results and Discussion........................................................................... 77
2.4 Simulation 2 ................................................................................................. 79
2.4.1 M ethods.................................................................................................... 81
2.4.2 Results .................................................................................................... 82
2.4.3 Discussion................................................................................................. 87
2.5 Simulation 3 ................................................................................................. 88
2.5.1 M ethods.................................................................................................... 90
2.5.2 Results and Discussion........................................................................... 91
2.6 Simulation 4 ................................................................................................. 94
2.6.1 M ethods.................................................................................................... 94
2.6.2 Results and Discussion............................................................................ 95
2.7 General Discussion................................................................................... 96
2.7.1 Conditions That Create Age of Acquisition Effects.............................. 99
2.7.2 Which Types of Knowledge Yield Age of Acquisition Effects? . . . 106
2.7.3 Conclusions................................................................................................. 110
3 Age-limited Learning Effects in Reading II: Empirical Studies 112
3.1 Experiment 1 ................................................................................................116
3.1.1 M ethods........................................................................................................117
3.1.1.1 S tim u li.................................................................................................... 117
3.1.1.2 Subjects.................................................................................................... 118
3.1.1.3 Procedure................................................................................................. 118
3.1.2 Results ............................................................... 119
3.1.3 Discussion.....................................................................................................120
3.2 Experiment 2 ................................................................................................122
3.2.1 M ethods........................................................................................................123
3.2.1.1 Stimuli and Design..................................................................................123
3.2.1.2 Subjects.....................................................................................................124
3.2.1.3 Procedure................................................................................................. 125
3.2.2 Results ........................................................................................................125
3.2.3 Discussion.....................................................................................................126
3.3 General Discussion.......................................................................................127
v
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4 A model of cross-linguistic speech perception 130
4.1 Existing m odels............................................................................................ 131
4.1.1 Flege’s SLM ................................................................................................. 131
4.1.2 Kuhl’s NLM................................................................................................. 133
4.1.3 Best’s P A M ................................................................................................. 135
4.1.4 McClelland et al.’s Hebbian M o d e l.........................................................138
4.2 The Current M odel...................................................................................... 140
4.2.1 Target Phenom ena.....................................................................................141
4.2.1.1 Na'ive L2 speech perception: Results from Best et al. (2001) . . . 141
4.2.1.2 Failure to acquire novel contrasts with extended exposure .... 143
4.3 M ethods.........................................................................................................144
4.3.1 Stimuli ........................................................................................................144
4.3.2 Model architecture and training procedure.............................................144
4.3.3 Testing procedures.....................................................................................147
4.3.3.1 Identification...........................................................................................147
4.3.3.2 D iscrim ination........................................................................................147
4.4 Results............................................................................................................ 148
4.4.1 English Speech Sounds...............................................................................148
4.4.1.1 Categorical Perception............................................................................148
4.4.1.2 Confusion Matrices..................................................................................149
4.4.2 Zulu Speech S o u n d s ..................................................................................152
4.4.2.1 Identification...........................................................................................152
4.4.2.2 D iscrim ination........................................................................................152
4.4.3 Long-term training . ...............................................................................153
4.5 Discussion......................................................................................................154
4.5.1 Weaknesses of the m odel............................................................................156
4.5.1.1 Normalization ........................................................................................ 156
4.5.1.2 Limits on realism .....................................................................................158
4.5.2 Conclusions..................................................................................................159
5 General Conclusions 161
5.1 Structural limits on language learning........................................................... 162
5.2 Future Directions..........................................................................................164
5.2.1 Generalization and frequency trajectory effects in lexical processing 165
5.2.2 Developmental studies of entrenchment...................................................166
5.2.3 Age-limited plasticity and specific grammatical s tru c tu re s .................168
5.3 Final thoughts .............................................................................................170
References 171
v i
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
List Of Tables
2.1 Properties of the Stimuli Used in Previous Studies of Effects of
Age of Acquisition and Frequency..................................................... 56
2.2 Various Frequency Measures as Predictors of Naming Latency in
Large-Scale S tudies.............................................................................. 60
2.3 Frequency and Age of Acquisition as Predictors of Naming Latencies 61
2.4 Correlations Among 6 Standard Lexical Measures and AoA .... 62
2.5 Unique Variance Accounted for by Frequency and AoA Indepen
dent of Other Lexical V ariab les........................................................ 63
2.6 Unique Variance Accounted for by AoA with Different Subsections
of the WFG Norms Used as P red icto rs ......................... 64
2.7 Correlation Between AoA and WFG Frequency at Different Grade
L e v e ls................................................................................................... 64
3.1 Descriptive statistics for norms collected in Experiment 1 ...............118
3.2 Correlations among norms collected in Experiment 1, and with
other lexical variables............................................................................... 119
3.3 Unique variance in naming RT accounted for by frequency trajec
tory in a regression analysis with six other variables......................... 121
3.4 Descriptive statistics for items in Experiment 2 ...............................124
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.1 Confusion matrix without noise . . . . ................................................ 150
4.2 Confusion matrix for noise inducing ~20% error rates.........................150
4.3 Identification and discrimination scores for Zulu contrasts................... 153
4.4 Identification as a function of A O A ......................................................153
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
List Of Figures
2.1 Model architecture used in all sim u latio n s...................................... 74
2.2 Frequency trajectories of critical items in Simulation 1 77
2.3 Performance over time for critical items in Simulation 1 ................ 77
2.4 Performance over time for Simulation 2, A) error rate and B) sum
squared e r r o r ....................................................................................... 83
2.5 Performance in the flat condition (Simulation 2) compared to the
same items in the early and late conditions in Simulation 1 . . . . 85
2.6 Performance on high and low cumulative frequency items within
the flat co n d itio n ................................................................................. 86
2.7 Performance over time for critical items in Simulation 3: A) error
rate, B) sum squared e rro r................................................................. 92
2.8 Performance over time for critical items in simulation 4: A) error
rate, B) sum squared error .............................................................. 95
3.1 Frequency trajectories of items in Experiment 2. Top: high-frequency
items. Bottom: low-frequency items. Error bars represent stan
dard error................................................................................................... 122
3.2 Adjusted means for Experiment 2 .........................................................126
4.1 Speech model architecture..................................................................... 145
4.2 Categorical perception in the m o d e l..................................................................149
ix
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Typographic Conventions
In discussions of models and experiments involving reading, written words are
presented in all caps (e.g., WORD). International Phonetic Alphabet symbols
are used to represent speech sounds, using slashes to indicate broad transcription
(e.g., /wad/).
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Abstract
Age-limited plasticity is a central issue in cognitive neuroscience. Recently,
important advances have been made in understanding the molecular, physiologi
cal and anatomical processes that give rise to age-limited plasticity in a number of
different animal models. How this understanding might be applied to age-limited
learning in humans - particularly sensitive period effects in language learning
- has remained largely unexplored, however. The current work represents an
attempt to apply insights from other areas of neuroscience to the study of age-
limited learning effects in language. In particular, I propose that these effects
can be explained in terms of the entrenchment of knowledge that supports one’s
native language, as opposed to parametric changes in brain plasticity, or specific
changes in the function of a specialized “language-acquisition device.”
Chapter 1 lays the theoretical groundwork for the rest of the dissertation and
frames the work presented here in the context of recent developments in the study
of age-limited plasticity - including a critical review of the literature regarding
particular model systems (e.g., ocular dominance columns, bird song) and pro
posed mechanisms (e.g., changes in NMDA receptor type or distribution). The
discussion is guided by the hypothesis that the mechanisms which limit plastic
ity in the best-understood systems fall into two broad categories. On the one
xi
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
hand, limits on plasticity in some systems appear to be the result of parametric
changes that limit their ability to change. That is, as the system develops, the
mechanisms that support plasticity become less available - for example, devel-
opmentally regulated changes in the extent and density of inhibitory projections
from superficial layers of cortex appear to limit the plasticity of sensory maps in
layer IV. On the other hand, some examples of age-limited learning seem to be the
result of changes in the structure of the neural substrate that supports particular
kinds of knowledge. That is, as a particular network is structured by learning or
other forms of activity-dependent plasticity, it becomes increasingly resistant to
change as a result of the “entrenchment” of overlearned representations.
These two kinds of explanation are not mutually exclusive - limits on plastic
ity in some systems and/or behaviors may be the result of both structural and
parametric mechanisms. However, the dichotomy does motivate the formulation
of some interesting research questions. For example, research on sensitive periods
for language acquisition has typically started from the assumption that changes
in the efficacy of language learning over the lifespan are due to some parametric
change in the plasticity of either linguistic or general cognitive representations.
This motivates looking for an independent effect of age (or “maturation”) on
learning. In contrast, starting from the assumption that changes in plasticity are
the result of the same mechanisms that give rise to the structure of the network
that supports language learning and use motivates a different approach to the
same questions. In particular, the statistical relationship between what ia already
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
known and what must be learned is assumed to play a central role in determining
the efficacy of learning.
The rest of the dissertation explores the notion of structural limits on learning
in two domains - word reading and speech perception - using two methodologies
- behavioral experiments and computational modeling of the development of
reading and speech perception. The particular models used provide powerful test
cases because the parameters governing plasticity are not changed throughout
training (although this could, in principle, be done). Demonstrations of age-
limited plasticity in such models provide a mechanistic account of structural
sensitive periods. Furthermore, by applying the same class of model to a range of
domains, we can generate principled hypotheses about which particular aspects
of language should be susceptible to age-limited learning effects.
In Chapter 2, I describe modeling work examining age-of-acquisition (AoA)
effects in word recognition. These models provided evidence that AoA effects
reported in previous studies were unlikely to have arisen in the translation from
spelling to sound. The results were supported by meta-analyses of existing studies
and novelempirical results.
Results from a behavioral study, reported in Chapter 3, provide further sup
port for this model. This study demonstrates that frequency trajectory has an
influence on AoA above and beyond other variables which influence adult per
formance. However, when frequency trajectory is manipulated independently of
these other factors, it has no residual influence on reading aloud in a word naming
task.
xiii
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The lack of age-limited learning in these models is ascribed to the nature of
the task being learned. In translating from spelling to sound, knowledge of items
learned earlier in life such as “bat” and “fan” readily generalizes to later learned
items such as “ban.” I explored this possibility by designing a set of smaller
models in which the overlap among early and late items was artificially eliminated.
In these smaller models, I observed a robust advantage for the early-learned
items over late learned items, even when the two sets of items were matched for
cumulative frequency. This contrast highlights a potential principle that could
help explain why sensitive periods appear in some domains (i.e., aspects of speech
perception and production) and not other (i.e., alphabetic reading).
The problem of learning the phonological inventory of a second language is
an example of a situation in which early learning does not necessarily support
new learning. Although there is overlap among languages in which aspects of the
speech signal are contrastive, most pairs of languages yield at least a few examples
in which knowledge of one does not readily generalize to the other. For example,
English discriminates “voiced” from “voiceless” stop consonants on the basis of a
very late voice onset time (VOT) boundary. Learning this category means, in
part, learning to ignore variability in voice onset time between pre-voiced stops
(for which voicing begins before the consonant burst) and stops with a VOT of
zero. However, many other languages treat this difference between prevoiced and
0 VOT speech sounds as contrastive. English speakers consistently have difficulty
learning to perceive and produce this contrast.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 4 describes a model of cross-language speech perception which is
based on the same computational principles as the model presented in Chap
ter 2. It replicates some critical data on cross-language speech perception, and
shows age-limited learning effects in simulations of long-term exposure to L2
speech sounds. Interestingly, the degree to which age-limited learning effects
occur depends on the particular contrast being learned. The ability to learn for
eign speech sounds after substantial training on a given native language depends
on a complex interaction between existing knowledge, the acoustic salience of
the to-be-learned contrast, and the degree of entrenchment of native language
knowledge.
In sum, the work described here provides a novel framework for studying
age-limited learning effects in language acquisition. The goal is to apply the
same theory to a range of age-limited learning phenomena, in an attempt to
understand whether disparate developmental patterns of various aspects of lan
guage learning might be the result of the same computational principles applied
to different domains. This view is complementary to the view that age-limited
learning phenomena are the result of parametric changes in plasticity. Critically,
the dichotomy between parametric and structural limits on plasticity advanced
here offers a novel perspective on some very old problems. Rather than asking
whether developmental constraints on plasticity are maturational or experience-
dependent, I argue that it is more fruitful to ask whether the mechanisms by
which plasticity is limited are the same or different from the mechanisms that
guide the organization of the system.
xv
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 1
Two Kinds of Age-limited Learning
Neural development is marked by specific periods of heightened plasticity called
“critical periods” or “sensitive periods.” These phenomena are of particular inter
est, because they demonstrate the complex interaction of genes and the environ
ment in the development of the brain. On the one hand, they demonstrate the
central role of experience in neural and cognitive development - abnormal experi
ence during these periods often has permanent consequences for the organism. On
the other hand, they provide a glimpse of genetic mechanisms which act to limit
plasticity once a certain level of development has been reached. Understanding
such developmental changes in plasticity brings us closer to understanding the
mechanisms of plasticity themselves, and their role in development.
The pioneering work of Hubei and Wiesel (1970) established a simple neu-
rophysiological and anatomical model of critical periods in cat visual cortex.
This work demonstrated that there are limited periods during development dur
ing which normal experience is critical for typical development. Furthermore,
this “critical period” is apparently under strict maturational control. Although
1
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
the timing of the critical period is somewhat sensitive to experiential variables
(Cynader & Mitchell, 1980), it cannot be delayed indefinitely. At some point in
development, the particular structures examined by Hubei and Wiesel (1962) and
others, become immune to even the most radical manipulations of experience.
The impact of this initial characterization of critical periods on reaearch in
other areas, as well as in public policy discussions (see, e.g, Bruer, 1999, for
a discussion) has been enormous. Although the precise mechanisms governing
the opening and closing of critical periods - even in the cat visual cortex -
remain largely unknown, evidence that innate, maturational factors act to limit
the critical period for ocular dominance organization in visual cortex is often cited
as oblique evidence that similar mechanisms may be at work in other domains
where effects of age-limited plasticity are observed (e.g., Pinker, 1989).
As the neural and behavioral changes in plasticity that occur over the lifespan
have begun to be understood in increasing detail - and in a number of model
systems - the working definition of critical period has become more nuanced as
well. Interestingly, as the idea of age-related limitations on learning and plasticity
has been explored in a wide range of domains - from the formation of primary
sensory maps, to the formation of multimodal neural maps, to the formation
of complex representations underlying specific behaviors such as language and
birdsong - the phrase critical period has evolved to mean very different things,
depending upon the context in which it is being used. Indeed, in some domains -
e.g., the birdsong literature and language literature - the term “sensitive period”
2
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
is used to acknowledge that there is a difference between the phenomena being
investigated and other examples of critical periods.
One perspective on age-limited learning, which is gaining currency in some
areas of neuroscience and psychology (particularly birdsong and speech percep
tion, e.g., Doupe & Kuhl, 1999) is that learning itself plays a role the closure of
sensitive periods. If we consider that learning in the brain consists of strength
ening and weakening connections among groups of neurons, it seems plausible
to imagine that the neural representations which support, e.g., native-like profi
ciency in a given language, may not be optimal for acquiring and representing a
second language. The more deeply entrenched these representations become (as
a result of continuous exposure) the harder it becomes to modify the network
in order to represent new forms. This kind of entrenchment has been studied
in computational models of linguistic tasks (Smith, Cottrell, & Anderson, 2001;
Ellis & Lambon Ralph, 2000; Zevin & Seidenberg, 2002; McClelland, Fiez, & Mc-
Candliss, in press), and provides a mechanism of age-limited plasticity that does
not depend on maturational factors per se (see also Odlin, 1989, for discussions
along much the same lines).
In this review, I suggest that age-limited plasticity phenomena can usefully
be considered as falling into two broad categories. In some instances, changes in
plasticity are the result of changes in a parameter of the system that is distinct
from the mechanisms by which patterns of connectivity are established. For
example, the maturation of inhibitory circuitry seems to play a role in limiting
the activity of layer IV cells in primary visual cortex, setting a threshold for
3
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
activity-dependent plasticity than emerges during the critical period. I will refer
to these as parametric mechanisms, because they serve to limit the plasticity of
a particular region or system. Parametric mechanisms may be contrasted with
instances in which limits on adult plasticity are the result of the same mechanisms
by which patterns of connectivity are established. That is, the structure of the
neural substrate that results from learning one set of representations is itself
a limiting factor in the acquisition of novel representations. I refer to these
mechanisms as “structural,” to emphasize the notion that the structure of a neural
network itself (i.e., the pattern of connectivity in a particular region or system)
is the limiting factor in acquiring novel patterns of connectivity or behavior.
Because it acknowledges the fundamentally biological nature of learning, this
distinction gets away from the false dichotomy of “biological” vs. “psycholog
ical” explanations for sensitive that crops up occasionally in the psychological
literature (e.g., Bialystok & Hakuta, 1994). Furthermore, the distinction here is
somewhat orthogonal to the dichotomy of “nature vs. nurture.” While it is true
that parametric changes result largely from mechanisms that seem to be genet
ically pre-programmed (e.g., developmental changes in NMDA receptor subunit
distribution), experience often plays a role in determining the precise timecourse
of these events. Similarly, it would be a mistake to equate the patterns of connec
tivity which may close structural sensitive periods with learning and nothing else.
Patterns of connectivity in many brain regions appear to be genetically controlled
(Crair, Horton, Antonini, & Stryker, 2001), and even in cases where electrical ac
tivity appears to play some role in normal development, it is sometimes the case
4
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
that this activity is endogenously generated, and thus experience-independent
(e.g., Katz & ; Shatz, 1996; Chiu k, Weliky, 2002; Crowley k Katz, 2002).
In this chapter, I describe four different kinds of sensitive period: sensory map
formation (visual, somatosensory) in a range of species, calibration of multimodal
receptive fields in barn owls, song learning in some species of birds and language
acquisition in humans. I conclude by relating these phenomena to the proposed
distinction between parametric and structural sensitive periods.
1.1 A critical period in visual cortex: Plasticity of ocular
dominance columns
Perhaps the most widely studied manifestation of age-limited plasticity is the
critical period1 for the modification of ocular dominance columns (ODCs) in
mammals. It is important to keep in mind that the critical period under discus
sion here relates to the plasticity of ODCs, and not to their formation. This is
different from the notion of the critical (or sensitive) period in domains such as
language or birdsong, during which normal experience is putatively necessary for
the acquisition of species-typical behavior or representations.
ODCs are columns of cortex that respond selectively to input from a single
eye (Hubei k Wiesel, 1962). The phenomenon of interest is that there is a
restricted period (for example, from 21 to 60 post-natal days in kittens, Hubei k
Wiesel, 1970) during which manipulations of visual input give rise to rapid and
X I use the terms “ critical period” and “ sensitive period” interchangeably, in order to maintain
consistency with the literature under discussion.
5
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
permanent changes in the distribution of thalamocortical inputs from the lateral
geniculate nucleus (LGN, the thalamic nucleus that routes inputs from the retinal
ganglia to cortex), and, consequently, to the pattern of eye-specific columns in
primary visual cortex. During the critical period, monocular deprivation can
induce “colonization” of regions dedicated to the deprived/disrupted eye by the
unmanipulated eye (Wiesel & Hubei, 1965). That is, anatomical and physiological
markers show overwhelming representation of the normal eye, whereas in normal
development, columns selective for each eye are evenly distributed throughout
VI.
A consensus appears to be developing that the mechanisms underlying the
initial formation of ODCs are distinct from the mechanisms that underlie crit
ical period plasticity (Crowley & Katz, 2002). However, an understanding of
what ODCs are and what is known about their formation is helpful in explaining
the critical period for ODC plasticity. I therefore discuss these two phenomena
separately.
1.1.1 Formation of ODCs
The role of experience in the formation of ocular dominance columns has recently
undergone an extraordinary evolution (for a comprehensive review, see Katz &
Crowley, 2002). Results from early work by LeVay, Stryker, and Shatz (1978) were
consistent with the notion that, early in development, visual cortex is essentially
homogenous with respect to the distribution of inputs from the two eyes. This
suggested that segregation of ODCs may be the result of a competitive Hebbian
6
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
mechanism. That is, differential input from each eye would give rise to correlated
patterns of firing in VI. Because simultaneous activity of afferents can lead to
strengthening of synapses (Hebb, 1949) the high degree of correlation among
afferents from a single eye could lead to the observed pattern of thalamocortical
connectivity. Indeed, studies demonstrating pharmacological blocking of retinal
activity during the critical period resulted in abnormal patterns of connectivity
in VI lent further support to this hypothesis (Antonini & Stryker, 1996).
However, two recent developments have led to a re-evaluation of this idea.
First, a problem with the earliest work in this area has been addressed. The
tracers used in establishing the initial pattern of connectivity in VI were prone
to “spillover" (LeVay et al., 1978) This means that although tracer was injected
into only one eye, interneuronal transport at the level of LGN may have led to
a spuriously homogenous distribution of tracer in VI. While LeVay et al. (1978)
developed methods for quantifying and thus statistically controlling for spillover,
more recent work has established that ODCs form much earlier in the kitten
than had been thought. The results of LeVay et al. (1978) put the date at which
ODCs became visible during typical development at around 21 days - just at
the beginning of the critical period. Using a combination of optical imaging and
retrograde labeling studies, however, Crair et al. (2001) have found evidence of
nascent ODCs at 14 days. Interestingly, they found no evidence for ODCs at
7 days of age. For reasons discussed in more detail below, it is thought that
retinal activity plays little role in the organization of connectivity this early in
development.
7
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The second challenge to the notion that neuronal activity is critical in the
establishment of ODCs comes from studies in which the eyes of ferrets are enu
cleated (functionally removed) at birth (Crowley & Katz, 1999). Ferrets are a
good model in which to explore the initial formation of ODCs, because they are
highly altricial. When they are born, axons from LGN have not yet innervated
cortex, and in fact the eventual segregation of LGN into eye-specific layers is
not complete. Thus the finding that removal of any input from one or both
eyes does not effect the initial organization of ODCs means that there is almost
certainly more to the initial patterning of thalamocortical connections than coin
cident activity resulting from visual experience or even intrinsic retinal activity.
The ODCs present in ferrets with enucleated retinas may well reflect a genetically
pre-determined pattern of connectivity (e.g., as a result of a chemical gradient) or
an intrinsic pattern of activity arising in the LGN itself (Weliky & ; Katz, 1999).
1.1.2 The critical period for ODC plasticity
While we have seen that a Hebbian mechanism alone does not provide a sufficient
explanation for the initial formation of ODCs, it may provide an explanation for
their plasticity during the critical period. The notion is that afferent neurons
compete to form connections at thalamocortical synapses. Because under normal
circumstances, inputs from each eye are more strongly correlated with each other
than with inputs from the other eye, normal visual experience is sufficient to
maintain an even balance between the two eyes. However, suturing one eye shut
8
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
reduces the input from that eye, leading, via some competitive mechanism to an
overrepresentation of the untreated eye.
The earliest signs of this colonization come from studies in which ocular dom
inance is established by physiological methods. Responses to stimuli in each eye
are measured by single-cell recording, and the preference of individual cells for
each eye is recorded (Wiesel k Hubei, 1963). Interestingly, the thalamocortical
connections in layer IV that have been the subject of much of the study in this area
(e.g., Antonini k Stryker, 1993) do not appear to be the initial site of remodeling
of connectivity in VI. Thalamocortical synapses do eventually reflect the higher
degree of correlation among connections from the untreated eye, but this occurs
over a period of approximately one week (Antonini k Stryker, 1993) long after
the physiological response appears to have saturated. Trachtenberg, Trepel, and
Stryker (2000) demonstrated that changes in patterns of connectivity between
layer II/III and layer IV have the same time course as physiological markers for
ODCs. It may be that this initial remodeling of horizontal connections sets the
stage for the shift in thalamocortical connections, which are in turn responsible
for the long-term maintenance of the abnormal pattern of connectivity.
After 60 days, similar manipulations have no discernible effect on the pattern
ocular dominance in VI. (Hubei k Wiesel, 1970), thus marking the close of the
critical period. Interestingly, Cynader and Mitchell (1980) demonstrated that
the critical period can be extended by rearing animals in complete darkness.
This, and similar studies, suggests a role for overall levels of activity in setting
parameters relevant to plasticity in VI.
9
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
1.1.3 ODC plasticity as a model critical period
Ocular dominance in primary visual cortex displays a very particular kind of sen
sitive period. Manipulations which upset the typical patterns of activity from the
periphery can have substantial and permanent effects on the cortical representa
tions of individual eyes during a period that has a clear onset, and a clear offset.
Manipulations before the onset of the critical period have no effect, probably
because input from the eyes is too weak to override intrinsic activity of the tha
lamocortical circuit (Chiu & Weliky, 2002; Crair et al., 2001). Once the critical
period has closed - possibly as a result of the maturation of inhibitory synapses
(Huang et al., 1999; Hensch et ah, 1998) - drastic manipulations of the input
have little effect on the ODCs. It is important to keep in mind, however, that
many other aspects of visual representation in VI do remain very plastic well into
adulthood (Gilbert, 1996). For example, the receptive fields of VI neurons can
be modified by induction of an artificial blind spot (Gilbert, Das, Ito, Kapadia,
& Westheimer, 1996). Why ODCs would be susceptible to a critical period effect
when these other aspects of VI remain plastic is an important question for future
research.
The fact that critical periods for ODC formation have been found in so many
species - in rats, cats, ferrets, monkeys, and even mice - suggests that the critical
period for ODCs itself is a general phenomenon, at least for mammals. It remains
to be seen whether the parameters that regulate the onset and offset of the
critical period for ODCs are general to other critical period phenomena; indeed
it is possible that the critical period for ODC formation is governed by different
10
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
parameters in different species. One candidate mechanism for the regulation of
ODC plasticity comes from a series of studies involving transgenic mice - a species
in which patterns of binocular connectivity are slightly different from the ODCs
discussed thus far.
1.1.4 Inhibition and the organization of mouse binocular cortex
Although mice do not have the ocular dominance columns typical in higher mam
mals, they do have a critical period for the organization of binocular responses
in primary visual cortex. Gordon and Stryker (1996) characterized the ocular
dominance of cells in the binocular zone of mouse primary visual cortex, and
demonstrated that they were sensitive to monocular deprivation. As in the case
of ocular dominance columns, responses shifted toward the undeprived eye. Fur
thermore, the plasticity of these physiological responses was limited to a brief
window between P19 and P28.
The characterization of binocular visual cortex in mice is of great technical
importance because of the ready availability of strains of mice with targeted mu
tations. For example, Huang et al. (1999) examined the development of binocular
visual cortex in transgenic mice that constitutively overexpress BDNF, a trophic
factor. Compared to wild type controls, these mice were sensitive to monocu
lar deprivation earlier in development, and also lost this sensitivity much earlier.
This is an interesting finding, because it seems to contradict an earlier view that a
decline in the availability of trophic factors was responsible for age-limited plastic
ity in some systems (Thoenen, 1995). Instead, it appears that an over-abundance
11
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
of trophic factor can result in the premature development of adult-like inhibitory
circuitry.
Further evidence that the maturation of inhibitory circuitry provides neces
sary and sufficient conditions for the closure of the critical period, at least for
the plasticity of thalamocortical comes from another transgenic mouse strain
(GAD65 knockouts, Fagiolini & Hensch, 2000) GAD65 is one isoform of the en
zyme that synthesizes the inhibitory neurotransmitter gamma-amino-buteric acid
(GABA). Mice lacking this enzyme never develop normal levels of GABA-ergic
inhibition. In these animals, monocular deprivation can induce reorganization of
primary visual cortex well after the critical period has closed in wild type strains.
Furthermore, pharmacological restoration of inhibition (by injection of benzodi-
azapenes) permanently closes the critical period in these animals, suggesting a
causal role for inhibition, particularly GABAergic circuitry.
Inhibition might effect plasticity by acting as a high-pass filter for correlated
activity. That is, neurons which already have strong projections would still be
able to drive activity in inhibited post-synaptic targets, but more weak projec
tions would be unlikely to do so. This means that few new synapses can be
formed, because the ability to drive activity in post-synaptic cells is critical for
the formation of new connections. Furthermore, existing synapses that are below
threshold will tend to be weakened, further sharpening the pattern of connec
tivity. In this way, the development of inhibition seems to impose a parametric
limit on the plasticity of primary visual cortex.
1 2
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
1.2 Barrel fields in rodent somatosensory cortex
Thalamic projections to the primary somatosensory cortex of rodents form char
acteristic structures called “barrels." Each barrel represents a single whisker (or
vibrissa Van der Loos k Woolsey, 1973). These barrels are characterized phys
iologically by their uniform responses to the stimulation of individual whiskers.
They are also readily visible in anatomical studies: staining at the trigeminal
nerve of vibrissae results in a characteristic pattern of anterograde (message) in
somatosensory cortex.
Between birth and about the sixth post-natal day, there is a critical period
during which removal of a single vibrissa (or a whole row) results in reorganiza
tion of somatosensory cortex such that whiskers adjacent to the removed whisker
take over its cortical space (Woolsey, 1990). After six days, removal of whiskers,
or indeed whole rows of whiskers has no further effect on the pattern of thalam
ocortical projections, nor are there permanent effects on physiological responses
driven by peripheral stimulation when whiskers are removed and then allowed to
grow back.
Interestingly, this critical period only appears to limit plasticity in the thala
mocortical projections to layer IV of cortex. Sensory maps in superficial layers of
cortex - in particular, layer II/III - are defined by cortico-cortical connections,
and can in fact be modified as a result of peripheral reorganization until well into
adulthood.
Although the basis of age-limited plasticity in the barrel formations is not
well understood, the use of transgenic mice in experiments on barrel plasticity
13
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
has provided a unique opportunity to test a popular hypothesis about the role of
particular subtypes of NMDA receptors in age-limited plasticity. NMDA recep
tors are multimeric proteins composed of up to five subunits: one NR1 subunit
and a set of NR2 subunits. Whereas NR1 subunits are of a single type, there are
at least four different types of NR2 subunit. NR2 subunit type is developmentally
regulated, and changes in the distribution of NR2 receptor type have been linked
to developmental regulation of plasticity (Carmignoto & Vicini, 1992). Early in
development, NMDA receptors tend to contain mainly the NR2B subunit, but
these are gradually replaced with NR2A subunits during development (Monyer,
Burnashev, Laurie, Sakmann, & Seeburg, 1994). The different subunits give rise
to different cell kinetics. Immature synapses have characteristically slow exci
tatory post-synaptic currents (EPSCs Carmignoto & Vicini, 1992). Essentially,
once the channel is open, it takes a longer time to close. NMDA receptors with
NR2A subunits, on the other hand, have relatively fast EPSCs.
These changes in kinetics are potentially important, because slower receptor
kinetics allow more calcium ions to enter the cell during each activation, and Ca++
plays an important role in the modification of synapses. This would result in a
higher probability of post-synaptic modifications dependent on calcium-related
biochemical cascades which contribute to experience-dependent plasticity (Bliss
& Collingridge, 1993). Furthermore, NR2A subunit insertion is regulated in part
by experience. Quinlan, Olstein, and Bear (1999) demonstrated bidirectional
regulation of subunit expression such that dark rearing resulted in a depressed
level of expression for NR2A subunits (in visual cortex only). Interestingly, as
14
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
little as two hours of exposure to light dramatically increased the expression of
NR2A subunits relative to dark rearing (M., Philpot, Huganir, & Bear, 1999),
but this effect could be reversed within two or three days by further light de
privation. It is important to note, however, that these studies were carried out
during the critical period for plasticity in visual cortex. It is possible that the
dynamic regulation of NMDA receptor subunit distribution is itself limited by
a maturational mechanism. It is interesting to note, in this regard, that none
of the manipulations used had any apparent effect on the distribution of NR2B
subunits. It remains possible that NR2B subunit distribution is downregulated at
the end of the critical period by an experience-independent mechanism separate
from the experience-dependent one that modulates NR2A insertion. Indeed, Xue
and Cooper (2001) found that experience-dependent effects on NR2A distribu
tion were age-limited, so that NR2A levels were not influenced by dark rearing
in animals beyond the critical period.
Lu, Gonzalez, and Crair (2001) used genetically modified mice who do not
express any NR2A subunits to examine the role of NMDA receptor subunit type
in the closure of the sensitive period. They found that NR2A knockout mice
showed normal modification of barrel assemblies as a result of peripheral damage
during the critical period. More importantly, they showed that plasticity in
these animals declined over the same period of development as observed in wild-
type subjects. This must mean that the expression of NR2A subunits is not
necessary for the closure of the sensitive period in the formation of barrels in
primary sensory cortex. In typically developing animals, insertion actually seems
15
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
to start at the beginning of the critical period, rather than at the end. The
faster currents of NR2A receptors may encourage plasticity, particularly if what
is being learned requires very fine temporal resolution. The slow currents of
NR2B receptors may, in some instances, slow down the sorting of appropriate
and inappropriate connections because the longer open time results in a higher
probability of inappropriate inputs being active during depolarization. This does
not appear to be the case in the mouse barrel formation, however, because its
segregation into barrels is normal in animals with no NR2A receptors at all. If
NR2A receptors contributed to plasticity, NR2A knockouts would be less sensitive
than wild type to follicle cauterization during the sensitive period. They were
not (Lu et al., 2001).
Although these results would seem to rule out one particular candidate for a
parametric sensitive period in somatosensory cortex, the overall pattern of results
does suggest that age-limited plasticity in this system is parametric, rather than
structural. It may be that the same inhibitory mechanisms that control plasticity
in visual cortex are at work in somatosensory cortex.
It is worth noting that barrel formations are a specific aspect of the organi
zation of somatosensory cortex which are subject to a critical period. In other
species, plasticity remains present in somatosensory cortex well into adulthood.
For example, in an experiment by Buonomano and Merzenich (1995), fingers of
adult monkeys were sutured together. After a period of several months, responses
in somatosensory cortex were recorded to regions between the two fingers. Re
sponses to these regions were elicited by stimulation to the corresponding regions
16
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
on either finger even after they were physically separated, strongly suggesting
that the novel receptive fields were the result of cortical reorganization. It is
also the case that receptive fields in visual cortex, remain highly malleable in
adulthood (e.g., Das & Gilbert, 1997). Indeed, the sharp limits imposed on the
plasticity of structures such as barrels and ocular dominance columns is in sharp
contrast to the extensive plasticity of other aspects of sensory maps.
1.3 A critical period for tuning of multimodal maps in
barn owls
Barn owls are extremely good at echolocation. One potential basis for this skill
is the fact that the map of space based on inter-aural time differences (ITDs) in
the central nucleus of the inferior colliculus (ICC) and the map of space derived
from visual experience in the optic tectum (OT) are tightly in register. That
is, responses in OT can be elicited either by presentation of visual stimuli at
particular locations in the visual field, or by auditory stimuli with ITDs which
indicate that they come from the same location in space (see, e.g., Knudsen, 2002,
for review)
Although studies with blind-reared owls suggest that this map is shaped in
part by innate mechanisms guiding the connectivity of ICC and ICX, this mul
timodal map can be adapted to accomodate abnormal auditory or visual input
(Knudsen & Knudsen, 1990). Knudsen and Knudsen (1990) established this be-
haviorally by raising baby barn owls with prismatic spectacles. The spectacles
17
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
had the effect of shifting their visual experience 23 degrees to the left or to the
right. Knudsen and Knudsen (1990) measured orienting responses to sounds from
a speaker at random locations in their experimental room and compared these
to orienting responses to visual stimuli at the same locations. At first, visual ori
enting responses deviated from auditory responses as a result of the spectacles.
However, younger owls were able to adjust to the placement of prisms within two
months.
Brainard and Knudsen (1998) demonstrated that responses in the OT of owls
raised under these circumstances reveal a commensurate shift: the best responses
to auditory stimuli in OT are shifted in the direction of the prismatic lenses,
resulting in a more accurate registration of the visual and auditory maps of
space, and supporting more accurate orienting behavior. These adaptive shifts
appear to be the result of plasticity in the forward projections from the ICC to
the external nucleus of the inferior colliculus (ICX), which projects directly onto
OT (Feldman & Knudsen, 1997; DeBello, Feldman, & Knudsen, 2001).
The plasticity of this multimodal map is limited by a sensitive period of
development. Interestingly, the timing of the sensitive period depends on the
richness of the experience after placement of the spectacles, and the abruptness
of the prismatic shift. In the initial study, bespectacled owls were housed alone
in small cages during the adaptation period (Knudsen & ; Knudsen, 1990). Under
these circumstances, animals older than 100 days failed to adapt to a 23deg
shift in visual experience. More dramatically, animals raised with spectacles in
individual cages during the sensitive period for initial adaptation also showed
18
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
evidence of a second sensitive period for re-adjustment once the spectacles were
removed; their orienting responses failed to return to normal if the spectacles
were removed after 200 days. Brainard and Knudsen (1998) housed most of their
experimental subjects in large flight cages with other owls during adaptation, and
for these animals found very different estimates of the senstive periods: For initial
adaptation, the estimated the closure of the sensitive period to be around 200
days, whereas there was no apparent sensitive period for readjusting to removal
of prismatic spectacles.
It is interesting to consider what this means about changes in plasticity during
the sensitive period. One possible conclusion is that the change in plasticity over
time is fairly gradual, such that relatively minimal experience results in map
reorganization before 100 days, but after 100 days more extensive experience is
necessary to induce plasticity. Indeed, a later study by Linkenhoker and Knudsen
(2002) demonstrates substantial plasiticity in adult owls well past the typical
critical period when small, incremental shifts are introduced (as opposed to the
large shifts used in experiments with juveniles). By starting with small prismatic
shifts of 6 degrees, and building up to 23 degrees, Linkenhoker and Knudsen
(2002) were able to observe shifts in tuning of OT neurons in adult owls equivalent
to those observed in juveniles fitted with 23 degree spectacles in a single step.
Again, the results do not conflict with the notion that plasticity declines with
age - a very different kind of experience is necessary to induce changes in adult
vs. juvenile animals - but they do demonstrate that the adult system is capable
of substantial reorganization.
19
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
A recent study by DeBello et al. (2001) suggests a mechanism by which plas
ticity is limited as a result of development. They found that in juvenile owls,
connections from ICC to ICX were coarsely topographic. In normally raised
adult owls, the arborization of ICC axons in ICX was much sparser overall, and
more focused in the region that supports typical registration of the space maps in
ICC and OT, dubbed the peak of normal projection (PNP). Interestingly, prism-
rearing did not result in a substantial reduction in the number of synapses at the
PNP. Interestingly, the shift in receptive field is achieved by different mechanisms,
depending on the direction of the shift. Azimuth is encoded on a rostro-caudal
axis in ICC bilaterally. However, the orientation of the map differs across the
hemispheres, so that on the left side, the rostro-caudal axis in ICC represents
space from left-to-right, whereas on the right side, space is mapped from right-
to-left. Adaptation to prismatic lenses in ipsilateral ICX occurs via exuberant
ingrowth of new connections rostral to the pre-potent pattern of connectivity. In
contralateral ICX, by contrast, adaptation is acheived by the pruning of rostral
connections, and the relative sparing of caudal connections (DeBello et al., 2001).
It is thus plausible to suggest that the relative plasticity of ICC — » ICX maps
in juveniles is the result of the broader distribution of connectivity. Plasticity
in ICX depends on an instructive signal from OT (Hyde & Knudsen, 2002), and
many synapses in juvenile ICX contain only NMDA receptors and can therefore
only be activated by coincident input. Coincident activity of ICC and OT cells
(for example, when the source of a sound is visible) may result in the activation of
these synapses, triggering a release of trophic factors that results in the survival of
20
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
caudal synapses and generation of new rostral synapses. However, in adult owls,
where the distribution of synapses is more sparse (and fewer of the synapses are
NMDA-only), there is essentially “less to work with.” That is, incoming activity
resulting from a radically shifted map in OT is not coincident with any incoming
activity from ICC which represents the same point in space, and there is thus no
basis on which to elaborate new functional synapses.
This hypothetical mechanism would explain why adult plasticity is limited to
situations in which small, incremental shifts are used. Older subjects may lack
the extent of connectivity needed to adapt directly to a 23 degree shift in visual
stimulation, but they maintain enough “inappropriate” connectivity to adjust to
smaller shifts. Once new connectivity is established, this creates the basis for
further plasticity, because the extent of connectivity has been increased (at least
rostrally). This mechanism would also explain why, given sufficient experience,
owls at any age can adjust to the removal of prismatic spectacles. Calibration of
the visual and auditroy space map does not result from a thorough re-mapping
of the connections between ICC and ICX. Rather, existing connectivity is largely
maintained, while novel connections are formed - or supernumerary connections
which normally have been pruned are spared - to accomodate the shift in visual
experience (DeBello et al., 2001).
Furthermore, activity at connections between ICC and ICX at the PNP is in
hibited by GABAergic connections while prismatic spectacles are in place (Zheng
& Knudsen, 2001). Neurophysiological evidence for this comes from studies in
volving topical application of bicuculine (a GABA antagonist) in ICX. When
21
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
inhibitory function is blocked in the ICX of owls raised without prismatic spec
tacles, the tuning of these cells to auditory input remains relatively unchanged.
However, when the same experiment is conducted with prism-reared owls, the
mean best response of cells shifts to an intermediate value between that expected
on the basis of innate connectivity, and the value expected based on the learned
multimodal map. Thus, the release from inhibition unmasks an underlying pat
tern of connectivity that is normally functionally suppressed.
This maintenance of different sets of connections between ICC and ICX, and
the ability to selectively inhibit one or the other underlies a further feat of adult
plasticity. Owls raised with prismatic spectacles, who then have the spectacles
removed and are allowed to adapt to normative visual experience, then adapt very
rapidly again to the replacement of spectacles, provided they shift the visual field
in the same direction as the initial prisms. This is in sharp contrast to adults
raised without any perturbation of visual space, who can only adapt to small,
gradual shifts in the visual space map (Linkenhoker & Knudsen, 2002).
The range of effects reflecting developmental changes in plasticity of the mul
timodal space map in barn owls is most consistent with the notion of structural
limits on plasticity. Limits on plasticity in adulthood are the result of the grad
ual tuning of connections from ICC to ICX, and not of a general change in the
chemical or physiological properties of the system, as appears to be the case in
the unimodal sensory maps described above.
2 2
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
1.4 A behavioral sensitive period in songbirds
Many species of songbird learn their song during a sensitive period of develop
ment. In such so-called “closed-ended learners” song is typically learned early in
development by a process of successive approximation of a model (or “tutor”).
Song behavior stabilizes during puberty, after which no further modifications to
song typically occur. Birdsong is potentially an extremely informative model sys
tem for the study of age-limited learning, because the development of both song
behavior and the brain regions which support it are fairly well understood.
Song acquisition is supported by a set of brain nuclei collectively referred to as
the anterior forebrain pathway (AFP, Nottebohm, Stokes, Leonard, Wingfield, &
Farner, 1976) which are not necessary for song production in normally-developing
adults (Bottjer, Miesner, & Arnold, 1984). Interestingly, anatomical and physio
logical changes in these regions are correlated with the typical sensitive period,
and may be developmentally regulated by changes in circulating hormone lev
els (see review in Bottjer & Johnson, 1992). The coincidence of such extreme
changes in the AFP and the timing of the sensitive period suggests that song
plasticity is limited in adulthood by maturational changes in the function of the
AFP. A number of recent studies have provided striking demonstrations that the
AFP continues to play a role in adult song behavior (Jarvis, Scharff, Grossman,
Ramos, & Nottebohm, 1998), and that adult plasticity can be “unmasked” under
conditions that engage the AFP (e.g., Nordeen & Nordeen, 1992; Williams &
Mehta, 1999).
23
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Songbirds differ dramatically in the degree to which their song learning can
be considered “age-limited” or “closed-ended.” The discussion here focuses pri
marily on studies conducted with zebra finches, because this species is the most
widely used (see Brenowitz, Margoliash, & Nordeen, 1997, for a review of the
development of song in other species and/or comparisons among species).
1.4.1 Defining the sensitive period for birdsong learning
Experiments involving the presentation of tutors to juvenile zebra finches at vary
ing ages have provided a good deal of information about the development of song
during the sensitive period. In particular, song acquisition is often described as
consisting of two distinct phases: a sensory learning phase in which an auditory
“template” or memory of a model song, and a sensori-motor integration phase in
which singing behavior is acquired by successive approximation of this template
(Marler, 1997). In zebra finches, these phases largely overlap.
The sensory acquisition phase begins at around 17 days post-hatch (17d,
Eales, 1985). Whereas birds can learn to copy the song of a tutor whom they
have heard only between 17d-35d (Bohner, 1990), plasticity is apparently at its
height later, between 35d-70d (Eales, 1985). Birds presented with different tutors
during these early and late time periods ultimately copied more of the tutor they
heard later. Recent data from training studies suggests that when tutors are
presented after the onset of vocalization (37d-40d) the tutee’s song evolves to
resemble the tutor’s song in a matter of three days (Tchernichovski, Mitra, Lints,
& Nottebohm, 2001).
24
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
In the sensori-motor phase, birds appear to follow a process of successive ap
proximation. Song behavior starts as a disorganized stream of sound bearing
little resemblance to adult song (Zann, 1996; Tchernichovski, Lints, Mitra, &
Nottebohm, 1999; Tchernichovski et al., 2001). Song behavior is adjusted until
recognizable copies of tutor syllables emerge; meanwhile birds pass from a phase
in which tutor syllables are repeated linearly to one in which they are organized
into a temporal order resembling that of the tutor song. The spectral and tem
poral details of song are gradually refined until the song reaches its adult state,
in which the same elements are sung in a highly stereotyped order - and also in
a highly stereotyped manner, such that there is little variation among exemplars
of an individual syllable from rendition to rendition. Clasically, song is said to
“crystallize” at around 90 days and no new song material is learned after this
point (e.g., Brenowitz et al., 1997). This is also roughly the age at which many of
the anatomical and physiological properties of the song system reach their adult
form.
1.4.2 Evidence for plasticity after the sensitive period
Interestingly, neither of these aspects of song learning - acquiring a model song
to copy, and learning to copy it - is strictly limited by age-related changes in
the plasticity of the song system. While the Bohner (1990) and Eales (1985)
studies suggest that there are optimal periods of development during which a
song model is acquired especially well, there is no hard and fast limit on this
form of plasticity. A dramatic demonstration of this comes from Jones, ten Cate,
25
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
and Slater (1996), who isolated birds from conspecific song well into adulthood
(240d). Once exposed to adult song, these birds learned readily from tutors,
although their songs were typically slightly shorter than songs of birds exposed
to models as juveniles. This extension of the sensitive period is apparently in
definite, and independent of a number of physiological and anatomical changes
which coincide with the opening and closure of the sensitive period under normal
circumstances. For example, the growth and retraction of nuclei critical for song
learning (Bottjer, Meisner, & Arnold, 1986) and changes in the distribution of
NMDA receptors (Basham, Sohrabji, Singh, Nordeen, k Nordeen, 1999) and re
ceptor subtypes (Singh, Basham, Nordeen, & Nordeen, 2000) are delayed by lack
of exposure to conspecific song. However, by adulthood (120d), these parame
ters all reach normal adult values, even in animals never exposed to a tutor song
(Iyengar k Bottjer, 2002; Livingston, White, k Mooney, 2000; Burek, Nordeen,
k Nordeen, 1991).
New data also seem to refute the notion that the song behavior of typically
developing birds becomes “crystallized” at the end of the sensitive period, and
is rendered impervious to change by some discrete neurodevelopmental event at
the close of the sensitive period. For example, recent work by Brainard and
Doupe (2001) and Lombardino and Nottebohm (2000) demonstrates that the
crystallization process actually unfolds well into adulthood. Song behavior is
fairly stereotyped at 90 days, but precise measures of the large-scale temporal
organization of song behavior, as well as of finer-grained spectro-temporal aspects
of song demonstrate that the stereotypy of song behavior continues to increase
26
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
in a fairly linear fashion well after the close of the critical period. Furthermore,
both Lombardino and Nottebohm (2000) and Brainard and Doupe (2001) found
a relationship between the age at deafening and both the rapidity of onset and
severity of deafening-induced disruptions of song. As we shall see below, earlier
studies had established that manipulations which disrupt feedback can interfere
with the maintenance of song in adult birds. Thus, there are age-related changes
in plasticity during adulthood which appear to be continuous with changes in
plasticity during the standard sensitive period.
Studies in which behavioral plasticity is induced in normally-developed adult
birds by disruption of feedback suggest that the mechanisms of song maintenance
in adulthood are identical (or at least related) to those involved in initial song
acquisition. Nordeen and Nordeen (1992) found that deafening zebra finches re
sults in deterioration of song structure over a period of weeks. Similarly, Williams
and McKibben (1992) found that resection of the tracheosyringeal nerve (which
controls the vocal organ) had long-term effects on song structure (even after the
nerve had regrown). In both cases, the results are best explained by a model in
which song stereotypy is maintained and improved throughout adulthood by con
tinuously comparing performance to a stored memory - or "target." If behavior
consistently matches this target, there is no reason for the behavior to change,
even if the system remains capable of plasticity at the cellular level. Once a
mismatch between the target behavior and the animal’s observation of its own
performance is introduced, either by disrupting feedback directly, or introducing
noise into the motor output, behavior becomes subject to change. This strongly
27
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
suggests a role for learning in the maintenance of so-called “steady-state” behav
ior in adults beyond the sensitive period for song learning. Interestingly, adult
plasticity depends on the same neural mechanisms that underlie plasticity in the
developing brain. If either deafening (Brainard & Doupe, 2000) or nerve resection
(Williams & Mehta, 1999) are paired with lesions of a forebrain nucleus critical
for song learning in juveniles (1MAN, Bottjer et al., 1984), the distortion or
removal of feedback has no permanent effect on song behavior.
Plasticity in adulthood is not limited to cases in which aberrant feedback re
sults in disruptions to song. Recent experiments in which feedback is reversibly
disrupted suggest that adults retain the ability to reacquire song when disrup
tions are temporary. Leonardo and Konishi (1999) used delayed auditory feed
back to disrupt song behavior without surgical deafening or nerve resection. They
observed a large influence of this procedure, particularly on the temporal orga
nization of song, e.g., “stuttering” and rearrangement of song syllables. After a
few weeks without the delayed feedback, song returned to normal, without any
permanent influence of the feedback procedure.
Temporary deafening has also been used to examine adult plasticity. Woolley
and Rubel (2002) used ototoxic hair cell lesions to reversibly deafen Bengalese
finches. Bengalese finches exhibit slightly greater plasticity than zebra finches,
and thus showed evidence of deafening-induced changes to song within a week of
deafening - depending on their age, zebra finches can take months to show an in
fluence of deafening. Because hair cells recover in birds (Woolley k Rubel, 1999),
hearing recovered after a few weeks, and the Bengalese finches were able to hear
28
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
their own song and that of cagemates with which they were housed. Interestingly,
in addition to reacquiring their pre-treatment songs, some subjects acquired novel
song elements similar to their cagemates’. Zevin, Seidenberg, and Bottjer (2000)
used chronic exposure to white noise to reversibly remove auditory feedback from
zebra finches. After white noise was discontinued, song was disrupted in a man
ner similar to the effects of deafening. Like Bengalese finches, most zebra finches
reacquired elements of their pre-treatment song. However, there were a number
of exceptions. In some instances, song became increasingly stereotyped without
increasing in similarity to pre-treatment song, and syllables which were entirely
deleted from song were never replaced (cf., Hough & Volman, 2002). Finally,
unlike the Bengalese finches studied by Woolley and Rubel (2002), none of the
zebra finches we observed appeared to learn novel song material from animals
with which they were housed.
The motivation in these studies was to examine the hypothesis that limits
on adult plasticity are the result of behavioral stereotypy itself. If plasticity de
clines as a result of the entrenchement of over-learned representations supporting
stereotyped song, then it might be possible to re-introduce a period of plasticity
for these representations. The fact that Bengalese finches were able to acquire
novel song elements in adulthood once their songs were disrupted suggests that
limits on plasticity in that system are in part the result of song stereotypy in
adulthood.
The data discussed here are consistent with the view that age-limited learning
in some species of songbirds is the result of the engrainment of representations
29
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
that support stereotyped song, and thus reflect a structural limit on plasticity.
It is a particularly striking example because, as we shall see below, a number of
potential parametric mechanisms have been examined in some detail, yet none
account for the observed pattern of behavioral plasticity. In particular, all of the
parameters considered to date reach adult values by the end of puberty, even in
the absence of exposure to song, whereas the ability to learn song remains intact
well into adulthood.
1.5 Language Acquisition
As human language is an extremely complex phenomenon, we should not be
surprised that a number of different notions of what it means to have a “sensitive
period for language acquisition” exist in the literature. According to some, the
notion of a sensitive period is only really relevant to first-language learning (e.g.,
Bialystok & ; Hakuta, 1994). Others maintain that the difficulties adults often
have in acquiring a second language are also the result of the closure of a sensitive
period (e.g., Johnson & Newport, 1989). Furthermore, there is some disagreement
over which aspects of language axe susceptible to age-limited learning and which
are not. For example, whereas it is clear that the ability to learn new words is
maintained well into adulthood (Service & Craik, 1993; McCandliss, Posner, &
Givon, 1997; Markson & Bloom, 1997), it seems fairly clear that sensitive-period
effects emerge in th e learning of novel phonetic categories (e.g., Strange, 1995).
On the other hand, as we shall see, the question of whether a sensitive period
exists for syntax remains highly controversial.
30
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Finally, although there is apparent consensus that no sensitive period exists
for lexical acquisition, a number of recent studies suggest that there may indeed
be a processing advantage for words learned early in life over those learned later
in life across a range of lexical tasks (Morrison & ; Ellis, 2000; Bates, Burani,
D’Amico, & Barca, 2002). While the study of age-limited learning effects in
lexical acquisition raises a unique set of methodological difficulties, such effects
potentially raise an interesting point about the nature of sensitive periods in other
domains. Once one allows that a graded, probabilistic decline in performance on
a grammaticality judgement task or phoneme discrimination task may be taken
as evidence for sensitive periods in those domains (Johnson & Newport, 1989;
Flege, Yeni-Komshian, S z Liu, 1999), there is no principled reason to reject the
notion that a sensitive period for lexical acquisition exists if age-of-acquisition
are in fact real. This suggests the possibility that changes in the ability to
learn language over the lifespan are mediated by a single mechanism, despite
the different developmental trajectories of different aspects of language.
1.5.1 First language learning
The most compelling evidence that a sensitive period exists for first language
(LI) acquisition comes from studies of deaf children with hearing parents who
do not know sign language. In these cases, children are raised in an otherwise
normal social milieu, but are often not exposed to natural language until later
in life (contrary to other recorded situations in which isolation from language is
coupled with extreme neglect which has frankly horrifying consequences for other
31
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
aspects of development Curtiss, 1977; Lane, 1979). Deaf children and their non
signing parents usually develop improvised gesture systems (also called homesign)
which exhibit only rudimentary aspects of natural language (Goldin-Meadow &
Mylander, 1998). Thus, although they acquire and use a somewhat language-like
communication system, they can be said to learn American Sign Language (ASL)
as their first natural language. This sometimes occurs quite late in childhood.
A number of studies demonstrate that ultimate attainment in ASL is much
poorer in people who learn it as a first language during adolescence than in those
who learn it early in childhood (Mayberry & Eichen, 1991; Mayberry, 1993).
Furthermore, acquisition is more succesful in people who have lost their hearing
in childhood, after they had acquired a spoken language, than in congenitally deaf
individuals learning ASL at the same age (Mayberry, Lock, & Kazmi, 2002a).
In these studies, language skill was measured by having subjects repeat long,
morphologically complex sentences. This task taps a broad range of language
abilities, which makes it ideal for establishing the relative proficiency of different
groups - but makes it difficult to identify particular aspects of language which
are effected by late learning.
In a study by Mayberry and Eichen (1991) congenitally deaf individuals dif
fering in the age at which they acquired sign language made different kinds of
errors. People who learned ASL from birth tended to make errors which main
tained the meaning of the sentence, such as replacing one word with another
having a similar meaning (i.e., paraphrasing). Later learners, on the other hand,
were more prone to “phonological” errors, for example producing a sign that
32
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
overlaps gesturally with the target sign, but could mean something very differ
ent. This difference in error type is difficult to interpret, because it could reflect
a number of different processing difficulties. Mayberry and Eichen (1991) suggest
that late learning might result in imperfect phonological representations which
result in poor comprehension. However, it is also possible that the late signers
understood the sentences, but made errors in repetition due to an influence of im
perfect phonological/phonetic knowledge on production. It is also possible that
a lack of syntactic or lexical proficiency on the part of later learners resulted in
an inability to comprehend the sentences, and resulted in their trying to repeat
the sequences without understanding them.2
On their face, the data appear most consistent with a parametric sensitive
period. In particular, they fit neatly with Lenneberg’s notion of the “exercise
hypothesis.” According to this hypothesis, early experience with language is nec
essary to establish structures in the brain which are specialized for language.
Once these are in place, however, they are sufficiently general to permit acquisi
tion of multiple languages, even later in life. Without exposure to language during
a maturationally-defined sensitive period, the language system does not develop
2One concern with these studies is the high rate of errors made by people who learned
ASL from birth. For example, in Mayberry and Eichen (1991), verbatim recall of words in
the sentences was only .68 for native speakers (compared to .60 for speakers acquiring ASL in
childhood and .56 for adolescent learners). This is a fairly high error rate for a task that requires
participants simply to repeat grammatical sentences with no delay. It would be preferable to
have a task which learners from birth perform with no apparent difficulty, and which nonetheless
shows strong effects of age of acquisition. If the task is too hard for native speakers - who are
completely proficient in the language - to perform with a high degree of accuracy, some factor
above and beyond proficiency must be playing a role in determining performance. Further
concerns raised by the number of ungrammatical responses produced by native speakers (20%
in some cases).
33
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
properly. Because of parametric limits on plasticity, it becomes impossible to
“catch up” to individuals with normal language experience later in life.
However, a structural critical period account cannot be ruled out without a
more systematic exploration of the particular difficulties late LI learners have, and
how these relate to the communication system they developed before exposure
to ASL. Although homesign systems have some features of natural languages,
they differ from natural languages in many respects as well (Goldin-Meadow k
Mylander, 1998). For example, whereas sign languages have rich inflectional
systems that mark case and number (in the case of ASL) and use agreement
to establish links among words in a sentence, homesign sentences are typically
described as strings of uninflected iconic gestures which depend heavily on deictic
information (e.g., pointing to things in the environment) to be understood at all.
It may turn out that the errors late learners make in ASL are systematically
related to aspects of their homesign system. As we shall see below, in studies
of second language acquisition, one can predict which particular aspects of the
L2 will be difficult to learn on the basis of the structure of LI (e.g., Bialystok
& Hakuta, 1994; Best, McRoberts, & Goodell, 2001). If “LI” is a homesign
system, it becomes difficult to predict how this knowledge might interact with
ASL acquisition. It seems at least plausible that the difficulty of “late LI” learners
is due in part to the fact that their pre-linguistic communication systems, by dint
of not being natural languages, have less in common with ASL than a natural
language (e.g., English) would. In fact, a recent study by Mayberry et al. (2002a)
suggests that individuals who become deaf after having learned a spoken language
34
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
acquire better skills in ASL than individuals who are cogenitally deaf and acquire
ASL as a “first language” at the same age. Issues related to the overlap (or lack
therein) of early- and late-learned knowledge have been treated more thoroughly
in the literature on second language learning, and suggest a strong influence of
the similarity between LI and L2 on measures of age-limited plasticity, as we
shall see presently.
1.5.2 Second language learning
There are actually two literatures on sensitive period effects in second language
learning, one concerning the acquisition of syntax, and the other concerning the
learning of phonetic categories. I discuss them separately, although, as we shall
see, they have begun to converge on similar conclusions about the nature of the
sensitive period for second language learning, and the importance of taking into
account the nature of the relationship between the first and second languages in
studying this phenomenon.
1.5.2.1 Syntax
The first systematic study to provide evidence for a sensitive period in L2 ac
quisition was conducted by Johnson and Newport (1989). They documented an
age-related decline in performance on a grammaticality judgement task (in En
glish) in Korean and Chinese speakers learning English as L2. Furthermore, the
particular shape of the function relating age of arrival (AOA) to grammatical
ity judgement performance suggested a qualitative change in this relationship at
35
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
the age of 15 years. Before 15 years of age, a weakly negative but systematic
relationship between AOA and performance was observed. After 15 years of age,
however, performance was overall quite poor - none of the subjects performing
within range of native speakers - and no systematic relationship between AOA
and performance was observed.
This strongly suggested that some discrete event at the age of 15 resulted
in a decline in the ability to acquire the grammar of a second language. This
interpretation has not held up to scrutiny, however. First, consider the reanalysis
of the Johnson and Newport (1989) dataset undertaken by Bialystok and Hakuta
(1994). Johnson and Newport (1989) had selected the age of 15 years to look for
a qualitative shift in the relationship between AOA and performance because this
was the median AOA of their sample. However, there is nothing special about the
age of 15 in particular; for example Lenneberg (1967) suggested that the closure
of the sensitive period for language should be at around 12 years of age. For this
reason Bialystok and Hakuta (1994) took a different approach. They used a curve-
fitting procedure to search for nonlinearities in the relationship between AOA
and performance on the judgement task. Essentially, this consisted of looking
for cutoff points where using two different regression equations accounted for
more variance than using a single equation to describe the entire data set. They
found such a point at the age of 20 and arrived at a very different conclusion
about the relationship of AOA to performance on the judgement task: There was
a significant, negative correlation between AOA and'performance for both the
older and younger groups.
36
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The issue is further complicated when we consider data presented by Birdsong
and Molis (2002). This study, conducted using precisely the same materials
as the initial Johnson and Newport (1989) study, examined the grammatical
proficiency of native speakers of Spanish varying in AOA. Using the same analysis
technique applied by Bialystok and Hakuta (1994), Birdsong and Molis (2002)
found a nonlinearity in the function relating AOA to performance at the age of
27 years. Furthermore, they observed a number of subjects over 20 years of age
who performed within native range. The fact that one’s native language has such
a large impact on the relationship between AOA and ultimate attainment makes
it highly doubtful that there is a maturational explanation, in particular when
large numbers of Spanish speakers achieve native-like proficiency despite starting
to learn English well after puberty.
Finally, consider a study by Flege et al. (1999). In this study, a grammaticality
judgement task similar to the one used by Johnson and Newport (1989) and
Birdsong and Molis (2002) was given to native Korean speakers varying in AOA
for English. Along with the main task, a thorough questionnaire designed to
measure relevant aspects of language experience and language learning was also
presented. Responses to this questionnaire were used to control for the amount
of formal education in the target language, the proportion of time spent speaking
the target language, and other factors which both effect performance and covary
with AOA. With these factors controlled, Flege et al. (1999) found no independent
influence of AOA on performance on the grammaticality judgement task. The
suggestion is that older learners could, with more experience (and potentially
37
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
more formal education) achieve levels of proficiency similar to younger learners;
indeed, at least one subject with an AOA of 20 appears to have performed within
the range of native speakers (Flege et al., Figure 3b). Typically, the circumstances
under which older immigrants learn their adopted language are quite different
from those under which younger immigrants learn it, and this apparently accounts
for most of the relationship between AOA and ultimate attainment.
In short, the data suggest that, while there is a relationship between AOA and
ultimate performance on a grammaticality judgement task, this effect is clearly
modulated by the pairing of LI and L2 - that is, the relationship of the language
one already knows to that which one is trying to learn. Furthermore, in at least
one instance, careful consideration of factors that are highly correlated with AOA,
leads to the conclusion that there is no direct effect of AOA on grammar learning
(Flege et al., 1999).
1.5.2.2 Speech
In addition to the grammaticality judgement task, Flege et al. (1999) examined
the accentedness of speech in their experiment. They asked subjects to repeat
a series of sentences aloud, and these productions were later scored for accent
edness by a group of native English speakers. Flege et al. (1999) used the same
language experience questionnaire to control for factors having to do with ex
perience, formal training, and motivation in examining the AOA effect on both
syntax and accentedness. However, unlike performance on the grammaticality
38
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
judgement task, accentedness was shown to be directly effected by AOA, even
with these other factors taken into account.
Thus, there is strong evidence for age-related limitations on the ability to
acquire native-like speech production abilities in a foreign language. Flege et al.
(1999) suggest that this reflects a failure to perceive foreign speech sounds veridi-
cally as a result of learning their LI phonological inventory. Flege, Bohn, and
Jang (1997) further examined the influence of LI phonological inventory on per
ception and production of English vowels in a study involving native speakers
of four languages with very different vowel inventories. All participants began
learning English after 14 years of age, mean age of 25.
Although there was evidence of learning over time - on average, participants
who had been in the country longer produced closer approximations of American
English vowels - a striking age-limited learning effect was observed. Specifically,
even among subjects with extensive English experience, a strong influence of
LI vowel inventory was observed. In many instances, differences from standard
English pronunciation could be predicted by considering the vowel inventory of
the participants’ native language.
Importantly, Flege et al. (1997) found a strong relationship between percep
tion and production. This was reflected in both a strong correlation between
production and perception accuracy and specific aspects of perception and pro
duction. For example, native Korean speakers depended overwhelmingly on tem
poral cues to distinguish tense from lax vowels (e.g., j\j vs. /I/) in perception
39
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
and also produced j\j with much greater duration than /I/. This is at vari
ance with the way native American English speakers produce and perceive this
contrast, which involves largely spectral properties of the sounds (differences in
second formant frequency).
They also found that when native and English phonetic categories were fairly
close, as in the German /i/ - /I / distinction, accuracy was fairly high, even among
less experienced speakers of English, and there was little room for improvement
with experience. Note in this regard that most languages overlap at least some
what in their phonological inventories. This means that some of one’s LI knowl
edge, at least, does in fact generalize to L2. Indeed, data from Best et al. (2001)
demonstrate that naive performance in distinguishing contrasts between foreign
speech sounds depends on the relationship between the novel sounds and the
listeners’ native phonological inventory. In cases where an L2 contrast depends
on dimensions similar to a known LI constrast, near-native performance on a
discrimination task can be achieved without any prior exposure to the language.
Thus, it is possible to explain both successes and failures to learn the speech
sounds of a second language in terms of transfer from one’s native language.
Thus, it appears that the acquisition of phonetic knowledge in a second lan
guage is limited in later learners. Furthermore, these limits appear directly re
lated to the acquisition of one’s native language. Finally, there is at least one
study which provides clear evidence that age-limited learning in speech percep
tion is not an artifact of motivational variables, or experiential variables having
to do with how frequently the language is spoken (although each of these has
40
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
been shown to have an effect). More detailed research regarding the timecourse
of age-related changes in plasticity of speech processing, and its relation to en
trenchment phenomena is necessary to determine whether these phenomena can
be explained in purely structural terms.
1.5.2.3 Age of Acquisition effects in lexical tasks
While it is clear that people can (and do) continue to learn new words late into
adulthood, we have seen thus far that age-limited learning effects can in fact be
graded and probabilistic in nature. Thus, it is possible that there is an age-related
decline in the ability to learn new words that results in later-acquired words being
represented less efficiently than early-acquired words. Indeed, there is abundant
evidence from tasks such as picture naming (Morrison & Ellis, 2000; Brown k,
Watson, 1987b), auditory and visual lexical decision (Morrison & Ellis, 1995;
Gerhand & Barry, 1998), semantic categorization (Brysbaert, Van Wijnendaele,
& De Deyne, 2000) and reading aloud (Bates et al., 2002; Gerhand & Barry,
1998).
However, we must not be too hasty in accepting the conclusion that lexical
acquisition is subject to age-limited learning effects. The methodological problem
is one of circularity. When studying acquisition of an L2, one can be fairly certain
that the reason one language was acquired before the other had nothing to do
with the relative difficulty of the languages themselves. Rather, in these cases,
and in cases of late LI acquisition, the target language is acquired at a given
point of development as a matter of historical accident. This is typically not
41
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
the case in lexical acquisition. The order in which children learn words is at
least partly determined by the difficulty of the words themselves. Words that
are short, refer to concrete, imageable concepts, and occur very frequently in the
language are acquired earlier than words that are long, abstract and infrequent.
This raises the circularity issue discussed in detail by Zevin and Seidenberg (2002,
submitted). In both of those studies, large correlations between age of acquisition
and factors which predict adult performance in many lexical tasks were found.
Furthermore, effects which had appeared to support the hypothesis that age-of-
acquisition influences reading performance were found to be artifacts of confounds
with these variables.
Zevin and Seidenberg (2002, submitted) noted that it is possible to examine
age-limited learning effects in the lexicon by considering a novel factor: frequency
trajectory. Some items are learned early because they are very frequent in dis
course directed to children. Other words are learned later because they are used
mainly in adult discourse. By examining items which differ in their frequency
trajectory (but are matched for their cumulative frequency), we can see whether
there is any particular advantage to having learned a word earlier in life, while
controling for other factors which are known to influence adult performance.
Unfortunately, research using the frequency trajectory variable has thus far
been limited to studies of reading aloud. Modeling work by Zevin and Seidenberg
(2002), discussed in more detail below (Chapter 2), suggests that this is the least
42
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
likely place to find such effects, because the strong regularities in the spelling-to-
sound mapping mean that knowledge that supports the reading aloud of early-
acquired words may generalize readily to late-acquired words, washing out the
effect. In order to explore the possibility that generalization from early- to late-
learned patterns was responsible for the lack of a frequency-trajectory effect in the
reading models, Zevin and Seidenberg (2002) manipulated the training set so that
the early and late items did not overlap. In these models, a significant advantage
for early-learned over later-learned items was observed. This demonstrates (as
does the modeling in Chapter 4) that age-limited learning effects are indeed
consistent with the general class of models used in these simulations, but only
arise when there is no opportunity for generalization from early to late-learned
knowledge. This reflects a general property of structural limits on plasticity.
Thus, the age-limited learning effects in language considered here appear to
have a good deal in common. All are graded and probabilistic - there is no
hard and fast cut-off point after which learning becomes impossible. None ap
pear particularly well-timed to specific developmental events - indeed, timing of
the sensitive period appears to depend on the particular pairing of languages in
the case of L2 syntax acquisition (Johnson & Newport, 1989; Birdsong & Mo
lis, 2002). Most importantly, all appear to involve an influence of early-acquired
knowledge on later-acquired knowledge. This is perhaps debatable in the case
of “late LI” learning, but the fact that late learners of ASL benefit substantially
from having learned another natural language - relative to having depended on
43
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
a rudimentary system of homesign that bears little resemblence to natural lan
guages, including ASL (Mayberry et al., 2002a) - suggests that the possibility of
intereference has been underinvestigated in this area of the literature. All of this
is consistent with the view that age-limited learning effects in language are more
likely to arise from structural rather than parametric changes in plasticity.
1.6 Conclusions
The research reviewed here provides the context for a refinement of the definition
of parametric and structural bases for age-limited plasticity and the distinction
between them. Rather than simply classifying the various phenomena reviewed
here as either “structural” or “parametric,” I have tried, in this concluding section,
to pit these notions against one another to demonstrate their potential for framing
testable hypotheses.
Parametric limits on plasticity may be defined as depending on a parameter of
the system that is distinct from the mechanism by which patterns of connectivity
are established. For example, consider inhibitory mechanisms as a parameter.
The development of inhibitory circuitry in visual cortex is potentially separable
from the initial formation of ODCs (which appears to be the result of either
chemical gradients or intrinsic thalamic activity). The development of inhibitory
circuitry is activity dependent, but the limits it poses on plasticity are, critically,
not local to synapses formed as part of th e pattern o f ocular dom inance. Rather,
inhibitory circuitry apparently acts to raise the threshold for activity throughout
visual cortex. Lower levels of activity favor established connections over novel
44
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
ones, and thus act to preserve the pattern of connectivity established before the
maturation of inhibitory circuitry.
Conversely, structural limits on plasticity depend on the same mechanisms
by which the pattern of connectivity is established. The calibration of visual
and auditory space maps in owls offer a clear example. Early in development,
the multimodal map in ICX is only coarsely topographic. When large prismatic
shifts in the visual field are induced by the placement of prismatic spectacles,
these can be accomodated by selectively strengthening and weakening existing
patterns of connectivity (DeBello et al., 2001). However, in adulthood, when
the map in ICX has been finely tuned via coincident activity of ICC and OT
projections, the structure of the network itself prevents sudden, large changes in
the pattern of connectivity. However, small, incremental changes remain possible,
even late into adulthood (Linkenhoker & Knudsen, 2002).
Parametric limits on plasticity can thus be distinguished from structural limits
in that the former pose real limits on whether a system can change at all in
response to the environment, whereas the latter merely limit how the system
can change. So, for example, once the critical peiod for ODC formation has
passed, there is literally no effect of such large-scale restrutcturing of experience
as the removal of one or both eyes. Birdsong, on the other hand, is subject to
change as a result of manipulations which disrupt auditory feedback throughout
the lifespan. Although the degree to which manipulations of feedback influence
song behavior does change gradually with age, there is nothing like the absolute
limit on plasticity apparent in ODC formation.
45
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
It is important to note, in this respect, that one can artificially induce a
parametric period for birdsong via lesions of the AFP. It is clear that the same
brain structures which underlie initial song acquisition are also involved in song
plasticity during adulthood. Indeed, one early model of the sensitive period
hypothesized that the role of these nuclei in song behavior slowly declined as
a result of maturation. The closure of the sensitive period marked the end of
their involvement in song behavior (e.g., Bottjer et al., 1984). If something like
this were in fact the case, it would represent an example of parametric age-
limited plasticity: the parameter in question would be the functioning of the
AFP (perhaps mediated by its relative size, which changes dramatically during
the typical sensitive period).
Similarly, we can give some more shape to the distinction between paramet
ric and structural limits on plasticity by imagining how the critical period for
ODCs might be different if it were a structural critical period. Imagine that the
mechanism by which the initial map were established played some role in limit
ing plastiicty. For example, one might imagine that during the critical period,
many of the thalamocortical synapses in VI were comprised of silent synapses.
Retinal waves or endogenously generated impulses from the LGN might result in
these silent synapses being converted to active synapses with AMPA receptors.
At this point, the stage would be set for a simple hebbian mechanism to institute
Moore’s Law: Strong synapses (representing correlated inputs from a particular
eye or layer of LGN) would get stronger, whereas weak synapses would get weaker,
until the weak synapses were too weak to drive any activity, posing a functional
46
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
limit on plasticity. Indeed, something like this model was popular until evidence
began to appear that the initial establishment of ODCs is mechanistically distinct
from the mechanisms that regulate critical period plasticity.
Language offers a further opportunity to play “assume the opposite.” Take
the version of a parametric sensitive period that has the most empirical support:
Lenneberg (1967) proposed the “practice” hypothesis, whereby exposure to any
natural language during a senstive period results in the construction of a neural
circuit that, while tuned by a particular language, is sufficiently general to sup
port acquisition of other natural languages. This view depends on the premise
that all natural languages are sufficiently alike that learning one of them allows
generalization to others. Striking evidence for this hypothesis has been provided
by Mayberry et al. (2002a). They showed that people first exposed to American
Sign Language in adulthood ultimately spoke more fluently and grammatically
if they had already learned another natural language than deaf individuals who
had no previous exposure to sign language.
These data can also be accounted for by a structural explanation, however.
This is because most deaf children who do not learn a structured sign language
until later in life spontaneously develop systematic “homesign” systems they use
in communicating with their caretakers (e.g., Goldin-Meadow & Mylander, 1998).
While these homesign systems can be characterized as language-like in some ways
(Goldin-Meadow & ; Mylander, 1998), they tend to have small, iconic lexica and
what can only be described as highly impoverished syntax or morphology. If
such a system became sufficiently entrenched, it would result in greater difficulty
47
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
comprehending and producing the complex morphosyntax of ASL than learning
a spoken language would.
Furthermore, while both the parametric and structural hypotheses can ac
count for the data from late learners of ASL, only the structural hypothesis
directly predicts that which language(s) one already knows should have a large
impact on how readily various aspects of a second language can be acquired in
adulthood. If there were a parametric sensitive period for second language ac
quisition, as suggested by Johnson and Newport (1989), AOA effects should be
independent of the language background. But this is clearly not the case. Spanish
speakers can acheive native proficiency as measured by the test used in Johnson
and Newport (1989) until their mid-twenties (Birdsong & Molis, 2002), in sharp
contrast with data from the initial study, which proposed a discrete cutoff at 15
years of age (but see Bialystok & Hakuta, 1994; Flege et al., 1999).
The fact that many of these mechanisms appear to have both maturationally-
defined and experience-dependent properties suggests that considering sensitive
periods in terms of “nature vs. nurture” is unlikely to be as productive as looking
for explanatory frameworks that cut across this divide. I have proposed one:
“parametric” vs. “structural," which I have found to be useful in organizing the
data reviewed here. Others are certainly possible, and may indeed be preferable.
The parametric vs. structural distinction is useful because it allows us to dis
tinguish two very different kinds of phenomenon. Parametric limits on plasticity
have the effect of “freezing” particular patterns in place. Once a certain point in
development has been reached (as a result of the combined effects of maturation
48
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
and experience) a parameter of the system reaches a critical value, and no futher
modification can be made. Structural limits on plasticity are more dynamic. En
trenchment phenomena such as those observed in the formation of multimodal
maps in owls, song in passerine birds and language in humans are the result of the
increasing specificity of representations that develop to support adult perception
and behavior.
49
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 2
Age-limited Learning Effects in Reading I: Modeling
Many studies of word reading have examined how stimulus properties such as fre
quency, length, spelling-sound consistency, and imageability affect performance
(see Balota, 1994; Seidenberg, 1995, for reviews)1 . Over the past several years
another factor, age of acquisition (AoA), has drawn considerable attention (Mor
rison k Ellis, 1995; Gerhand k Barry, 1998, 1999b, 1999a). The basic idea is
that the age at which a word is learned in acquiring spoken language affects the
performance of skilled readers. People learn words such as TOP and SYRUP
before words such as TAX and SYRAH. As operationalized in recent studies,
the AoA hypothesis is that there will be an effect of this early learning on adult
performance when other factors such as frequency of usage in adult language are
controlled.
The existence of an AoA effect on word reading would be consistent with ev
idence concerning other types of age-dependent learning (Doupe & ; Kuhl, 1999;
Quartz k Sejnowski, 1997). In many cognitive domains, early learning results in a
1This chapter was originally published as: Zevin, J. D. & Seidenberg, M. S. (2002) Age of
acquisition effects in reading and tasks. Journal of Memory & Language, 47 1-29.
50
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
reduction in plasticity that limits the ability to acquire new information. Phono
logical acquisition provides a classic example (Werker & Tees, 1984): learning the
phonological structure of one’s language limits the ability to learn new phonetic
contrasts (e.g., in a second language). Similarly, there is evidence that the abil
ity to learn the morphology and syntax of a language drops monotonically after
approximately seven years of age (although it is controversial; see Flege et al.,
1999). Lexical acquisition is not thought to be highly age-dependent (Markson &
Bloom, 1997; McCandliss et al., 1997); still it is possible that early-learned words
have an advantage over later-learned words, and that this would carry over to
how they are read.
The ages at which people learned particular words are unknown, of course, but
can be estimated from other measures. For example, Gilhooly and Logie (1980)
collected subjective ratings of AoA, familiarity, imageability and concreteness
for nearly two thousand words. These norms have been widely used in studies
of effects of AoA on several tasks including tachistoscopic identification (Lyons,
Teer, & Rubenstein, 1978), word naming (Brown & Watson, 1987b; Coltheart,
Laxon, & Keating, 1988) and object naming (Carroll & White, 1973; Ellis &
Morrison, 1998) and with neurologically impaired patients (Hirsh & Ellis, 1994;
Hodgson & Ellis, 1998; Lambon Ralph, Graham, Ellis, & Hodges, 1998). The
Gilhooly and Logie (1980) data were obtained from 36 adult subjects; the AoA
ratings also correlate significantly with independent measures of AoA (Gilhooly &
Gilhooly, 1980; Lyons et al., 1978; Morrison, Ellis, & Chappell, 1997) suggesting
that they provide reliable information.
51
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Given estimates of the frequencies with which words occur in adult usage and
when words were acquired, it seems natural to consider whether the two factors
have independent effects on skilled performance. Morrison and Ellis (1995) or
thogonally manipulated AoA and frequency in naming and lexical decision tasks,
and found a strong AoA effect with frequency controlled, but no frequency ef
fect with AoA controlled. They also observed that AoA and frequency had been
confounded in previous studies, raising the possibility that effects attributed to
frequency might have been due to AoA. Subsequent studies (Gerhand & Barry,
1998, 1999a, 1999b) replicated Morrison and Ellis’ AoA effect with frequency
controlled, but contrary to the earlier results, significant effects of frequency were
observed with AoA controlled. Nonetheless, the finding that AoA affects per
formance independent of frequency seems to present a challenge for models of
word reading (e.g., Coltheart, Curtis, Atkins, & Haller, 1993; Plaut, McClel
land, Seidenberg, & Patterson, 1996; Seidenberg & McClelland, 1989) that do
not explicitly take this factor into account.
The research described below was motivated by empirical and theoretical
considerations that led us to examine more closely whether age of acquisition has
an effect on skilled reading. On the empirical side, the concern was that it might
be difficult to isolate effects of age of acquisition because it is correlated with
many stimulus properties, including frequency. Below we present analyses of the
materials used in previous studies and other data which suggest that the evidence
for an effect of AoA on skilled reading is weak at best. On the theoretical side, we
were interested in developing a better account of why age of acquisition could have
52
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
an effect on skilled reading or other tasks. Many previous studies have employed
a bottom-up strategy in which AoA is treated as a factor, like frequency or length,
that might account for independent variance in adult performance. However, AoA
needs to be understood in terms of a theory that addresses why some words are
learned earlier than others, and how early experience affects later performance.
Such a theory would clarify the relationship between the AoA measure and other
factors that affect word learning and skilled performance, and provide a stronger
basis for generating predictions about the role of age of acquisition in reading
and other tasks.
After examining existing studies of AoA effects in reading, we describe in
vestigations of these effects using a computational model of the mapping from
orthography to phonology (Harm & Seidenberg, 1999). Modeling was useful for
several reasons. First, it allows direct manipulations of the frequency and timing
of exposures to words using stimuli that are exactly controlled with respect to
properties (such as frequency and length) that are normally highly confounded.
Second, such models embody an explicit theory of reading acquisition and skilled
processing in which the roles of frequency and timing of exposure can be exam
ined. Finally, previous analyses of the behavior of such models suggest a possible
computational basis for age of acquisition effects. In some models, the “entrench
ment” of early-learned items has an effect on later performance (Ellis k, Lambon
Ralph, 2000; Munro, 1986). Thus, connectionist models are consistent with the
existence of age of acquisition effects; our research addresses the conditions under
53
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
which such effects occur and how they relate to the conditions that govern read
ing. We focused on the mapping between orthography and phonology because it
plays an important role in the naming and lexical decision tasks that have been
used to study AoA effects in reading.
To foreshadow the results, the simulations yielded two complementary find
ings. Simulations using a large corpus of English words yielded no effects of AoA
on skilled performance. There was an initial advantage for words that were pre
sented more often early in training, but there was no residual effect on skilled
performance. This occurred because the regularities in the mapping between or
thography and phonology that exist across words in English reduce the effects
of early exposure to individual items. These results, taken with the analyses
of previous behavioral studies, suggest that age of acquisition effects in word
reading are likely to be minimal, with other properties that are correlated with
AoA controlled. However, a significant age of acquisition effect was observed in a
simulation in which early and late learned words were chosen so that they over
lapped little in terms of orthographic or phonological structure. This artificial
condition, which is not characteristic of reading acquisition, yielded an advantage
for early-learned words in skilled performance with other factors controlled.
The simulations suggest that the occurrence of age of acquisition effects de
pends on the nature of the learning task, specifically whether what is learned
about one pattern carries over to others with which it shares structure. Thus,
we observed the effect in a simulation using materials that explicitly eliminated
the overlap between early and late-learned patterns, but not when the stimulus
54
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
patterns exhibited the regularities in the correspondences between spelling and
sound that are characteristic of the English writing system. This analysis also ex
tends to the simulations reported by Ellis and Lambon Ralph (2000), Smith et al.
(2001), and Monaghan and Ellis (2002), who observed robust age of acquisition
effects using materials and tasks that differ from reading in important respects,
discussed below. Thus both the modeling and the analysis of existing behavioral
studies suggest that age of acquisition has little impact on skilled reading. At
the same time, the modeling also suggests that such effects may occur for other
tasks such as learning the names associated with objects or faces, for which the
learning of one pattern carries little information about others. The full range of
effects can be explained in terms of basic properties of learning in connectionist
networks employing distributed representations. Such networks provide deeper
insight about how early experience affects later performance.
2.1 Previous Studies
Two strategies have been used in previous studies of AoA effects in word reading.
One is to conduct experiments in which AoA and frequency are manipulated
factorially. The other is to use multiple regression to show that AoA accounts
for unique variance in predicting reponse latencies or proportions of errors. We
consider these in turn.
Morrison and E llis (1995) conducted the first experim ents factorically m anip
ulating AoA and frequency in word reading tasks. Their stimuli were equated
across conditions in terms of mean Kucera and Francis (1967) frequency, and
55
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
other variables (e.g., imageability, length in letters, the N measure (Coltheart,
Davelaar, Jonasson, & Besner, 1977)) but varied significantly in terms of rated
AoA. This study and subsequent ones using similar methods (Gerhand & Barry,
1999b, 1999a, 1998; Monaghan S z Ellis, 2002; Turner, Valentine, & Ellis, 1998)
yielded effects of AoA with such stimuli.
Table 2.1: Properties of the Stimuli Used in Previous Studies of Effects of Age
of Acquisition and Frequency
Study Condition KF log(KF) Celex log(Celex) WFG log(WFG) FAM
Morrison & Early 23 2.63 512 5.78 477 5.62 5.62
Ellis (1995) Late 24 2.63 301 4.82 107 3.32 4.10
Difference -1 0 211 .96** 370** 2.30*** 1.52***
Gerhand & Early 105 3.01 1986 5.91 2164 5.41 5.35
Barry Late 75 3.15 881 5.50 306 3.61 4.62
(1998,1999a,1999b) Difference 30 -.14 1105 .41 1858f 1.80* .73**
Turner et al. Early 52 3.24 555 5.51 2184 6.90 5.69
(1998) Late 50 2.86 309 4.63 1274 6.13 4.97
Difference 2 .38 246 0.88** 910 0.77* 0.72***
Monaghan & Early 35 2.63 654 5.56 411 5.20 NA
Ellis (in press) Late 25 2.30 420 4.88 141 3.36 NA
Inconsistent Words Difference 10 .33 234 .68 270* 1.84** NA
Monaghan & Early 33 2.14 672 4.97 469 4.31 4.97
Ellis (in press) Late 29 2.07 496 4.93 199 3.76 4.55
Consistent Words Difference 4 .07 176 .03 270 .65 .42
Note : In all cases, stimuli were matched using Kucera and Francis (1967). Turner
et al. (1998) also matched their items on spoken frequencies from Baayen, Piepen-
brock, and van Rijn (1993). WFG = Zeno (1995); FK =Kucera and Francis
(1967); Celex = written English frequencies from Baayen et al. (1993); FAM =
Familiarity from Gilhooly and Logie (1980). f= p < .1 * = p < .05; ** = p < .01;
*** = p < .001. NA = Familiarity ratings were not available for most the Incon
sistent items in Monaghan and Ellis (2002).
These studies raise concerns about whether stimulus frequencies were equated
across conditions as the designs of these experiments required. Properties of
words such as length in letters are objective and therefore easy to manipulate or
56
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
control across conditions. In contrast, the frequency counts derived from corpora
such as Kucera and Francis (1967) are statistics: estimates of a variable (how
often a word is used) whose actual values are unknown. Like other statistics, fre
quency counts are associated with measurement error arising from factors such
as the size of the corpus, the sample of texts used in generating the corpus, and
individual differences in language experience. These sources of error can compli
cate the interpretation of frequency effects in behavioral studies (Gernsbacher,
1984).
One problem is that the widely-used Brown corpus (from which the Kucera
& Francis, 1967, norms are derived) is relatively small, which introduces con
siderable error in the estimates for individual words, particulary in the lower
frequency range. Table 2.1 provides frequency data for the stimuli used in pre
vious age of acquisition studies dervied from Kucera and Francis (1967) and two
other sources, the Educator’s Word Frequency Guide (WFG; Zeno, 1995) and
Celex (Baayen et al., 1993) databases. Whereas the Brown corpus is about 1
million words, the WFG and Celex corpora are both over 16 million words. The
data also include a measure of rated familiarity (Gilhooly & Logie, 1980), which
Gernsbacher (1984) showed provides a more sensitive measure of frequency dif
ferences among lower frequency words. Morrison and Ellis (1995) equated their
early and late AoA stimuli in terms of Kucera and Francis (1967) frequency, but
as the table indicates, the items differ significantly on the other measures in the
expected direction: early acquired words are also more frequent and familiar.
The early and late stimuli in the Gerhand and Barry studies exhibit a similar
57
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
pattern; there are numerical differences between the early and late stimuli on
all measures, and they are significant using log WFG frequency and familiarity.
The materials in the Turner et al. (1998) study also differ such that early words
were higher in frequency (log Celex, log WFG) and rated familiarity than late
words. In a recent study, Monaghan and Ellis (2002) examined age of acquisition
effects for words with consistent or inconsistent spelling-sound correspondences.
They equated the stimuli with respect to frequency estimates derived from both
the Brown and Celex corpora. The stimuli in the inconsistent condition exhibit
small differences in the direction of early words being higher in frequency on all
three measures; using the WFG norms the difference is statistically reliable. For
the consistent items, the differences between the conditions are smaller and non
significant on all three measures. The consistent condition is the only one in the
table in which an age of acquisition effect was not obtained.
These cases are similar to the ones studied by Gernsbacher (1984), who showed
that several apparently conflicting findings in the contemporary word recognition
literature could be traced to the relative insensitivity of the Kucera and Francis
frequency norms; stimuli that were apparently equated on this measure differed in
terms of rated familiarity. In the studies in Table 2.1, stimuli that were equated
on the Kucera and Francis (1967) norms differed in rated familiarity and/or
another measure of frequency based on a larger corpus. The inconsistent word
condition in the Monaghan and Ellis study is the least clear case, insofar as the
stimuli did not differ reliably on two frequency measures but did on a third. It
should be noted that the WFG norms appear to provide a sensitive measure of
58
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
frequency, however. Table 2.2 presents the correlations among several measures
of frequency and the naming and lexical decision latencies in three large-scale
studies. The Seidenberg and Waters (1989) dataset consists of mean naming
latencies for 3000 words from 30 undergraduate subjects; the Spieler and Balota
(1997) data are naming latencies for 2,906 words from 31 subjects, and the Balota,
Pilotti, and Cortese (2001) data are lexical decision latencies for 2,905 words from
60 subjects (30 young adults and 30 older adults). The correlations between
estimated frequencies and response latencies are highest for the WFG norms,
which also account for unique variance when entered into a simultaneous multiple
regression with the other norms. Below we return to methodological issues about
the use of different frequency norms; here the main point is that the early and
late acquired stimuli in previous studies were not closely matched in frequency
and thus did not provide strong tests of the role of age of acquisition independent
of this factor. 2
Some of the studies in Table 2.1 also included conditions in which age of
acquisition was controlled and frequency varied, which yielded a mixed pattern
of results. Morrison and Ellis (1995) found a frequency effect in lexical decision,
but not in naming; age of acquisition effects, in contrast, were found in both
tasks. The fact that there was an AoA effect but not a frequency effect in the
naming task suggested that the AoA effect could not be wholly due to a frequency
2Another bit of evidence that the age of acquisition effect reported by Monaghan and Ellis
(2002) w as d u e to differences in frequency is re p o rte d b y S tra in , P a tte rs o n a n d S eidenberg
(submitted), who found that using frequency counts derived from either the Celex or WFG
databases as a covariate in the analyses of variance eliminated the age of acquisition effect in
the Monaghan and Ellis (2002) data.
59
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 2.2: Various Frequency Measures as Predictors of Naming Latency in Large-
Scale Studies
Study Measure r Unique Variance (%)
Spieler and WFG -.35 2.39***
Balota, 1997 FAM -.32 .82*
CELEX -.29 .12
KF -.27 .03
Seidenberg WFG -.23 .72*
and Waters, FAM -.21 .22
1989 CELEX -.21 .11
KF -.18 .27
Balota, WFG -.63 3.97***
Pilotti and FAM -.62 3.86***
Cortese, 2001 CELEX -.58 .22
KF -.51 .80**
Note: ** = p < .01 ; *** = p < .001. WFG = Word frequency from Zeno (1995),
FAM = familiarity from Gilhooly and Logie (1980), CELEX = frequency from
Baayen et al. (1993), KF = frequency from Kucera and Francis (1967).
confound. However, this pattern of results did not replicate in a study by Gerhand
and Barry (1998) using the same stimuli; they observed both frequency and age of
acquisition effects in naming. The Morrison and Ellis (1995) data also exhibited
an atypical pattern in which lexical decision latencies were faster than naming
latencies for the same words (cf. Balota & Chumbley, 1984; Forster & Chambers,
1973). In summary, the factorial studies leave open a window of uncertainty as
to whether the observed effects were due to differences in age of acquisition or
frequency.
The second methdology employed in this area involves using multiple regres
sion to isolate unique variance in response latencies associated with AoA (Brown
& Watson, 1987b; Butler & Hains, 1979; Lyons et al., 1978; Morrison & Ellis,
60
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2000). These studies reported effects of AoA independent of other stimulus prop
erties including imageability, familiarity and frequency. We conducted a similar
analysis using the data from the three large-scale studies of word naming and
lexical decision mentioned above (Seidenberg & Waters, 1989; Spieler & Balota,
1997; Balota et al., 2001) and found similar results. For 528 of the words in
these studies, there are data concerning both frequency (Zeno, 1995) and AoA
(Gilhooly & Logie, 1980). For all three data sets, AoA and frequency were signif
icantly correlated with response latencies (Table 2.3); for the Spieler and Balota
(1997) and Balota et al. (2001) data both factors account for unique variance.
Table 2.3: Frequency and Age of Acquisition as Predictors of Naming Latencies
Study Measure r Unique Variance (%)
Spieler and WFG -.28 2.59***
Balota, 1997 AoA .28 2.35***
Seidenberg and WFG -.19 1.52**
Waters, 1989 AoA .17 .64f
Balota, Pilotti WFG -.49 9.20***
and Cortese, 2001 AoA .44 5.15***
Note: f = p < .1; ** = p < .01 ; *** = p < .001. WFG = Word frequency from
Zeno (1995)
It is important to avoid making a “correlation is causation” error in inter
preting these data, however, because both AoA and frequency are correlated
with other stimulus properties. To illustrate, Table 2.4 provides the correlations
among AoA, frequency, Coltheart’s N, length in letters, and rated familiarity, im
ageability, and concreteness (also from the Gilhooly & Logie, 1980, norms) for the
528 words. These intercorrelations make it difficult to isolate effects due to age
of acquisition per se. Some additional information is provided by assessing the
61
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 2.4: Correlations Among 6 Standard Lexical Measures and AoA
Variable AoA WFG IM FAM CON LEN
WFG -0.5141***
IM -0.5861*** 0.1073*
FAM -0.6740*** 0.7203*** 0.2026***
CON -0.3840*** 0.0056 0.8082*** -0.0099
LEN 0.1984*** -0.0666 -0.1483*** -0.0605 -0.1717***
N -0.1976*** 0.1417** 0.1195** 0.1245** 0.1215** -0.7142***
Note: * = p < .05, ** = p < .01, * * * = p < .001. WFG = log Zeno (1995)
frequency; IM = imageability; FAM = familiarity (Gilhooly & Logie, 1980); CON
= concreteness; LEN = number of letters; N = Coltheart’s N.
amount of unique variance associated with frequency and age of acquisition after
the other measures in Table 4 have been partialled out (Table 2.5). These re
sults indicate that whereas frequency accounts for a small but significant amount
of variance, the age of acquisition measure does not3. These data suggest that,
rather than there being an effect of age of acquisition on skilled performance
independent of other stimulus factors, the ages at which words are learned are
determined by factors such as frequency, length, and imageability. Thus, after
these factors are taken into account, there is no residual effect associated with
the age of acquisition measure.
The results in Table 5 differ from those reported by Brown and Watson (1987)
and Morrison et al. (1997), who conducted similar analyses using smaller sets of
3 The amount of unique variance attributed to either variable is surprisingly small. One
factor that may be relevant is that effects of lexical frequency are reduced or eliminated by
ex p o su re to n e ig h b o rin g w ords. W ords th a t have m a n y n e ig h b o rs (e.g., co n sisten t ones) d o n o t
show strong frequency effects in naming. Another is that naming is less sensitive to frequency
effects than other tasks because it only measures time to initiate the response; frequency effects
can also show up in things like duration of the whole utterance (Balota & Abrams, 1995) and
in the duration of onsets that contain continuants (Kawamoto, Kello, Jones, & Bame, 1998).
62
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 2.5: Unique Variance Accounted for by Frequency and AoA Independent
of Other Lexical Variables
Study Measure Unique Variance (%)
Spieler and WFG 1.27**
Balota, 1997 AoA .29
Seidenberg and WFG .69*
Waters, 1989 AoA .01
Balota, Pilotti WFG
2 94***
and Cortese, 2001 AoA .34
Note: *** = p < .001; ** = p < .01; * = p < .05; WFG = cumulative frequency
from Zeno (1995), AoA = age of acquisition from Gilhooly and Logie (1980)
words and found significant effects of age of acquisition independent of frequency.
The differing results appear to be related to differences between the WFG norms
and the Brown and CELEX norms used in earlier studies. The WFG norms are
based on a larger sample of texts than the Brown norms and the sample is more
diverse than either the Brown or Celex samples. Like the American Heritage
norms (Carroll et al., 1971), the WFG sample includes texts from a broad range
of reading levels, including books for school-aged children. Each text in the
sample was assigned a grade-level based on a readability formula. Frequency
data are provided for each word at each grade level, ranging from first grade to
college. For the analyses presented above, we used the sum of these frequencies.
The fact that the WFG frequencies correlate more highly with response latencies
than the other norms (Table 2) and yield no residual effect of age of acquisition
(Table 5) may be related to the inclusion of this broader range of texts.
To examine this issue further, we conducted regression analyses using differ
ent subsets of the WFG corpus. Specifically, we examined how much variance
63
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 2.6: Unique Variance Accounted for by AoA with Different Subsections of
the WFG Norms Used as Predictors
WFG Subsection
Study Predictor 2-13+ 3-13+ 4-13+ 5-13+ 6-13+ 7-13+ 8-13+ 9-13+
SB AoA .36 .41 ,44f ■47f •50f .54f •56f •57f
Frequency 1.26** 1.17** 1.01** .85* .84* .86* .78* .67*
SW AoA .04 .04 .06 .07 .08 .10 .10 .12
Frequency .98* .97* .89* .83* .87* .95* .91* .91*
BCP AoA •39f .46f .52* .58* .63* .68* .72* .68*
Frequency 2.43*** 2.22*** 2.04*** 1.92*** 1.97*** 2.10*** 2.11*** 2.18**
Note: f = p < .10; * = p <.05; ** = p < .01; *** = p < .001; WFG = Zeno (1995)
frequency counts; 2-13 = Grade levels 2 (2nd grade) to 13+ (University) in the
WFG norms. SB = Spieler and Balota (1997); SW = Seidenberg and Waters
(1989); BCP = Balota et al. (2001)
Table 2.7: Correlation Between AoA and WFG Frequency at Different Grade
Levels
1 2 3 4 5 6
Grade Level
7 8 9 10 11 12 13 TOTAL
-.68 -.67 -.63 -.60 -.53 -.50 -.47 -.45 -.43 -.38 -.35 -.31 -.17 -.51
Note: All correlations significant, p < .001.
the WFG and AoA measures accounted for when the data from lower grades
were excluded (Table 6). The results for all three of the large-scale behavioral
studies exhibit a consistent pattern: as more of the data from lower grade-levels
is excluded, the amount of residual variance due to frequency decreases while
the amount associated with AoA increases. In two of the three studies, the AoA
effect reaches significance with data from the younger grades excluded, although
the amount of variance account for is very small.
One interpretation of these results is that there is a small effect of age of ac
quisition on skilled performance which the WFG norms (but not Brown or Celex)
pick up because the corpus included texts for younger readers. Words that are
64
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
learned earlier may tend to be used more often in texts that are appropriate for
younger readers. Table 7 presents the correlations between rated age of acquisi
tion and grade-level frequency for the 528 words used in previous analyses; there
are strong negative correlations which decline gradually with age. Thus it could
be argued that the WFG frequency data for the lower grades covertly encode age
of acquisition. On this view, skilled performance is affected by two independent
factors, age of acquisition and frequency of usage in adult language, both of which
are captured by the cumulative WFG frequency measure.
There is a different explanation for these results, however: unlike the Brown or
Celex corpora, the WFG norms provide estimates of the cumulative frequencies
of words, that is, how often they have been encountered over a long period of
time (e.g., since an individual began to read). Cumulative frequency may be a
better predictor of adult performance because it affects how lexical information
is represented in memory (as for example in the connectionist models discussed
below). On this view, age of acquisition norms account for variance in skilled
performance because they index how frequently words were used at younger ages,
information that the Brown and Celex norms do not include. Thus there is an
effect of cumulative frequency on skilled performance, rather than separate effects
of age of acquisition and adult frequency of usage. The WFG norms provide a
reliable estimate of cumulative frequency, leaving no residual effect of age of
acquisition.4
4It is important to recognize that the grade-level frequency data in the WFG norms are not
literally data concerning the grades (or ages) at which the texts were read. Rather, they reflect
the assignment of texts to grade levels using a formula that weighs factors such as number of
words per sentence and number of syllables per word. On this measure, Charlotte’s Web and
65
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
In summary, the data in Table 1 and the correlational analyses suggest that
the age of acquisition effects observed in previous studies may have been due to
confounds with “adult” frequency (measured by Kucera & Francis and Celex) or
cumulative frequency (assessed by WFG). One difficulty in developing a well-
controlled AoA experiment arises from the strong correlations between AoA and
other lexical variables presented in Table 2.4. These correlations make it difficult
to design factorial experiments in which AoA is varied for a sufficient number of
items with these and other factors controlled. The regression analyses suggest
that AoA may account for a small amount of variance in skilled performance
because it is correlated with how often words are read at younger ages, data that
are not indexed by "adult" norms such as Kucera and Francis (1967) but which
contribute to cumulative frequency of exposure.
2.2 Theoretical Issues
The above discussion addressed some methodological issues that arise in attempt
ing to isolate age of acquisition effects. The data indicate a need to consider what
statistics such as estimated age of acquisition and frequency measure and how
they relate to the mechanisms that underlie lexical acquisition and processing.
The Old Man and the Sea are both assigned to the 4th grade reading level, for example. Thus,
the data from the lower grade-levels reflect texts that are likely to be read by children at a given
age but also texts of approximately similar structural complexity that are read at older ages.
On our view (supported by the modeling presented below), these norms are relevant because
th e y pro v id e estim a te s o f th e cu m u lativ e frequency, ra th e r th a n th e e x a c t tim in g , of ex p o su res
to words.
66
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The concept “age at which a word is acquired” seems clear enough and intu
itively different from “frequency of usage in adult language.” However, whereas
frequency norms reflect a property of words (namely, how often they are used),
age of acquisition norms reflect something different, a behavioral event (learning
a word by a certain age). This event is very similar to a task such as naming
aloud: one behavior concerns how long it took to learn a word, the other how
long it takes to pronounce a word. This point is particularly clear with respect
to “objective” measures of AoA (Morrison et al., 1997) obtained by determining
the ages at which children can name pictured objects. Just as studies of word
reading have examined the factors that make some words easier to name than
others, age of acquisition can be considered with respect to the factors that cause
some words to be learned earlier than others.
Among these factors is frequency. In many theories, the frequency with which
a stimulus is practiced or experienced affects how early and well it is learned as
well as skilled performance. If the age at which a word is learned is affected
by how often it is experienced, empirical estimates of AoA may covertly encode
frequency of occurrence during the acquisition period. Moveover, we have also
seen that age of acquisition ratings are correlated with grade-level frequency data
from the WFG norms, including data from higher grades well past the ages at
which the words were acquired. Thus, age of acquisition norms appear to be
related to frequency of occurrence over a multi-year time span beginning with
initial acquisition.
67
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Seen in this light, word frequency, as standardly operationalized using norms
such as Kucera and Francis (1967), provides the remaining chronological data
concerning how often words are experienced in adulthood. These observations
suggest that both age of acquisition and “adult” frequency norms reflect how
often words are encountered but at different points in a developmental continuum
ranging from initial acquisition to adulthood. The WFG norms take matters
one step further, providing estimates about how often words are encountered
at multiple points along this continuum, as well as about cumulative frequency.
Thus, age of acquisition and frequency seem more intrinsically related than recent
discussions have suggested. In effect, studies like the ones in Table 1 attempted
to dissociate the effects of frequency of exposure during two widely-spaced time
spans.
2.2.1 Connectionist modeling
Connectionist models of reading that employ distributed representations and
gradual learning from experience provide a theoretical framework for examin
ing effects of the frequency and timing of learning experiences on performance
(e.g., Harm & Seidenberg, 1999; Plaut et al., 1996; Seidenberg & McClelland,
1989). Such models illustrate three points relevant to the AoA hypothesis. First,
frequency has pervasive effects on network performance, including how quickly
a word is learned (“age of acquisition”) and level of skilled performance. Sec
ond, these effects are intrinsically related. Models such as Seidenberg and Mc
Clelland’s (1989) attempt to provide unified account of acquisition and skilled
68
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
performance in which the same computational principles apply throughout the
developmental continuum. The effects of frequency on learning a word and on
skilled performance are both realized by changes to the weights governing network
performance. Thus the behavior of the system reflects the cumulative effects of
exposure to words over time. Finally, the magnitudes of the effects of frequency
of exposure differ depending on the state of the network, which changes over time
as knowledge is acquired. As the model picks up on the similiarities that hold
across words, and as the weights assume values that allow output to be produced
accurately (i.e., minimize error), the effects of pattern frequency decline.
Some properties of these networks favor the idea that there will be an advan
tage for words that are learned earlier in training (Ellis k Lambon Ralph, 2000).
(We assume for the remainder of this discussion that stimuli are equated along
other dimensions.) Consider a network such as Seidenberg and McClelland’s in
which weights are initially set to random values and output units take values of
1 or 0. The adjustments to the weights that occur using backpropagation with a
logistic activation function are proportional to the activation of the unit accord
ing to the term a(l — a), where a is the activation value. The adjustments are
therefore largest when the activations are in the middle of the logistic function
(around .5) , as occurs when the network is initialized with small, random weights.
The adjustments become smaller as the weights assume values that cause unit
activations to more closely approximate the target values of 1 or 0. Thus, there is
a loss of plasticity associated with learning the early-trained patterns. In effect,
early-trained patterns become entrenched in the weights (see Munro, 1986, for an
69
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
early discussion of this phenomenon). Both Ellis and Lambon Ralph (2000) and
Smith et al. (2001) emphasize these aspects of network behavior in explaining
age of acquisition effects.
There is another factor to consider, however: the effects of similarities across
training patterns. The mapping between spelling and sound in English exhibits
considerable systematicity. Reading models such as Seidenberg and McClelland’s
employed representations that allowed the weights to encode these regularities.
Thus what is learned about one word carries over to other words with which
it shares structure. This property modulates the effects of exposure to a given
word. Until the model begins to encode the systematic aspects of the mapping,
performance on a pattern is highly dependent on how often it is trained. By later
in training the weights reflect the structure of the entire training set, changing
its behavior. Once a word is learned, additional repetitions have little impact,
creating a discrepancy between frequency of training and network performance.
Furthermore, new words can be learned with little training if they share structure
with known words. In the limit a new word can be pronounced correctly with
no training, as in nonword generalization. Thus, there is an initial advantage for
words that are trained with high frequency, but as the model learns there is less
and less of a disadvantage for later-trained items. In effect the entrenchment of
early-learned words is reduced as the model picks up on patterns that hold across
words (see also Marchman & Bates, 1994).
In summary, the entrenchment phenomenon in connectionist networks pro
vides a basis for age of acquisition effects, but other properties of the task and
70
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
materials to be learned will affect whether there is a long-lasting effect on per
formance, as the age of acquisition hypothesis suggests.
Using this theoretical framework, the issue of AoA effects in reading can be
clarified by considering two factors, cumulative frequency and frequency trajec
tory. Cumulative frequency refers to how often a word is presented to the network
from the beginning to the end of training. This is a simplified analogue of how
often people have encountered a word to the point at which performance is as
sessed. Frequency trajectory refers to how experience with a word is distributed
over time. Thus, a given cumulative frequency can be associated with different
trajectories.
The AoA hypothesis, then, is the prediction that frequency trajectory has an
effect on adult performance independent of cumulative frequency. Specifically,
if the cumulative frequencies of words (as well as other stimulus properties) are
equated, words for which most of the training occurs early should show an ad
vantage over words with other trajectories. Words that are trained more often
early in development will in general be learned earlier than words that are mainly
trained later; thus frequency has an effect on age of acquisition. However, the
age of acquisition hypothesis is that there will be a further effect of this early
experience on skilled peformance.
A measure such as Kucera and Francis (1967) frequency provides a poor esti
mate of cumulative frequency. Given the nature of the texts used to generate the
corpus, it tends to underestimate the frequencies of many low frequency words,
71
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
including ones that are mainly experienced in childhood. The WFG norms prob
ably provide better information about cumulative frequency, but this is difficult
to independently assess. Age of acquisition norms, in contrast, provide imper
fect information about frequency trajectory because some words that are learned
early (e.g., BOTTLE, CUP) are also used frequently later in life whereas others
(e.g., TEDDY, BOOTIE) are not.
Because the actual cumulative frequencies and frequency trajectories of dif
ferent words are not known, and because frequency norms and rated AoA provide
imperfect estimates, we took the approach of using simulation modeling to ex
plore the phenomena. Simulation also allowed control over stimulus properties
that are normally confounded. Thus we could create conditions in which it was
certain that cumulative frequency and stimulus properties were closely matched,
while manipulating frequency trajectory, providing a strong test of the age of
acquisition hypothesis.
2.3 Simulation 1
In the first simulation, a model was trained on a large corpus of words using
the standard technique of probabilistically presenting words during training as a
function of their estimated frequencies of occurrence (Seidenberg & McClelland,
1989). The critical data concern a subset of items for which we manipulated
frequency trajectory w hile keeping cum ulative frequency constant. Som e o f these
words were more frequent early in training compared to later (Early condition),
whereas other words followed the complementary trajectory (Late condition). By
72
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
the end of training, however, cumulative frequencies of words in the two conditions
were the same. In addition, the same words appeared in both Early and Late
conditions across different runs of the model.
This model differs from previous models of age of acquisition effects in an im
portant way: the task was closely related to the problem of learning the spelling-
sound correspondences of English, information that plays an important role in
the naming and lexical decision tasks used in the behavioral studies discussed
above. The input and output representations were based on English orthogra
phy and phonology and the training corpus, a large set of monosyllabic words,
instantiated the quasiregular mappings between the two (Seidenberg & McClel
land, 1989). Previous simulations have utilized more artificial tasks and stimuli
that did not capture this rich structure (discussed further below). Simulation
1 therefore provides more direct evidence concerning the occurrence of age of
acquisition effects in reading.
2.3.1 Methods
2.3.1.1 Architecture
The basic architecture shown in Figure 1 was used in all simulations. For Sim
ulations 1 and 2, models with 100 orthographic (input) units, 250 phonological
(output) units and 100 hidden units were used. In addition, the phonological
layer had 20 hidden units which mediated connections between this layer and
itself (cleanup units; Hinton & ; Shallice, 1991) . The cleanup units diffentiate
this model from a simple feedforward net such as the one studied by Seidenberg
73
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
and McClelland (1989). The network is given an input pattern and activation
spreads through the network over a series of time steps. Each unit propagates
activation to the other units to which it is connected. The feedback connections
between the phonological and cleanup units create a type of dynamical system
called an attractor network which settles into a stable pattern over time (see
Harm & Seidenberg, 1999, for additional details). A further feature of the model
was that each time step was discretized into a series of moments, which allows a
unit’s activation to ramp up gradually. Thus the learning algorithm (continuous
recurrent backpropagation) changes the weights in ways that improve accuracy
but also how quickly the network produces the correct output (see Harm, 1998;
Bishop, 1995, for discussion).
IO O O O O O I
i 1 Cleanup Units
IO O O IO O O O O O IO O O I
i Phonological Units
IO O O O O O I
i Hidden Units
0
0
Q
O
0
Q
O O O IO O O I
Orthographic Input Units
Figure 2.1: Model architecture used in all simulations
2.3.1.2 Corpus and Training
The training corpus consisted of 2,891 monosyllabic, monomorphemic words. 108
of these words were critical items whose frequencies were manipulated, as detailed
74
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
below. The remaining 2,783 words (background items) were assigned frequencies
taken from the Marcus, Santorini, and Marcinkiewicz (1993) norms, which are
based on 43 million tokens from The Wall Street Journal.5
The critical items were divided into two lists of 54. Sets of 4 items were created
by exchanging onsets and rimes. The lists were counterbalanced such that, for
example, FOIST and MIST occurred on one list and FIST and MOIST on the
other. Thus each list contained each onset and rime in the quadruple, but in
different combinations. The model was run ten times with different initial random
weights (between 0.1 and - 0.1), analogous to replications with different subjects.
Each list occurred five times in each trajectory. Thus the same items occurred
in both Early and Late conditions across simulations. The data presented below
are averages across the 10 runs of the model.
The Early and Late trajectories were designed to provide a strong test of
the effects of early exposure on later performance; they were not intended to
capture the observed trajectories for individual words, which are more variable.
The frequencies of the words in the Early and Late conditions were manipulated
as follows. Training consisted of ten epochs of 100,000 trials each. Early items
were assigned a frequency of 1000 for the first three epochs of 100,000 training
trials. For the next four epochs the frequency was adjusted to 500, 100, 50 and
10 in succession. Finally, for the last three epochs the frequency was set to one.
5 The Wall Street Journal corpus has been extensively used in sentence processing research
a n d a t th e tim e we b eg an th is research it w as th e la rg e st available co rp u s of E n g lish . T h e lexical
sample is somewhat skewed insofar as words such as STOCK, MARGIN, and INFLATION are
overrepresented compared to other corpora. In our simulations, the norms were only used
to insure that the background items in the training set were presented with a distribution of
frequencies similar to that seen in natural language. When the goal is to examine the effects
of frequency on individual words, other norms such as Zeno (1995) are preferable.
75
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The trajectory in the Late condition was the complement of the one in the Early
condition. Late items started at a frequency of 1 for the first 3 epochs, frequency
was adjusted to 10, 50, 100 and 500 over the next 4 epochs, and it finally reached
1000 for the last three epochs. These frequencies are within the range of the
raw Marcus et al. (1993) frequencies used for the background items. As with
the frequencies used for the noncritical words, these assigned frequencies were
square-root transformed and items were sampled probabilistically. This method
of compressing the frequency distribution allows the model to learn very low
frequency items after a relatively small number of trials (Plaut et al., 1996). The
actual frequencies with which the critical items were presented to the model at
each epoch are given in Figure 2. The mean frequency for Early items in the
first epoch was 41 and the mean frequency of Late items in this same epoch was
4. Frequencies were adjusted over time such that in the last epoch, the Late
items had a mean frequency of 40 and the Early items had a mean frequency
of 4. Importantly, by the end of training the Early and Late words had been
trained equally often: the cumulative frequencies averaged across items were 198
for Early words and 196 for the Late words, f(107) < 1.
On each training trial, a word was probabilistically selected for training and
its orthographic pattern was activated on the input units. Activation propagated
forward for 11 time ticks. On the 12th time tick, error was computed and the
weights of the model adjusted accordingly. The learning algorithm computes
error on the basis of the difference between the desired and observed output at
a given time tick, as well as the state of the model at earlier time ticks. In this
76
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
u
• Early
A Late
Epochs (100k each)
Figure 2.2: Frequency trajectories of critical items in Simulation 1
0
£
TD
1
GO
a
p
00
c
< D
• Early
A Late
Epochs (100k each)
Figure 2.3: Performance over time for critical items in Simulation 1
way, each adjustment of the weights leads to incrementally more accurate as well
as faster computation of the desired output.
2.3.2 Results and Discussion
The model’s performance was assessed using both accuracy and sum squared
error (SSE) measures. The model’s output for a word was scored as correct
if the output for each phoneme was closer to the correct phoneme than any
other by euclidean distance. The SSE measure was the sum of the squared
77
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
differences between the computed output and the target. The two measures
are highly related; correct words produce lower error scores than incorrect words.
However, among the correct words, differences in SSE reflect the relative difficulty
of generating a response (see, e.g., Seidenberg & McClelland, 1989). Thus, the
model’s performance can continue to improve after it has learned to produce the
correct response, as in human performance.
At the end of training, the model produced correct output for 98% of the
training set. Errors were almost all on low frequency strange words such as COUP,
PLAID and RHEUM, which are thought to require input from the orthography
— > semantics -> phonology pathway that was not implemented here (Plaut et al.,
1996; Strain, Patterson, & Seidenberg, 1995; Harm & Seidenberg, in press).
For the smaller set of critical words, the model learned to produce correct
output for all items within the first epoch. Mean sum squared error for these
items was calculated after each epoch. As shown in Figure 3, there was a small
effect of frequency early in training which rapidly disappeared. T-tests on the
difference between the means in the Early and Late conditions confirmed this:
Error scores were significantly lower for Early words compared to Late after the
first epoch, f(107) = 4.24, p < .001, and this effect remained significant after
5 epochs, t(107) = 2.09, p < .05. By epoch 6, when the frequency trajectories
began to cross, the effect was nonsignificant, f(107) = 1.12, p > .1. At the end of
training, when the cumulative frequency of the two groups was closely matched,
there was also no reliable difference between conditions; in fact the means were
identical, .50. At this point all critical items were still pronounced correctly.
78
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The first simulation indicates that with stimulus properties equated, there is
an effect of frequency trajectory early in training, but this effect rapidly recedes.
By the end of training, when the cumulative frequencies are equated, there is no
residual effect. Early in training, before much learning has occurred, performance
is better on words that are trained more often. This is simply a frequency effect
during the early phase. As training continues, performance in the two conditions
converges to the same level.
2.4 Simulation 2
Simulation 2 was a replication of the first simulation that addressed two con
cerns. First, effects of the frequency trajectory manipulation might have been
difficult to detect because the critical stimuli all contained spelling patterns with
consistent spelling-sound correspondences. In addition, the stimuli were con
structed in quadruples such as FIST-MOIST-MIST-FOIST, insuring that every
word-body occurred at least twice with the same pronunciation. In the type of
network studied here, learning of one item with a given spelling-sound pattern
(e.g., FIST) carries over to other items containing the same pattern (e.g., MIST),
reducing the effects of exposure to the item itself (a neighborhood effect). The
net result was that all of the critical words were learned relatively rapidly; there
was an effect of frequency of exposure early in training but it was observed on
the sum squared error m easure, n ot how rapidly the m odel learned (i.e., “age of
acquisition”). We therefore created a new set of critical stimuli containing only
“strange” words (Seidenberg, Waters, Barnes, & Tanenhaus, 1984) which have
79
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
atypical spellings and spelling-sound correspondences. Because they have few
close neighbors, these words show larger effects of frequency both in behavioral
studies (e.g. Seidenberg et al., 1984) and connectionist models (e.g., Seidenberg
& McClelland, 1989). We therefore expected to see effects of frequency trajectory
on both SSE and how quickly these words were learned.
A second issue concerns the processes that gave rise to the Figure 3 data.
One possibility is that these data reflect two complementary “age of acquisition”
effects. Thus far we have followed the behavioral research in emphasizing the
possible effect of early high frequency exposure on skilled performance. There
might also be a complementary effect of high frequency exposure late in training,
however. Thus the similar levels of performance in the Early and Late conditions
at the end of training might derive from two sources: an AoA effect and a recency
effect (Lewis, 1999, found evidence for both in a face naming task). We therefore
added a control condition using a relatively flat frequency trajectory. For this
condition, a subset of the critical items from Simulation 1 were assigned their
normal frequencies and included among the background stimuli. After running
the simulation, we isolated a large subset of these words that met two conditions:
(a) their frequency trajectories were very flat, and (b) their cumulative frequencies
were similar to what they were in Simulation 1. Thus the flat trajectory condition
acts as a baseline against which the data from Simulation 1 can be compared. An
effect of either the Early or Late trajectory in Simulation 1 would be indicated
by better performance than in the flat trajectory condition at the end of training.
80
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Finally, the flat trajectory condition was also used to assess whether cumula
tive frequency has an effect on network performance independent of trajectory,
by comparing the results for two subsets of stimuli from the flat condition whose
cumulative frequencies were considerably different.
2.4.1 Methods
The same model and corpus were used as in Simulation 1. The critical items from
the earlier simulation were included among the background items and assigned
their Marcus et al. (1993) frequencies, and a different set of 48 critical items was
selected. The main criterion for the critical items was that their bodies not be
assigned the same pronunciation in other words in the training list; thus, they
included words such as BEIGE, PHLEGM and SCOURGE. The stimuli were
divided into two lists with the assignment of lists to training condition counter
balanced across two simulations. The mean cumulative number of presentations
for both Early and Late words was 183.
Stimuli in the Flat trajectory condition consisted of 95 of the critical stimuli
in Simulation 1. These items were selected because when presented throughout
training at their standard Marcus et al. (1993) frequency, they are well matched
to the critical items for cumulative frequency. The mean cumulative frequency of
these words was 200, comparable to the cumulative frequencies for these words
in the Early and Late conditions in Simulation 1 (198 and 196, respectively).
8 1
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2.4.2 Results
After 10 epochs, the model generated correct phonological codes for 98% of the
training set. Performance on the critical items was assessed in terms of SSE,
accuracy, and how quickly words were learned (i.e., “age of acquisition” in model
time). Because the models were initialized with different random weights and
because words were selected probabilistically during training, individual runs of
the model differ slightly from one another in terms of performance, including
when in training individual words were learned. Analogous individual differences
are seen in children. For each item, age of acquisition was defined as the point at
which 75% of the models generated correct responses. This criterion is similar to
one used in the Morrison et al. (1997) study in which the age at which children
acquired a word was defined as the age at which 75% of the subjects could
name a pictured object accurately. By this measure, the average “age” at which
Early items were acquired was approximately 2.09 epochs, whereas the average
age for Late items was approximately 6.7 epochs. This difference is significant,
£(34) = 12.14. Note that epochs are defined with respect to the total number
of training training trials on all items, including the 2,843 background words,
not the number of exposures to individual words. The mean number of trials
to learn words in the Early and Late conditions were 296 and 250, respectively.
These data indicate that the Early words were acquired more rapidly than the
Late words, as expected. It took fewer exposures to learn the Late words because
they benefitted from prior learning of other words. Even for strange words, then,
there is generalization based on exposure to other words.
82
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
A.
1.0
• Early
A Late
u
te
P i
o
£
0.5
0.0,
Epochs (100k each)
B .
S 6 .0
I
D
£ 4.0
3
cr
00
1 2.0
CO
§
1 >
S o .o ,
• Early
A Late
Epochs (100k each)
Figure 2.4: Performance over time for Simulation 2, A) error rate and B) sum
squared error
83
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Accuracy over the course of training is depicted in Figure 4A. As in the pre
vious simulation, the advantage for the early items dissipated as the cumulative
frequency of the Late items converged on that for the Early items. Mean accu
racy for both conditions was 85% at the end of training. This level of accuracy
is somewhat lower than for the consistent words in Simulation 1; this finding is
consistent with the view that performance on the most difficult strange words
normally requires input from orthography — » • semantics — > phonology. The error
rate did not differ in the two frequency trajectory conditions, however, i(47) < 1.
Thus, although the frequency trajectory manipulation affected the “age” at which
items were acquired, it had no residual effect on accuracy when the cumulative
frequency of Early and Late items converged. Figure 4B shows the change in sum
squared over time for Early and Late items, which is very similar to the accuracy
graph.
One further aspect of the data is worth noting: Toward the end of training the
model began to exhibit some unlearning of the Early words, as indicated by the
slowly rising scores in this condition for both measures. Protecting early-acquired
words from unlearning requires intermittent re-exposure to these items over time
(Hetherington & Seidenberg, 1989). The Early trajectory entailed a steep decline
in frequency toward the end of training. This property, taken with the probabilis
tic nature of sampling, resulted in too few exposures to maintain performance
at the maximum level. We did not systematically examine performance after 10
epochs, because it was at this point that the two conditions converged on the
84
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
same cumulative frequencies. We do know, however, that a small number of ad
ditional training trials on the critical items is sufficient to stop the slow erosion
of performance seen in Figure 4. This behavior of the model is broadly consistent
with human performance; knowledge acquired in childhood may degrade over
time through lack of use, but can be revived with modest additional experience.
We now consider the results for the Flat trajectory condition. This condition
addresses the concern that the results of Simulation 1 might have derived from
two complementary AoA effects: one due to high frequency of exposure early in
training and one due to high frequency of exposure late in training. If this were
correct, performance at the end of training in both the Early and Late condi
tions should be better than in the Flat condition, in which frequencies changed
very little across epochs. This result was not observed. Figure 5 summarizes
performance in the Flat condition and on the same items in the Early and Late
conditions from Simulation 1.
• Early
A Late
o Flat 0.7
i
0.6
§
2 0.6
0.5
0.4,
Epochs (100k each)
Figure 2.5: Performance in the flat condition (Simulation 2) compared to the
same items in the early and late conditions in Simulation 1
85
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
1.0
Low Cumulative Frequency
High Cumulative Frequency
0.6
0.4,
Epochs (100k each)
Figure 2.6: Performance on high and low cumulative frequency items within the
flat condition
Results in the Flat condition closely resembled those obtained in the Early
condition. Both conditions exhibited a small advantage early in training com
pared to the Late condition, but by the end of training all conditions converged
on the same level of performance at the end of training. The mean SSE in the
Flat condition was .48, compared to .48 and .49 in the Early and Late conditions
respectively. No effect of frequency trajectory was observed, F (l, 93) < 1. The
early advantage in the Flat condition reflects the fact that the items had a mean
frequency of 20 presentations per 100,000, which was higher than in the Late
condition over these epochs. However, the cumulative frequency of flat items
(200) was not significantly different from the Early and Late items F (l, 93) < 1.
Data concerning the role of cumulative frequency are presented in Figure 6,
which shows the sum squared error for the highest and lowest frequency 25 items.
The mean cumulative frequencies for these subsets of these items differ: 544 for
the highest frequency words and 60 for the lowest. Cumulative frequency has
the expected effect on performance, which is better for high frequency words
86
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(.46) than low words (.55), t(47) = 3.22,p < .005. Note that these means are
substantially lower than the means for the critical items in Simulation 2. This
suggests that the failures to observe AoA effects were not due to floor effects on
the critical items.
2.4.3 Discussion
Results from the Early and Late conditions were consistent with Simulation 1.
There was a larger difference between these conditions until well into training,
which reflects the fact that the critical words have few neighbors and therefore
performance does not benefit as much from training on other words. However,
performance in the two training conditions again converged as the cumulative
frequencies evened out. Thus the results of Simulation 1 generalize to stimuli
that have less consistent spelling-sound mappings. Performance on words in the
Flat condition converged to the same level as on these same words in the Early
and Late conditions in Simulation 1, indicating that the results for the Early and
Late conditions did not reflect two complementary types of facilitation. Finally,
there was an effect of cumulative frequency in the Flat condition: at the end of
training performance was better on the words with higher cumulative frequencies
than lower.
These results suggest that whereas cumulative frequency has an impact on
performance, frequency trajectory does not. The age of acquisition hypothesis
tested in previous behavioral experiments was that there would be a residual
effect of early word learning on skilled adult performance. However, although
87
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
words in the Early condition were learned more rapidly than words in the Late
condition, performance in the two conditions was nearly identical by the end of
training.
2.5 Simulation 3
To this point the results suggest that when cumulative frequencies and stimulus
properties are equated across conditions, there is little if any effect of frequency
trajectory. What matters is how often a word is encountered, not the pattern
of encounters over time. Here we consider another factor that may have con
tributed to these results: the fact that the training corpus consisted of words
that exhibit systematic relationships between orthography and phonology. What
the model learns about one word carries over to other words that share structure
with it, reducing the effects of lexical frequency (Seidenberg & McClelland, 1989)
and thus the effects of any frequency trajectory manipulation. These neighbor
hood effects were larger for the consistent words used in Simulation 1 than for
the strange items used in Simulation 2; the consistent words were learned more
rapidly and yielded better asymptotic performance than the strange words even
though the trajectories and cumulative frequencies were very similar in the two
cases. Although the strange words have fewer close neighbors, their orthographic-
phonological correspondences are not arbitrary; a word such as BEIGE is not
pronounced “glorp;” it overlaps w ith m ore distant neighbors such as BINGE,
BARGE, WEIGH and many other words among the background stimuli. Thus
88
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
the systematic aspects of the orthography — > phonology mapping might have
reduced trajectory effects even for the strange words.
Suggestive evidence is provided by simulations of age of acquisition effects
presented by Ellis and Lambon Ralph (2000). Feedforward models were trained
to produce a transformation of arbitrary bit vectors. In their training set, output
vectors were generated by randomly changing 10% of the bits in the input vector.
Ellis and Lambon Ralph (2000) observed strong age of acquisition effects, such
that items that were introduced early had an advantage over late items, even
when the later items were much higher in cumulative frequency. The nature
of the stimuli meant that learning on any given trial carried little information
relevant to other items. Under this condition, there was a residual advantage
for mappings that became entrenched early in training. Ellis and Lambon Ralph
(2000) provide a thorough discussion of why this entrenchment occurs. In essence,
learning that occurs for early-trained items involves large weight changes that
reduce the model’s sensitivity to error signals generated by the presentation of
later items. Smith et al. (2001) provide a similar analysis of the results of their
simulation, which was also constructed so that what was learned on one trial did
not carry over to other trials.
Together the results of Simulations 1-2 and the Ellis and Lambon Ralph (2000)
and Smith et al. (2001) simulations suggest that the nature of the input-output
mapping - specifically whether what is learned on one trial predicts anything
about other trials - may be crucial to producing AoA effects. To investigate
this hypothesis, we devised a training regime deliberately unlike the orthography
89
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
— > phonology translation in English. Items for the Early and Late trajectory
conditions in Simulation 3 were constructed such that Early and Late items had
minimal orthographic or phonological overlap. In addition, we did not include any
background items; thus what the model learned depended solely on the properties
of the critical stimuli. These conditions are more comparable to the ones studied
by Ellis and Lambon Ralph and Smith et al. (2001) 6
2.5.1 Methods
The training set consisted of 68 words. Two lists were created out of different
inventories of letters and phonemes. One list included items such as COB, COG,
COP, HOG, HOP, and TOG, whereas the other contained items such as BAD,
BAN, BANE, PANE, PAN, and PAT. Some phonemes occurred in both lists
(e.g., / p/), but in different positions in different lists (e.g., onset and coda). The
model’s phonological representation (Harm & Seidenberg, 1999} treats these as
separate phonemes; thus what is learned about onset /p / does not carry over
to coda /p /. The simulation was run twice with lists assigned once to each
trajectory condition (Early, Late). In contrast to Simulations 1-2, no other words
were presented during training. Thus, the model could learn regularities among
the items within a training condition, but these regularities did not extend to the
items in the other list, and performance was not modulated by exposure to any
non-critical items.
6The simulations in this article were actually conducted before we were aware of the Ellis
and Lambon Ralph (2000), Smith et al. (2001) or Monaghan and Ellis (in press) simulations.
90
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Due to the smaller size of the training set, the models in Simulations 3 and
4 used a scaled down architecture with 29 orthographic units, 40 hidden units
and 10 cleanup units. The phonological layer was kept the same. Frequency
trajectories for items in Simulations 3 and 4 were similar to those in Simulations
1 and 2. However, because no “background” items were present, the range be
tween lowest (9 per 10000) and highest (290 per 10000) frequency words is more
dramatic. This is because how frequently an item is presented depends on both
its log-compressed frequency and the number of other items in the training set.
In the previous simulations, nearly 3000 words were being trained, so that even
items with very high frequencies were only seen, on average, about 40 times per
100.000 trials. In this simulation, only 68 items were trained, resulting in higher
real frequencies, although the log compressed frequencies used to select items
were the same. Also because of the smaller training set, fewer training trials were
required: The model was trained for 10 epochs of 10,000 trials each, resulting in
100.000 training trials, as opposed to 1 million in Simulations 1 and 2. The mean
cumulative frequency of Early words (1474) was not different from the cumulative
frequency of Late words (1467), t(Q7) < 1.
2.5.2 Results and Discussion
Figure 7 presents the accuracy and mean SSE data over the course of training.
By the end of training the model had learned to produce correct output for all
words. Whereas all of the Early items were learned within the first 2 epochs, the
Late items did not reach this level until much later. The mean number of trials to
91
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
A.
1.0
• Early
a Late
0.5
0.0,
Epochs (10k each)
B.
£ 10.0
• Early
A Late
w 5.0
0.0,
Epochs (10k each)
Figure 2.7: Performance over time for critical items in Simulation 3: A) error
rate, B) sum squared error
92
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
learn the Early words was 1.3 epochs vs 5.5 for the Late items, a highly reliable
difference, t(67) = 49.1. Again, these numbers reflect the point in training as
a function of all trials for all items. Because so many of the Early items were
learned within the first epoch, the mean number of exposures before learning was
computed by examining the model’s performance at 1,000 trial intervals. By this
measure, the mean number of exposures to a given item before it was learned
was 242 for Early items and 270 for Late items. Note that this is different from
Simulation 2, in which fewer actual exposures were required for the learning of the
Late items. In this simulation, knowledge of the Early items seemed to impede
rather than aid learning of the late items. The contrast provides a reminder of
the extent to which learning spelling-sound correspondences normally depends
on exposure to neighbors.
In contrast to previous simulations, there was a small but reliable advantage
for words that were presented frequently early in training in Simulation 3, even
after the cumulative frequencies in the Early and Late conditions converged. As
shown in Figure 7B, there was an advantage for Early words that was maintained
through 10 epochs of training. A t-test on the mean SSE at the end of training
revealed that error was reliably greater for Late words (1.13), than Early words
(.74), i(67) = 10.08,p < .001.
The critical difference between the simulations concerns the nature of the
stimuli and thus the mapping between input and output codes. Simulations 1
and 2 used a large corpus of words that exhibit the regularities between spelling
and sound characteristic of English orthography. These regularities modulate
93
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
the effects of frequency of exposure to a given word, yielding no residual effect
of frequency trajectory on skilled performance. This result obtains when other
stimulus properties and cumulative frequencies are controlled.
In Simulation 3, the normal regularities in the mapping between spelling and
sound were not maintained because we eliminated the background items and
created nonoverlapping stimulus sets. What the model learned about one word
in a training list carried over to other words on the same list, but not to words on
the other list. Given this sharp dissociation between the stimulus characteristics
of Early and Late words, there was an advantage for the early-trained items.
2.6 Simulation 4
Simulation 3 strongly suggests that the nature of the mapping between input and
output determines whether frequency trajectory affects performance. However,
this simulation differed from the earlier ones in a number of other ways (e.g., the
number of units; size of the training corpus). We therefore ran a final simula
tion using the same procedures as in Simulation 3, but using stimuli which, like
the ones in Simulations 1-2, contain overlapping orthographic and phonological
patterns.
2.6.1 Methods
The same items from Simulation 3 were used, but rather than segregate items
such that no letter or phoneme was repeated in the same position between lists,
we organized the lists so that no letter or phoneme occurred on one list but not the
94
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
other. For example HUB, HUG, LUCK, PAT, and MAD were on List 1, whereas
HUCK, LOG, LUG, MATE, and PAD were on List 2. Cumulative frequency of
Early (1474) and Late (1467) words was matched i(67) = 1.12, p > .2.
A.
1.0
• Early
A Late
* - <
I
0.0,
Epochs (10k each)
B.
8 io .o
• Early
A Late
tT
Epochs (10k each)
Figure 2.8: Performance over time for critical items in simulation 4: A) error
rate, B) sum squared error
2.6.2 Results and Discussion
As in Simulations 2 and 3, Early items were learned quickly (1.7 epochs) whereas
Late words required more training to be accurately named (3.7 epochs). This
difference is reliable i(67) = 9.8, p < .001 . This is reflected in the change
in accuracy over time, shown in Figure 8A. Also note that accuracy on both
95
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Early and Late items reached 100% by the 6th epoch; thus, although frequency
trajectory had the expected effect on AoA, it had no residual effect on accuracy.
The model’s ability to generalize from Early to late items meant that even though
it took much longer in terms of training epochs for the Late items to be learned,
they were produced correctly after many fewer trials per word: the mean number
of exposures to produce correct output was 262 for Early items and 52 for Late.
As shown in Figure 8B, sum squared error on the Late words decreased more
slowly than for the Early words, but performance in the two conditions eventually
converged. The SSE was not different between Early (1.13) and Late (1.13) items
t(67) < 1 at the end of training. As in Simulations 1-2, there was no residual
effect of frequency trajectory when cumulative frequencies were matched. Error
declined much more rapidly for the Late words in this Simulation (Figure 9A)
than in Simulation 3 (Figure 8A). This is because learning on the Early items
transferred to performance on Late items, whereas in Simulation 3, learning on
Early and Late items was independent.
Because this simulation was identical in every other respect to Simulation 3,
the results indicate that the factor relevant to producing a frequency trajectory
effect in Simulation 3 was the lack of overlap between Early and Late words.
2.7 General Discussion
Studies of age of acquisition effects have raised im portant questions about the
effects of early experience on later learning. An effect of age of acquisition on
skilled reading would call into question the results of many previous behavioral
96
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
studies and models in which this factor was not investigated. The potential theo
retical importance of this phenomenon as well as methodological and theoretical
concerns led us to examine it further. Examination of the materials used in pre
vious studies suggested that they did not provide strong evidence for an effect of
age of acquisition independent of other measures of frequency with which AoA
was confounded. The regression analyses provided evidence that age of acquisi
tion ratings may account for a small amount of variance in skilled performance
with other factors statistically controlled, but via the fact that they are correlated
with how often words are used pre-adulthood. Thus there was no effect of AoA
independent of cumulative frequency, as indexed by the WFG norms.
The results of Simulations 1 and 2 are consistent with these conclusions and
provide evidence concerning the computational mechanisms that give rise to the
behavioral phenomena. The simulations provide a strong test of the AoA hypoth
esis because the cumulative frequencies and frequency trajectories were known,
and properties of early and late stimuli were equated exactly. The training cor
pus was a large, representative sample of monosyllabic words, which exhibit the
statistical regularities characteristic of the orthography -» phonology mapping in
English. There was an initial advantage for words presented more frequently early
in training, but no residual effect of early learning on skilled performance. This
was true for both words with highly consistent spelling-sound correspondences
(Simulation 1) and words with atypical spellings and pronunciations (Simulation
2). The advantage for early-trained words is washed out as the model picks up
on the similarities that hold across words. This occurs more rapidly for words
97
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
such as LAST whose component spelling patterns are pronounced consistently
across many words than for strange words such as BEIGE which have fewer close
neighbors. In both cases, however, early and late trained words converged to
the same level of performance as the number of exposures evened out. This be
havior can be traced to basic properties of connectionist models (Seidenberg &
McClelland, 1989). Knowledge in these models is encoded in weights on connec
tions among units, which reflect the cumulative effects of exposure to all words.
Changes to the weights that occur when a word is trained also benefit words with
which it overlaps. This leaves little room for early words to maintain an advan
tage, because the weights that support them also facilitate learning later-learned
words.
Simulations 3 and 4 provided further evidence consistent with this analysis.
In Simulation 3, we removed the overlap between early and late trained words and
observed a reliable “age of acquisition” effect: there was an advantage for early-
trained words that was maintained throughout the course of training. In this case,
learning of the late items was impeded by the model’s knowledge of the early-
learned words. Finally, in Simulation 4, we reintroduced the overlap between
early and late trained words and the age of acquisition effect was eliminated,
further demonstrating that the critical factor that gave rise to the AoA effects in
Simulation 3 was the lack of overlap among the early and late patterns.
In summary, both the behavioral data and the simulations are consistent with
the conclusion that whereas there is an effect of cumulative frequency on reading
performance, there is no independent effect of the age at which words are learned.
98
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2.7.1 Conditions That Create Age of Acquisition Effects.
In the remainder of this article we consider other types of conditions and tasks for
which age of acquisition effects are likely to be more prominent. Our Simulation 3
and the simulations previously reported by Ellis and Lambon Ralph (2000), Smith
et al. (2001), and Monaghan and Ellis (2002) all suggest that age of acquisition
effects will occur under some circumstances. Although these simulations differ
in detail, they share an important property: given the nature of the stimuli
and network architecture, what was learned about early-trained patterns did not
carry over to later-trained patterns. Early-trained patterns became entrenched,
yielding a persistent advantage over later-trained patterns. Our main point is
that the conditions that give rise to these effects are not characteristic of reading
an alphabetic orthography, but are potentially relevant to other tasks. To see
this clearly, it is necessary to examine some details of the simulations.
The Ellis and Lambon Ralph (2000) simulations involved a simple feedforward
network. The input and output layers each consisted of 100 units, and there were
50 hidden units. The input stimuli consisted of random bit patterns created by
activating a random 20% of the units on the input layer. The model was trained to
copy the input onto the output, but with 10% of the bit values changed (randomly
determined in advance). Two aspects of the simulations underlie the strong age
of acquisition effects that were observed. One has to do with the nature of the
patterns that were trained and the other with the nature of the mapping between
input and output.
99
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The important property of the training patterns is that, unlike words in natu
ral languages, they did not exhibit a rich internal structure. The statistical struc
ture of the lexicon reflects the fact that there are constraints on the ordering of
letters and phonemes and differences in the frequencies with which these elements
occur and co-occur. Much of this structure ultimately derives from constraints
imposed by speech perception and production; for example, certain sequences of
phonemes are ruled out because they cannot easily be articulated; the relative
frequencies of patterns are determined in part by ease of articulation; and so on.
These constraints are also reflected in alphabetic writing systems because they
are codes for representing speech. In contrast, the stimuli in the Ellis and Lam
bon Ralph simulation were constructed so that the probability that any given
unit was on was independent of the probabilities for all other units. Under this
condition, what is learned about one pattern does not carry information about
other patterns. Using an architecture with a smaller number of hidden units than
input or output units promotes the discovery of subregularities that hold across
patterns (as occurs, e.g., with words). If these regularities do not exist, how
ever, the model can only learn the task by memorizing individual patterns, even
though the mapping is prima facie highly consistent. Under these conditions,
early-trained patterns become entrenched: the large initial weight changes that
favor these patterns are difficult for later-trained patterns to overcome.
The nature of the mapping between input and output codes also promoted
pattern memorization in these simulations. The fact that the mapping between
input and ouput involved random changes to 10% of the bits meant that the model
100
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
could not generalize from early-trained patterns to later-trained ones accurately.
The mapping between input and output codes contained a partial regularity (90%
of the input bits mapped onto the corresponding output bit) but the inconsistent
elements were random and therefore unlearnable except by memorization.
The Smith et al. (2001) simulation was similar in that the stimuli were random
bit patterns that were not internally structured. Their model was also trained
to copy the input to the output through a smaller number of hidden units, but
without the random changes to 10% of the bits. Like Ellis and Lambon Ralph’s
model, Smith et al.’s performed the task by memorizing the training patterns,
and again exhibited entrenchment of early-learned patterns.
The Monaghan and Ellis (2002) simulation also conforms to this analysis, al
though it differs from the other simulations in interesting ways. The simulation
again involved a simple feedforward network. Unlike the simulations discussed
above, the training patterns were designed to capture some aspects of lexical
structure. The input and output layers were divided into three slots, analo
gous to a CVC syllablic structure. Within each slot there were ten bit patterns
(“phonemes”) that were repeated across stimuli in the training set. Thus there
were constraints on which units could and could not be simultaneously activated;
what was learned about one occurrence of a pattern over the whole set of input
units could carry over to other patterns with which it overlapped - i.e., those
containing the same “phonemes.”
Monaghan and Ellis also manipulated the consistency of the mapping from
input to output. In a behavioral experiment, they found that whereas words
101
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
with inconsistent spelling-sound correspondences produced an age of acquisition
effect, words with consistent correspondences did not. The stimuli in this study
were discussed earlier; there is some evidence that the effect was due to frequency
rather than age of acquisition. In the simulation of these effects, the consistency
of the mapping from input layer to output was varied. On 80% of the trials, the
model was trained to copy the input; on the other 20% the input the “consonants”
were copied but the “vowel” was randomly assigned to one of the other 9 possible
vowels. The consistent patterns did not produce an age of acquisition effect,
whereas the inconsistent patterns did.
The results for the consistent condition are like those we observed in Simula
tions 1: no age of acquisition effect when the stimuli overlap in structure. The
results for inconsistent patterns appear to conflict with the results of Simulation
2, in which we did not observe an age of acquisition effect for words with atypical
(“inconsistent”) spelling-sound correspondences. However, the differing results
are traceable to properties of the stimuli. Our model was trained on a large set
of words; the critical stimuli were a subset of “strange” words that contain atyp
ical spelling-sound correspondences. The modeling indicates that these words
nonetheless overlap sufficiently with other words in the corpus to wash out the
initial advantage for early-trained items.
Monaghan and Ellis’ inconsistent stimuli were wordlike patterns in which the
“vowel” was randomly mapped onto other vowels for 20% of the items. Given
the arbitrary nature of these mappings, the model could only perform the task
by memorizing the patterns. As in other conditions in which patterns must be
102
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
memorized, there was a strong age of acquisition effect. It is important to note
that this degree of arbitrariness is not seen in English words, even strange ones.
Although vowel graphemes in English map onto multiple phonemes, the range
of possibilities is constrained. No vowel grapheme maps onto all possible vowels
(Venezky, 1970); typically the irregular pronunciation is a small number of pho
netic features away from the “regular” pronunciation. Thus HAVE is irregular,
but /ae/, like /e l/ is a front, unrounded vowel, not a more distant vowel such
as /ou/. This general pattern is also observed with other irregularly-pronounced
vowels; for example, EA may be pronounced as in BEAD, BREAD and BREAK,
all of which contain mid-to-high front, unrounded vowels (/i/, /e / and /e l/ re
spectively). A word like BEIGE is “strange” in the sense that it lacks immediate
neighbors, but the El — > /e l/ mapping is supported by other words in the lexicon
(WEIGH, EIGHT, HEIR). Finally, although vowel graphemes map onto multiple
phonemes in English, the pronunciations are typically cued by surrounding let
ters. The regularities that exist over the units termed rimes (or “word-bodies”)
have been extensively studied, but there are partial regularities involving other
parts of words as well (Kessler & ; Treiman, 2001). In Monaghan and Ellis’s
stimuli, the alternative pronunciations of vowels were assigned independently of
context.
These examples illustrate only some aspects of the statistical structure of
words in English. The important point is that the characterstics of the stimuli in
the Monaghan and Ellis simulation were quite different, even though the simula
tion was intended to be relevant to consistency effects in English. Their stimuli
103
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
produced large age of acquisition effects because they lacked the redundancy of
English words.
In summary, all of the simulations of age of acquisition effects are consistent
with the same conclusion: AoA effects depend on the nature of the mapping
between codes, specifically whether what is learned about early-learned patterns
carries over to later patterns. When the stimuli and task afford this type of
learning, the network does not have to memorize individual patterns; it encodes
regularities across patterns which allow the model to generalize, washing out the
initial advantage for early-trained words. Simulations 1 and 2 provide the most
direct evidence concerning such effects in reading, insofar as the model was trained
on a large corpus of words exhibiting the spelling-sound mappings characteristic
of English. When the stimuli and task do not afford this type of learning (the Ellis
and Lambon Ralph (2000) and Smith et al. (2001) simulations, and Monaghan
and Ellis’s inconsistent condition), the network is forced to memorize patterns,
yielding an advantage for early-trained ones. In this light it is interesting to
consider our Simulation 3, in which the Early and Late items overlapped among
themselves, but not across lists. In this case, the model could generalize from one
Early item to another, and from one Late item to another, but the orthogonal
nature of the lists made it such that the Late items as a group were learned
suboptimally - the representations developed to support the Early items impeded
acquisition of the Late items.
It should be noted that our simulations did not address all aspects of lexical
processing and so cannot be taken as showing that such effects cannot occur. The
104
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
simulations involved knowledge of orthographic — > phonological correspondences
and we have argued that they are consistent with behavioral studies of age of
acquisition effects that used tasks, such as naming and lexical decision, in which
this knowledge plays an important role. The simulations suggest that the age at
which this knowledge is acquired has little impact on skilled performance. The
original age of acquisition hypothesis (Brown & Watson, 1987b; Morrison & ; Ellis,
1995) however, concerned the effect of the age at which words are acquired in
spoken language, an aspect of lexical learning our simulations did not address.
Acquiring a spoken word vocabulary involves learning mappings between phonol
ogy and semantics. Skilled reading often involves computations from orthography
to phonology to semantics (see, e.g., Van Orden, Johnston, and Hale (1988), for
behavioral evidence and Harm and Seidenberg (in press), for a computational
model). Hence the age at which children learned phonology to semantics map
pings could have a residual impact on the orthography — > phonology — » semantics
computation. None of the simulations of age of acquisition effects, including our
own, address this possibility.
This issue needs to be examined in future research. Two points should be
noted, however. First, we have presented evidence that the results of existing
behavioral studies can be explained in terms of the impact of lexical factors such
as frequency, imageability and length on word reading. Thus, it is not clear if
there is an age of acquisition effect to be explained further. Second, properties of
the phonology — » semantics mapping make it unlikely to be the source of effects
of age of acquisition on reading. The mapping between these codes is largely
105
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
arbitrary for monomorphemic words; words that overlap with the sound of the
word CAT do not overlap with it in meaning. Thus what is learned about the
phonology — > semantics mapping for CAT does not carry information that facil
itates learning the mapping for SAT or FAT. Given the computational analysis
presented above, this might seem like a condition that would promote a strong
age of acquisition effect in spoken language acquisition, which in turn could affect
reading via the shared phonology — > semantics pathway. However, other charac-
teristcs of the phonology -> semantics mapping need to be taken into account.
First, the mapping between phonology and semantics is not entirely arbitrary;
there are partial regularities among many monomorphemic words (e.g., correla
tions between the phonological characteristics of words and their grammatical
class; Kelly, 1992); more importantly, inflectional and derivational morphemes
make consistent (though quasiregular) contributions to the meanings of many
words (Seidenberg & Gonnerman, 2000). Second, both phonology and semantics
are themselves highly structured: the words of a language occupy restricted re
gions of the much larger space of possible phonological forms or meanings. All of
these properties will facilitate the learning of mappings between phonology and
semantics in many types of connectionist networks, reducing effects of the ages
at which words are learned, as in the simulations presented above.
2.7.2 Which Types of Knowledge Yield Age of Acquisition Effects?
On our account, the key issue regarding age of acquisition effects concerns the
nature of the stimuli and task being learned. The research discussed in this article,
106
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
like the behavioral studies discussed above, focused on the use of information
concerning orthographic-phonological correspondences in English. The analyses
of previous studies, the theoretical analysis of the problem, and the results of the
simulations all suggest that AoA effects are likely to be minimal in this domain.
However, the modeling led to the identification of other conditions that give rise
to age of acquisition effects. The question then is whether these conditions are
characteristic of other types of human learning. This issue needs to be considered
further using both behavioral and modeling approaches.
One obvious question is whether there are age of acquisition effects in reading
nonalphabetic writing systems such as Chinese. Written Chinese exhibits less
consistency in the mapping between written symbols (characters) and their pro
nunciations. Chinese words are usually taught as arbitrary associations between
written words and meanings, a process requiring several years for the mastery of
a few thousand characters. There may be a lasting advantage for early-learned
words in Chinese because of the more arbitrary nature of the mapping. This
unresolved empirical question needs to be addressed carefully. Many of the early-
learned words are nonarbitrary in that they contain characters that provide par
tial cues to pronunciation. The same need to control for other correlated proper
ties (e.g., frequency) will also arise. This is illustrated by recent studies of AoA
effects in reading Kanji, the Chinese characters that are part of Japanese writing.
Yamazaki, Ellis, Morrison, and Lambon Ralph (1997) reported data indicating an
AoA effect on Kanji naming; however, further analyses by Yamada, Takashima,
and Yamazaki (1998) suggest that other factors may be at work. They found
107
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
that the ease with which naive students could learn the pronunciations of the
characters in question was also a strong predictor of naming latency. Thus the
effect seems to be due to stimulus factors other than age of acquisition.
AoA effects have been observed in several tasks other than reading. Many
of these studies are also subject to the methodological concerns we have raised,
but the findings are suggestive. One task that probably yields genuine AoA
effects is learning the names associated with faces. Moore and Valentine (1998)
studied this using faces rated for both subjective frequency and AoA. The earlier
acquired faces were named more quickly than later acquired faces, with subjective
frequency controlled. Moore and Valentine (1999) also found that AoA effects in
face naming were stronger than those in name reading. Lewis (1999) found similar
effects with faces from long-running soap operas, where more objective controls of
the time at which individuals came in and out of public awareness were possible.
Whereas Moore and Valentine attributed the effects to age of acquisition, Lewis
interpreted them as effects of cumulative frequency. Although further research is
needed, the effects are consistent with the theory presented here. Unlike words,
face-name pairs provide a strong test of the AoA hypothesis, because the earlier
acquired items do not vary predictably along other dimensions that make them
easier to learn or recognize. Aside from partial phonological regularities in name
gender (Cassidy, Kelly, & Sharoni, 1998) and various national/ethnic regularities
(one rarely meets an Italian named Wong, for example), matching names to faces
is essentially an arbitrary mapping in that what is learned early does not carry
over to later items.
108
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Recent studies of Dutch by Brysbaert, Lange, and Van Wijnendaele (2000)
and Brysbaert, Van Wijnendaele, and De Deyne (2000) also yielded results con
sistent with our account. They found larger effects of AoA in Dutch on associate
generation and semantic classification tasks than on word naming. Word associ
ations have an arbitrary, learned component. The high association between pairs
such as BREAD-BUTTER or HUSBAND-WIFE cannot be simply due to overlap
in meaning because other pairs that overlap in meaning to a similar degree are not
as highly associated (e.g., BREAD-CAKE; HUSBAND-MAN). Moreover, both
associate generation and semantic classification tasks involve using knowledge
about word meanings, not merely orthographic-phonological correspondences.
The relationship between form (orthography or phonology) and meaning is much
less systematic than the relationship between orthography and phonology; words
that overlap in spelling tend to overlap in sound but not in meaning. Thus the
age of acquisition effects observed in these tasks may be related to the use of
this information. Further research is needed, however, to determine more defini
tively whether age of acquisition has an effect on the orthography — > semantics
or phonology — » semantics mappings. Furthermore, any task that uses word
meanings is open to difficulties establishing the chain of causality: Are early AoA
words easy because they are early, or are they early because they are easy? This
problem will require some ingenious methodological innovations before it can be
solved.
Finally, consider the problem of learning a second language. It is well known
that some aspects of language learning are easier for children than for adults
109
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(Johnson & Newport, 1989; Flege et al., 1999). The second language learning
situation is one in which what is learned early in experience (the first language)
is not highly predictive of what is to be learned in the later phase (the sec
ond language). Assuming that both languages make use of overlapping neural
structures (see Perani et al., 1998, for an interesting discussion) it follows that
second language learning should be disadvantaged. On this view, so-called “sen
sitive period” effects are actually extreme cases of AoA effects - failures to learn
in later life which reflect the entrenchment of early-learned patterns - and not
maturational changes in the neural substrate supporting language acquisition,
as has been classically presumed (Lenneberg, 1967; Neville & Bavelier, 2000).
Further progress in understanding how early experience interacts with learning
later in life will be facilitated by examining tasks in which such effects are likely
to be most powerful, and by further exploring the computational mechanisms
underlying these tasks.
2.7.3 Conclusions
The purpose of our research was to examine age of acquisition effects on skilled
reading, a topic with potentially broad theoretical implications that has been the
focus of considerable research. Ironically, the main conclusion to be drawn from
our research is that age of acquisition effects are likely to occur, but for tasks
other than reading an alphabetic orthography. Age of acquisition effects reflect
a loss of plasticity associated with success in mastering a task, a phenomenon
that occurs in many types of learning and species. The zebra finch’s success in
110
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
acquiring its characteristic song imposes significant constraints on its ability to
acquire additional vocal behavior (Doupe k Kuhl, 1999). Similarly, the child’s
success in acquiring the phonological inventory or syntax of a language may con
strain its ability to learn other languages (Johnson k Newport, 1989; Werker
k Tees, 1984). Issues concerning the nature and limits of plasticity in different
domains and their neural and computational bases are central ones in cognitive
neuroscience. Connectionist models provide a computational framework for un
derstanding plasticity in terms of the nature of the material to be learned, and
how what is to be learned is affected by what has already been learned. The
entrenchment phenomenon discussed above is one outcome that occurs in such
networks and we have taken a step toward specifying the conditions that give
rise to it. Under other conditions, other outcomes are observed; in the reading
case studied here, later learning is facilitated by prior knowledge rather than re
stricted by it. In the catastrophic interference case (McCloskey k Cohen, 1989),
later success in learning results in forgetting of earlier material. Gaining a deeper
understanding of the principles that govern the entire set of outcomes, and how
they relate to the various tasks that humans perform, is an important goal for
future research.
I l l
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 3
Age-limited Learning Effects in Reading II: Empirical
Studies
Some aspects of language are subject to age-limited learning effects, such that
they can only be fully mastered during specific periods of development (e.g., Flege
et al., 1999; Mayberry, Lock, & Kazmi, 2002b)1. Although the ability to learn new
words remains intact well into adulthood (e.g., McCandliss et al., 1997; Service &
Craik, 1993), there is some evidence that words which are learned early in life are
processed more quickly and accurately than later learned words (e.g. Brown &
Watson, 1987a; Morrison & Ellis, 1995). Converging results from computational
modeling and behavioral studies of these age-of-acquisition (AoA) effects suggest
new ways of thinking about age-limited learning more generally. In particular,
connectionist modeling has begun to provide a mechanistic account of the role
of learning itself in limiting adult plasticity (Ellis & Lambon Ralph, 2000; Smith
et al., 2001) and the relationship between age-lim ited learning and generalization
1This chapter has been submitted for publication as: Zevin, J. D. & Seidenberg, M. S.
(submitted) Cumulative frequency affects reading aloud; frequency trajectory does not.
112
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(Zevin & Seidenberg, 2002). The current study addresses the empirical issue of
whether age-limited learning effects arise in reading aloud.
Zevin and Seidenberg (2002) reviewed the literature on AoA effects in read
ing aloud and found it inconclusive. The basic methodological problem is that
it is difficult to manipulate AoA while matching stimulus items along other di
mensions because AoA is correlated with variables such as imageability, length,
and familiarity that also affect skilled performance. These correlations are not
accidental; the ease of learning a word is affected by these and other stimulus
properties. Thus it is very difficult to dissociate the effects of when a word was
learned from the factors that determined when it was learned.
Zevin and Seidenberg (2002) developed an alternative approach to these phe
nomena that relied on two concepts: frequency trajectory and cumulative fre
quency. Frequency trajectory refers to the frequency of exposure to a word over
time. Cumulative frequency refers to the total number of exposures to a word.
Words with the same cumulative frequency may exhibit different trajectories. For
example, some words are very frequent in speech and texts intended for young
children, but less frequent in adult discourse (e.g., MITTEN, STROLLER); these
words are learned relatively early but used less in adulthood. Conversely, some
words are learned and used more often in adulthood and occur rarely or not at
all in childhood (e.g., FAX, MERLOT). Frequency trajectory provides a way of
explicitly manipulating the pattern of experience with a word over time, without
relying on outcome measures such as rated or empirically derived AoA ratings.
113
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
In this way, it becomes possible to observe the effect of age-limited learning in
dependent of the confounds inherent in estimates of AoA.
In order to study the influence of frequency trajectory in reading, Zevin and
Seidenberg (2002) manipulated the frequency trajectories of particular items in
a series of connectionist models which learned mappings from spelling to sound
(based on work by Harm & Seidenberg, 1999). Items trained with a high-to-
low frequency trajectory (more frequent early in training) were learned more
quickly than items with the complementary trajectory. These manipulations
were unaffected by other potentially confounding variables because a crossed
design was used in which each item appeared in each trajectory across runs
of the model. Words in the early trajectory were learned more rapidly, but
there were no residual effects of trajectory on asymptotic (“adult”) performance.
In contrast, differences in cumulative frequency of exposure to the same items
had a large effect on performance throughout training. Zevin and Seidenberg
(2002) hypothesized that the lack of a frequency trajectory effect was due to the
systematicity of the mapping between spelling and sound: because this mapping is
quasiregular (Seidenberg & McClelland, 1989), knowledge of early-learned items
is helpful in learning and processing later items. In a further simulation, the
overlap between early and late items was eliminated by artificially manipulating
the training set. In this case frequency trajectory effects were observed, further
suggesting that generalization from early to late items was responsible for the
lack of frequency trajectory effects in the more realistic models.
114
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The Zevin and Seidenberg (2002) results were somewhat surprising, given that
other connectionist models have shown and advantage for early-learned patterns
in a range of tasks (Smith et al., 2001; Ellis & Lambon Ralph, 2000). However,
these earlier models learned mappings between sets of random bit patterns, which
provide no basis for generalization from early to late learned items. Similarly,
Monaghan and Ellis (2002) manipulated the consistency of the mappings used in
their simulations, and found age-limited learning effects for inconsistent mappings
only. The way in which they manipulated consistency was not directly analogous
to the way such effects arise in the mapping from spelling to sound: the “vowels”
of inconsistent items could take any of ten possible pronunciations, and each
pronunciation was represented by a different, random pattern of activation. This
means that among inconsistent items there was no basis for generalization from
the rest of the training set - just the set of circumstances under which Zevin and
Seidenberg (2002) predicted that age-limited learning effects should arise.
Taken together, the modeling work suggests that age-limited learning effects
should be most prominent in domains where knowledge of early items does not
readily generalize to processing of late items. Thus, whereas the models predict
that there should be no effect of age-limited learning on reading aloud, such ef
fects may well arise in tasks which require the use of semantic information. The
mappings between semantics and the two sensory codes, orthography and phonol
ogy, are more arbitrary than the mappings between orthography and phonology.
Thus what is learned about the mapping between spelling and meaning for, say
115
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
CAT, does not carry over to similar words such as BAT and CAN in the way
that information about its pronunciation does.
In the present research we examined behavioral predictions derived from the
Zevin and Seidenberg (2002) modeling, specifically: (a) whereas frequency tra
jectory should have an influence on the age at which words are learned, it should
have no residual effect on adult performance; and (b) in contrast to freqency tra
jectory, cumulative frequency should affect skilled performance. The latter effect
reflects the fact that in performance in connectionist and some other types of
models is influenced by the aggregate effects of exposure to a stimulus over time.
3.1 Experiment 1
Zevin and Seidenberg (2002) noted that AoA is essentially an outcome variable.
The age at which words are acquired depends on a number of factors, many
of which also influence adult processing, such as frequency (McRae, Jared, &
Seidenberg, 1990; Monaghan & ; Ellis, 2002; Seidenberg et al., 1984), imageabil-
ity and/or concreteness (Cortese, Simpson, & ; Woolsey, 1997; Zevin & Balota,
2000; Strain, Patterson, & Seidenberg, 2002), length (Spieler & Balota, 1997)
and neighborhood size (Andrews, 1992). Zevin and Seidenberg (2002) estimated
that approximately 70% of the variance in AoA is explained by these standard
predictors of adult reading performance.
E xperim ent 1 exam ined w hether frequency trajectory— i.e., differences in how
exposures to words are distributed over time-has any additional influence on the
age at which words are learned. We generated an empirically derived measure
116
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
of frequency trajectory using the Zeno (1995) norms. These norms consist of
estimates of the frequencies of words at each of 13 grade levels, derived from a
corpus of 17 million words derived from large sample of texts. These norms (which
are similar to the grade-level norms of Carroll, Davies, & Richman, 1971, but
based on a much larger sample) provide a good approximation of changes in the
frequencies of words over time. We also collected new ratings of age of acquisition,
imageability, and concreteness for all words in the study. Most previous studies
have relied on the Gilhooly and Logie (1980) norms, which were derived from 30
British speakers. We then used these measures to predict age of acquisition, with
a focus on whether frequency trajectory has an impact independent of the other
factors.
3.1.1 Methods
3.1.1.1 Stimuli
Words were chosen from the Zeno (1995) frequency norms which had either low-
to-high or high-to-low frequency trajectories. An item was counted as having a
high-to-low frequency trajectory if its frequency in the early grades - first, second
and third - was three times as great as its frequency in the later grades - eleventh
twelfth and university-level. Low-to-high frequency was defined as the converse
pattern of exposure over time. A continuous measure of frequency trajectory was
generated for use in regression analyses by subtracting the log frequency of the
word in the latest grades from its log frequency in the earliest grades.
117
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 3.1: Descriptive statistics for norms collected in Experiment 1
Factor Range Mean
Examples
Low High
AoA 1.03 - 6.60 3.51 boo ebb
Imageability 1.15 - 6.84 4.09 nor gun
Concreteness 2.00 - 6.84 4.46 gee eel
3.1.1.2 Subjects
40 University of Wisconsin - Madison undergraduates participated in each of the
three norming studies for course credit. None provided norms for more than one
factor.
3.1.1.3 Procedure
Subjects were instructed to rate items on a scale of 1-7 for one of three factors:
Imageability, concreteness or AoA. In each case, the instructions included exam
ples of anchor points on the scales. For imageability, the examples were BADGER
(the school mascot) for highly imageable and CONFUSION for low imageable.
For concreteness, the examples were COTTON and ELM for high, IDEA and
COLOR for low. For age of acquisition the "early" examples were BALL and
DOGGIE; "late" examples were PERPLEX and COGNITIVE. Responses were
collected on an iMac running Psyscope 1.5.
118
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.1.2 Results
Norms for all items are available on the web at http://lcnl.wisc.edu/people/jdzevin/zs.appendix.hti
Table 1 reports means, ranges and examples of items at the extremes for each
factor.
Table 3.2: Correlations among norms collected in Experiment 1, and with other
lexical variables.
Variable AoA Freq. Traj. Cum. Freq. Subj. Freq. Length N Imageability
Freq. Traj. -0.5362
Cum. Freq. -0.6871 0.1054
Subj. Freq. -0.7115 0.1005 0.8354
Length 0.2869 -0.3196 -0.1342 -0.1358
N -0.3108 0.3013 0.1791 0.1618 -0.6606
Imageability -0.3048 0.3140 -0.0093 -0.0712 -0.0893 0.0755
Concretenss -0.2112 0.2289 -0.0303 -0.1149 -0.1266 0.1125 0.9309
Note: Freq. Traj. = frequency trajectory; Cum. Freq. = Cumulative Frequency
from Zeno (1995); Subj. Freq. = Balota et al. (2001) subjective frequency; N =
Coltheart’s N.
Zero-order correlations among variables are presented in Table 2. The two
measures of frequency had the highest correlation with AoA, followed by fre
quency trajectory. The R2 value was .797 for the regression of all 7 variables
against AoA. Removing frequency trajectory from the analysis resulted in an R2
of .702. Thus, the proportion of variance explained by frequency trajectory with
the other predictors partialed out was estimated at .095, t = 12.26, p < .001.
Other variables which explained significant proportions of the variance in AoA
were cumulative frequency (r2 = .018) t = 5.32, p < .001, subjective frequency
(r2 = .062) t = 9.89, p < .001, imageability (r2 = .019) t = 5.44, p < .001 and con
creteness (r2 = .004) t= 2.66, p < .01. Coltheart’s N predicted a marginally sig
nificant proportion of the variance t= 1.69, p = .092. Estimates for Imageability
and Concreteness may be compromised because they were nearly non-orthogonal
119
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(r= .93 for the correlation between them). Using an average of the two instead
gives an estimate of .037 of the variance explained, t= 7.43, p < .001.
3.1.3 Discussion
Although new norms for AoA, concreteness, imageability and familiarity were
used, along with a new set of items, the results closely replicated those presented
by Zevin and Seidenberg (2002), in that the standard lexical predictors - cumula
tive frequency, length, Coltheart’s N, imageability, concreteness and familiarity -
accounted for about 70% of the variance in AoA. These results are also consistent
with factor analyses conducted by Bates et al. (2002) which showed that AoA
loaded on both a “frequency” factor and a “semantic” factor. Frequency trajectory
accounted for an additional 9.5% of the variance when these other variables were
taken into account. This result indicates that the pattern of exposures to a word
over time has an influence on the age at which it is acquired. Unsurprisingly,
words that are frequent early in development are learned earlier than words that
are relatively low frequency early in development.
In contrast with AoA, frequency trajectory is only moderately correlated with
variables that influence adult performance. In fact, when frequency trajectory is
considered as the dependent variable and cumulative frequency, length, N, image
ability, concreteness and subjective frequency are regressed on it, the proportion
of variance explained is only .25. The fact that frequency trajectory is strongly
related to the age at which words are learned, but only weakly related to other
120
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 3.3: Unique variance in naming RT accounted for by frequency trajectory
in a regression analysis with six other variables
unique
Study r variance
Seidenberg & Waters
(1989) -.14 .0001 n.s.
Spieler k Balota
(1997)______________ -.20 .0011 n.s.
factors that affect adult performance makes it a useful tool for examining age-
related effects in lexical processing. Specifically, if there is a residual effect of
early experience on skilled performance, frequency trajectory should also affect
adult naming latencies. However, Zevin and Seidenberg’s modeling suggests that
frequency trajectory has no impact on skilled performance whereas cumulative
frequency does.
Having established that frequency trajectory influences that age at which
words are acquired, we can now ask whether it has any effect on skilled adult
processing. One way to do this is to consider a regression analysis of frequency
trajectory - along with the other lexical predictors discussed here - on reaction
time in the naming task. As shown in Table 3, although frequency trajectory is
correlated with naming latency in both the Seidenberg and Waters (1989) and
Spieler and Balota (1997) study, when length, N, cumulative frequency, subjective
frequency, imageability and concreteness are taken into account, it explains no
unique variance in naming latency. In Experiment 2, we employ a factorial design
to provide a more direct test of this effect.
121
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
• Early
° Late
I* 120
| 100
• Early
° Late
I
I
I
Figure 3.1: Frequency trajectories of items in Experiment 2. Top: high-frequency
items. Bottom: low-frequency items. Error bars represent standard error.
3.2 Experiment 2
The results of Experiment 1 suggest that there is no influence of frequency tra
jectory on adult reading performance. This is consistent with the analyses and
modeling work presented by Zevin and Seidenberg (2002). Those models made
the further prediction that although frequency trajectory has no influence on
adult performance, cumulative frequency does.
Cumulative frequency effects are robust in models trained with a gradient de
scent learning algorithm because every exposure to a word improves performance
122
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
on that particular item (as well as items with which it shares structure). Fre
quency trajectory effects are subtler, and arise when the network structure that
forms as a result of early learning prevents optimal acquisition of later knowl
edge. However, because the quasiregular nature of the mapping from spelling to
sound allows generalization from early-learned items to late-learned items, the
models predict no influence of frequency trajectory in reading aloud. In the first
experiment, we test the predictions of the models by manipulating cumulative
frequency and frequency trajectory independently in a factorial design.
3.2.1 Methods
3.2.1.1 Stimuli and Design
112 words were selected in a 2 (frequency trajectory) x 2 (cumulative frequency)
design. The frequency trajectories are shown separately for high and low cumula
tive frequency conditions in Figure 1. As shown in Table 4, stimuli were matched
listwise for length and N.
Subjective frequency was constant across levels of frequency trajectory, but
not across levels of cumulative frequency. AoA was not controlled in either case.
To understand why this was not done, consider the graphs in Figure 1. The
late-acquired, high-frequency items are more frequent at every grade level than
the early-acquired low-frequency items. However, because frequency trajectory
is manipulated at each level of cumulative frequency, a main effect of frequency
trajectory would still indicate an effect of age-limited learning.
123
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 3.4: Descriptive statistics for items in Experiment 2
Factor HF LF Diff Early Late Diff
Cumulative Frequency 6.68 3.13 3.35** 4.98 4.83 0.15
Subjective Frequency 4.77 3.43 1.34** 4.04 4.15 -0.11
AoA 3.40 4.35 -0.95** 3.29 4.67 -1.38**
Length 4.4 4.4 0 4.3 4.5 -0.2
N 5.6 5.2 0.4 5.7 5.0 0.7
Imageability 3.97 4.06 -0.09 4.48 3.55 0.93**
Concreteness 4.37 4.48 -0.11 4.67 4.18 0.49*
Note: * p < .05; * * p < .001; Cumulative frequency = log (Zeno, 1995)fre-
quency, summed across all grades; subjective frequency estimates from Balota
et al. (2001). Figures for cumulative frequency are collapsed across frequency
trajectory manipulations and vice versa to highlight manipulations for main ef
fects.
Finally, in order to have a large enough list of stimuli, concreteness and im
ageability were allowed to covary with frequency trajectory, so that early items
were both more imageable and more concrete than late-acquired items. Because
semantic variables have only weak effects on reading aloud, particularly for consis
tent words (Strain et al., 1995; Zevin & Balota, 2000), it was preferable to match
the lists carefully for length, N and cumulative frequency (where appropriate)
and control for concreteness and imageability statistically (using ANCOVA).
3.2.1.2 Subjects
38 University of Wisconsin - Madison undergraduates participated in the naming
study for course credit or a $5 remuneration.
124
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.2.1.3 Procedure
Words were presented using Psyscope 1.5 on an iMac computer. Word lists
were randomized anew for each subject. Subjects were seated a comfortable
distance from the computer and instructed to read each word aloud as quickly
and accurately as possible. On each trial, a fixation cross was presented for 500
m s, followed by a word. Eight practice trials were run to allow subjects to become
accustomed to the procedure. Response times were recorded using the Psyscope
button box, and were scored for accuracy online by the experimenter. Sessions
were taped to allow offline revisions of these scores.
3.2.2 Results
As shown in Figure 2, a large (16ms) effect of cumulative frequency was present,
whereas only a small (3ms) effect of frequency trajectory was present. Only the
main effect of cumulative frequency was significant in the ANCOVA on response
latencies, F(l,106) = 6.22, p < .05. Neither the main effect of frequency trajec
tory nor the interaction between frequency trajectory and cumulative frequency
were reliable, both Fs < 1. Effects of the covariates (imageability and concrete
ness) were also nonsignificant, both Fs < 1.
There were no effects of any kind in the error data. The mean proportion of
errors was 2% in each condition.
125
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
510
500
e3 490
470
LF
H F
460
Late Early
Frequency Trajectory
Figure 3.2: Adjusted means for Experiment 2
3.2.3 Discussion
The results of Experiment 2 were clear. We found no evidence of a frequency
trajectory effect, although we found very strong cumulative frequency effects with
the same items. This corroborates the regression analyses presented above, as
well as the analyses of Zevin and Seidenberg (2002). Furthermore, this pattern
of results was predicted by the models in Zevin and Seidenberg (2002).
Interpreting null results can be difficult, because inferential statistics that
test their reliability are less developed than for positive results. One way to
determine the reliability of a null result is to conduct a comparison of effect sizes
across a range of studies. Table 5 displays effect sizes for a number of studies
which have examined cumulative frequency and AoA, along with the current
study. The effect size for AoA and cumulative frequency are roughly equivalent
(Cohen, 2002, defines effect sizes between .2 and .49 as "small"). The effect size
for frequency trajectory in the current study, however, is much smaller. It is
important to note that the effect sizes for AoA in earlier studies do not provide
126
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Study Factor
Effect
Size (f)
Gerhand & AoA .21
Barry (1998) Frequency .35
Monaghan & AoA .20
Ellis (2002) Frequency .20
Current Frequency
Results Trajectory .04
Frequency .23
Note: f= V O *"*')2
s ij
evidence of age-limited learning. In both of the studies in Table 5, AoA was
confounded with cumulative frequency (Zevin & Seidenberg, 2002).
3.3 General Discussion
The current results corroborate a number of the predictions and assumptions of
the modeling work presented in Zevin and Seidenberg (2002). In Experiment
1, a measure of frequency trajectory was developed which explains a substan
tial proportion of unique variance in the age at which words are acquired, even
when other measures that explain 70% of the variance in AoA are partialed out.
This suggests that frequency trajectory is a reliably related to the age at which
words are acquired, and therefore a useful tool in examining effects of age-limited
learning. In Experiment 2, frequency trajectory was manipulated orthogonally
to cumulative frequency. Whereas a large cumulative frequency effect was found,
no effect of frequency trajectory was found. Thus, frequency of exposure affects
127
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
the initial acquisition of a word, but skilled performance is affected by the total
number of exposures.
These results contrast with a number of studies in which independent ef
fects of AoA and frequency have been found (Morrison & Ellis, 1995; Gerhand
& Barry, 1998). By using rated norms (or objectively determined norms) for
AoA, those studies did not provide the strongest test of the age of acquisition
hypothesis. Because AoA itself is so strongly correlated with other factors which
also predict adult performance, it is extremely difficult to perform experiments
in which AoA is manipulated while other factors are held constant. As a result,
all but one of the previous studies had confounded AoA and frequency. Inter
estingly, when cumulative frequency was appropriately controlled in a study by
Monaghan and Ellis (2002), no AoA effect was found. Because frequency trajec
tory is less strongly correlated with factors that influence adult performance, it
allows a more stringent test of the hypothesis that age-limited learning plays a
role in lexical processing.
Although it is difficult to argue conclusively for the null hypothesis, three
aspects of the current data force us to conclude that frequency trajectory has no
influence on adult reading performance. First, we failed to find frequency trajec
tory effects in two different experiments, one of which was replicated using two
different data sets. Second, the factorial design of Experiment 2 provided enough
statistical power to detect effects of cumulative frequency, which are typically
quite weak for consistent words. The null effect of frequency trajectory in this
experiment is thus unlikely to be the result of a type II error. At the very least,
128
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
we have shown the effect of frequency trajectory to be much weaker than the
effect of cumulative frequency - in contrast to previous reports in which effects of
AoA were typically equal to or larger in size than frequency effects. Finally, the
results of the experiments support predictions derived from a model which covers
a broad range of other phenomena (Harm & Seidenberg, 1999; Harm, 1998). The
effect of frequency trajectory on AoA, the main effect of cumulative frequency
and the absence of a frequency trajectory effect are all explicit predictions of the
modeling work reported by Zevin and Seidenberg (2002).
In conclusion, whereas frequency trajectory influences the age at which words
are acquired, it has no residual influence on reading aloud. However, it remains
possible that frequency trajectory may have an effect on reading tasks that more
strongly implicate semantic information. Mappings from spelling and sound to
meaning are largely arbitrary. Therfeore, it is possible that early learning could
interfere with later learning involving semantics. Because frequency trajectory is
a strong predictor of AoA, and yet is less strongly correlated with factors that
influence both AoA and adult processing, it provides a unique tool for examining
effects of age-limited learning effects. Experiments using the methodology intro
duced here can help advance the understanding of the mechanisms underlying
age-limited learning more generally by determining which tasks do (and do not)
give rise to frequency trajectory effects.
129
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 4
A model of cross-linguistic speech perception
Foreign accents are common among late learners of second languages1. Early
linguistic descriptions of speaking with an accent suggest that it consists (at least
in part) of misapplying phonetic knowledge from one’s native language (LI) to the
second language (L2) (e.g., Trubetzkoy, 1939/1969; Weinreich, 1957). Research
over the last few decades has revealed that not only are both speech perception
and speech production in L2 strongly influenced by LI knowledge, but that this
influence begins quite early in life (Werker & Tees, 1984) and that the age at
which one starts learning a second language largely determines how strong this
influence will be (Flege et al., 1999).
This chapter describes a connectionist model which provides mechanistic ex
planations for a number of cross-linguistic speech perception phenomena. The
model learns to map from acoustic input to articulatory output for a large set
of CV syllables sampled from the phonological inventory of American English.
1 Parts of this work have been reported as: Keidel, J. L., Zevin, J. D., Kluender, K. R.,
& Seidenberg, M. S. (2003) Modeling the role of native language knowledge in the perception
of nonnative speech sounds. Talk to be presented at 15th Annual International Conference of
Phonetic Sciences, Barcelona, Spain.
130
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
After extensive training on this reduced sample of English speech, the model’s
nai've perception of foreign speech sounds agrees remarkably well with human
data using similar stimuli. Furthermore, simulations in which L2 speech stimuli
are introduced at different points during training demonstrate that age-limited
learning effects in L2 speech perception can result from the entrenchment of net
work structure that supports LI perception. That is, plasticity in the model
decreases as a result of learning, and not as a result of any parametric change
that governs the rate with which connection weights can be adjusted, or the
“pruning” of particular sets of connections.
4.1 Existing models
A number of verbal models have been proposed to explain cross-linguistic speech
perception and L2 phonetic learning. These models have in common the assump
tion that knowledge of the LI phonetic inventory plays a role in the perception
and acquisition of novel phonetic knowledge, but differ in their particular as
sumptions about how this occurs.
4.1.1 Flege’s SLM
The Speech Learning Model (SLM) is a framework for predicting the learnability
of particular L2 phonemes, taking into account the relationship of the phoneme to
be learned to the phonological inventory of LI, similarity to other L2 phonemes,
and the complex interaction of these and other factors. In some instances, the
similarity of L2 categories to LI categories helpful - for example, when both
131
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
depend on the same dimension (e.g., voicing, in the / 0/ vs. /3 / contrast for
native French speakers learning English, Jamieson & Morosan, 1986). In other
cases it is harmful, because the LI sound is simply substituted for the L2 sound in
perception and production (e.g., English speakers’ perception and pronunciation
of French/u/, Rochet, 1995).
The SLM explains age-limited learning effects in terms of equivalence clas
sification - when contrastive sounds in L2 differ along a dimension that is not
contrastive in LI they are classified as exemplars of the same category. When
this occurs, there is no impetus to form a novel phonetic category for the new
speech sound. In this way, the model explains age-limited learning effects in
speech perception as “structural” rather than “parametric.”
While the SLM accomodates a wide range of phenomena, it is somewhat
vague with regard to the mechanism by which sounds are assimilated to existing
categories. Flege (1995) describes perception of L2 sounds in terms of fitting
the sounds to “prototypes,” and formation of novel categories or adjustment of
existing ones in terms of changes to these prototypes, or the addition of new
ones. The model also suggests that learning plays an important role in the
formation and modification of these prototypes, but does not advance an explicit
notion of how this learning takes place. This weakens SLM’ s predictive power.
Without explicit claims about the mechanisms underlying either perception or
learning of L2 phonetic categories, it is difficult to predict the ease or difficulty of
acquiring particular sounds. In practice, predictions in the SLM are derived from
132
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
contrastive analysis of phonological inventories and identification experiments in
which assimilation patterns are empirically determined.
4.1.2 KuhPs NLM
Kuhl suggests a mechanism by which native language experience shapes percep
tion of foreign speech sounds, namely, the “Native Language Magnet” model. On
this view, prototypical values for phonemes “warp the space around them” so that
changes along relevant dimensions in the speech signal (e.g., the relative position
of formants in vowel perception) are less readily perceived when they are close to
a prototype for an LI category (Kuhl, 2000)
This model is problematic, however, on two counts. First, the dimensions
along which this warping of space is meant to occur have never been adequately
described. This means that, while the model can account for some finding re
garding vowel perception, it is not at all clear how it would extend to account for
crosslinguistic effects in the perception of consonants.
The more serious problem with NLM, however, is the behavioral data regard
ing subcategory phonetic structure which it was initially developed to account for,
i.e., the Perceptual Magnet Effect (PME). In a series of experiments, Kuhl and
colleagues (Iverson & Kuhl, 1995; ?) used vowel stimuli that varied in formant
values around a prototypical /if (according to figures from Peterson & ; Barney,
1952). They first asked subjects to rate these stimuli as exemplars of /i/. Some of
the stimuli were consistently rated as good exemplars, whereas others were rated
as poorer exemplars. They then ran a series of discrimination tasks. They found
133
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
that discrimination was best for contrasts involving the poorer (“non-prototype”)
stimuli than for contrasts involving the better (“prototype”) stimuli. This was
true despite the fact that all contrasts were generated using changes in formant
values that had been equated for their psychophysical distance from one another
in Mel space.
Lotto, Kluender, and Holt (1998) demonstrated that the apparent structure of
phonetic categories “revealed” in these experiments could be explained in terms of
categorical perception and contrast effects. That is, the non-prototypical stimuli
in these experiments were actually on the border between the category in question
and a second category (e.g., the poor exemplars of j\j had FI and F2-F1 values
closer to modal values for /I/). Thus, the apparent increase in discrimination
for non-prototypical stimuli is actually an improvement in discriminability at
a category border, i.e., an effect of “acquired dissimilarity” (Liberman, Harris,
Hoffman, & Griffith, 1957). Thus, the results from within-language perception
do not support the assumptions of the NLM.
There is also a theoretical problem with vagueness of the notion of “prototype:”
as described by Kuhl (2000), because it has never been made clear whether this
is meant to refer to a representational prototypes (Smith, Shoben, & Rips, 1974),
or prototypes that emerge from multiple stored episodic representations (Hintz-
man, 1986), or attractors in multilayer perceptrons such as in the current model.
However, the basic premise of the NLM - broadly construed - has a good deal
of face validity, and in fact the current model is based on some similar assump
tions. The strongest aspect of the NLM, on my view, is that it represents the
134
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
assertion that perception is shaped by experience, such that extensive experience
with one’s native language can result in an inability to perceive foreign speech
sounds veridically. Furthermore, the notion that effects in cross-linguistic speech
perception are contiguous with the influence of fine-grained categorical structure
in LI is an important one. Both of these elements of NLM are preserved in the
current simulations.
4.1.3 Best’s PAM
The Perceptual Assimilation Model (PAM) provides a different perspective on
the role of LI knowledge in L2 speech perception. The most important differ
ence between PAM and the other models discussed here is its commitment to
direct realist assumptions. On this view, speech perception centrally involves the
direct pick-up of gestural information from the acoustic waveform. Perceptual
learning consists of tuning the perceptual system to the higher-order regularities
among coordinated gestures that represent contrastive categories in one’ s native
language. Thus, as in both the SLM and NLM, PAM proposes that literal infor
mation about particular phones is lost in the process of learning to treat speech
sounds categorically. However, in PAM, the medium (articulatory vs. acoustic)
as well as the mechanism (direct perception vs. computation) are different.
This means that PAM’s predictions about cross-linguistic speech perception
are in principle derived from contrastive analysis of phonological inventories in
formed by strictly gestural information. In practice, however, predictions from
135
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
the model are driven by empirically derived patterns of assimilation, which, as
we shall see, are not unambiguously gestural in nature.
PAM predicts that L2 speech sounds should be readily distinguishable from
one another when they either 1) are assimilated to different LI categories, or 2)
are not assimilated to LI categories at all. In cases where both L2 sounds are
assimilated to the same category, the ability to discriminate them should be poor,
unless one of the L2 sounds differs from the LI category sufficiently that it is less
assimilable. The problem with this framework is that it begs the question of why
one contrast should be more or less assimilable to a native category than another.
Furthermore, it requires a second (non-phonetic) form of perception to explain
the full range of perception. For example, consider the click contrasts in Best,
McRoberts, and Sithole (1988). They examined exclusively place and voicing
distinctions, which have highly salient acoustic properties and are thus easy to
distinguish. But if one examined discrimination of pre-nasalization contrasts,
these would presumably be harder, because the pre-nasalization is simply harder
to retrieve from the proximal stimulus.
This is not a serious problem for PAM in the case on non-assimilable contrasts,
but one has to imagine that something similar is going on in the case of contrasts
that vary in their assimilability. For example, Best et al. (2001) compared native
English speakers’ performance on two contrasts: /kh/ - /k ’/ and /b / - / 6/. In
the case of the pulmonic/ejective contrast, the two phones differ essentially by a
single gesture - in the ejective, the closure and release of the velar constriction
is accompanied by a raising of the larynx. In the case of the implosive/pulmonic
136
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
distinction, the two stimuli differ by one gesture - in the implosive, the closure
of the pulmonic is accompanied by a larynx lowering gesture. Both contrasts
also involve differences in timing of voice onset time (the ejective has a longer
lag than the voiceless pulmonic, the implosive has a longer lead than the voiced
pulmonic).
As we shall see in more detail below, performance on these two contrasts
differed significantly - and, in fact, the data were taken as support for PAM.
However, the data are actually problematic for PAM, because they point up
the difficulty of generating a priori predictions based purely on considerations
of gestural similarity. If both acoustic and gestural information can be used in
discriminating L2 phonetic contrasts, how are we to parcel out the influence of
each of these variables? On a purely gestural account, both /k ’/ and / 6/ ought
to be equally assimilable to English categories, because the degree to which they
differ from the pulmonic stops to which they are most similar is essentially the
same. Both depend primarily on a gesture not used contrastively in English
(lowering or raising the larynx) and variation in timing of VOT in a range outside
the contrastive range in English. They differ in their assimilability because the
acoustic consequences of the underlying gestures differ in their salience.
Because it lacks explicit assumptions about why some phones are more as
similable than others, and therefore has no principled way of predicting the ease
with which particular L2 phones will be assimilated to particular LI categories,
the PAM is in the same position as the SLM. It predicts that assimilation to LI
categories will have a strong influence on the perception of L2 phones, but does
137
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
not provide a mechanism for predicting patterns of assimilation a priori. Indeed,
this weakness of the model has been noted by Best (1995) who proposed using a
gestural similarity matrix to generate specific quantitative predictions from PAM.
Like the SLM, PAM relies on impressionistic contrastive analyses and empirically
derived patterns of assimilation to generate explicit predictions.
4.1.4 McClelland et al.’s Hebbian Model
The modeling work of McClelland, Thomas, McCandliss, and Fiez (1999) focuses
specifically on the question of how early learning can interfere with later learning
in a neural network that has no parametric decline in plasticity. They imple
mented a set of two-layer Hebbian networks which were trained to categorize
dispersed “blobs” of activity on their input layer into single-unit representations
on the output. After extensive training with one set of categories, its knowledge
began to limit its plasticity in a very specific way. That is, it could not learn
to differentiate stimuli which it had learned were members of the same category.
Once this state was reached, training with a novel set of categories had virtually
no effect on the model’s performance - although other models initially trained
with the second set of categories acquired them readily.
The lack of plasticity in the “adult” model resulted from an interaction be
tween the model’s knowledge of its input space and the statistical relationship
between this knowledge and the novel categories it was being trained to perceive.
Interestingly, novel stimuli could be learned, but only when they were manipu
lated so as not to overlap with existing categories.
138
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
This insight was applied in a set of training studies in which Japanese speakers
were trained to discriminate /l/ from jif (McCandliss, Fiez, Conway, & McClel
land, 1999). The notion was that Japanese speakers are unable to perceive the
distinction between these two sounds because their native language treats them
as members of the same category, roughly analogous to the overlapping “blobs”
of activity in the simulation models. Thus, many native speakers of Japanese
come to associate both jij and /l/ sounds to their own (single) native category.
Interestingly, training with exaggerated exemplars of jij and /l/ sounds gener
ated with a speech synthesizer, most of the participants in the McCandliss et al.
study were able to learn to distinguish jij from /l/, although tests of their ability
to generalize this knowledge were fairly limited (see also Logan, Lively, k. Pisoni,
1991; Lively, Logan, & Pisoni, 1993).
Although the model provides a clear demonstration of how a simple associative
learning mechanism can give rise to age-limited plasticity effects, it provides
only a rough guide to predicting which particular contrasts will be hardest for
L2 learners to acquire. This is because the model’s input space is defined by
only two dimensions, whereas phonetic categories are determined by many more
dimensions, and by probabilistic, rather than categorical distributions of stimuli
along those dimensions.
139
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.2 The Current Model
We have seen that there are a number of theories that address cross-linguistic
speech perception. What they have in common is the premise that L2 speech per
ception is heavily influenced by knowledge of LI categories. The SLM and PAM
lack explicitly formulated mechanisms for generating predictions about how L2
speech sounds will be assimilated, and furthermore lack mechanistic explanations
of how age-limited learning arises. The NLM and Hebbian models suggest explicit
mechanisms, but are limited in their ability to make contact with the empirical
literature. For example, it is not at all clear how to extend the NLM beyond
perception of vowels without making assumptions about which dimensions are
critical to consonant perception (this is problematic because these assumptions
are just the kind of thing one would want the learner to “ find” in the input). The
representations employed in the Hebbian model also make it too abstract to be
applied to actual speech sounds.
The contribution of the current model is that it provides a mechanistic ap
proach to understanding both the patterns of perception in naive listeners, and
the patterns of plasticity in response to long-term exposure to a second language.
The model implements the hypothesis that knowledge of the LI phonological in
ventory is acquired via an associative mechanism that is sensitive to the statistical
regularities of the input. The same learning mechanism provides an explanation
for age-limited learning phenomena observed in speech perception and produc
tion.
140
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The model’s knowledge of speech sounds is encoded in a set of weights that
mediate between an acoustic (input) layer, and an articulatory (output) layer.
The values on these weights are set by extensive training in which the model learns
to generate appropriate articulatory outputs in response to English speech sounds.
As the system becomes increasingly tuned to the perception of English, it develops
very strong connections, which support highly fault-tolerant perception, and good
generalization to exemplars of familiar phonemes. This specificity works against
the veridical perception and learning of novel speech sounds, however. Note that
this general form of explanation is quite similar to the Hebbian models proposed
by McClelland, Thomas, McCandliss, and Fiez (1999), although, as discussed
in more detail below, the precise mechanisms by which plasticity declines are
somewhat different.
4.2.1 Target Phenomena
4.2.1.1 Naive L2 speech perception: Results from Best et al. (2001)
In an important study of cross-linguistic speech perception, Best et al. (2001)
presented contrasts from isiZulu which varied in their relation to the English
phonetic inventory to English speakers in an AXB task. The contrasts were
classified according to their patterns of assimilation as “single category,” “category
goodness,” or “two category.”
In the single category (SC) assimilation pattern, both foreign sounds are
mapped onto the same native category, yielding very poor discrimination per
formance. For example, in the distinction between the isiZulu implosive (/6/)
141
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
and pulmonic biliabial stop (/b/), the implosive was often perceived as a /b /.
This distinction was perceived with only approximately 65% accuracy by native
speakers of American English.
Category goodness (CG) distinctions are just that; while both sounds map
onto the same category, they do not do so equally well. An example of this is
the contrast between the velar ejective stop consonant and the pulmonic voiceless
aspirated velar stop (/k’/-/k h/). Because the former has a much more forceful
burst that than either the isiZulu or English /kh/, it is judged as a significantly
worse exemplar of the /kh/ category. Discrimination for this contrast was there
fore predicted to be good, but not perfect. Best et al. found approximately 90%
discrimination for these stimuli.
Finally, two category (TC) distinctions involve two sounds, each of which
assimilate to a different LI category. IsiZulu voiced and voiceless lateral fricatives
(/I/ - /I5/) have this property. They are produced in a manner similar to English
/s / or /J/, but with lowering of one side or both sides of the tongue, as occurs in
English /l/. It was predicted that since these sounds were assimilated to either
jij and /s / or /0 / and /5 / respectively, discrimination performance would be
uniformly good. Discrimination performance was approximately 95% for these
items. Thus, success in the discrimination task was related to the manner which
the same stimuli were assimilated to LI categories, providing an interesting target
phenomenon for any model that assumes a relationship between the structure of
the LI phonetic inventory and L2 speech perception.
142
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.2.1.2 Failure to acquire novel contrasts with extended exposure
As is the case with most forms of age-limited learning in complex systems, it is
extremely difficult to establish unambiguously whether there are effects of age
on the acquisition of phonetic categories because of a number of confounds. For
example, the conditions under which younger and older immigrants learn a second
language differ greatly in terms of the kind of exposure (i.e., younger learners
typically have some formal education in their second language, including explicit
language instruction), and in their motivation to learn to speak “like a native.”
Furthermore, late learners tend to use their native language more often, and more
consistently than younger learners (e.g., Mackay, Meador, & Flege, 2001). All
of these factors are correlated with AOA, and all have direct effects on various
aspects of second-language acquisition. Critically, Flege et al. (1999) were able
to take many of these factors into account in a large-scale study of native Korean
speakers acquiring English as a second language. They found an independent
influence of AOA on accentedness ratings when the aforementioned factors were
appropriately controlled. Importantly, in the same study, no influence of AOA
was found on grammatical processing with the same controls.
Currently, although there is general agreement that LI knowledge interferes
with L2 perception, and that this is potentially related to failures of learning in
adulthood, there is no model that explicitly links the development of LI phono
logical representations to L2 development, and has the capacity to make contact
with the empirical literature at a fine grain of detail. The current model repre
sents an attempt to fill this gap.
143
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.3 Methods
4.3.1 Stimuli
2600 English CV syllables were recorded by eight male native English speakers
and one male bilingual Zulu/English speaker, and digitized at 20 kHz. The
consonant set included all English stops, fricatives and liquids, as well as the
nasal /m /. Each consonant was produced in each of the vowel contexts /a /, / ei/,
/i/ /ou/, and /u /. Each stimulus was then converted into a cochleagram with
a 15ms analysis length and a time step of 10ms using the Praat program (P.
Boersma and D. Weenink). The distance between filters was 2 Bark.
Stimuli were then over- and undersampled at rates of 21 and 19 kHz, ap
proximately equivalent to a + /- 5% change in register. This effectively tripled
the number of speakers to which the model was exposed, and discouraged over
learning of the training set. Two hundred and nineteen of these syllables were
withheld from the training set for testing. In addition, 102 isiZulu stimuli were
recorded and cochleagrams made with the same parameters. These stimuli were
used exclusively for testing.
4.3.2 Model architecture and training procedure
As shown in Figure 4.1, the network consisted of 4 layers of units. The first hidden
layer was connected recurrently to a second layer; this improves the model’s
ability to track temporal dependiencies in the input. At the beginning of training,
144
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 4.1: A. Overall model architecture. B. Auditory input to the model: a
psychoacoustic cochleogram with time in 15ms “ticks” on the X axis, frequency in
Bark on the Y axis and loudness in phons on the Z axis C. Part of the articulatory
scheme used, and the specific target values for the syllable /ba/.
each weight in the network was assigned a random value between -0.1 and 0.1.
The integration constant was set at 0.3, and a learning rate of 0.001 was used.
The input layer of the model consisted of 13 filters, each representing a range
of 2 Bark. Cochleograms (as described above) were presented sequentially in
15ms increments (“ticks”). The output layer was roughly based on task-level
descriptions of articulation (e.g., Browman & Goldstein, 1992). Groups of units
coded for the position and degree of closure of each of the following articulators:
lips, tongue tip, tongue body, velum and glottis. Coding was designed to capture
the degree of sim ilarity am ong gestures. For exam ple, degree o f closure was coded
with three units for each organ: full (stop) closure was coded as -1 -1 -1 ; frication
145
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
was coded as 1 -1 -1; approximant closure was coded as 1 1 -1, and fully open
(vowel) position was coded as 1 1 1.
All gestures were coded as pairs of events. Thus, consonants were coded
as having both a closure gesture, and an opening gesture. This provides a fair
degree of realism, in particular for voicing contrasts, are determined by the timing
of the voicing gesture relative to the opening gesture. For example, in Figure
4.1, note that the lips remain closed until just before the vowel. In a voiceless
consonant, there would be a period of aspiration between the closure and the
onset of the vowel. Furthermore, parameters not relevant to a particular target
were allowed to vary freely. For example, if the target is a bilabial closure for
/b /, the position of the tongue body is irrelevant. Because of the way error
is determined in continuous recurrent backpropagation, this means that, where
possible, the model can produce gestural overlap, such that, e.g., the tongue body
can be moving into position to produce the /a / in /b a / while the lips are forming
the consonant closure. This kind of gestural overlap is well-documented (e.g.,
Browman & ; Goldstein, 1992).
On each trial, the network was run for 50 time ticks, using the continuous
recurrent backpropogation algorithm (Pearlmutter, 1995; Harm, 1998). On each
time tick, a spectral slice from the cocleagram was clamped on the input layer.
Beginning at time tick 38, activation on the output layer was compared to the
target output for the current syllable. Error was then applied to those units
whose activation deviated from the target, and changes on the weights to each
146
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
unit were made on the basis of the magnitude and direction of the error. Training
ceased when the sum squared error (SSE) reached asymptote (at 2 million trials).
Five runs of the model were trained with different random weights and dif
ferent orders of stimulus presentation. Results presented throughout are means
across all five runs.
4.3.3 Testing procedures
4.3.3.1 Identification
Identification was simulated by determining the phoneme nearest to the literal
output of the model. For novel English stimuli the proportion of trials on which
the output matched the correct output as determined in advance. For the isiZulu
items, raw percentage of identifications are reported for the modal responses.
4.3.3.2 Discrimination
Discrimination performance was examined using an analogue of the AXB task.
10 stimuli from each of the categories understudy (/I/, /fe/> A ’/, /kh/, / 6/,
/b /, all from the same speaker) were used. On each trial, a test stimulus (X)
was compared to two other stimuli (A and B), one of which was an exemplar of
the same phonetic category, the other of which belonged to a different category.
Correct discrimination was scored if the model’s response indicated that the test
stimulus was more similar to the stimulus taken from the same category.
Responses in the AXB task were determined by examining the values on
the hidden units at the time tick corresponding to the middle of the consonant.
147
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
These values were taken for all three stimuli, and Euclidean distance was used to
determine which of the two exemplars (A or B) was perceived as more similar to
the probe (X). The proportion of correct responses out of 120 trials run for each
contrast is reported as mean accuracy.
A preliminary test of categorical perception was also run using a single /k ha/
- /ga/ continuum generated from natural speech tokens. A /kha/ stimulus with a
VOT of 90 ms was modified by replacing 10 ms at a time of aspiration with glottal
pulses from a /g a/ stimulus spoken by the same speaker. As in the AXB task,
discriminability was determined by comparing hidden unit values for adjacent
stimuli using a Euclidean distance metric.
4.4 Results
4.4.1 English Speech Sounds
In order to establish that the model had learned to perceive stimuli in its “Ll,”
we tested its performance on the set of stimuli with which it was trained, and on
English stimuli not included in the training set. Performance on the training set
was near perfect (98.7%), and generalization to untrained exemplars of English
CV syllables was quite good (92% correct).
4.4.1.1 Categorical Perception
As depicted in Figure 4.2, the model has a VOT boundary at approximately 20
ms for velar stops. Discrimination across this boundary is visibly much better
148
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
1.1
HU distance
1.0 ---------------------C Z . /rro/
0.1
0.0
- 0.1
0 20 40 60 80 100
VOT
Figure 4.2: Categorical perception of a /kha / - /g a/ continuum with even steps of
10ms VOT. Euclidean distances are reported for discriminations of N vs. N+10
on the x-axis.
than discrimination within a category. These results are ideally a wider range of
stimuli would be tested.
4.4.1.2 Confusion Matrices
In order to examine the model’s knowledge of the English phonological inventory
in greater detail, two confusion matrices were generated and compared to results
from Miller and Nicely (1955). Table 4.1 shows a confusion matrix for the output
of the model with no noise on the input; Table 4.2 shows the same matrix when
noise sufficient to induce a 20% error rate was added to the inputs. Comparison
to the human data reveals a number of striking similarities. First, the most
error-prone stimuli for both humans and the model were labio-dental (/f/, /v/)
and interdental (/9/, /3/) fricatives. These were most often confused with one
149
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.1: Confusion matrix without noise
b d
g Ph
th kh f 0 s I v
a z
3
NA
b .92 .01 .06
d .94 .05
g
.01 .03 .90 .03
Ph
.07 .92 .01 .01
th .03 .94 .02
kh .01 .99
f .92 .04 .01 .02
0 .32 .67 .01
s 1
I
.9 .03 .07
V .87 .04 .09
a .04 .86 .09
z .07 .93
3
1.00
Table 4.2: Confusion matrix for noise inducing "20% error rates.
b d
g Ph
th k h f 0 3 J v a z
3
NA
b .84 .02 .02 .01 .11
d .06 .72 .01 .03 .18
g
.02 .05 .69 .03 .02 .19
Ph
.11 .83 .01 .01 .04
th .08 .79 .03 .06 .04
k h .09
*
.03 .81 .02 .04
f .7 .2 .05 .07
9 .29 .48 .03 .2
s .97 .03
s
.79 .07 .14
V .81 .04 .15
a .06 .01 .05 .68 .02 .16
z .06 .89 .05
3 .96 .04
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
another, in particular, jij confused with /0 / and /v / confused with /5/. This
is perhaps not surprising, given that the spectral center of frication for these
consonants is a poor cue to place, and that the formant transitions are difficult
to hear because of the low amplitude of frication.
After the anterior fricatives, stops were most easily confused. Here the model’s
performance differs slightly from humans. Whereas the participants in the Miller
and Nicely (1955) study typically made place errors (e.g., mistaking a /b / for
a /g/), the model tended to make voicing errors, (e.g., mistaking a jgj for a
/kh/). Interestingly, the model’s errors reflect a modest bias in favor of voiced
consonants.
The model also agrees with the human data regarding which phonemes are
easiest to identify. The voiceless alveolar fricative / s/ and voiced post-alveolar
fricative / 3/ were identified with a high degree of accuracy, even under moderate
noise. Unlike human participants, the model had difficulty distinguishing jij
from / 3/ under moderate noise.
Overall, the confusion matrices suggest that the model’s categorization of
stimuli is fairly similar to native English speakers: Contrasts that are difficult
for native English-speaking participants are also difficult for the model, and con
trasts that are easy for are easy for the model. Unlike human listeners, the
model’s errors tend to involve voicing rather than place (except for labio-dental
and interdental fricatives, for which reponses were more realistic).
This may be an artifact of the coding scheme used in the model. Different
organs are coded by non-overlapping sets of bits in the output, whereas voicing
151
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
differences are coded by the timing of the output of a single bit. This means that
voicing confusions are, essentially, easier to make.
4.4.2 Zulu Speech Sounds
4.4.2.1 Identification
Patterns of assimilation are given in Table 1. Zulu stimuli were always assimilated
to English categories, although the consistency of assimilation patterns varied
among contrasts (as with human English speakers, Best et al., 2001).
4.4.2.2 Discrimination
As shown in Table 4.3, discrimination performance varied with stimulus type,
F(2,8)= 10.31, p < .01. As in the human data, discrimination was best for the
TC contrast (/fea/-/4a/), somewhat poorer for the CG contrast (/ga/ - /k ’a/) and
poorest for the SC contrast (/ba/ - / 6a/). Note that the CG contrast was different
from the one used by Best et al. (2001). This is because of dialect differences
between the stimuli used in their experiments and ours. The stimuli used in
our experiment were from a speaker from near Johannesburg. In this region,
ejectives are produced with a shorter VOT than near Durban, whereas Best et
al.’s informant was from. Indeed, as shown in Table 1, the model was nearly
twice as likely to identify /k ’/ stimuli as /g / than as /kh/. This is consistent
with our own perception of these sounds, and we are conducting studies with
human participants now to determine whether our stimuli yield different patterns
of results from those reported in Best et al.
152
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 4.3: Identification and discrimination scores for Zulu contrasts.
Identification
N N / s / / kV N N
b .83 b .44
d .14 v .30
ph .02 5 .20
g .93 g .51
b .04 kh .32
d .02 ph .04
3 -94 J .88
z .02 s .04
f .04
Discrim
.71 .79 .90
Note: ID = identification; proportions do not add up to 1.0 because some stimuli were
identified as unshown phonemes. DISC = discrimination; proportion correct for each
contrast (separated with vertical lines).
Table 4.4: Identification of isiZulu and English consonants by models varying in
prior exposure to English alone___________________________
Consonant
Trials b 6 k’
g fe
i TOTAL
0 91 94 85 85 90 96 90
100k 91 88 92 77 95 83 88
1M 91 88 70 77 62 79 78
2M 91 75 73 77 72 87 79
Note: Trials = number of trials of training on English alone before exposure to
isiZulu
4.4.3 Long-term training
As shown in Table 4.3, the ability to acquire isiZulu contrasts declines with
extended exposure to English. After one million trials of training on both English
and isiZulu phones, models which had initially been trained on English alone
for one or two million trials were less accurate in identifying novel phones than
models trained as “bilinguals from birth” or models in which training on isiZulu
was started at 100,000 trials.
153
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Interestingly, the models which began acquiring isiZulu later in training were
able to learn the specific items on which they were trained (mean accuracy on
training set = 98%). This is in accord with a number of training studies with
adults, in which training on speech sounds from a second language results in
improvement on performance for the particular items trained, but not for gener
alization (Lively et al., 1993).
4.5 Discussion
The simulation data presented here demonstrate that a model relying on purely
associative mechanisms can acquire representations that are similar in a number
of ways to human phonetic categories - the model generalizes quite well (though
not perfectly), the errors it makes, particularly in noise, are similar to those made
by human listeners, and its extension to non-native speech contrasts suggests that
it generalizes outside of typical LI categories in a way that is. similar to native
speakers of English.
The model also captures relationship between labeling and discrimination,
such that L2 sounds reliably labeled as members of the same category of LI
sounds are less well discriminated than those that fit two known categories. In
brief, the model demonstrates both acquired similarity and acquired contrast
in its perception of novel speech sounds. In this way, it provides a method of
generating explicit predictions about how various foreign phones will be perceived
by native English speakers. This is one element that has heretofore been missing
in models of cross-linguistic speech perception. The model’s modal responses
154
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
for isiZulu stimuli largely match with reported patterns of assimilation (in one
instance, differences in the stimuli used explain the discrepancy).
Most importantly, the resutls of experiments with long-term exposure to
isiZulu provide a mechanistic explanation of the widely observed decline in plas
ticity with age. The neural substrate for speech perception develops in the context
of exposure to a particular language. Statistical properties of the input are gradu
ally encoded in the pattern of connectivity among units in the model. Eventually,
this results in a robust representational structure that allows the model to identify
speech sounds which vary greatly in their acoustic properties as exemplars of the
same underlying categories (e.g., defined by the constellation of gestures that give
rise to them). The robustness of these representations is a double-edged sword,
however. The strong connections that emerge to support this rapid and efficient
categorization also make the model less sensitive to variation along dimensions
that are not relevant for categorizing LI phones.
This makes it more difficult to acquire novel contrasts in two ways. First,
differences along dimensions that are irrelevant in LI are difficult to perceive
- perception is shaped by experience, so these are not correctly perceived in
the first place. Second, the large weights on the connections that support LI
function to reduce the efficiency of learning because of the way error is computed
in the backpropagation function. Note that although this seems like a peculiarity
of backpropagation networks, it actually captures an aspect of natural neural
networks that appears to play a role in limiting plasticity in some systems. That
155
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
is, very strong (or very weak) connections are more difficult to modify than
connections that are intermediate in their strength.
4.5.1 Weaknesses of the model
The model has a number of weaknesses. Some of these are telling with regard
to specific assumptions of the model, others appear to reflect implementational
limitations and suggest directions for future work.
4.5.1.1 Normalization
People have little difficulty “normalizing” across a range of conditions, including
rate and speaker. That is, part of what people seem to know about phonetic
categories is that they are invariant with respect to the many acoustic (and
dimensions that vary as a result of differences in the rate with they are spoken
and variability among speakers in vocal tract length and properties of the vocal
folds that determine pitch.
The current model has not been tested on rate normalization. This is because
of the way inputs and targets were presented during training. Syllables were
vowel-centered, and targets provided at specific time points. This was done for
computational simplicity in generation of training patterns - because both the
input and output were encoded as time-varying, it was necessary to ensure that
the appearance of particular patterns on the input and targets on the output
were synchronized in some way. The simplest way to do this was to ensure that
156
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
consonants and vowels always occured at roughly the same position on both the
input and the output.
Another way around this problem is to force the model to “wait” until the end
of the syllable to begin producing its output. Initial experiments with models
given targets only during the later portions of trials suggest this method of train
ing was less efficient, but it is feasible, and could usefully be applied to issues
of rate normalization, and other interesting problems having to do with timing
which have been studied in more detail in other frameworks (Saltzman, 1995).
The model shows some ability to generalize to novel speakers. When trained
on speech tokens from seven native English speakers, tokens from an eighth were
identified correctly with a high rate of accuracy, albeit less reliably than novel
tokens from any of the other seven. Earlier versions of the model with a smaller
number of speakers in the training set did more poorly, suggesting that increasing
variability in the signal has the predictable effect of improving generalization.
One difficulty in training models such as these is that even very large training
sets (from the perspective of generating pattern files and actually running the
models on available hardware) are still fairly small relative to the wide variety
of speech input encountered in naturalistic settings. Furthermore, because all
patterns are generated from syllables spoken in isolation, in citation form, the
amount of variability in the signal is greatly underestimated. I should note that
whenever the variability in the training set has been increased, this has had the
net effect of improving performance in the model, as opposed to degrading it as
may be supposed.
157
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.5.1.2 Limits on realism
Although the input and output of the model are designed to make as few assump
tions as possible as to the representations that underlie phonetic processing, they
may be too simplistic to provide an accurate description. First, consider the
input. Although the cochleogram is based on psychophysically derived measures
of just-noticeable differences in frequency and amplitude, it does not take into
account the dynamics of the hearing system and the kinetics of speech sounds.
For example, it is known that forward masking plays a role in some aspects of
speech perception (Summerfield, Haggard, Foster, & Gray, 1984), but this is not
properly implemented in the current simulations.
On the other end, no assumptions about the physics of articulation are in
cluded. All units have equally fast responses, despite the fact that some of them
control large, slow-moving articulators (e.g., the jaw, in determining tongue-body
height) and other control small, fast articulators (e.g., the tongue tip).
It is important to note that these limitations on the realism of the model may
actually serve to impede its performance. This is because some of the compu
tational problems faced by the model are actually simplified by elements of the
perceptual and articulatory system not captured in the model architecture. For
example, forward masking accentuates bursts and formant transitions, as these re
flect abrupt changes in amplitude at specific frequencies. Including an analogue
of forward masking would thus improve the model’s perception of some stops,
and potentially improve performance on anterior fricatives (where transitions are
158
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
a much stronger cue than the low-amplitude frication of either interdentals or
labio-dentals).
4.5.2 Conclusions
The model presented here opens up a number of interesting avenues of research.
Most importantly, it provides a preliminary demonstration of a structural sensi
tive period in a realistic task. Taken together with the model presented in Chapter
2, it also supports the notion that generalization is related to age-limited learn
ing. The models in Chapter 2 did not show age-limited learning effects when
trained on a realistic analogue of reading, whereas models in this chapter did
show age-limited learning effects trained on a speech perception task.
The reason for this difference is not in the architecture or assumptions about
learning - which were largely shared between these two sets of models. Rather,
the difference in learning trajectories between reading and speech perception have
to do with an interaction between structural limits on plasticity and generaliza
tion from early to later-learned patterns. In reading, generalization is typically
adaptive. Late-learned words obey both phonotactic and orthotactic constraints
the learner can abstract from experience with earlier-learned items. Further
more, the mapping from spelling to sound is quasiregular, which means that
specific information about the mappings from particular letters or groups of let
ters to particular sounds or groups of sounds can also be generalized. In contrast,
learning to perceive a particular phonetic inventory does not necessarily provide
159
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
one with a basis for generalization to speech sounds from a second phonetic in
ventory. While it is true that some sounds will be quite similar to one’s native
language, some novel contrasts will depend on stimulus dimensions which the
experienced speaker will have learned to ignore. In this way, the structure of the
neural network that has been tuned to a particular environment gives rise to two
distinct learning trajectories: In some cases, learning is not limited by age, be
cause generalization is adaptive; in other cases, generalization from early-learned
patterns to later-learned patterns is, in fact, maladaptive, and can act to prevent
native-like attainment.
160
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 5
General Conclusions
I started with a characterization of two different kinds of age-limited plasticity.
Parametric changes in plasticity are defined as arising from mechanisms that
are different from the mechanisms by which patterns of connectivity in neural
networks are established. By contrast, what I have called structural limits on
plasticity are the result of the entrenchment of patterns of connectivity that
support early learning. I have further argued that complex phenomena such as
the acquisition of song in birds and language in humans are best characterized
as demonstrating structural, as opposed to parametric limits on plasticity.
Unlike the critical period phenomena observed in the large-scale organiza
tion of particular thalamocortical projections (e.g., Hubei & Wiesel, 1970), the
plasticity of the birdsong system declines gradually throughout the lifespan, ap
parently independent of major developmental events, such as puberty, changes
in NMDA receptor type and distribution, and large-scale changes in the size and
density of particular regions of neocortex (Troyer & Bottjer, 2002). Although
less is known about the timing of neurobiological events in human development,
161
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
the capacity for language learning clearly remains present well into adulthood,
except when the communication system acquired by the learner is so thoroughly
at variance with natural language that its entrenchment permanently impedes
the acquisition of a “native language,” (Mayberry et ah, 2002a).
In this concluding chapter, I tie together the research presented in the various
sections of this dissertation and explain why it supports the notion that age-
limited learning in the domain of language is structural, rather than parametric.
I then discuss some further predictions of this framework, and how these might
be tested.
5.1 Structural limits on language learning.
Studies of AoA effcts in word reading raise intresting questions about age-limited
learning effects in language more generally. Because there is clearly no sensitive
period for lexical learning - there is apparently no specific point in development
during which a clear, qualitative change in the ability to learn new words takes
place - these effects point up the graded, quantitative nature of age-limited learn
ing of language.
The modeling and empirical research presented in Chapters 2 and 3 addressed
a number of critical issues in the study of AoA effects in lexical tasks, providing
novel methodological and theoretical insights into the nature of these effects. In
m ethodological term s, the problem s associated w ith studying A oA in lexical tasks
are very difficult because of its high degree of correlation with other variables that
influence adult performance as well as the age at which words are acquired. In
162
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 2, I argue that frequency trajectory - a metric that takes into account
changes in frequency over time in order to capture the pattern of exposure to
particular words - is a better independent variable to manipulate, because it
isolates the effect of the state of the organism when words are acquired from
other factors which determine the order in which words are acquired.
The models also raised a critical theoretical point. Models in Chapter 2, and
human subjects in Chapter 3 showed no evidence of frequency trajectory effects
when engaged in a task that depended primarily on the transalation of spelling
to sound. By manipulating the input to the model so that early and late items
did not overlap with one another, I was able to demonstrate that the lack of fre
quency trajectory effects in this task were due to the interaction between changes
in plasticity in the model and the computational structure of the learning prob
lem. Because knowledge of spelling-to-sound translation is readily generalizable,
what is known about early-learned items actually aids in the acquisition of later
items. The smaller models with non-overlapping early and late items suggest
that frequency trajectory effects may arise in domains where early learning does
not readily generalize to later learning. For example, in mappings between or
thography and/or phonology and semantics, which are arbitrary at least in so far
as words with similar spellings or pronunciations do not necessarily have similar
meanings.
Chapter 4 extends this line of reasoning to deal with a domain in which
age-limited learning effects are more firmly established: speech perception and
production. Training on English alone for a large number of trials (one or two
163
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
million) impedes the models’ ability to acquire L2 phonetic knowledge. Interest
ingly, the models are capable of learning to identify specific L2 phones on which
they are trained, but generalization is poor relative to models trained on both
English and isiZulu from early on in training. Age-limited learning in speech per
ception and production is a highly productive area of research. Existing theories
already emphasize the role of Ll knowledge in L2 learning. The modeling work
presented here provides a mechanistic account of this process, which has the prac
tical advantage of making explicit predictions about which speech sounds should
be easiest for speakers of a given Ll to learn, without exhaustive contrastive
analyses or collection of empirical data regarding assimilation patterns, which
are currently employed in existing models such as PAM and SLM.
5.2 Future Directions
The dichotomy between structural and parametric limits on plasticity motivates
complementary research strategies. Most research on age-limited learning, partic
ularly in psycholinguistics, has focused on questions motivated by the hypothesis
that observed changes in plasticity are parametric in nature. Research motivated
by this persepective attempts to identify individual developmental factors that
limit the plasticity of older learners relative to younger learners. For example,
one proposal is that limitations on the grain size of generalizations that can be
abstracted from the data set make less m ature learners m ore efficient in acquir
ing particular domains (i.e., the “less is more” hypothesis Newport, 1990; Elman,
1993).
164
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
On the other hand, relatively little work has been done to explicitly examine
the hypothesis that limits on plasticity are primarily structural. This perspective
motivates a slightly different set of questions. For example, the research described
in this dissertation focuses on the computational demands of particular aspects of
language learning, and how this interacts with the entrenchment of early-learned
patterns. In this section, I discuss a studies in progress and in preparation that
extend this work in a number of related directions.
5.2.1 Generalization and frequency trajectory effects in lexical processing
The results from the combined modeling and empirical work in Chapter 2 and
3 suggest that in some domains generalization can completely eliminate any in
fluence of age-limited learning. However, results from models with artificially
orthogonalized training sets (early and late items did not overlap) suggests that
frequency trajectory effects are a natural consequence of learning under some
circumstances. For example, it is possible that the arbitrariness of mappings
involving semantics creates a situation roughly analogous to to these smaller
models - although one would want to test larger-scale models which actually
capture more of the structure of semantics — » phonology mappings (e.g., Harm
& Seidenberg, in press) before making any serious claims about what the mod
els do and do not predict under those circumstances. It may be that there is
enough non-arbitrariness in the phonotactic and orthotactic constraints (and on
constraints that govern the semantic structure of words) to overcome age-limited
learning effects in this domain.
165
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The methodology demonstrated in Chapter 3 would be useful in establishing
boundary conditions on when such effects do and do not arise. Stimuli selected
as in Chapter 3 could be used in a range of experiments which involve semantic
processing to varying degrees. For example, if a single set of items showed fre
quency trajectory effects in a picture naming task, but not a word naming task,
this would suggest that frequency trajectory has a apecific influence on semantic
processing. This is because picture naming requires recognizing an object and
“retrieving” its name, whereas word naming centrally involves spelling-to-sound
translation without reference to semantics, except under specific circumstances
(Strain et al., 1995; Cortese et al., 1997; Zevin & Balota, 2000).
5.2.2 Developmental studies of entrenchment
One paradox in the study of age-limited plasticity in L2 speech perception is that
although Ll influences on L2 speech perception are observed very early in life, the
ability to acheive native-like perception and production of L2 speech sounds seems
to decline much more slowly. Infants as young as one year of age appear to treat
foreign contrasts very differently from native contrasts in various psychophysical
tests (see review in, e.g., Jusczyk, 1981). And yet the ability to learn novel
speech sounds appears to decline fairly gradually throughout the lifespan - in
some cases individuals with AO As as late as 10 years appear capable of mastering
an L2 phonetic inventory (although the measures of native-like performance are
sometimes less rigorous than one would like Flege et al., 1999; Bongaerts, van
Summeren, Planken, & Schils, 1996). This presents a possible problem for the
166
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
structural view of the sensitive period for speech learning, if one assumes that
the acquisition of Ll categories is a discrete event. It is possible, however, that
learning to categorize Ll speech sounds is actually a fairly protracted process:
although infants appear to solve much of the problem fairly early on, a good
deal of fine-tuning occurs during childhood and adolescence. This fine-tuning
reflects an increase in specificity that may be the basis for entrenchment of Ll
phonological categories, ultimately resulting in a failure to learn novel speech
sounds.
Evidence for such gradual fine-tuning of phonetic knowledge comes from stud
ies of hearing in noise designed to test the efficacy of cochlear implants at various
ages (Eisenberg, Shannon, Martinez, Wygonski, & Boothroyd, 2000). In these
studies, large age-related changes in the perception of speech in noise have been
observed. Furthermore, a study by Mayo, Florentine, and Buus (1997) suggests
that there is a relationship between AOA and performance on a a test of speech
perception in noise for non-native speakers. They found that subjects who began
learning English after the age of 14 performed significantly worse than those who
began learning English earlier. However, a number of aspects of this work make
it somewhat problematic. First, variables such as the amount and context of use
are not well controlled (cf., Flege et al., 1999). Second, it is important to note
that (Mayo et al., 1997) found no difference between a group of subjects with
a mean AOA of 6 years, and group who were raised speaking both languages
from infancy. This strongly suggests a role for interference from Spanish in their
perception of English speech.
167
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Thus, one striking prediction of the hypothesis that the sensitive period for
phonetic learning is structurally defined would be that the ability to acquire
novel speech sounds is inversely related to the development of speech perception
in noise. This prediction can be tested by using a large-scale cross-sectional study
(similar to the initial sensitive period study of syntax by Johnson & Newport,
1989) to determine the influence of AOA on a range of speech perception tasks,
and relating the trajectory observed in this study to the development of Ll speech
perception in noise. Furthermore, as in the models described in Chapter 4, it is
clear that this hypothesis predicts an influence of the relationship between Ll and
L2 phonetic inventories on the pattern of age-limited plasticity. The predictions
from this view are essentially similar to those of the Speech Learning Model
(SLM, Flege, 1995), but emphasize the role of statistical learning, and provide a
mechanism for making predictions about learning trajectories without contrastive
analysis or empirical determination of assimilation patterns.
5.2.3 Age-limited plasticity and specific grammatical structures
As noted above, studies of age-limited learning in syntax have tended to focus on
establishing the role of parametric changes in plasticity for the acquisition of L2
syntax. However recent studies have begun to suggest a role for structural limits
in addition to (if not in place of) parametric mechanisms. For example, Bird
song and Molis (2002) found very different results for native speakers of Spanish
than Johnson and Newport (1989) found for Korean and Chinese speakers, al
though they used precisely the same stimuli and testing conditions. Thus far,
168
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
the approach taken has been to test L2 speakers on a wide range of syntactic
constructions and show that their performance is overall poorer than Ll speak
ers. However, when factors such as time spent using motivation to learn, etc. are
taken into account, there is no variance left over for Age of Arrival to explain
(Flege et al., 1999)
It may be that a more nuanced study of particular syntactic paradigms will
reveal further evidence for structural limits on syntax learning. In accounting for
age-limited plasticity in speech perception, the structural hypothesis puts specific
emphasis on the similarities and differences between the Ll and L2 phonetic
inventories. By analogy, it seems fair to suggest that particular structures in L2
syntax may be subject to age-limited learning on this view as a result of their
dissimilarity from Ll. As in the case of Japanese speakers, who learn to treat /l/
and /i/ as members of a single category in the course of learning to process the
speech sounds of their native language, I would suggest that learners of a language
such as Chinese, which has essentially no inflectional morphology or agreement,
would show specific age-limited learning effects for these aspects of English. In
contrast, native speakers of a languages with complex case structure, such as
Serbo-Croatian, should not show age-limited learning effects on this aspect of
English syntax.
A study to test just this dissociation is currently being developed. Preliminary
results suggest that native Serbo-Croatian speakers do, in fact, have particular
difficulty with determiners and word order, whereas native Chinese speakers have
169
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
particular difficulty with determiners and inflectional morphology. More cross-
sectional data will be required to establish that these effects would hold when
one considers both the pattern of AOA effects, and the initial performance of
speakers with fairly limited experience with English.
5.3 Final thoughts
The research presented here is part of a growing literature on age-limited learning
that takes seriously the biological nature of learning and seeks to provide mech
anistic explanations of age-related limits on plasticity in terms of interactions of
genetically defined constraints (learning rule, initial conditions, system param
eters that may change maturationally or in response to particular inputs) and
learning itself. The research conducted in this framework thus far suggests that
the age-limited learning effects observed in language acquisition are the result
of changes in the structure of the neural networks that support early language
learning. When, as in the case of reading, these changes are adaptive for both
early- and late-learned items, there is no disadvantage for late learning. On the
other hand, in cases such as learning an L2 phonetic inventory, these changes are
maladaptive for later learning (e.g., having learned to ignore a stimulus dimension
that is contrastive in L2), age-limited learning effects are readily observable. A
major future challenge for this framework will be the characterization of various
aspects of language learning, and the exam ination of age-lim ited learning effects
and their interactions with generalization.
170
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
References
Andrews, S. (1992). Frequency and neighborhood effects on lexical access - lexical
similarity or orthographic redundancy. Journal of Experimental Psychology:
Learning, Memory and Cognition, 18, 234-254.
Antonini, A., k Stryker, M. (1993). Rapid remodeling of axonal arbors in the
visual cortex. Science, 260, 1819-1821.
Antonini, A., k Stryker, M. (1996). Plasticity of geniculocortical afferents fol
lowing brief or prolonged monocular occlusion in the cat. Journal of Com
parative Neurology, 369, 64-82.
Baayen, R. H., Piepenbrock, R., k van Rijn, H. (1993). The CELEX lexical
database (CD-ROM). (Linguistic Data Consortium, University of Pennsyl
vania, Philadelphia, PA)
Balota, D. (1994). Visual word recognition:The journey from features to meaning.
In M. A. Gernsbacher (Ed.), Handbook of psycholinguistics (p. 303-356).
San Diego, CA: Academic Press.
Balota, D., k Chumbley, J. (1984). Are lexical decisions a good measure of
lexical access? The role of word frequency in the neglected decision stage.
Journal of Experimental Psychology: Human Perception and Performance,
10, 340-357.
Balota, D., Pilotti, M., k Cortese, M. J. (2001). Item-level analyses of lexical
decision performance: Results from a mega-study. Memory k Cognition,
29, 639-647.
Balota, D. A., k Abrams, R. A. (1995). Mental chronometry - beyond onset
latencies in the lexical decision task. Journal of Experimental Psychology:
Learning, Memory and Cognition, 21, 1289-1302.
Basham, M. E., Sohrabji, F., Singh, T. D., Nordeen, E. J., k Nordeen, K. W.
(1999). Developmental regulation of NMDA receptor 2B subunit mRNA
and ifenprodil binding in the zebra finch anterior forebrain. Journal of
Neurobiology, 39, 155-167.
171
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Bates, E., Burani, C., D’Amico, S., k Barca, L. (2002). Word reading and picture
naming in Italian. Memory k Cognition, 29, 986-999.
Best, C. T. (1995). A direct realist view of cross-language speech perception. In
W. Strange (Ed.), Speech perception and linguistic experience: Theoretical
and methodological issues in cross-language speech research (p. 171-206).
Timonium, MD: York Press.
Best, C. T., McRoberts, G. W., k Goodell, E. (2001). Discrimination of
non-native consonant contrasts varying in perceptual assimilation to the
listener’s native phonological system. Journal of the Acoustic Society of
America, 109, 775-794.
Best, C. T., McRoberts, G. W., k Sithole, N. M. (1988). Examination of per
ceptual reorganization for nonnative speech contrasts: Zulu click discrim
ination by English-speaking adults and infants. Journal of Experimental
Psychology: Human Perception and Performance, 14, 345-360.
Bialystok, E., k Hakuta, K. (1994). In other words: The science and psychology
of second language acquisition. New York: Basic Books.
Birdsong, D., k Molis, M. (2002). On the evidence for maturational constraints
in second-language acquisition. Journal of Memory and Language, 44, 215-
249.
Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford: Claren
don Press.
Bliss, T., k Collingridge, G. (1993). A synaptic model of memory: Long-term
potentiation in the hippocampus. Nature, 361, 31-39.
Bohner, J. (1990). Early acquisition in the zebra finch Taeniopygia guttata.
Animal Behavior, 39, 369-374.
Bongaerts, T., van Summeren, C., Planken, B., k Schils, E. (1996). Age and
ultimate attainment in the pronunciation of a foreign language. SSLA, 18,
447-465.
Bottjer, S. W., k Johnson, F. (1992). Growth and regression of thalamic efferents
in the song-control system of male zebra finches. Journal of Comparative
Neurology, 326, 442-450.
Bottjer, S. W., Meisner, E. A., k Arnold, A. P. (1986). Changes in neuronal
number, density and size account for increases in volume of song-control
nuclei during song development in zebra finches. Neuroscience Letters, 67,
263-268.
172
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Bottjer, S. W., Miesner, E. A., & Arnold, A. P. (1984). Forebrain lesions disrupt
development but not maintenance of song in passerine birds. Science, 224,
901-903.
Brainard, M., k Doupe, A. (2000). Interruption of a basal ganglia-forebrain
circuit prevents plasticity of learned vocalizations. Nature, 404, 762-766.
Brainard, M., k Doupe, A. J. (2001). Postlearning consolidation of birdsong:
Stabilizing efFects of age and anterior forebrain lesions. Journal of Neuro
science, 21, 2501-2517.
Brainard, M. S., & Knudsen, E. I. (1998). Sensitive periods for visual calibra
tion of the auditory space map in the barn owl optic tectum. Journal of
Neuroscience, 18, 3929-3942.
Brenowitz, E. A., Margoliash, D., k Nordeen, K. W. (1997). An introduction to
birdsong and the avian song system. Journal of Neurobiology, 33, 495-500.
Browman, C. P., k Goldstein, L. (1992). Articulatory phonology: An overview.
Phonetica, 49, 155-180.
Brown, G. D. A., & Watson, F. L. (1987a). 1st in, 1st out - word learning age and
spoken word-frequency as predictors of word familiarity and word naming
latency. Memory k Cognition, 15, 208-216.
Brown, G. D. A., k Watson, F. L. (1987b). First in, first out: Word learning
age and spoken word frequency as predictors of word familiarity and word
naming latency. Memory k Cognition, 15, 208-216.
Bruer, J. T. (1999). The myth of the first three years: A new understanding of
early brain development and lifelong learning. New York: Free Press.
Brysbaert, M., Lange, M., k Van Wijnendaele, I. (2000). The effects of age-of-
acquisition and frequency-of-occurrence in visual word recognition: Further
evidence from the Dutch language. European Journal of Cognitive Psychol
ogy, 12, 65-85.
Brysbaert, M., Van Wijnendaele, I., k De Deyne, S. (2000). Age-of-acquisition
effects in semantic processing tasks. Acta Psychologica, 104, 215-226.
Brysbaert, M., Van Wijnendaele, I., k De Deyne, S. (2000). Age-of-acquisition
effects in semantic processing tasks. Acta Psychologica, 104, 215-226.
Buonomano, D., k Merzenich, M. M. (1995). Temporal information transformed
into a spatial code by a neural network with realistic properties. Science,
267, 1028-1030.
173
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Burek, M. J., Nordeen, K. W., k Nordeen, E. J. (1991). Neuron loss and addition
in developing zebra finch song nuclei are independent of auditory experience
during song learning. Journal of Neurobiology, 22, 215-223.
Butler, B., k Hains, S. (1979). Individual differences in word recognition latency.
Memory k Cognition, 7, 68-76.
Carmignoto, G., k Vicini, S. (1992). Activity dependent decrease in NMDA
receptor responses during development of the visual cortex. Science, 258,
1007-1011.
Carroll, J. B., Davies, P., k Richman, B. (1971). The American Heritage Word
Frequency Book. Boston: Houghton-Mifflin.
Carroll, J. B., k White, M. N. (1973). Word frequency and age of acquisition as
determiners of picture-naming latency. Quarterly Journal of Experimental
Psychology, 25, 85-95.
Cassidy, K. W., Kelly, M. H., k Sharoni, L. (1998). Inferring gender from name
phonology. Journal of Experimental Psychology: General, 128, 362-381.
Chiu, C., k Weliky, M. (2002). Relationship of correlated spontaneous activity
to functional ocular dominance columns in the developing visual cortex.
Neuron, 35, 1123-1134.
Cohen, J. (2002). Statistical power analysis for the behavioral sciences. New
York: Academic Press.
Coltheart, M., Curtis, B., Atkins, P., k Haller, M. (1993). Models of reading
aloud: Dual-route and parallel-distributed-processing approaches. Psycho
logical Review, 100, 589-608.
Coltheart, M., Davelaar, E., Jonasson, K., k Besner, D. (1977). Access to
the internal lexicon. In S. Dornic (Ed.), Attention k Performance VI (p.
535-555). Hillsdale, NJ: Erlbaum.
Coltheart, V., Laxon, V. J., k Keating, C. (1988). Effects of word imageability
and age of acquisition on children’s reading. British Journal of Psychology,
79, 1-12.
Cortese, M. J., Simpson, G. B., k Woolsey, S. (1997). Effects of association and
imageability on phonologicl mapping. Psychonomic Bulletin and Review,
4, 226-231.
Crair, M. C., Horton, J. C., Antonini, A., k Stryker, M. P. (2001). Emergence of
ocular dominance columns in cat visual cortex by 2 weeks of age. Journal
of Cognitive Neurology, 430, 235-249.
174
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Crowley, J., k Katz, L. (2002). Ocular dominance development revisited. Current
Opinion in Neurobiology, 12, 104-109.
Crowley, J. C., k Katz, L. C. (1999). Development of ocular dominance columns
in the absence of retinal input. Nature Neuroscience, 2, 1125-1130.
Curtiss, S. (1977). Genie: a psycholinguistic study of a modern-day "wild child".
New York: Academic Press.
Cynader, M., k Mitchell, D. E. (1980). Prolonged sensitivity to monocular
deprivation in dark-reared cats. Journal of Neurophysiology, 43, 1026-1040.
Das, A., k Gilbert, C. (1997). Distortions of visuotopic map match orientation
singularities in primary visual cortex. Nature, 387, 594-598.
DeBello, W., Feldman, D., k Knudsen, E. (2001). Journal of Neuroscience, 21,
3161-3174.
Doupe, A. J., k Kuhl, P. K. (1999). Birdsong and human speech: Common
themes and mechanisms. Annual Review of Neuroscience, 22, 567-631.
Eales, L. A. (1985). Song learning in zebra finches:Some effects of song model
availability on what is learnt and when. Animal Behaviour, 33, 1293-1300.
Eisenberg, L., Shannon, R., Martinez, A., Wygonski, J., k Boothroyd, A. (2000).
Speech recognition with reduced spectral cues as a function of age. Journal
of the Acoustic Society of America, 107, 2704-2710.
Ellis, A., k Morrison, C. (1998). Real age-of-acquisition effects in lexical retrieval.
Journal of Experimental Psychology: Learning, Memory and Cognition, 24,
515-523.
Ellis, A. W., k Lambon Ralph, M. A. (2000). Age of acquisition effects in
adult lexical processing reflect loss of plasticity in maturing systems: In
sights from connectionist networks. Journal of Experimental Psychology:
Learning, Memory and Cognition, 26, 1103-1123.
Elman, J. L. (1993). Learning and development in neural networks: The impor
tance of starting small. Cognition, 48, 71-99.
Fagiolini, M., k Hensch, T. K. (2000). Inhibitory threshold for critical-period
activation in prim ary visual cortex. Nature, 404, 183-186.
Feldman, D., k Knudsen, E. (1997). An anatomical basis for visual calibration of
the auditory space map in the barn owl’s midbrain. Journal of Neuroscience,
17, 6820-6837.
175
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Flege, J. E. (1995). Second-language speech learning: Theory, findings, and
problems. In W. Strange (Ed.), Speech perception and linguistic experience:
Theoretical and methodological issues (p. 229-273). Timonium, MD: York
Press.
Flege, J. E., Bohn, O.-S., & Jang, S. (1997). The production and perception of
english vowels by native speakers of german, korean, mandarin, and Spanish.
Journal of Phonetics, 25, 437-470.
Flege, J. E., Yeni-Komshian, G. H., k Liu, S. (1999). Age constraints on second-
language acquisition. Journal of Memory and Language, 41, 78-104.
Forster, K. I., k Chambers, S. M. (1973). Lexical access and naming time.
Journal of Verbal Learning and Verbal Behavior, 12, 627-635.
Gerhand, S., k Barry, C. (1998). Word frequency effects in oral reading are
not merely age-of-acquisition effects in disguise. Journal of Experimental
Psychology: Learning, Memory and Cognition, 24, 267-283.
Gerhand, S., k Barry, C. (1999a). Age of acquisition, word frequency, and the
role of phonology in the lexical decision task. Memory k Cognition, 27,
592-602.
Gerhand, S., k Barry, C. (1999b). Age-of-acquisition and frequency effects in
speeded word naming. Cognition, 73, B27-B36.
Gernsbacher, M. A. (1984). Resolving 20 years of inconsistent interactions
between lexical familiarity and orthography, concreteness, and polysemy.
Journal of Experimental Psychology: General, 113, 256-281.
Gilbert, C. D. (1996). Plasticity in visual perception and physiology. Current
Opinion in Neurobiology, 6, 269-274.
Gilbert, C. D., Das, A., Ito, M., Kapadia, M., k Westheimer, G. (1996). Spatial
integration and cortical dynamics. Proceedings of the National Academy
of Sciences, 93, 615-622.
Gilhooly, K. J., k Gilhooly, M. L. (1980). The validity of age-of-acquisition
ratings. British Journal of Psychology, 71, 105-110.
Gilhooly, K. J., k Logie, R. H. (1980). Age-of-acquisition, imagery, concreteness,
familiarity, and ambiguity measures for 1,944 words. Behavior Research
Methods and Instruments, 12, 395-427.
Goldin-Meadow, S., k Mylander, C. (1998). Spontaneous sign systems created
by deaf children in two cultures. Nature, 391, 279-281.
176
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Gordon, J. A., k Stryker, M. P. (1996). Experience-dependent plasticity of
binocular responses in the primary visual cortex of the mouse. Journal of
Neuroscience, 16, 3274-3286.
Harm, M. W. (1998). Division of labor in a computational model of visual
word recognition. Unpublished doctoral dissertation, University of Southern
California, Los Angeles, CA.
Harm, M. W., & Seidenberg, M. S. (1999). Phonology, reading, and dyslexia:
Insights from connectionist models. Psychological Review, 163, 491-528.
Harm, M. W., k Seidenberg, M. S. (in press). Division of labor in a mulitcom-
ponent model of visual word recognition.
Hebb, D. O. (1949). The organization of behavior. New York: John Wiley &
Sons.
Hensch, T., Fagiolini, M., Mataga, N., Stryker, M., Baekkeskov, S., & Kash, S. F.
(1998). Local GABA circuit control of experience-dependent plasticity in
developing visual cortex. Science, 282, 1504-1508.
Hetherington, P., k Seidenberg, M. S. (1989). Is there “catastrophic interference”
in connectionist networks? In Proceedings of the 11th annual conference
of the cognitive science society (p. 26-33). Hillsdale, NJ: Erlbaum.
Hinton, G. E., k Shallice, T. (1991). Lesioning an attractor network: Investiga
tions of acquired dyslexia. Psychological Review, 98(1), 74-95.
Hintzman, D. L. (1986). “schema abstraction" in a multiple-trace memory model.
Psychological Review, 93, 411-428.
Hirsh, K. W., & Ellis, A. W. (1994). Age of acquisition and lexical processing in
aphasia: A case study. Cognitive Neuropsychology, 11, 435-458.
Hodgson, C., k Ellis, A. W. (1998). Last in, first to go: Age of acquisition and
naming in the eldery. Brain and Language, 64, 146-163.
Hough, G. n., k Volman, S. (2002). Short-term and long-term effects of vocal
distortion on song maintenance in zebra finches. Journal of Neuroscience,
22, 1177-1186.
Huang, Z., Kirkwood, A ., Pizzorusso, T ., Porciatti, V ., M orales, B ., Bear, M.,
Maffei, L., k Tonegawa, S. (1999). BDNF regulates the maturation of
inhibition and the critical period of plasticity in mouse visual cortex. Cell,
98, 739-755.
177
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Hubei, D., & Wiesel, T. (1962). Receptive fields, binocular interaction and
functional architecture in the cat’s visual cortex. Journal of Physiology,
160, 106-154.
Hubei, D. H., & ; Wiesel, T. N. (1970). The period of susceptibility to the phys
iological effects of unilateral eye closure in kittens. Journal of Physiology,
206, 419-436.
Hyde, P., & Knudsen, E. (2002). The optic tectum controls visually guided
adaptive plasticity in the owl’s auditory space map. Nature, 415, 73-76.
Iverson, P., & Kuhl, P. K. (1995). Mapping the perceptual magnet effect for
speech using signal detection theory and multidimensional scaling, j acous
tic, 97(1), 553-562.
Iyengar, S., & Bottjer, S. W. (2002). The role of auditory experience in the for
mation of neural circuits underlying vocal learning in zebra finches. Journal
of Neuroscience, 22, 946-958.
Jamieson, D. G., & Morosan, D. E. (1986). Training non-native speech contrasts
in adults: Acquisition of the English / d / - / 0/ contrast by francophones.
Perception & Psychophysics, 40, 205-215.
Jarvis, E. D., Scharff, C., Grossman, M., Ramos, J. A., & Nottebohm, F. (1998).
For whom the bird sings: context-dependent gene expression. Neuron, 21,
775-788.
Johnson, J. S., & Newport, E. L. (1989). Critical period effects in second language
learning: The influence of maturational state on the acquisition of English
as a second language. Cognitive Psychology, 21, 60-99.
Jones, A. E., ten Cate, C., & Slater, P. J. B. (1996). Early experience and
plasticity of song in adult male zebra finches (taeniopygia guttata). Journal
of Comparative Psychology, 110(4), 154-169.
Jusczyk, P. (1981). Infant speech perception: A critical perspective. In P. D.
Eimas & : J. L. Miller (Eds.), Perspectives on the study of speech (p. 113-
164). New York: Academic Press.
Katz, L., & Crowley, J. (2002). Development of cortical circuits: lessons from
ocular dominance columns. Nature Reviews Neuroscience, 3, 34-42.
Katz, L. C., & Shatz, C. J. (1996). Synaptic activity and the construction of
cortical circuits. Science, 274, 1133-1138.
178
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Kawamoto, A. H., Kello, C. T., Jones, R. J., & Bame, K. (1998). Initial phoneme
versus whole word criterion to initiate pronunciation: Evidence based on
response latency and initial phoneme duration. Journal of Experimental
Psychology: Learning, Memory and Cognition, 24, 862-885.
Kelly, M. H. (1992). Using sound to solve syntactic problems: The role of
phonology in grammatical category assignments. Psychological Review, 99,
349-364.
Kessler, B., & Treiman, R. (2001). Relationships between sounds and letters in
english monosyllables. Journal of Memory and Language, 24, 592-617.
Knudsen, E., & Knudsen, P. (1990). Sensitive and critical periods for visual
calibration of sound localization by barn owls. Journal of Neuroscience, 10,
222-232.
Knudsen, E. I. (2002). Instructed learning in the auditory localization pathway
of the bam owl. Nature, 417, 322 - 328.
Kuhl, P. K. (2000). A new view of language acquisition. Proceedings of the
National Academy of Sciences, 97, 11850-11857.
Kucera, H., & Francis, W. N. (1967). Computational analysis of present-day
American English. Providence, RI: Brown University Press.
Lambon Ralph, M. A., Graham, K. S., Ellis, A. W., & Hodges, J. R. (1998).
Naming in semantic dementia - what matters? Neuropsychologia, 36, 775-
784.
Lane, H. (1979). Wild boy of aveyron. Cambridge, MA: Harvard University
Press.
Lenneberg, E. H. (1967). Biological foundations of language. New York: Wiley.
Leonardo, A., & Konishi, M. (1999). Decrystallization of adult birdsong by
perturbation of auditory feedback. Nature, 399, 466-470.
LeVay, S., Stryker, M., & Shatz, C. (1978). Ocular dominance columns and their
development in layer iv of the cat’s visual cortex: A quantitative study.
Journal of Comparative Neurology, 179, 223-224.
Lewis, M. B. (1999). Are age-of-acquisition effects cum ulative-frequency effects
in disguise? A reply to Moore, Valentine and Turner (1999). Cognition, 72,
311-316.
179
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Liberman, A. M., Harris, K. S., Hoffman, H. S., & Griffith, B. C. (1957). The
discrimination of speech sounds within and across phoneme boundaries.
Journal of Experimental Psychology, 54(5), 358-368.
Linkenhoker, B. A., k Knudsen, E. I. (2002). Incremental training increases the
plasticity of auditory space map in adult barn owls. Nature, 419, 293-296.
Lively, S. E., Logan, J., & Pisoni, D. (1993). Training Japanese listeners to
identify English jxj and /l/: II. The role of phonetic environment and talker
variability in learning new perceptual categories. Journal of the Acoustic
Society of America, 94, 1242-1255.
Livingston, F. S., White, S. A., h Mooney, R. (2000). Slow nmda-epscs at
synapses critical for song development are not required for song learning in
zebra finches. Nature Neuroscience, 3, 482-488.
Logan, J., Lively, S., & Pisoni, D. (1991). Some effects of training japanese
listeners to identify english jxj and /l/. Journal of the Acoustic Society of
America, 89, 874-886.
Lombardino, A. J., & Nottebohm, F. (2000). Age at deafening affects the stability
of learned song in adult male zebra finches. Journal of Neuroscience, 404,
5054-5064.
Lotto, A., Kluender, K., & Holt, L. (1998). Depolarizing the perceptual magnet
effect. Journal of the Acoustic Society of America, 103, 3648-3655.
Lu, H. C., Gonzalez, E., h Crair, M. C. (2001). Barrel cortex critical period
plasticity is independent of changes in NMDA receptor subunit composition.
Neuron, 32, 619=634.
Lyons, A., Teer, P., & Rubenstein, H. (1978). Age-at-acquisition and word
recognition. Journal of Psycholinguistic Research, 7, 179-187.
M., Q. E., Philpot, B., Huganir, R. L., & Bear, M. F. (1999). Rapid, experience-
dependent expression of synaptic nmda receptors in visual cortex in vivo.
Nature Neuroscience, 2, 352-357.
Mackay, I. R. A., Meador, D., & Flege, J. E. (2001). The identification of English
consonants by native speakers of Italian. Phonetica, 58, 103-125.
Marchman, V. A., & Bates, E. (1994). Continuity in lexical and morphologi
cal development: A test of the critical mass hypothesis. Journal of Child
Language, 21, 339-366.
180
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Marcus, M., Santorini, B., k Marcinkiewicz, M. A. (1993). Building a large anno
tated corpus of English: The Penn Treebank. Computational Linguistics,
19, 313-330.
Markson, L., k Bloom, P. (1997). Evidence against a dedicated system for word
learning in children. Nature, 385, 813-815.
Marler, P. (1997). Three models of song learning: Evidence from behavior.
Journal of Neurobiology, 33, 501-516.
Mayberry, R., k Eichen, E. (1991). The long-lasting advantage of learning sign
language in childhood: Another look at the critical period for language
acquisition. Journal of Memory and Language, 30, 486-512.
Mayberry, R. I. (1993). First-language acquisition after childhood differs from
second-language acquisition: The case of American Sign Language. Journal
of Speech and Hearing Research, 36, 1258-1270.
Mayberry, R. I., Lock, E., k Kazmi, H. (2002a). Development - Linguistic ability
and early language exposure. Nature, 417, 38.
Mayberry, R. I., Lock, E., k Kazmi, H. (2002b). Development - Linguistic ability
and early language exposure. Nature, 417, 38.
Mayo, H., Florentine, M., k Buus, M. (1997). Age of second-language acquisition
and perception of speech in noise. Journal of Speech, Language, and Hearing
Research, 40, 686-93.
McCandliss, B. D., Fiez, J. A., Conway, M., k McClelland, J. M. (1999). Eliciting
adult plasticity for Japanese adults struggling to identify english fi/ and
/l/: Insights from a Hebbian model and a new training procedure. Journal
of Cognitive Neuroscience, Supplement S, 53.
McCandliss, B. D., Posner, M. I., k Givon, T. (1997). Brain plasticity in learning
visual words. Cognitive Psychology, 33, 88-110.
McClelland, J. L., Fiez, J. A., k McCandliss, B. D. (in press). Teaching the
non-native [r]-[l] speech contrast to Japanese adults: Training methods,
outcomes, and neural basis. Physiology and Behavior.
McClelland, J. L., Thomas, A., McCandliss, B. D., k Fiez, J. A. (1999). Un
derstanding failures of learning: Hebbian learning, competition for repre
sentational space, and some preliminary experimental data. In J. Reggia,
E. Ruppin, k D. Glanzman (Eds.), Progress in brain research (Vol. 121, p.
75-80). Amsterdam: Elsevier.
181
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
McClelland, J. L., Thomas, A. G., McCandliss, B. D., & Fiez, J. A. (1999).
Understanding failures of learning: Hebbian learning, competition for rep
resentational space, and some preliminary experimental data. In Brain,
behavioral, and cognitive disorders: The neurocomputational perspective
(p. 75-80). Oxford, England: Elsevier.
McCloskey, M., & Cohen, N. J. (1989). Catastrophic interference in connectionist
networks: The sequential learning problem. In G. H. Bower (Ed.), The
psychology of learning and motivation (Vol. 23, p. 109-164). New York,
NY: Academic Press.
McRae, K., Jared, D., & Seidenberg, M. S. (1990). On the roles of frequency and
lexical access in word naming. Journal of Memory and Language, 29(1),
43-65.
Miller, G. A., & Nicely, P. (1955). An analysis of perceptual confusions among
some english consonants. Journal of the Acoustic Society of America, 338-
352.
Monaghan, J., & ; Ellis, A. W. (2002). What exactly interacts with spelling-
sound consistency in word naming? Journal of Experimental Psychology:
Learning, Memory and Cognition, 28, 183-206.
Monyer, H., Burnashev, N., Laurie, D., Sakmann, B., & Seeburg, P. (1994). De
velopmental and regional expression in the rat brain and functional prop
erties of four nmda receptors. Neuron, 12, 529-540.
Moore, V., & Valentine, T. (1998). The effect of age of acquisition on speed and
accuracy of naming famous faces. The Quarterly Journal of Experimental
Psychology, 51A, 485-513.
Moore, V., & Valentine, T. (1999). The effects of age of acquisition in processing
famous faces : Exploring the locus and proposing a mechanism. In Proceed
ings of the twenty-first annual conference of the cognitive science society
(p. 416-421). New Jersey: Erlbaum.
Morrison, C. A., & Ellis, A. W. (2000). Real age of acquisition effects in word
naming and lexical decision. British Journal of Psychology, 91, 167-180.
Morrison, C. M., & Ellis, A. W. (1995). Roles of word frequency and age of
acquisition in word nam ing and lexical decision. Journal of Experim ental
Psychology: Learning, Memory and Cognition, 21, 116-153.
182
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Morrison, C. M., Ellis, A. W., & Chappell, T. D. (1997). Age of acquisition
norms for a large set of object names and their relation to adult estimates
and other variables. Quarterly Journal of Experimental Psychology: Human
Experimental Psychology, 50A, 528-559.
Munro, P. W. (1986). State-dependent factors influences neural plasticity: A
partial account of the critical period. In J. L. McClelland, D. E. Rumel-
hart, & the PDP Research Group (Eds.), Parallel distributed processing:
Explorations in the microstructure of cognition. Volume 2: Psychological
and biological models (p. 471-502). Cambridge, MA: MIT Press.
Neville, H. J., & Bavelier, D. (2000). Specificity and plasticity in neurocognitive
development in humans. In M. S. Gazzaniga (Ed.), The new cognitive
neurosciences (p. 83-98). Cambridge, MA: MIT press.
Newport, E. L. (1990). Maturational constraints on language learning. Cognitive
Science, 14, 11-28.
Nordeen, K., & Nordeen, E. (1992). Auditory-feedback is necessary for the
maintenance of stereotyped song in adult zebra finches. Behavioral and
Neural Biology, 57, 58-66.
Nottebohm, F., Stokes, T. M., Leonard, C. M., Wingfield, J. C., & Farner, D. S.
(1976). Central control of song in the canary, serinus canaria. Journal of
Comparative Neurology, 165, 457-486.
Odlin, T. (1989). Language transfer. New York: Cambridge University Press.
Pearlmutter, B. A. (1995). Gradient calculations for dynamic recurrent neural
networks: A survey. IEEE Transactions on Neural Networks, 6(5), 1212-
1228.
Perani, D., Paulesu, E., Galles, N., Dupoux, E., Dehaene, S., Bettinardi, V.,
Cappa, S., Fazio, F., & Mehler, J. (1998). The bilingual brain - Proficiency
and age of acquisition of the second language. Brain, 121, 1841-1852.
Peterson, G., & Barney, H. (1952). Control methods used in a study of vowels.
Journal of the Acoustic Society of America, 24, 175-184.
Pinker, S. (1989). Learnability and cognition: The acquisition of argument
structure. Cambridge, MA: MIT Press.
Plaut, D. C., McClelland, J. L., Seidenberg, M. S., & Patterson, K. E. (1996).
Understanding normal and impaired word reading: Computational princi
ples in quasi-regular domains. Psychological Review, 103, 56-115.
183
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Quartz, S., & Sejnowski, T. J. (1997). The neural basis of cognitive development:
A constructivist manifesto. Brain and Behavioral Sciences, 20, 537-596.
Quinlan, E., Olstein, D., & Bear, M. (1999). Bidirectional, experience-dependent
regulation of N-methyl-D-aspartate receptor subunit composition in the rat
visual cortex during postnatal development. Proceedings of the National
Academy of Sciences, 96, 12876-12780.
Rochet, B. (1995). Perception and production of 1 2 speech sounds by adults. In
W. Strange (Ed.), Speech perception and linguistic experience: Theoretical
and methodological issues in cross-language speech research (p. 379-410).
Timonium, MD: York Press.
Saltzman, E. L. (1995). Dynamics and coordinate systems in skilled sensorimotor
activity. In R. Port & T. van Gelder (Eds.), Mind as motion: Dynamics,
behavior, and cognition. Cambridge, MA: MIT Press.
Seidenberg, M. S. (1995). Visual word recognition: An overview. In P. Eimas &
J. L. Miller (Eds.), Handbook of perception and cognition: Language. New
York: Academic Press.
Seidenberg, M. S., & Gonnerman, L. M. (2000). Explaining derivational morphol
ogy as the convergence of codes. Trends in Cognitive Sciences, 4, 353-361.
Seidenberg, M. S., & ; McClelland, J. L. (1989). A distributed, developmental
model of word recognition and naming. Psychological Review, 96, 523-568.
Seidenberg, M. S., & Waters, G. S. (1989). Word recognition and naming: A
mega study. Bulletin of the Psychonomic Society, 27, 489.
Seidenberg, M. S., Waters, G. S., Barnes, M. A., & Tanenhaus, M. K. (1984).
When does irregular spelling or pronunciation influence word recognition?
Journal of Verbal Learning and Verbal Behavior, 23, 383-404.
Service, E., S z , Craik, F. I. M. (1993). Differences between young and older adults
in learning a foreign vocabulary. Journal of Memory and Language, 32,
608-623.
Singh, T. D., Basham, M. E., Nordeen, E. J., & Nordeen, K. W. (2000). Early
sensory and hormonal experience modulate age-related changes in NR2B
mRNA within a forebrain region controlling avian vocal learning. Journal
of Neurobiology, 44, 82-94.
Smith, E., Shoben, E., & Rips, L. (1974). Structure and process in semantic
memory: A featural model for semantic decisions. Psychological Review, 3,
214-241.
184
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Smith, M. A., Cottrell, G. W., k Anderson, K. L. (2001). The early word catches
the weights. In T. K. Leen, T. G. Dietterich, k V. Tresp (Eds.), Advances
in neural information processing systems 13 (p. 52-58). Cambridge, MA:
MIT Press.
Spieler, D. H., k Balota, D. A. (1997). Connectionist models of word naming: An
examination of item level performance. Psychological Science, 8, 411-416.
Strain, E., Patterson, K., k Seidenberg, M. S. (1995). Semantic effects in single
word naming. Journal of Experimental Psychology: Learning, Memory and
Cognition, 21, 1140-1154.
Strain, E., Patterson, K., k Seidenberg, M. S. (2002). Theories of word naming
interact with spelling-sound consistency. Journal of Experimental Psychol
ogy: Learning, Memory and Cognition, 28, 207-214.
Strange, W. (Ed.). (1995). Speech perception k linguistic experience: Issues in
cross-language research. Timonium, MD: York.
Summerfield, Q., Haggard, M., Foster, J., k Gray, S. (1984). Perceiving vowels
from uniform spectra : Phonetic exploration of an auditory aftereffect.
Perception and Psychophysics, 35, 203-213.
Tchernichovski, O., Lints, T., Mitra, P., k Nottebohm, F. (1999). Vocal imitation
in zebra finches is inversely related to model abundance. Proceedings of the
National Academy of Sciences, 96, 12901-12904.
Tchernichovski, O., Mitra, P., Lints, T., k Nottebohm, F. (2001). Dynamics of
the vocal imitation process: how a zebra finch learns its song. Science, 291,
2564-2569.
Thoenen, H. (1995). Neurotrophins and neuronal plasticity. Science, 270, 593-
598.
Trachtenberg, J., Trepel, C., k Stryker, M. (2000). Rapid extragranular plasticity
in the absence of thalamocortical plasticity in the developing primary visual
cortex. Science, 287, 2029-2032.
Troyer, T. W., k Bottjer, S. (2002). Birdsong: Models and mechanisms. Current
Opinion in Neurobiology, 11, 721-726.
Trubetzkoy, N. (1939/1969). Principles of phonology. Berkeley, CA: University
of California Press. (Translated by C. A. Baltaxe)
Turner, J. E., Valentine, T., k Ellis, A. W. (1998). Contrasting effects of age
of acquisition and word frequency on auditory and visual lexical decision.
Memory k Cognition, 26, 1282-1291.
185
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Van der Loos, H., S c Woolsey, T. A. (1973). Somatosensory cortex: Structural
alterations following early injury to sense organs. Science, 179, 395-398.
Van Orden, G. C., Johnston, J. C., Sc Hale, B. L. (1988). Word identification
in reading proceeds from the spelling to sound to meaning. Journal of
Experimental Psychology: Memory, Language and Cognition, 14, 371-386.
Venezky, R. L. (1970). The structure of English orthography. The Hague:
Mouton.
Weinreich, U. (1957). Languages in contact, findings and problems. The Hague:
Mouton.
Weliky, M., Sc Katz, L. (1999). Correlational structure of spontaneous neuronal
activity in the developing lateral geniculate nucleus in vivo. 285, 599-604.
Werker, J. F., S c Tees, R. C. (1984). Cross-language speech perception: Evidence
for perceptual reorganization during the first year of life. Infant Behavior
Sc Development, 7(1), 49-63.
Wiesel, T., S c Hubei, D. (1963). Single-cell responses in striate cortex of kittens
deprived of vision in one eye. Journal of Neurophysiology, 26, 9768-993.
Wiesel, T., S c Hubei, D. (1965). Extent of recovery from the effects of visual
deprivation in kitten. Journal of Neurophysiology, 28, 1060-1072.
Williams, H., Sc McKibben, J. (1992). Changes in stereotyped central motor
patterns controlling vocalization are induced by peripheral nerve injury.
Behavioral and Neural Biology, 57, 67-78.
Williams, H., S c Mehta, H. (1999). Changes in adult zebra finch song require
a forebrain nucleus that is not necessary for song production. Journal of
Neurobiology, 39, 14-28.
Woolley, S., S c Rubel, E. (1999). Hair cell regeneration results in recovery of
degraded song in adult bengalese finches. (Poster presented at the annual
Society for Neuroscience meeting, November, Miami, FL)
Woolley, S., Sc Rubel, E. (2002). Vocal memory and learning in adult bengalese
finches with regenerated hair cells. Journal of Neuroscience, 22, 7774-7787.
W oolsey, T . A. (1990). Peripheral alteration and som atosensory developm ent.
In E. J. Coleman (Ed.), Develpment of sensory systems in mammals (p.
461-516). New York: Wiley.
186
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Xue, J., & ; Cooper, N. (2001). The modification of nmda receptors by visual
experience in the rat retina is age dependent. Brain Research: Molecular
Brain Research, 91, 196-203.
Yamada, J., Takashima, H., & Yamazaki, H. (1998). Effect of ease-of-acquisition
on naming latency for Japanese Kanji: A reanalysis of Yamazaki et al.’s
(1997) data. Psychological Reports, 83, 991-1002.
Yamazaki, M., Ellis, A. W., Morrison, C. M., & Lambon Ralph, M. A. (1997).
Two age of acquisition effects in the reading of Japanese Kanji. British
Journal of Psychology, 88, 407-411.
Zann, R. A. (1996). The zebra finch: A synthesis of field and laboratory studies.
London: Oxford University Press.
Zeno, S. (Ed.). (1995). The educator’s word frequency guide. Brewster, NJ:
Touchstone Applied Science Associates.
Zevin, J. D., & Balota, D. A. (2000). Priming and attentional control of lexical
and sublexical pathways during naming. Journal of Experimental Psychol
ogy: Learning, Memory and Cognition, 26, 121-135.
Zevin, J. D., & Seidenberg, M. S. (2002). Age of acquisition effects in reading
and other tasks. Journal of Memory and Language, 47, 1-29.
Zevin, J. D., & Seidenberg, M. S. (submitted). Cumulative frequency affects
reading aloud; frequency trajectory does not.
Zevin, J. D., Seidenberg, M. S., & Bottjer, S. W. (2000). Song plasticity in adult
zebra finches exposed to white noise. (Poster presented at the meeting of
the Society for Neuroscience. New Orleans, LA)
Zheng, W., & Knudsen, E. (2001). Gabaergic inhibition antagonizes adaptive
adjustment of the owl’s auditory space map during the initial phase of
plasticity. Journal of Neuroscience, 21, 4356-4365.
187
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Connectionist phonology
PDF
Hemispheric differences during reading in Urdu
PDF
Glutathione reduction and its effects on mitochondrial function and protein degradation: Implications for Parkinson's disease
PDF
Age and sex differences in levels of proliferation within the zebra finch telencephalic ventricular zone
PDF
Increased susceptibility of glutathione peroxidase-1 transgenic mice to kainic acid-related seizure activity and hippocampal neuronal cell death due to direct activation of the NMDA receptor by GSSG
PDF
Distributional cues and subject identification in the production of subject -verb agreement
PDF
Glial activation and neurotoxicity: Age-related cytokine expression, ApoJ-induced microglial activation, and role of peroxynitrite in mediating neuron death
PDF
Acquisition, consolidation and storage of an associative memory in the cerebellum
PDF
Apoptotic pathways involved in kainate excitotoxicity
PDF
Factors influencing the interpretation of novel words as adjectives in 4-year-old Spanish -speaking children
PDF
Discourse functional units: A re-examination of discourse markers with particular reference to Spanish
PDF
Face classification
PDF
A phylogenetic analysis of oological characters: A case study of saurischian dinosaur relationships and avian evolution
PDF
Functional regulation and trafficking mechanism of rat plasma membrane GABA transporter 1
PDF
Interhemispheric interaction in bilateral redundancy gain: Effects of physical similarity
PDF
Baculoviral p35, a caspase inhibitor, contributes to increased survival of neurons in response to various apoptotic stimuli
PDF
Dopamine related temporal processing errors in prenatally stressed rats
PDF
Childhood videotaped neuromotor and social precursors of schizophrenia: A prospective investigation
PDF
Estrogen effects on excitability and plasticity in hippocampus
PDF
Cyclostratigraphy and chronology of the Albian stage (Piobbico core, Italy)
Asset Metadata
Creator
Zevin, Jason D. (author)
Core Title
Age -limited learning effects in reading and speech perception
School
Graduate School
Degree
Doctor of Philosophy
Degree Program
Neuroscience
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
biology, neuroscience,OAI-PMH Harvest,psychology, cognitive
Language
English
Contributor
Digitized by ProQuest
(provenance)
Advisor
Bottjer, Sarah (
committee chair
), Andersen, Elaine (
committee member
), Byrd, Dani (
committee member
), Seidenberg, Mark (
committee member
)
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c16-658307
Unique identifier
UC11335080
Identifier
3116813.pdf (filename),usctheses-c16-658307 (legacy record id)
Legacy Identifier
3116813.pdf
Dmrecord
658307
Document Type
Dissertation
Rights
Zevin, Jason D.
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA
Tags
biology, neuroscience
psychology, cognitive