Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Connecting phrasal and rhythmic events: evidence from second language speech
(USC Thesis Other)
Connecting phrasal and rhythmic events: evidence from second language speech
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
CONNECTING PHRASAL AND RHYTHMIC EVENTS:
EVIDENCE FROM SECOND LANGUAGE SPEECH
by
Emily Anne Nava
________________________________________________________________________
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(LINGUISTICS (HISPANIC LINGUISTICS))
August 2010
Copyright 2010 Emily Anne Nava
ii
Dedication
This work is dedicated to my nieces and nephews (in alphabetical order):
Ada Grace
Blaise Teatrault
Gibson Lee
Vivian Jane
~ May your paths in this life be guided by the wisdom to choose
and the strength to continue ~
iii
Acknowledgements
I would first like to thank Joyce Perez, the center strategist of the Linguistics
Department, whose calm presence, even temper, quick laugh and spot-on advice have
guided me through this entire process.
I am greatly indebted to William Rutherford for good conversations, many
invitations, and for generously donating the space where much of this work and the
thinking behind it took place. I thank Maria Luisa Zubizarreta for her support in my
pursuit of a unique and difficult research question, and for the providing the opportunity
for RAships, including the lab space that allowed me to carry out this research.
All of my committee members demonstrated heaps of flexibility and unwavering
support. I thank Jean-Roger Vergnaud, whose expertise as a mathematician, phonologist,
syntactition and all-around original thinker allowed me to believe that it was possible to
pull all of these aspects together. I thank Louis Goldstein for his constant support of my
ideas and for allowing me to see that it is never too late to pursue happiness in research
and life. And I thank Mario Saltarelli who reminded me more than once that given the
choice, I would do this all over again. And who, in his role as philosopher, kindly showed
me what Linguistics has to do with spirituality.
A number of other professors in the USC Linguistics and other departments have
shown me immense intellectual and emotional support throughout this process. Among
them, I thank Tania Ionin for her interest, excellent advice, and all that she shared and
gave during this process. I thank Hagit Borer for her continued support from day one, and
for being who she is in all that she does – always. I thank Dani Byrd for sage advice,
iv
scientific equanimity, and many research opportunities. I thank Sun-Ah Jun at UCLA for
her humanness and insightful comments and discussions.
The time I spent at Haskins Laboratories and the experiences therein continue to
influence my work in unimagined ways. I thank Louis Goldstein for this opportunity and
I thank all who I met for their generosity and engagement. I thank Elliot Saltzman for
realistic views about everything, and Hosung Nam for his patient teaching and his interest
in my continued development. I also thank Khalil Iskarous for illuminating conversations
about research, and Mike Proctor and Sam Tilsen for great discussions.
Without my family, none of this would have ever come to pass and I thank each
and every one of them for their love and support. I thank my mother, my first teacher.
There is no doubt in my mind that I was led to this point guided by the importance of
consistent discipline and the love for learning that she worked to instill in all of my
efforts. I thank my Aunt Carolyn for her example of positive attitudes and her diligent
character, and my Unicle Don for checking in, making lists, and dreaming up endless
possibilities for me. I thank Buck and Tonya whose happiness and generosity have helped
warm the darkest of moments. I am so fortunate to be able to thank my grandmother,
whose patience and longevity of spirit always serve as a reminder that we do have so
much to look forward to, and that we have come from so much.
I especially thank my sister Stephanie for carrying all of us through so many
processes with her timely wit and endless humor, and I thank her and David for the
numerous technical aspects that made this experience possible from the ground, and
whose spiritual generosity has been paramount to my sanity.
v
There is absolutely no way to express all my thanks to my sister Susan – the
biggest of all my fans – for the constant communication, the sounding board, for solving
with me all of life’s quandaries and mysteries – who (perhaps) unbeknownst to her
exemplified the complex system laid bare and in action, that made this modeling
possible. Much gratitude too for years of clerical work and cheerleading with mantras.
I thank my also sister Susie Choi who has, over these years, been there for
everything and therapied my sense of self to a deeper place. I thank Aaron Jacobs for
instant friendship and immediate help.
Undoubtedly one of the richest parts of this process has been the resulting
friendships, among them I especially thank Jelena Krivocapic, Ana Sánchez Muñoz,
Álvaro Cerrón-Palomino, Michal Martinez and Fuyun Wu for so much support. I thank
my girl friends Amy Linker, Amanda Bogart, Anari de Sousa, Maartje Duin and Eva
Pons for unlimited understanding. I thank Joe Tepperman for taking up parts of this
journey with me. I thank Phil Potamites and Aaron Walker for meaningful conversations.
I express my gratitude to all of the undergraduate RAs, Amanda Bogart, Chris
Van Booven, Ashley Flor, whose laughter and unique education have taught me so much.
I will greatly miss all my HILSA cohorts and friends, past and present, whose
union and presence provided a past, present, and future to put to our experience: Monica
Cabrera, Omar Beas, Álvaro Cerrón-Palomino, Ana Sánchez Muñoz, Rebeka Campos,
Roberto Mayoral Hernández, Asier Alcázar, Michael Rushforth, Michal Temkin
Martínez, Ben Parrell, Hector Velásquez, Magdalena Pire, Erika Vardis, Katy McKinney-
Bock, Laura Tejada, Sergio Robles Puente.
vi
A todos los que se encuentran al sur de mi corazón: Javier, Lorenzo, Tyler,
Veronica, Mari Carmen, Francisco, la Moni, la familia Ulloa, Rogelio, Silvina, Chuj,
Fede, Alfredo, Alejandra, Octavio, las ancianas y los demas Hernandez. Y sobre todo a
Antonio por siempre iluminarme el camino con tanto amor y tanta fuerza.
Finally, work for this dissertation was made possible in part by funding from the
following sources: NSF BCS-0444088, NSF IIS 07-03624, NIH (NIDCD) DC03172,
USC Provost Grant: Advancing Scholarship in the Humanities and Social Sciences, Del
Amo Summer Graduate Fellowship for Study in Spain, USC Morkovin Graduate School
Fellowship.
vii
Table of contents
Dedication ii
Acknowledgements iii
List of Tables ix
Lists of Figures x
Abstract xiii
Chapter 1: Introduction 1
1. Introduction 1
2. Prosody 5
3. Rhythm 7
4. Complex systems and self-organization 8
5. The complex systems approach 12 12
5.1 Second language acquisition and complex systems 14
5.2 Complex systems and SLA: applications 17
5.3 The complex systems approach and the current study 19
6. Outline of the dissertation 19
Chapter 2: Experiment 1: Phrasal events 21
1. Introduction 21
2. Studies of L2 acquisition of phrasal level prosody 23
3. Phrasal events in the L2English speech of Spanish native speakers 29
3.1 Phrasal prominence in English and Spanish: similarities 30
3.2 Phrasal prominence in English and Spanish: differences 31
3.2.1 Phrasal prominence in Romance 35
3.3 Further cross-linguistic differences: Anaphoric Deaccenting 36
3.4 Experiment 1: Phrasal prominence placement 38
4. Experiment 1: Methods 42
4.1 Participants 42
4.2 Procedure 43
4.3 Design and stimuli 44
4.4 Coding and statistics 46
4.5 Experiment 1: Results 46
5. A dynamic modeling approach to qualitative change 57
5.1 The superposition of influences on the order parameter 62
5.2 An analysis of English prosodic pattern acquisition 63
5.3 Discussion 72
6. Conclusion 78
Chapter 3: Experiment 2: Rhythmic events 80
viii
1. Introduction 80
2. Studies of L2 rhythm acquisition 80
3. Component events: rhythm 84
3.1 Rhythm in English and Spanish 84
3.2 Cross-linguistic rhythm classification 87
3.3 Studies in cross-linguistic rhythmic classification 89
4. Experiment 2: Methods 92
4.1.1 Participants 92
4.1.2 Procedure and materials 93
4.1.3 Analysis 1: Measurements 93
4.2 Results 95
4.3 The directionality of acquisition of NS and rhythm 99
5. Experiment 2: Forced Text Alignment 105
5.1 Analysis 2: Measurements 106
5.2 Forced text alignment results and statistics 107
5.3 Individual analysis 112
6. Vowel adjacency distributions 115
7. Discussion and conclusion 121
Chapter 4: Experiment 3: Coordination 124
1. Introduction 124
2. Coordination and rhythmic organization 124
2.1 Theories and models of rhythmic organization 125
2.2 Experimental work 128
3. Experiment 3: Repetition task 131
3.1 Methods 133
3.1.1 Participants 133
3.1.2 Design and stimuli 133
3.1.3 Procedure 135
3.1.4 Measurements 136
3.1.5 Statistics 136
3.2 Results compared across speaker groups 137
3.3 Individual results 144
3.3.1 ENC individual results 144
3.3.2 Spanish individual results 148
3.3.3 L2English individual results 151
3.4 Modeling of sub-interval flexibility 156
4. Discussion and conclusion 160
Chapter 5: Discussion and conclusion 164
References 174
Appendix 188
ix
List of Tables
Table 1. L2 participant population Cloze test results 43
Table 2. L2 population: Age and Time in US 43
Table 3. Organization of stimuli by verb type and discourse context 45
Table 4. Proficiency levels: L2 prosodic proficiency and Cloze test proficiency 55
Table 5. L2 background information by NS proficiency 78
Table 6. Word pair stimuli 134
Table 7. Trimmed values for all speakers groups and task conditions 137
Table 8. ANOVA results for all speaker groups, voicing ratio 199
Table 9. ANOVA results for all speaker groups, vowel duration by lexical category 199
Table 10. ANOVA results for group results, syllable Ratio measure 199
Table 11. ANOVA results for group results, syllable DtoD measure 200
Table 12. ANOVA results for group results, syllable DtoT measure 200
Table 13. ANOVA results for group results, foot Ratio measure 200
Table 14. ANOVA results for group results, foot DtoD measure 200
Table 15. ANOVA results for groups results, foot DtoT measure 200
Table 16. ANOVA results for L2 individual results, foot DtoT measure 201
Table 17. ANOVA results for L2 individual results, syllable Ratio measure 201
Table 18. ANOVA results for L2 individual results, syllable DtoT measure 201
x
List of Figures
Figure 1. Prosodic patterns: unaccusative verbs with wide focus contexts 47
Figure 2. Prosodic patterns: transitive compound structures with wide focus contexts 48
Figure 3. Prosodic patterns: unergative verb, general results 49
Figure 4. Prosodic pattern, unergative verb, wide focus context 50
Figure 5. Prosodic pattern, unergative verb, noteworthy predicate 51
Figure 6. Transitive & ditransitive verbs, wide focus and given information contexts 52
Figure 7. Proficiency levels: L2 prosodic proficiency and Cloze test proficiency 54
Figure 8. Prosodic patterns of prominence for ENC population 56
Figure 9. Phrase-internal prosodic patterns for L2E population 56
Figure 10. Dynamical system potential attractors and repellers, Gafos & Benus 59
Figure 11. V(x) and it probability function p(x), taken from Gafos & Benus 60
Figure 12. Potential as a function of control parameter k, taken from Gafos & Benus 61
Figure 13. Potential function as plotted for English 65
Figure 14. Potential function as plotted for Spanish 66
Figure 15. Derivation of composite potential for unaccusative, ENC population 68
Figure 16. Derivation of composite potential for unaccusative, L2E population 70
Figure 17. Complex systems schematized (Adapted from Chris Langton) 77
Figure 18. Voicing ratio: Monolingual, L1 and L2 groups 95
Figure 19. Percent voicing and standard deviation of voiceless 96
across speaker populations
Figure 20. Voicing ratio by prosodic proficiency breakdown 97
Figure 21. Percent voicing and standard deviation of voiceless for L2English groups 98
xi
Figure 22. Scatterplot of nuclear stress and voicing ratio for L2 speakers 100
Figure 23. NS high 102
Figure 24. NS low 102
Figure 25. VR high 103
Figure 26. VR low 103
Figure 27. Vowel durations by category across speaker groups 109
Figure 28. Scatterplot of content-to-function vowel duration ratio and NS score 113
Figure 29. Scatterplot of VOT mean and NS score 113
Figure 30. Probability density distributions for vowel ratios for all speaker groups 119
Figure 31. Deviation from normal distribution for all speaker groups 120
Figure 32. Group results of ratio values for syllable condition 138
Figure 33. Group results of ratio values for foot condition 139
Figure 34. Group results of DtoD values for syllable condition 140
Figure 35. Group results of DtoD values for foot condition 141
Figure 36. Group results of DtoT values for syllable condition 142
Figure 37. Group results of DtoT values for foot condition 143
Figure 38. ENC individual results for vowel duration ratio in syllable condition 144
Figure 39. ENC individual results for vowel duration ratio in foot condition 145
Figure 40. ENC individual results for DtoD in syllable condition 145
Figure 41. ENC individual results for DtoD in foot condition 146
Figure 42. ENC individual results for DtoT in syllable condition 147
Figure 43. ENC individual results for DtoT in foot condition 147
Figure 44. Spanish individual results for ratio in syllable condition 148
xii
Figure 45. Spanish individual results for ratio in foot condition 148
Figure 46. Spanish individual results for DtoD in syllable condition 149
Figure 47. Spanish individual results for DtoD in foot condition 149
Figure 48. Spanish individual results for DtoT in syllable condition 150
Figure 49. Spanish individual results for DtoT in foot condition 150
Figure 50. L2English individual results for ratio in syllable condition 152
Figure 51. L2English individual results for ratio in foot condition 153
Figure 52. L2English individual results for DtoT in syllable condition 154
Figure 53. L2English individual results for DtoT in foot condition 155
Figure 54. Distribution of English individual speakers ratio of DtoT over DtoD 157
Figure 55. Distribution of Spanish individual speakers ratio of DtoT over DtoD 158
Figure 56. Distribution of L2English individual speakers ratio of DtoT over DtoD 160
Figure 57. Composite potential functions 166
xiii
Abstract
This dissertation investigates the relation between prosodic events at the phrasal
level and component events at the rhythmic level. The overarching hypothesis is that the
interaction among component rhythmic events gives rise to prosodic patterns at the
phrasal level, while at the same time being constrained by the latter, and that in the case
of second language acquisition, acquisition at the rhythmic level will precede that of the
phrasal level.
The data used to investigate the hypothesis and related predictions is from a test
population of second language speakers of English whose first language is Spanish, and a
control population of monolingual speakers of English. These speaker populations
provide an ideal testing ground due to the contrast in prosodic organization between
Spanish and English. By examining the prosodic behavior of acquirers who are in
different stages of the acquisition process the structure of the prosodic system and its
organization can be observed, while in tandem observing how the prosodic system of the
acquirers’ undergoes change.
Among the factors that guide the realization of a given phrasal prosodic pattern
are verb type and discourse context. In English both phrase-final and phrase-internal
prosodic patterns exist for wide focus discourse contexts. However, in Spanish for the
same discourse context only phrase-final prosodic patterns exist.
English and Spanish also differ regarding the properties that characterize the
rhythm of each language. Among other properties relevant to the overall rhythmic
difference between the two languages, the hallmark of English rhythm is a substantial
xiv
discrepancy in duration between adjacent vowels – a property that is not characteristic of
rhythm in Spanish.
A set of three experiments probes the relation between phrasal and rhythmic
events. The first experiment addresses the question of phrasal prosodic patterns in
English with monolingual and second language speakers. Results confirm the existence
of both phrase-final and phrase-internal prosodic patterns in English, and speak to the
difference in prosodic pattern realization between speaker populations, namely that many
second language speakers produce a phrase-final pattern where monolinguals produce a
phrase-internal pattern.
The second experiment addresses the difference in rhythmic properties between
English and Spanish, first with a global measure of the ratio between voiced and
voiceless intervals in speech, and a second, more precise analysis that details the
difference between specific vowel durations. This analysis allows for the first
approximation of the connection between rhythmic and phrasal events with the finding
that not only do English and Spanish differ regarding crucial vowel duration differences,
but importantly that second language speakers with phrase-internal prosody in their
speech also demonstrate an English-like distribution of vowel duration properties.
The final experiment unites the findings of Experiments 1 and 2 and explores the
nature of rhythmic events in a repetition task designed to elicit language-specific patterns
of the coordination of adjacent vowels comprising larger rhythmic units. English
monolingual and Spanish monolingual speakers demonstrated very different preferences
for vowel coordination in the task condition that encourages for the formation of a
durationally-modulated foot. Second-language speakers with phrase-internal nuclear
xv
stress in their speech patterned like the monolingual English speakers in this task, while
those without phrase-internal nuclear stress did not.
This work contributes to our understanding of the language-specific modes of
organization within the realm of prosody, and provides a window into the path of
acquisition for the second language learner.
1
Chapter 1. Introduction
The surface of the ocean responds to the forces that act upon it in movements
resembling the ups and downs of the human voice. If our vision could take it all in
at once, we would discern several types of motion, involving a greater and greater
expanse of sea and volume of water: ripples, waves, swells, and tides.
It would be more accurate to say ripples on waves and swells on tides, because
each larger movement carries the smaller ones on its back (…) In speech, the
ripples are the accidental changes in pitch, the irrelevant quavers. The waves are
the peaks and valleys that we call accent. The swells are the separations of our
discourse into its larger segments. The tides are the tides of emotion.
- D. Bolinger 1964: 282
1. Introduction
It is precisely the types of motion referenced above that this work is concerned
with exploring, namely the larger movement of phrasal prominence carrying the small
ones of rhythm. The current investigation of prosodic structure is comparative: the
prosodic production behavior of second language (L2) speakers of English whose first
language (L1) is Spanish is compared to that of monolingual speakers of English. Such a
comparison allows us to probe the organization of prosodic structure by observing the
changes that the acquirer’s prosodic system undergoes. The hypothesis explored in this
work is the relation between prosodic events at the phrasal level and component events at
the rhythmic level, specifically addressing how prosodic patterns are composed of
rhythmic events whose organization results in the given pattern, while at the same time
being constrained by that pattern. The aforementioned relation is one of circular
causality, and will be explored within the complex systems (CS) framework couched
within a broader dynamical systems theory, whose mathematical framework is argued to
be more appropriate for the study of such cognitive processes as acquisition than is
symbolic computation (Tuller et al. 2004). By modeling stable prosodic systems using
2
dynamical potential functions, it is possible to account for the acquisition of proficiency
in the L2 prosodic system through systematic shifts in the parameters of the prosodic
organization. The relation between changes of modes at the rhythmic and phrasal levels
can be captured through interdependence of their parametric values.
The motivation for pursuing this specific connection between phrasal and
rhythmic events is based primarily on an intuition developed from my direct observation
of the speech of second language speakers whose first languages were either English or
Spanish. Aware that English and Spanish differ regarding phrasal patterns, and also
aware the different rhythmic classifications had been proposed for these languages, I
began to develop an understanding of how native-like rhythm and native-like phrasal
prosody must be connected. I was further prompted to pursue this intuition after
experience with a speech synthesis model known as Task Dynamics Application (TaDA)
(Nam et al. 2004). TaDA takes as input an orthographic string, then creates a simulation
of the articulators that mimic human production of the sequence and then maps this to
sound (text-to-speech synthesis). While working with other colleagues to develop a
Spanish version of TaDA, I realized that the output from Spanish TaDA sounded more
like a natural production in Spanish than did the English version in English. I surmised
that this difference must be in great part due to the lack of prosodic structure such as the
foot, or any phrase-internal lengthening, which is claimed to form part of English
prosodic structure but not of Spanish.
Finally, an understanding of complex dynamical systems provided the necessary
framework for couching this particular relation, since the principles of CS rest on the
assumption that systems (in this case, a language’s prosodic system) are characterized by
3
the non-linear interaction of a (typically) large body of components that demonstrate self-
organization. Explored here is the notion that it is the interaction between component
rhythmic events (vowel durations and inter-syllable coordination) and phrasal events
(main prominence placement) that comprise the organization of the prosodic structure of
a language.
This work contributes to the area of linguistic theory generally and to second
language acquisition (SLA) specifically. Regarding linguistic theory, the existing body of
work that addresses the connection between rhythmic and larger phrasal prosodic events
is limited to a mere handful of proposals, all of which focus on the theoretical
implications of this relation. Carstairs-McCarthy (1999) proposes an analogy between
syllable structure and phrasal structure, such that the asymmetries in the former are
reflected in the latter. Barbosa (2002) argues that modeling speech rhythm production can
provide a deeper understanding of cross-linguistic differences in rhythmic organization.
To that end he uses a coupled-oscillator rhythmic system that simulates language-specific
continuous patterns of syllable-size. His argument is that languages can vary depending
on the way their underlying rhythmic systems interact with the higher-level components
of the grammar and with the gestures in the lexicon (where gestural coordination is
found). The proposed modeling includes positing an interaction between syllable-sized
oscillators and stress-group oscillators (a temporal grouping which is greater than and
includes syllable-sized oscillators). Through the modeling process, Barbosa found that
languages traditionally classified as “stress-timed” (see Chapter 3 for a discussion of this
term) showed a stronger coupling strength between syllable-sized and stress-group
oscillators than those classified as “syllable-timed”. The current work both contributes to
4
the theory of prosodic structure through its qualitative proposal of language-
specific/system-specific organization and is rigorously experimental in its quantitative
analysis that provides empirical evidence about how differences in rhythmic properties
are connected to phrasal prosodic pattern differences.
The contributions this work makes to research within the area of SLA are likewise
of impact. To my knowledge no previous work in the area of SLA has looked at the
development of native-like proficiency in the area of phrasal-level prosodic patterns in
tandem with the development of rhythmic properties and their interaction as part of the
acquisition process. Work in the area of L2 rhythm has experienced a swell in the past
decade (see Chapter 3), with a range of different measurement techniques contributing
empirical evidence to the same conclusion: when measured quantitatively, values for L2
rhythm fall somewhere between that of the native rhythm (the L1) and the target rhythm
(the L2). However research in the area of phrasal prosodic patterns is even more limited
(see Chapter 2), not only in terms of the number of studies but also in the scope of the
prosodic patterns investigated. Said studies report non-target like prosodic productions
and often make mention of “curious” or unexpected patterns that are not find in the L1
nor in the L2. The current study adds to the existing albeit paltry body of work with a
principled investigation of the factors that guide the realization of a given prosodic
pattern, and attempts to map the development of native-like prosodic proficiency by
examining the behavior of learners at different stages of acquisition.
5
2. Prosody
Prosody is an extra-lexical aspect of speech inextricably connected to the meaning
of a speaker’s message, and guides the realization of phrasing and prominence. In broad
terms, phrasing refers to the grouping of words into meaningful units, and is related but
not strictly isomorphic to the groupings yielded by syntactic structure. A change in
grouping can result in a change in meaning, as shown in examples (A) and (B) below.
(A) Jenny had doubts they knew about the raccoons.
(B) Jenny had doubts, they knew, about the raccoons.
In the case of (A), what Jenny doubts is knowledge by a third party about a particular
situation involving raccoons. However, in (B) Jenny’s doubts concern the raccoons
directly.
In any given sentence, some words may be heard as more prominent than others,
with one word in particular being perceived as being more prominent than the others. The
acoustic characterization of prominence includes the properties of duration, intensity and
pitch. Where main prominence falls in a sentence also affects the meaning, as illustrated
in examples (C) and (D).
(C) Medicine
1
should be taken with food.
(D) Medicine should be taken with food.
In (C), the reading could be that one should not eat without also ingesting medicine,
whereas in (D) the reading is that if one is going to take medicine then this should happen
at mealtime. This dissertation is concerned primarily with the prominence realization part
of the prosodic structure.
1
Underline is used throughout this work to mark where main prominence falls.
6
Pitch, loudness, tempo, rhythm, and strength of articulation are speech properties
that interact in response to a discourse context, which is fed by syntactic form and
semantic meaning. These prosodic attributes are assumed to exist in all languages, while
their distribution across and contribution to the prosodic landscape vary on a language-
specific and speaker-specific basis. In order to broach the investigation of the acquisition
of English prosody by native Spanish speakers, the current work must first investigate the
differences between the prosodic systems in English and Spanish, by analyzing the
principles that characterize the self-organization of these complex systems.
At the heart of this discussion is precisely the difference in one aspect of
prominence: the location of its placement, which results in a given prosodic pattern. A
prosodic pattern is guided by the type of verb, or syntactic structure, and discourse
context, or status of information, as part of an ongoing discourse. English and Spanish are
said to differ regarding the placement of main phrasal prominence, known as the nuclear
pitch accent, for the same discourse context (Zubizarreta 1998, Sosa 1999, Hualde 2007,
Nava & Zubizarreta 2010). While the nuclear pitch accent can fall phrase-internally or
phrase-finally in English for a wide-focus discourse context, in Spanish the nuclear pitch
accent placement for this context is limited to phrase-final. This means that while in
English there can be a major durational event (duration is one of the acoustic correlates
associated with a pitch accent) non-adjacent to a major phrasal boundary, in Spanish this
is not the case for a wide-focus context.
The hypothesis to be explored here is that this contrast is symbiotic with another
difference between the two languages, that of the initial conditions that set up the site for
pitch accent placement, i.e. those that typically fall under the heading of “rhythm”.
7
3. Rhythm
Rhythm was introduced above as a prosodic property integral to the process of
embodying meaning. In the most traditional sense, rhythm refers to an alternating pattern
of temporal units that link an ongoing event extended in time. The idea that rhythm links
series of events is central because it is ultimately this coordinative function that affords
the emergence of prosodic patterns. In the words of Cummins and Port (1998: 147), “ …
rhythm in speech is functionally conditioned. It emerges under just those speaking
conditions in which a tight temporal coordination is required between events spanning
more than one syllable.” Rhythm resides not in the static templates of lexical entries but
in the act of speaking, where it is a conduit for the larger temporal events it composes.
Vowel durations comprise one aspect of the variation that characterizes the
perceptual backings for such traditional notions as “strong” and “weak” within the
metrical tradition. English and Spanish differ importantly in the properties that allow for
coordination among vocalic units, and this respective flexibility that characterizes the
“rhythmic” patterning also interacts with pitch accent placement. In English the
durational difference between adjacent stressed and unstressed syllables is substantial,
whereas in Spanish this difference is minimal. One of the major ideas explored here is
that the coordination between vocalic units in English exhibits a flexibility that is
reflected at the phrasal level, i.e. the flexibility found in pitch accent placement for
certain discourse contexts. However, for the same discourse context in Spanish, the pitch
accent placement is rigid, a rigidity likewise reflected in the coordination between the
vocal units. While for English wide-focus contexts a large, asymmetric durational
difference among vowels can occur internal to the phrase, in Spanish the only locus for
8
such asymmetry to occur (with one vowel much longer than the other) is at the phrase
boundary – where nuclear pitch accent placement occurs.
4. Complex systems and self-organization
This thesis takes as central the challenge of relating theory and experiment in a
"mutually informative fashion" (Tuller 2004: 8) – an approach that also guides how the
relation between pitch accent placement and rhythm is examined. By analyzing this
relation as part of the acquisition process within a complex systems framework the
prosodic structure can be modeled and these implications tested empirically.
Complex systems are characterized by the non-linear interaction of a (typically)
large body of components that demonstrate self-organization. Self-organization refers to
the process of the formation of increasingly complex spatial, temporal, and spatio-
temporal structures or functions of a system (Haken 2004).
Examples of complex systems and self-organization abound in the natural world,
and have been studied in a plethora of areas in the natural as well as social sciences. The
common activity of foot transportation is an accessible example of the observed
transitions between relatively stable gait patterns as a mammal increases pace when
walking (Van Lieshout 2004: 52). The pace increase is the control parameter and the
phase transitions are the shifts between these (relatively) stable states, in this case
walking then jogging then running. There are a large number of muscle and joint degrees
of freedom that are changing during the course of walking such that they are mutually
predictable from a lower dimensional description of the act itself, which can be
characterized in terms of the relative phase of the limbs. Multiple degrees of freedom can
9
be wrapped up in the order parameter variable, defined as the phasing between the
temporal intervals when the legs swing as opposed to when they support. This resulting
motor action (walking, running, etc.) is achievable because the components have been
coordinated, meaning that they are constrained to work together as opposed to controlled
(Cummins & Port 1998).
We can easily think of languages as complex systems in that they are crucially
“constrained by and reflect temporal factors” (Culicover & Nowak 2003: 12). This idea
has been explored by researchers from disciplines spanning the purely physical (Titze
1994) to the mostly theoretical spectrum (Nowak 2006). In the present work, I limit my
scope to the analysis of prosody as a self-organizing system, in which the interaction of
rhythmic events in coordination affords the prosodic speech act.
Prosody exhibits self-organization due to the multiplicity of components that
interact in a non-linear fashion, in a hierarchical schema that is, however, not uni-
directional (Tilsen 2009). Analyzing prosody from the self-organization approach
challenges and simultaneously offers an alternative to traditional research on the
interaction between component levels of prosody. Two of the most widely-accepted
approaches to the organization of prosodic structure are the “accent-first” approach
(Selkirk 1984) and the “stress-first” approach. While both are grounded in derivational
models, the former is essentially top-down, where pitch accents are independently
assigned to words based on their membership status in “sense groups”, and the metrical
grid associated with the utterance is not taken into account at the time of assignment but
adjusted subsequently. The “stress-first” approach has enjoyed wider acceptance,
undoubtedly due to the bottom-up directionality it assumes: pitch accent distribution is
10
essentially controlled directly from the metrical grid, as resulting from the number of
levels of prominence. Such a proposal is amenable to the comfortable and somewhat
intuitive notion that lower-level units combine to “create” the larger prosodic structure.
However, the approach put forth here suggests mutually interacting structures should
replace strict hierarchies, such that the combination of component level units for example
at the “rhythmic” level responds to the prosodic context at the phrasal level, but also
guides it.
Prosody qualifies as a complex system with self-organization precisely because of
this circular causality: a given prosodic pattern results from the coordination between its
components parts (see sections 2, 3), but also exerts an influence on the behavior of these
components. If there were no relationship between events at the rhythmic level and
events at the phrasal level, then examining the behavior of these levels in isolation would
not provide any more or any less insight into the behavior of the system than what is
observed from studying their interaction. A lack of interaction between phrasal and
rhythmic events would predict such unobserved behavior as pitch accents falling on
vowels without stress, or major durational events occurring entirely independent from
phrasal junctures. Instead, what has been observed (and experimentally reproduced) is a
synergy among these events, such that quantitative differences in temporal events yield
qualitative effects in meaning. Integral to any model of language is the quantitative-
qualitative symmetry that speaks to a generalizability emerging from the structure, which
in turns affords predictions about the behavior. Relating the qualitative to the
quantitative in a principled and formal way is the domain of dynamics and differential
equations that can be used to model complex self-organized systems. The qualitative
11
aspects of the system have typically been studied in isolation from the quantitative ones
as a grammar of prosody. While separating out the qualitative aspects in this way when
describing the system does no violence to the understanding of the structure of the system
when it is in a stable state, in order to address the issue of change in a system it is
necessary to simultaneously embrace the qualitative and the quantitative.
The themes central to dynamics and the complex systems approach will be
elaborated throughout this work. This section provides an introduction to the vocabulary
that will be used in referring to the concepts. Most complex system approaches assume a
tripartite schema that includes parameters that act on the system (here, verb type,
discourse context, word order, etc.), levels of interacting elements (a “set of primitives”,
for example vowel durations, or temporal intervals more generally), and emerging
patterns (the prosodic pattern) (Kelso 1995). These emerging patterns are the observables
of the system, afforded by the mutability that exists between the levels of the interacting
elements.
Behavioral patterns self-organize such that they emerge, stabilize and change in
accordance with the given control parameters. A control parameter moves the system
through the various patterns, and is identifiable because its variation causes qualitative
change, for example the qualitative change in meaning associated with a given prosodic
pattern. An order parameter, on the other hand, is a single dimension along which the
state of the observable can be described, so as to reveal its stable state(s). Such states
result from the coordination between the elements and units of the system, at the same
time influence their behavior.
12
Complex systems exhibit multistability, fluctuation and phase transitions.
Dynamic instabilities observed in any complex system allow for distinction among the
multiple stable behavioral patterns, allowing for the identification of the continuous
dimension along which the change in pattern occurs (Kelso 1995: 43, 44). Fluctuations
constantly probe the system for stability and vulnerability, allowing for the discovery of
new patterns. Phase transitions refer to global changes in behavior due to the interaction
of local properties.
Throughout this work, this approach will be explored and expounded on, by
analyzing the prosodic systems of English and Spanish as distinct, self-organizing
complex systems. However, ultimately at the core of this analysis is the issue of how a
prosodic system adapts and develops (the L2 prosodic system), and its relationship with
the native system.
5. The complex systems approach
Among the motivating factors for pursuing a research project in L2 speech has been
my conviction that L2 speech behavior lays bare the interacting systems that compose
speech as a nested hierarchy of temporal events. By viewing the development of a system
as an active seeding ground, not only can the change that certain properties of the system
undergo be observed, but so can the very nature of these properties. A brief overview is
provided of how the concepts mentioned in section 4 above are applied to the current
study, followed by some predictions this makes for the L2 speaker.
The various linguistic and contextual parameters that influence the placement of
relative prominence include discourse context/information structure, verb type and
13
syntactic structure, word order, and possibly speaking rate. The dynamics associated with
each of these parameters can be modeled with a potential function that expresses the
preferred state or states of relative prominence associated with that parameter. These
preferred states are attractor values. In producing any given utterance, the composite
preference for prominence placement can be modeled by superposition of the parameters
that are relevant to this particular utterance, and determining the attractors of that
composite potential function. As a result of this combination of potential functions, the
main prominence can end up falling in a number of states, of which we will be restricting
in this work to phrase-internal or phrase-final.
The prosodic pattern is an example of a stable state along the order parameter of
relative prominence. This should be thought of as a nested subsystem with multiple parts:
the rhythm, or rather vowel durations as a measurable indicator of rhythm, and the
emergent rhythmic pattern that gives rise to the phrasal prosodic pattern. Attractors are
stable states, in this case an example of a prosodic pattern associated with a particular
combination of verb type, discourse context, etc. Each speaker’s prosodic system exhibits
attractors for prosodic production, with very little noise or fluctuation around a given
stable state. However, in order for a system to move from one stable state to a new one as
a result of a change to a particular control parameter, an attractor would have to undergo
destabilization. Such instability arises from heightened states of fluctuations, and as a
consequence measurable accuracy is oftentimes compromised. Fluctuation is an
observable of behavior such that for certain values of the control parameters, amplified
variation in the order parameters is observed, possibly resulting in a transition to a new
stable state. A phase transition in the case of prosodic patterns would shift relative pitch
14
accent placement as a function of verb type and information structure, and in the case of
rhythm refers to shift in the preferred ratio of adjacent vowel durations as a function of
the speech rate.
While both Spanish and English have the same factors governing relative
prominence placement, the measurable difference along this order parameters is a result
of the differential flexibility caused by self-organization of the systems, in particular as
determined by the vowel durations and their interactions in the coordinative space. The
prediction is that a change in certain component properties of the system (for instance,
vowel duration) will result in broader/larger qualitative changes of the system (prosodic
patterns). But the notion that acquisition changes the properties of the system makes no
prediction about the direction of that change. The specific hypothesis advanced here is
that changes in the coordination between vocalic units are the necessary precursors to
promoting a qualitative change at the phrasal level.
5.1 Second language acquisition and complex systems
Larsen-Freeman’s 1997 article is by far the most widely-cited and perhaps most
influential proposal concerning the relationship between SLA and complex systems. The
author’s in-depth proposal is both grandiose in its attempted range and thorough in its
discussion of the core issues and themes in the area of SLA research. As such, it does
merit reviewing the article in a fair amount of detail (as its application holds numerous
overlaps with the current work), followed by a brief review of work influenced by the
ideas put forth in the article.
15
Larsen-Freeman bases her ideas that the study of dynamic, complex nonlinear
systems is useful for SLA studies on the premise that the scientific study of language
should mirror the trends of scientific study in general, with a move away from linear,
reductionist thinking towards an approach that focuses on the process rather than the
state. She organizes her argument by breaking down the discussion of complex systems
by its themes and concepts, which she in turn applies to the study of SLA. She
summarizes that dynamic systems are those that change with time, and that the focus on
complex systems is what is crucially innovative about the dynamic systems approach.
The “complexity” refers to the large number of components that compose the system, but
more importantly the idea that the behavior that characterizes a complex system is not
just a product of the interaction of its components, but rather that its behavior emerges
from this interaction. The idea that the behavior of a complex system arises from the
interaction of its agents flies in the face of the traditional scientific approach of studying
the whole by looking at the parts in a piecemeal fashion.
In a linear system, cause and effect are directly proportional. Conversely, in a
nonlinear system, such as a complex system, “A simple trigger, one which occurs all the
time, might be enough on any given occasion to bring about a great convulsion in the
system, or to throw the entire system into a chaotic state.” (Larsen-Freeman 1997: 143)
However, accessing this chaos is only possible after a critical point has been passed (for
example, after a certain amount of input from the L2, in the case of SLA) – before which
time complex nonlinear systems behave with regularity. The vulnerability that complex
systems demonstrate towards chaos is due to its “sensitive dependence on initial
conditions.” Even minimal change in initial conditions can have a far-reaching impact on
16
future behavior. “ … the behavior of systems with different initial conditions, no matter
how similar, diverges exponentially as time passes” (Larsen-Freeman 1997: 144).
Engaging in language use is a dynamic process, and dynamic too in the sense of
undergoing growth and change. Here the author cites Rutherford (1987) in his suggestion
that a better metaphor for language is that of an organism, which grows, rather than a
machine, which is constructed. With regards to the process of language change, the
author places much emphasis on how use shapes change. The idea is that use as a
dynamic process changes the grammar of the user, to the extent that this process can also
lead to changes at a more global level. Such changes at a higher level are possible
because the composite of local interactions shapes the behavior of the system as a whole,
and the system self-organizes into attractor states (as mentioned above, an attractor is a
pattern that a dynamic system is drawn to, and they mark the path that such a system
takes in space).
Language, the author concludes, satisfies the complexity criteria in that it is
composed of many different subsystems (syntax, phonology, semantics, etc.) – which are
themselves interdependent. It is crucial to understand that a change in one of these
subsystems can result in a change in the others, and that from their interaction emerges
the behavior of the whole.
SLA and complex nonlinear systems show a number of parallels, principle among
them is the dynamic process that both embody – especially when one considers the
dynamism apparent in the evolution of the acquirer’s interlanguage (IL). The trajectory of
the acquirer’s IL is determined by many interacting factors among which are the L1, the
L2, the nature of the input, the amount of exposure, age at exposure – not to mention a
17
host of sociopsychological factors as well. Another parallel is that learning linguistic
items is by no means a linear process. Neither is it the case that acquirers learn one
discrete item before advancing to something else, nor is the acquisition of a particular
item a linear process. At a particular stage of development, the L2er’s production might
be more or less “target-like” for a relevant item than at a later or earlier stage, depending
on when assessment takes place, the nature of the assessment or interaction, the linguistic
context surrounding the assessment, etc. But within such a self-organizing system, the
chaos does eventually subside and give way to a restructuring that restores order to the
system (i.e. a certain degree of attainment).
5.2 Complex systems and SLA: applications
Despite increasing support and growing awareness of the application that the
complex systems approach holds for SLA, apparently few works have attempted to
incorporate it as a framework for analysis of an L2 research phenomenon. Menezes de
Oliveira e Paiva (submitted) and Ziglari (2008) both write spirited reviews of Larsen-
Freeman’s work and encourage members of the L2 research community to engage in
complex system theory-inspired investigation. Mallows (2002) challenges the English
foreign language teaching community to move away from linear approaches to lesson
planning, preparing, and instruction, in an effort to better mirror the very non-linear
process of L2 learning. The latter proposals are heavy on praise and light on application.
Among those works that have made a move towards applying the CS approach is the
interactive resource textbook by de Bot et al. (2005), which provides theoretical
background, excerpts from related articles and practice exercises to motivate students to
18
think of issues and problems in the area of acquisition within the CS framework and most
importantly to design solutions.
In his work on the multilingual lexicon, Meara (2006) proposes simple models to
explain the relation between the lexicons in bilingual speakers, and more precisely the
nature of the switch from one language to another. Using Boolean networks, he models a
lexicon as a stable state of activation, and the language proper (the L1 for example) as an
attractor. The attractor will remain stable until there is enough activation within the
network to first “kick” then force the system out of one stable mode and into another, in
this case the L2. Work by the same author (Meara 2004) that looks at the attrition in the
bilingual lexicon centers around the idea that vocabulary loss can be simulated as loss in
activation level in the networks to the degree that activation does not become dense
enough to bring about a shift out of one stable state in to another.
In a qualitative, longitudinal study conducted by Larsen-Freeman (2006) the
development of one adult Chinese native speaker acquiring English was tracked over 6
months using transcriptions from repeated productions of the same story. The author
found evidence for the reorganization of the speaker’s language resources as the system
as a whole moved through another region of its state space. The speaker’s target use of
prepositions and verb morphology fluctuated (between target-like and non target-like
uses), as did the appropriate use of idioms and grammatical constructions. But the
stabilization of a given pattern, for example the expected preposition use with a given
verb, was found to co-occur with the emergence of markers of pragmatic fluency.
19
5.3 The complex systems approach and the current study
As has been reviewed so far, there appears to be much enthusiasm for a tighter
relationship between SLA and CS to be established. However, to date the few studies that
address this relationship in a principled way do so either in terms of modeling (Meara) or
with qualitative studies (Larsen-Freeman). The current work is a quantitative study of the
L2 prosodic system undergoing re-organization, with empirical data from a large number
of L2 speakers. This large number of speakers at different stages within the re-
organization process is necessary in order to probe the directionality of acquisition (from
rhythm to phrasal patterns). Quantitative data of this kind allows for a way to construct an
actual dynamical model of the complex system, which will be in the following chapters.
By way of preview, the model will outline the various forces that drive the potential
behind a particular attractor and will illustrate that these stable states are quantitatively
and qualitatively different in English and Spanish. Vowel duration – the differences
between and the changes in – is the trigger for pushing the L2 speaker’s system towards a
different mode of organization.
6. Outline of the dissertation
In Chapter 2 the phrasal prominence patterns of interest in English and Spanish
are described, and hypotheses and predictions are made as to the prosodic pattern
behavior of adult native Spanish speakers learning English as a second language. Results
from experimental data testing the hypotheses and predictions are discussed. Chapter 3
takes up a second experiment designed to examine how differences in vocalic durations
contribute to the global differences in English and Spanish rhythmic patterns. In Chapter
20
4, results from a third experiment are discussed. A repetition task was used to investigate
the differences in coordination among vocalic units in the case of native English
speakers, native Spanish speakers, second language speakers of English who have native-
like prosodic proficiency at the phrasal level in English and those whose behavior is not
native-like. Chapter 5 is a general discussion of the findings of the overall investigation
and its contribution to various areas of study in linguistics.
21
Chapter 2. Experiment 1: Phrasal events
Theory is a good thing but a good experiment lasts forever.
– Peter Leonidovich Kapitsa
1. Introduction
This chapter begins with a discussion of models of acquisition that have been
proposed or used in the analysis of prosodic acquisition. To my knowledge no models
have been proposed that specifically address the L2 acquisition of prosody. However,
among the plethora of L2 acquisition models that have moved in and out of circulation
(Larsen-Freeman & Long (1991) estimate around 40), a number concern the L2
acquisition of phonology-related processes, a few of which have also been applied to
research and findings on suprasegmental aspects.
As part of the Speech Learning Model (SLM), Flege (1995) posits a directionality
of difficulty for the L2 acquirer such that similarity poses more of a problem than
difference. In order to acquire a sound (phoneme or allophone) that is present in the target
language but not in the native language, the acquirer must first identify the sound as
sufficiently different from existent sounds in the native system; failure to do so will result
in substitution of a sound from the L1. See Flege 1995 and references therein for a more
extensive discussion of examples outside of prosody that are analyzed in support of the
SLM.
Likewise, Major’s Ontogeny Phylogeny Model (OPM) (1987, 2001) claims that
acquirers are more successful at acquiring sounds that represent the greatest contrast to
those found in the L1, while those approximating existing units more closely are
hypothesized to present the biggest challenge.
22
Eckman et al. (2003) posit the similarity vs. difference hypothesis in the opposite
direction, and with data to support the claim that it should be more challenging to learn a
new phoneme (present in target L2, not present in L1) and less challenging to learn an
allophone of a phoneme that already exists (an allophone that exists in L1 differs from
one that exists in the L2).
While arguments in favor of these models have all been supported with L2 data
on phonological acquisition, a few researchers have extended the use of these models to
the realm of L2 prosody. One such study was Mennen (1999), reviewed in more detail
below in section 3, where it was concluded that Dutch acquirers of Greek were more
successful at producing Greek prosodic contours that were not similar to ones existing in
Dutch – in support of Flege’s SLM. However, Gut (2003), reviewed in Chapter 3,
concludes the opposite based on L2 German rhythm data. The author claims that
Romance-language speakers whose native language does not have vowel reduction were
not at an advantage for acquiring this aspect of rhythm production, when compared to
L1English speakers whose vowel reduction was more similar to the native German
speakers.
The data presented in sections 3.5 on the acquisition of phrasal prominence
patterns by the population under study also touch on the question of directionality of
difficulty in the path through the acquisition process. As will be elaborated on, one of the
hypotheses is whether the prosodic pattern associated with anaphoric deaccenting will be
acquired before the one associated with nuclear stress, or main phrasal prominence.
2
As
2
While this hypothesis was tested in the collaborative work by Nava & Zubizarreta (see references) the
original insight that anaphoric deaccenting should be acquired before nuclear stress is attributed to
Zubizarreta.
23
part of this discussion it will be demonstrated that where the applications of the SLM and
OPM have fallen short is in the egregious overgeneralization of what “difficult” is and
means for the acquirer, and it will be suggested that it is more accurate to cast the similar
vs. new dichotomy in terms of the complexity of the particular phenomenon to be
acquired.
Work in the area of complex systems argues in favor of this point precisely. Kelso
remarks that the ease with which one task is learned as opposed to another is directly
dependent “on the extent to which specific parameters cooperate or compete with existing
organizational tendencies (intrinsic dynamics).” (Kelso 1995:175) The implications of
Kelso’s remark will figure prominently in the extended discussion (alluded to above) in
section 4.
2. Studies of L2 acquisition of phrasal level prosody
Work that falls under the heading of segmental-level L2 intonation abounds (see
Leather & James 1991, Archibald 1998, 2000 and Eckman et al. 2003 for reviews).
Likewise, substantial ground has been covered and gained in the investigation of the L2
acquisition of a host of phonetic properties (see Major 1998, Piske et al. 2001, Monroy &
Gutierrez 2001 and Flege et al. 2003 for introductions and reviews).
In contrast to the vast literature on segmental aspects of L2 production, work on
the suprasegmental aspects of prosody relevant to the current investigation is
unfortunately more limited. A survey by Gut, as reported by Mennen (2007), cites less
than 10 dissertations and 10 major journal articles on suprasegmental aspects of L2
prosody acquisition in the last three decades. Chun (2002) reports similar numbers in her
24
review, and a supplementary survey by this author adds only three additional
dissertations and a handful more articles in this specific area. Undoubtedly this dearth is
due in part to the challenge that comes with analyzing prosody as an organizing principle
of speech, where it is not always obvious how differences in prosodic systems should be
measured. An additional challenge is data analysis, as more than one model for prosodic
analysis exist, and these alternatives do not always speak to each other transparently.
Finally, of the works in this area, still fewer are of direct relevance to the current
investigation of the acquisition of prosodic events at the phrasal level. In what follows, a
review of studies that present experimental evidence of the L2 acquisition of aspects of
phrasal prominence is given, with a discussion of the corresponding approaches.
A study by Archibald (1997) offers a descriptive analysis of how English phrasal
stress is implemented by adult L2 learners, without attempting to account for its
computation. The author analyses the speech of one L1Hungarian/
L2English and one L1Polish/L2English speaker by examining production data taken from
an experiment originally designed to address the question of word-level stress
acquisition. While evidence of L1 phrasal stress transfer in the speech of both L2 learners
is found in this study, where transfer is understood as the use of rules or system-based
operations from the native system/grammar in the second language, the author provides
only an impressionistic analysis and does not attempt to offer an account for the
processes that underlie the nature of these productions.
Other studies, however, have been guided by an approach that examines
separately the phonological and phonetic components of prosody (Fox 2000), in an effort
to better diagnose the nature of both the L2 prosodic production and the acquisition
25
process. The impression that a prosodic production is non-native is the culmination of a
combination of phonological and phonetic factors still anchored to the speaker’s L1
prosody, which contribute to production that is perceived as non-target. The phonological
properties associated with prosody include the choice and placement of pitch accents, and
the phonetic properties are those associated with temporal alignment of pitch events.
The term pitch accent refers to pitch movement on a stressed syllable, in addition
to the other characteristics of stress (duration, amplitude, spectral quality of vowel). In
many languages, the distribution of pitch accents guides the patterns of prominence
placement among words in a sentence that contribute to its meaning (Pierrehumbert &
Hirschberg 1990, Ladd 1996, Beckman et al. 2005, Xu & Xu 2005). One such study that
examines both the phonological and phonetic properties of L2 prosody is Ueyama & Jun
(1998) in their investigation of contours and pitch accent realizations for narrow focus
constructions by L1Korean and L1Japanese speakers of English. The authors found that
learners acquire phonological features (such as underlying sequences of tones for narrow
focus pitch accents and boundary tones) earlier than the related phonetic features (the
slope of initial F0 rise and peak F0 location).
Additional support for the phonological – prosodic distinction of prosodic
properties and for the directionality of acquisition is reported in Jun & Oh (2000). The
authors report that L1English/L2Korean speakers acquire tones linked to phonological
properties associated with prosody, such as the H(igh) tone’s
3
function of marking a
3
“H” refers to a high tone and “L” to a low tone, as established by ToBI conventions (Tones and Break
Indices, Silverman et al. 1992, Beckman and Ayers 1994). Easily decipherable references to these tones are
made at a few points in this chapter.
26
phrase-final boundary) before they acquire those linked to the phonetic properties (the
surface realization of Accentual Phrase tone patterns) in declarative sentences.
Jilka (2007) likewise separates out the phonological and phonetic components in
his analysis of the L2 German prosody of native American-English speakers. On the
phonology end, he finds that L2 speakers introduce more pitch accents than are produced
by native speakers in a given phrase, and in some cases substitute tonal sequences that
differ from the target (for example, L*H% (target German) vs. L+H* L-H% (American
English)). There were also substantial differences in the realization of the phonetic
parameters associated with prosody. For example, the rise of the L*H pitch accents
produced by the American English speakers were found to be significantly steeper and
longer than the native German speakers in the declarative sentences tested.
Mennen (1999) looked at the phonological realization of the differences in
prosodic contour between native Greek speakers and L1Dutch speakers for yes-no
questions (YNQ). Greek has two possible nuclear accent locations for YNQ, which yield
different prosodic contours. However, Dutch has only one possible nuclear accent
location and one possible prosodic contour for questions of the same type. It was found
that L2 acquirers were more accurate in producing the contour that is not similar to the
existing contour in their L1, i.e. they were more target-like in their production of a “new”
as opposed to “similar” prosodic contour, in support of Flege’s SLM model.
27
Kelm (1987)
4
presents a study of the acquisition of contrastive emphasis by
L1English speakers of Spanish. While accurate in its treatment of the question of
contrastive emphasis expression in Spanish, the experiment is not well enough controlled
to draw strong conclusions from the data. Nevertheless, the results do seem to indicate
that L1English speakers transfer the use of pitch and intensity as the resource to mark
contrasted words in their Spanish, whereas the native speakers use pitch and syntax (word
order), or additional words as contrastive markers (“sí”, “este”, etc.).
Somewhat surprisingly the two studies most relevant to the current work are also
the oldest ones reported on in this review. In their studies of phrasal level prominence in
the speech of L2English acquirers, Backman (1979) and Jenner (1976) found that L2
speakers of English inaccurately align prominence at the phrasal level. Jenner examined
the intonation patterns and other prosodic features of Dutch speakers of English (speakers
of Dutch English, in his terms) in an attempt to outline a model of prosodic
interlanguage. Interlanguage is a cover term used in SLA literature to refer to an L2
speaker’s emerging system of production that is characterized by components of the
native language and of the target language, and often production behavior that cannot
ostensibly be attributed to either. It is unclear if Jenner’s study included data
measurements, as the only discussion centers around a purely descriptive account of
prosodic contours. Nevertheless, his account does speak to the difficulty that Dutch
English speakers experience – despite the typological similarity in the target and L2
4
Kelm (1995) also investigates the question of prosodic transfer with a bidirectional study of Spanish-
English and English-Spanish adult L2 learners. This study measures the pitch ranges in both the native and
second languages of the speaker groups and finds that both groups transfer the “vacillation”, i.e. the amount
of pitch fluctuation, from their native language, with English speakers producing more fluctuation in pitch
range in both their English and Spanish.
28
languages – in marking with pitch accents the correct pattern of informational contrast
that characterizes a prosodic contour in English.
Backman’s study of 8 Venezuelan Spanish speakers learning English analyzed
prosodic productions of questions (YNQs, Wh-questions) and declaratives taken from
scripted dialogues. The same dialogues were translated into Spanish in order to also
analyze productions from the speakers’ native language as well. Along with acoustic
measures of the fundamental frequency (F0), subjective impressions by external judges
were made for “appropriateness” of pitch contour for the given context and of
prominence placement. The author’s interpretation of the results is somewhat opaque.
She claims that the sentential prominence placement for declaratives was judged as
occurring too early in the sentence (in comparison with the native English controls), and
additionally that the pitch on unstressed words occurring later in the sentence is too high.
However, from the examples of data provided, it is impossible to get at a clearer
understanding of the precise nature of the L2 productions. It is also very likely that these
results are obscured by a failure to design the declarative sentence stimuli according to
the syntactic category of the verb as well as informational status (all new information vs.
inferability, etc.) – meaning that analysis of all the data was lumped together, irrespective
of verb type and discourse status. Nevertheless, two important points can be taken from
this study: first, the L2English speakers in this study consistently misaligned sentential
prominence, and second, Backman claims that these “inappropriate” patterns are not
explained by the speakers’ native Spanish productions either (this latter point is taken up
again and in more detail in Chapter 3).
29
What has emerged as part of this literature review is that most of the research in
the area of prosodic acquisition has involved the separation of the phonological form
from the phonetic components in accounting for this acquisition process. However, it is
argued here that a solution to this problem comes not from this separation but by
modeling this process dynamically in order to account for both the discrete and
continuous aspects of complex systems (Gafos & Benus 2006). What will be shown in
what follows of this chapter is that the availability of different prosodic patterns in
English can be modeled as multiple discrete modes of a dynamical system that regulates a
continuous variable: the relative prominence of adjacent constituents. While this relation
is continuous, the modes of its distribution correspond to prosodic patterns that are
qualitatively distinct from any other in terms of meaning, and it is this discrete structure
that the dynamics also models. This shift in qualitative meaning along a continuum with
variation figures prominently in modeling the differences between Spanish and English at
the prosodic level, as well as the change in L2 behavior as a result of acquisition.
3. Phrasal events in the L2English speech of Spanish native speakers
In this section, the prosodic differences at the phrasal level between English and
Spanish are outlined, a hypothesis and related predictions of what might occur in an L2
situation based on those differences are proposed, and an experiment designed to test
these predictions is presented. Of the resulting data, this chapter centers primarily on L2
group data, with a more in depth look at individual data reserved for Chapters 3 and 4.
Following the data analysis, an analysis of the data within a cognitive dynamic systems
framework is given.
30
3.1 Phrasal prominence in English and Spanish: similarities
Within a given utterance, there is one word perceived as more prominent than the
others, which is assumed to correspond to a more prominent production of that word with
respect its syntagmatic cohorts. Words are flagged as prominent in response to
informational focus operations. Focus, in general terms, regulates the incorporation of
information in to the discourse flow (Ladd 1996), with words under focus serving as the
locus of “maximal inflection of the pitch contour.” (Chomsky 1971: 199)
Chomsky and Halle (1968) were among the earliest proponents of the idea that
main phrasal prominence, or nuclear stress (NS), is generated by an algorithm, the
Nuclear Stress Rule (NSR), that establishes the right-most element of a phrase as
prominent. Historically, this marked the beginning of a still-continuing debate as to NS
realization and its relation to information focus structure. That debate is left aside for the
moment to concentrate on the role of the NSR as being the algorithm responsible for
locating the “the rhythmically most prominent word”, where it assigns right-most stress.
Under this analysis, the NSR provides the correct prosodic output for the example wide
focus utterances given in (1) below:
(1) a. Hope bought a shirt.
b. Hope bought a shirt at the store.
c. Vivian danced in the studio.
d. Vivian danced on the table in the studio.
Ironically, even though the NSR was proposed to account for data in English, it has been
widely noted that it has a wider application outside of English, or Germanic languages in
general (Gussenhoven 1983, Zubizarreta 1998, Truckenbrodt 2006, Hualde 2007, among
others). It is perhaps a better fit for Romance languages, where it is well-known that NS
31
falls right-most (phrase-finally) in wide focus contexts, where all the information is
“new”, i.e. not previously mentioned. The Spanish counterpart to (1) above is given in (2)
below.
(2) a. Esperanza compró una camisa.
b. Esperanza compró una camisa en una tienda.
c. Viviana bailó en el estudio.
d. Viviana bailó sobre la mesa en el estudio.
However, while Spanish preserves phrase-final NS placement for all wide-focus contexts
(Contreras 1976, Zubizarreta 1998, Sosa 1999, Hualde 2007), there are wide-focus
contexts where NS can occur phrase-internally in English (Bolinger 1972, Schmerling
1976, Gussenhoven 1984), such as the examples given below in (3) (cited from
Zubizarreta 1998: 68).
(3) a. The baby’s crying.
b. The sun came out.
c. The mail arrived.
The implications of these differences between English and Spanish are discussed in the
following section.
3.2 Phrasal prominence in English and Spanish: differences
The observation that English and Spanish differ regarding prominence placement
for the same discourse context is not a new one. Bolinger (1954) postulated that the
observable difference between the two languages in informational status expression was
due to phrase-final length in Spanish and pitch accent in English. Delattre (1962), in a
comparative study of English and Spanish declarative intonation sentences, suggests that
“more options” exist for English declarative contours than in Spanish. Stockwell &
32
Bowen (1965) comment that while in English the most prominent word in the phrase may
occur before the end of the phrase, in Spanish whatever is most prominent must be last
(Stockwell & Bowen 1965: 28). While these observations are insightful, they remain
descriptive in nature and fall short of providing a means to account for the observed
differences.
Zubizarreta (1998) succeeds in providing a key part to such an account by
proposing a reformulation of the NSR. This reformulation yields two algorithms: the
Selection Ordering Nuclear Stress Rule (SNSR), which is the more specific algorithm,
sensitive to an argumenthood relation, that preferentially assigns NS to the S(ubject)
when the S and V(erb) are metrical sisters; and the Constituent Ordering Nuclear Stress
Rule (CNSR), the more general algorithm that assigns NS right-most when no selectional
relation exists between the sister nodes.
(4) SNSR: Given two sister nodes, A and B, B is strong if selected by A.
This is the operative algorithm in the case of intransitives, as preluded by (3), and OV
transitive compounds, further examples of which are given below.
In (5) the unaccusative V selects its argument (argument in the lexical-syntactic sense) as
strong and the prosodic output is rendered with NS on the S – phrase-internally.
(5) What happened?
A window closed.
In both (6a) and (6b) the V selects the O(bject) as strong, and as such the prosodic output
occurs with NS on the O in both cases.
(6) a. She eats pasta.
b. She is a pasta-eater.
33
However, when there is no selectional relation between the sister nodes, the CNSR
assigns NS right-most to the V.
(7) CNSR: Given two sister nodes A and B, the right-most node is strong.
An example of this is (8), where no argument relation holds between the adverb and V
(compare with (5)), bleeding the application of the SNSR.
(8) A window suddenly closed.
The above examples serve to demonstrate that in English both the SNSR and the CNSR
operate, where it is understood that the CNSR applies only after the SNSR fails to apply.
Up to this point the discussion has centered around wide-focus examples in order to
concentrate on grammar-generated stress patterns. In section 3.4 cases where pragmatic
considerations such as unexpectedness or noteworthiness that also have an effect on
prosodic output will be discussed.
However, there is evidence that the SNSR applies to unaccusatives and OV
compounds specifically, but not intransitives generally. This is motivated by the
structural difference that exists between these intransitive subclasses (Levin & Rappaport
Hovav 1995, Hale & Keyser 2002, among many others). Unaccusatives and OV
compounds are cases where the relation is verb-phrase internal, the argument that
ultimately receives nuclear stress is an internal argument in both cases. So the SNSR
applies more strictly to argument relations within the verb phrase. On the other hand, the
subject argument of unergative verbs is analyzed as an external argument – external to
the verb phrase. Unergatives are reported to be more variable in terms of NS placement
than unaccusatives (Sasse 1987, 1995). Sasse argues this is the case because most
unergatives receive a categorical judgment, which corresponds to a complex event. The
34
complex event analysis receives support owed to the fact that most unergatives have a
transitive counterpart, as shown in (9) and (10).
(9) a. Mandy ran.
b. Mandy ran a marathon.
(10) a. Betty danced.
b. Betty danced the cha-cha-cha.
Under this analysis, unergatives can support multiple prosodic outputs depending on the
specific realization of the verb and whether the verb’s internal argument (the object) has
been incorporated implicitly (as in the case of 9a and 10a above) or explicitly expressed
(9b, 10b).
These remarks bear some similarity with Hale & Keyser’s post vs. pre-
incorporation hypothesis regarding this verb class (Hale & Keyser 2002). This analysis
stipulates two possible metrical outputs, one in which the trace of the incorporated direct
object (for example “marathon” in (9b)) is visible (pre-incorporation), and another one in
which the DO object is invisible to the metrical structure (post-incorporation). Their
explanation is that the metrical structure is built either on the pre-incorporation syntactic
structure, or the metrical structure is built on the basis of the post-incorporation syntactic
structure. This inherent flexibility in the case of the unergative verbs has the prosodic
repercussion that more than one prosodic pattern is available and may be activated for a
given context. Said flexibility leaves this subclass highly sensitive to other factors such as
information structure, discourse context, noteworthiness vs. predictability, etc. (control
parameters,), as will be tested here.
35
3.2.1 Phrasal prominence in Romance
It is well-established in the literature on Romance intonation and prosody that NS
falls phrase-finally in Spanish (Zubizarreta 1998, Sosa 1999, Hualde 2007) and other
Romance languages (Cruttenden 1997; see Ladd 1996 for a discussion of Italian) in wide-
focus contexts. However, it is possible for NS placement to occur phrase-initially in the
case of narrow or contrastive focus.
(11) Verónica trajo el pastel (no Pedro).
[Veronica brought the cake (not Pedro)]
However, an equally possible, and perhaps preferred, option is to focus the S via word
order. “In fact, in Spanish the nuclear stress always falls on the last word of the
intonational phrase, except for cases where a non-final word receives narrow contrastive
focus … but even this is a relatively marked strategy in Spanish compared with the
rearrangement of word-order.” (Hualde 2007: 3) Hence, due to the possible
rearrangement as afforded by flexible word order both the S and the NS are phrase-final,
a process referred to as prosodically motivated movement (p-movement) in Zubizarreta
1998.
(12) Trajo el pastel Verónica (no Pedro).
In Spanish, word order and prosody conspire to pull NS phrase-finally, and a return to the
wide focus discussion furthers this point. If the intransitive unaccusative verb in (13a)
and (13b) are compared with English (5), as answers to the wide-focus context question
36
¿Qué pasó? (What happened?), it can be observed that regardless of the order, NS falls
phrase-finally.
5
(13) a. (VS) Se cerró una ventana. [Closed a window]
b. (SV) Una ventana se cerró. [A window closed]
Hence, the CNSR, given in (7) and repeated in (14), is the operative algorithm in Spanish
(Romance more generally) for NS realization.
(14) CNSR: Given two sister nodes A and B, the right-most node is strong.
The constituent-ordering algorithm is driven by the word order resource in
Spanish, and in Chapter 3 the discussion of the correlation this has with certain rhythmic
aspects of the language will be taken up.
3.3 Further cross-linguistic differences: Anaphoric Deaccenting
Up to this point, the discussion of main prominence realization and its relation to
focus structure has centered on wide-focus contexts, i.e. where all the information is “out-
of-the-blue”, or not presupposed in the existing context. In cases where the information is
salient in the discourse, or known to the locutor and interlocutor, the mechanism for NS
realization works differently.
6
This speaks, in part, to the modular nature of main
prominence realization in English: there is a part of phrasal prominence realization
5
Both Mayoral-Hernández (2006) and Nava (2007) found VS to be the most frequent order
for unaccusatives in wide-focus contexts in both written and oral productions, respectively, of native
Spanish speakers. Additionally, data from a recent web-based survey (Nava 2009) revealed that native
Spanish speakers showed a preference for VS order with unaccusatives and SV order for unergatives
associated with wide-focus contexts in a judgment task.
6
Here I refer only to cases of strict “givenness”, where the information supplied by one speaker is repeated
by another (in a Q&A type discourse). However, a substantial body of work exists on the intonation of
inferable information as well (i.e. information not repeated verbatim as part of the ongoing discourse but
nonetheless accessible and inferable from the relationship set up by the discourse context). See Baumann &
Grice 2006 for a detailed discussion on the intonation of accessibility.
37
accomplished by the NSR (SNSR, CNSR), and another part taken care of by Anaphoric
De-accenting (A-Deacc). A-Deacc refers to the deletion of a pitch accent that is
associated with an anaphoric constituent (an entity with an antecedent in the discourse)
(Ladd 1980, 1996, Selkirk 1984, Gussenhoven 1984), which triggers a subsequent shift in
prominence to the metrical sister node. A-Deacc is independent from the NSR algorithm
(Reinhart 2006), in that it applies to the output of a prior application of the NSR.
Well-cited examples of A-Deacc in English abound in the literature; the example
below is cited in Büring 2007. “Italian” is the previously mentioned or given information,
and as such is deaccented in the answer, resulting in main prominence placement on the
V.
(15) Q: Why do you study Italian?
A: Because I’m married to an Italian.
7
A-Deacc is another instance of main prominence occurring non-phrase-finally in English
– again in contrast with Spanish. While there are some Romance languages, such as
French (Ronat 1982, Zubizarreta 1998) and Brazilian Portuguese (Moraes 1998), that do
have A- Deacc, Spanish is among those that do not (Ladd 1996, Cruttenden 1997,
Zubizarreta 1998, Hualde 2007). An example token of A-Deacc with a transitive verb is
given in (16).
(16) Q: Why are you buying that old stamp?
A: Because I collect stamps.
For the same context in Spanish, a different prosodic pattern is preferred, where the given
information receives main stress, phrase-finally.
7
Italics are used to indicate de-accented material.
38
(17) Q: ¿Por qué compras ese sello tan viejo?
A: Porque colecciono sellos.
However, it is not the case that Spanish has no other recourse in contexts of
givenness than to repeat given information. The known or previously mentioned
information may also take the form of a clitic object pronoun, “los” referring to “sellos”,
appearing before the V in the answer, in which case the V is the phrase-final element and
receives main prominence.
(18) A: Porque los colecciono.
Thus far, the ways that English and Spanish differ in NS realization for both wide focus
and A-Deacc contexts have been evaluated. This comparison sets the stage for a phrasal
prominence hypothesis put forth in the section below.
3.4 Experiment 1: Phrasal prominence placement
As laid out in sections 3.2 and 3.3, the hallmark of English stress is the phrase-
internal prominence pattern, which has more than one source: the SNSR and A-Deacc.
However in Spanish, where the CNSR is operative, the respective prominence patterns
are phrase-final. Data from L2 speech is in the position to provide key insights into the
language-specific differences of NS and reveal language-specific tendencies for
organization. Even though the L2 speakers are producing English, the temporal patterns
guiding the realization of this production could still be organized according to the modes
of organization operative in Spanish. For this reason, a difference in prosodic proficiency
is expected to emerge among individual speakers regarding the acquisition of the mode of
organization associated with phrase-internal prominence in English. Speakers who have
39
not yet acquired the English-like mode of organization are expected to produce English
using the Spanish mode of organization, which will result in phrase-final prominence
placement for all wide focus contexts. For those speakers who have not yet acquired the
hallmark NS in English, a hypothesis is put forth regarding the prosodic transfer we
expect to observe as a result of this contact situation:
(19) Spanish speakers of English, in particular non-high proficiency speakers, will
maintain the mode of organization from their native language, resulting in transfer
of the CNSR from their native language.
The above hypothesis brings the following predictions:
(20) a. L1 Spanish speakers will place NS phrase-finally on the verb in intransitive SV
structures in English.
b. L1 Spanish speakers will place NS on the verb rather than on the object in the
English compound OV structures.
c. L1 Spanish speakers are expected to place NS phrase-finally on either the verb
or the adverb, respectively, in SADVV and SVADV contexts.
d. L1 Spanish speakers are expected to show a preference for phrase-final NS
placement with all unergatives, regardless of discourse context.
The implication from the above hypothesis and predictions is that in order for L2
speakers to successfully acquire English-like NS, their mode of organization with respect
to the language-specific NSR algorithm must undergo restructuring. However, the same
logic does not necessarily apply in the case of A-Deacc. While Spanish does not have a
counterpart to A-Deacc in English, it does have the possibility to deaccent given
information in narrow focus and contrastive focus contexts (see example (11)), so the
40
relevant prosodic pattern does exist, albeit for a different information structure context.
The case of A-Deacc is fundamentally different from that of nuclear stress in English
because in the case of nuclear stress the relation is one of prominence between adjacent
constituents, whereas in the case of A-Deacc the prominence pattern does not emerge
from the relation between adjacent constituents but rather reflects properties of individual
constituents. A CS approach also speaks more precisely to how differences in learning
and change are dependent on the nature of a system’s particular organization. As
described by Kelso, “Whether some tasks are learned more easily than others depends on
the extent to which specific parameters cooperate or compete with existing organizational
tendencies.” (Kelso 1995: 175). The acquisition of the prosodic pattern associated with
phrase-internal NS in English requires system restructuring so as to yield a new prosodic
attractor state. However, in the case of A-Deacc, a similar attractor state already exists: as
mentioned in section 3.2, illustrated with example (11), a phrase-internal prosodic pattern
does already exist for narrow contrastive focus contexts, i.e. where – just as with A-
Deacc – the known information is deaccented. The relevant point is that the prosodic
contour in Spanish for that context is already grossly similar in nature to that of English
A-Deacc. At least on the (perceptual) surface it sounds to the naked ear as though L2ers
are producing those salient characteristics (the relevant F0 patterns, etc.) recognized as A-
Deacc. For example, fundamental frequency (f0) analyses in both English and Spanish
have found that in both languages narrow focus is realized with a higher f0 peak on the
word under focus when compared with an f0 peak on the same word in a broad focus
context, and that the duration of the syllable under narrow focus is greater when
compared with the same syllable in broad focus (Toledo 1989, de la Mota 1995, Face
41
2002, Xu & Xu 2005).
8
In both languages it has also been found that while the pitch
range is expanded during the focused stressed syllables, the pitch range is suppressed
post-focally (a sharp decline following focus that continues to flatten into a plateau).
Finally, both in English and Spanish the location of the f0 peak for narrow focus is
largely the same, i.e. within the boundaries of the stressed syllable proper.
9
An additional
prediction to advance is that a finer-grained comparative measurement of the critical
properties of the two types of contours would indeed reveal certain differences in these
characteristics. But for the moment an informed hypothesis can be made about the
acquisition of A-Deacc as compared to the acquisition of the SNSR.
(21) Prosodic patterns that cooperate with existing organizational tendencies are
acquired before those that require a restructuring of existing modes of
organization.
The following predictions result from the above hypothesis:
(22) a. Native speakers of Spanish will acquire A-deacc before acquiring
the SNSR.
b. Speakers that have phrase-internal NS in their speech will have also
A-deacc.
c. Speakers with A-deacc in their speech may or may not have
phrase-internal NS.
8
In the studies reviewed, participants were instructed to treat all post-focal information in the narrow focus
contexts as repeated information, with the intention of deaccenting.
9
In English, the location of f0 peak is said to be the same for narrow and broad focus contexts, whereas in
Spanish the f0 peak for narrow focus is said to be “early” (Navarro Tomás 1944, Face 2002) even though it
occurs within the boundaries of the syllable, due to the fact that in broad focus contexts the f0 peak occurs
after the stressed syllable.
42
The experiment designed to test above the hypotheses is discussed in the following
section.
4. Experiment 1: Methods
4.1 Participants
The control group consisted of 35, adult English Native control (ENC) speakers,
and 45 adult L1Spanish/L2English speakers comprised the test group. The test population
was heterogeneous with regards to L1 dialect. While most of the participants came from
Spain and Mexico, ten speakers were from Paraguay. Differences in intonation and
prosodic patterns have been found to exist across varieties of Spanish, but as far as we
know not for the declarative utterances under study, nor between the dialects represented
in this L2 population (Quilis 1985: 166-167).
The high number of participants in both groups is somewhat unique to intonation
studies, where groups tend to average around 10 participants. The inclusion of a larger
number of participants in our study had the added advantage that we were able to amass
enough representative data points to map a developmental path through L2 acquisition of
prosody. Both groups completed a Cloze test (Oshita 1997) as an independent measure of
proficiency. The format of this Cloze test consists of three separate reading passages, and
the participant is asked to fill in blanks spaced every fifth word apart. There were 75
blanks in total, giving a score range from 0 to 75. Twenty-six L2ers tested at the high
proficiency level, 19 at the intermediate level; see Table 1.
43
Table 1. L2 participant population Cloze test results
Cloze test results Average Range
ENC 73 70-75
L2 High Proficiency 70 66-73
L2 Intermediate Proficiency 63 58-65
A one-tailed, two-sample unequal variance T Test revealed a significant difference
between L2 high and L2 intermediate groups proficiency results (p < .001).
Table 2 reports the group results for the participant’s age at the time of testing,
their age when they were first exposed to English, and the amount of time spent in the
United States, as reported on the Background Questionnaire completed by each
participant (see Appendix for complete form). Of the 10 participants tested in Paraguay,
none reported having lived in an English-speaking country.
Table 2. L2 population: Age and Time in US
Total L2
High Proficiency L2
Intermediate Prof L2
Average Range Average Range Average Range
Age at time of
testing
34 19-55 34 19-55 33 23-52
Age at exposure
to English
14 3-50 12 3-27 21 4-50
Years spent in
English-speaking
county
6 0-28 8 0-22 8 0-28
4.2 Procedure
The first experiment was designed to elicit NS production at the phrasal
level, and consisted of a scripted Question & Answer (Q&A) dialogue between the
experimenter and the participant. Participants wore a head-mounted microphone and
44
faced the experimenter during the task. The participant and experiment each held a set of
corresponding index cards where the stimuli were printed, and turned over a new card
simultaneously as part of the continuing dialogue task. Participants’ responses were
recorded and analyzed using PitchWorks software program (Scicon R&D Inc, 2006).
This particular dialogue task is germane to the research question, in that it allows
us to control for the appropriate connection of verb type to discourse factors (the control
parameters). Additionally, it is argued that a dialogue task allows for more naturalistic
productions, in contrast to many studies on prosody where data is drawn from readings of
prepared material that are often repeated at length over a series of trials (Ueyama & Jun
1998, Xu & Xu 2005, Calhoun 2006).
Following the dialogue elicitation task, participants were asked to read a small
paragraph that was used as the material for Experiment 2. After the reading passage was
recorded, participants completed the Cloze test, followed by the background
questionnaire.
4.3 Design and stimuli
The two conditions tested were verb type and discourse context. Syntactic
structures were paired with different information structure contexts, where the wide-focus
contexts probe for the neutral, grammar-generated patterns, and the cases of discourse
sensitivity tap in to the role played by pragmatic factors such as unexpectedness,
noteworthiness, etc. In order to elicit phrase-internal prosodic patterns in English,
unaccusative and OV compound structures were paired with neutral, wide focus contexts.
Monolingual English speakers are expected to produce phrase-internal patterns with these
45
contexts, while the L2 speakers are expected to produce phrase-final patterns. However,
both speaker groups are expected to produce a phrase-final pattern when an adverb
proceeds or follows an unaccusative verb, since this implies a bleeding of the application
of the SNSR. The noteworthiness of the verb (expected, unexpected) was varied in the
case of the unergative in order to test for the availability of more than one prosodic
pattern – NS on the verb or on the subject. Transitive verbs and ditransitive verbs paired
with neutral discourse contexts were compared with transitives and ditransitives where
information in the answer was known information, in order to test for the anaphoric
deaccenting.
Two lists of test items were constructed using a within-subjects design. Each list
had 45 target stimuli, and 45 fillers. The fillers included declarative sentences, questions,
and imperatives, and were designed to assure participants’ compliance with the question
and answer task. A list of a subset of target structures is given below:
10
Table 3. Organization of stimuli by verb type and discourse context
Verb type Syntactic
structure
Discourse context Number
of stimuli
Unaccusative S V Wide focus 4
Unaccusative S ADV V Wide focus 4
Unaccusative S V ADV Wide focus 4
Unergative S V Wide focus, neutral predicate 8
Unergative S V Noteworthy predicate 4
Transitive S V O Wide focus 4
Transitive
(compounds)
S V (OV) Wide focus 4
Ditransitive S V O PP Wide focus 4
Transitive S V O Previously mentioned
material (A-Deacc)
4
Ditransitive S V O PP Previously mentioned
material (A-Deacc)
4
10
The reader is referred to the appendix for a complete list of all the categories tested.
46
4.4 Coding and statistics
Data were coded for the presence vs. absence of pitch accent (PA) and for the
location of the nuclear PA. Coding was done by two native speakers of English, one of
whom was trained in ToBI labeling, to ensure inter-rater reliability. Inter-rater reliability
was 94%, and any coding discrepancies were resolved by a third, expert ToBI labeler. All
participants were tested by the same experimenter in Los Angeles, California except for
the 10 test participants who were tested in their native country of Paraguay.
For the statistical analysis the results were pooled across participants for a Chi-
square analysis, where each participant contributed 12 observations in the case of the
unaccusatives, 20 observations in the case of the transitives, and 12 in the case of the
unergatives. The results and discussion of Experiment 1 are presented in the following
section.
4.5 Experiment 1: Results
The results of NS production for unaccusative verbs are given in Figure 1. A
representative token of the category under discussion is given below the figure. The
greatest difference can be seen between the ENC and L2 populations with the wide focus
SV unaccusative structures; the former places NS on the subject 97% of the time, and the
latter only 23% of the time, a statistically significant difference (see example in (23)), χ
2
= 137.45, p < .001). These results add to existing experimental data that have also found
a statistically significant NS placement on S for unaccusatives with native English
speakers (Hoskins 1996). This resulting difference in population performance is
attributed to the ENC group employing the SSNR, whereas the majority of L2ers
47
“transfer” the CNSR from Spanish, i.e. they have maintained the Spanish-like mode of
organization.
However, a significant difference is not observed in the case of the SAdvV or the
SVAdv contexts (χ
2
= .59, p = .443). In the case of the SAdvV contexts, there was
sentence-final NS placement 91% of the time in the case of both populations, and the
majority of NS placement was sentence-final for both groups in the SVAdv context as
well (see example in (24)), 80% phrase-final in the case of the ENC and 94% in the case
of the L2E. This is due the operation of the CSNR, which takes care of NS assignment
for this context and is present in both English and Spanish. The figure below shows the
percentage of non-final NS patterns.
Figure 1. Prosodic patterns: unaccusative verbs with wide focus contexts
(23) SV unaccusative, wide focus
Q: What was that crashing sound?
a. A glass broke. (ENC)
b. A glass broke. (L2)
48
(24) SAdvV
Q: What happened?
a. A glass suddenly broke. (ENC)
b. A glass suddenly broke. (L2)
The results support the proposal that the SNSR is also operative in the case of transitive
OV compounds. Other studies on the realization of PAs in English compounds have
likewise shown that no PA is present on the second constituent for this type of OV
compound (Gussenhoven 2004: 18). The data shown in Figure 2 likewise expose a
significant difference between the control and the test groups (χ
2
= 37.54, p < .001), with
the ENCs producing NS on the argument 96% of the time, and L2ers doing so 43% of the
time. The L2 speakers’ preference for sentence-final NS again provides evidence of
transfer.
Figure 2. Prosodic patterns: transitive compound structures with wide focus contexts
(25) Transitive compound
Q: Did Barbara like the Italian restaurant?
a. Oh yes. She’s a pasta-eater. (ENC)
b. Oh yes. She’s a pasta-eater. (L2)
49
The robust contrast in the data from the two populations supports the hypothesis given in
(19).
In Figure 3 the pooled unergative results are given. It was predicted in section 4.2
that more variability in NS placement could be expected in the case of the unergatives,
where more than one stable state, or attractor, is available for this verb type. This
prediction is confirmed by the results. No significant difference was found between the
NS placement of the ENC speakers, who placed NS on the V 59% of the time (slightly
above chance), and the L2ers who did so 75% of the time (χ
2
= 4.66, p = .031).
Figure 3. Prosodic patterns: unergative verb, general results
As mentioned above, this data is the pooled result of all unergatives. The four tokens with
pragmatically unexpected predicates are removed (the noteworthiness category: A
dolphin is talking, A dog is singing, A lion smiled, A whale dance) a difference emerges;
see Figure 4 for the results of unergatives with neutral, wide focus discourse contexts.
These results provide evidence that variability exists for the prosodic patterns of the
unergative SV phrases, a variability that is guided by the type of syntactic structure as
50
well as the control parameter of discourse context. In this case where the discourse
context is neutral, the ENC place NS on the V at just around chance, 51% of the time,
while the L2ers did so 69% of the time, a between population difference that is
significant (χ
2
= 24.41, p < .001). The ENC are responding in accordance with the
bistable mode of organization for the prosodic pattern (NS on S or on V) that is available
for this verb given its particular syntactic make up, as reviewed in section 4.2. However
in the case of the L2ers, the stronger preference for NS on V likely stems from the
unimodal stable state for the prosodic pattern associated with this verb type in Spanish.
Figure 4. Prosodic pattern, unergative verb, wide focus context
(26) Unergative structure (pragmatically neutral predicate):
Q: How was your field trip?
a. It was cool. A lion roared. (ENC)
b. It was cool. A lion roared. (ENC)
b. It was cool. A lion roared. (L2)
We turn now to the results excluded in the previous table, those tokens with noteworthy
predicates (A dolphin is talking, A dog is singing, A lion smiled, A whale dance). Figure 5
51
shows that both the ENC and the L2 speakers prefer to place NS on the V, 81% and 74%
of the time respectively, in the case when the predicate is unexpected (lions are known to
roar, but are not generally associated with smiling). Even though this difference is not
statistically significant (χ
2
= .031, p = .859), the sources of NS might be different for the
two groups. The ENCs response is a reflex of pragmatic salience – emphasis is placed on
what is pragmatically salient (as was predicted in section 4.2 per the discussion of high
sensitivity to these factors for this verb class). But for the L2ers, the preference for NS on
the V is likely due to effects of transfer. In the case of the ENC a difference which speaks
to the ENCs response as guided by the noteworthiness factor, which is reflected in their
choice of highlighting what is most unexpected about the event: the fact that a dog is
singing (and not barking), i.e. a noteworthy predicate. However in the case of the L2ers,
what is reflected is again the preference for a phrase-final prosodic pattern, which in
Spanish would be accomplished together with word order, where the highlighted or
noteworthy piece of information appears phrase finally, as shown in (27a) where the
intention would be to drawn attention to the fact that a dog and not a human is singing, or
in (27b) where attention is drawn to the unusual nature of the singing activity:
(27) a. ¡Está cantando un perro! [Is singing a dog]
b. ¡Un perro está cantando¡ [A dog is singing]
Figure 5. Prosodic pattern, unergative verb, noteworthy predicate
52
(28) Unergative structure (pragmatically noteworthy predicate):
Q: How was your field trip?
a. Guess what? A lion smiled! (ENC)
b. Guess what? A lion smiled! (L2)
We turn our attention now to cases of transitive and ditransitive contexts, looking
first at wide focus transitive and ditransitive structures, both with and without previously
mentioned material. The results are shown first in Figure 6 with a discussion of the data
below.
Figure 6. Transitive & ditransitive verbs, wide focus and given information contexts
The data speak to the operation of the NSR, with NS on the object in the case of the
transitives and on the prepositional phrase in the case of the ditransitives for both
populations. While the L2ers especially favored this output the ENCs diverged from the
predicted results most notably in the case of the ditransitives. This difference is attributed
to one token in particular where ENCs consistently placed NS on the object. The token
was, “There is ice on the road”, where “ice” often received NS, likely due to the
unexpectedness or salience of ice being on the road which alters the context, as it has
implications for driving conditions, etc.
Wide focus Previously mentioned material
53
The transitive cases of previously mentioned material allow us to determine
whether or not the hypothesis and predictions related to A-Deacc (given in (21), (22)) are
borne out. The ENC group deaccented the O 82% of the time, placing NS on the V in those
cases. The L2ers deaccented at a significantly lower rate, only placing NS on the object
30% of the time (χ
2
= 29.36, p < .001). This same example context given in (15) is
repeated below in (29).
(29) SVO transitive, object previously mentioned
Q. Why are you buying that old stamp?
a. Because I collect stamps. (ENC)
b. Because I collect stamps. (L2)
Differences in prosodic patterns between groups also emerged in the case of the
ditransitives. ENCs deaccented the previously mentioned prepositional phrase, placing NS
on the object 80% of the time, whereas L2ers did so in only 22% of the cases, also a
significant difference (χ
2
= 36.24, p < .001). A sample token is given below in (30).
(30) SVOPP, PP previously mentioned
Q. Why are these notebooks missing their covers?
a. Because I’m drawing pictures on the covers. (ENC)
b. Because I’m drawing pictures on the covers. (L2)
So far it has been established that, as a group, some L2ers show prosodic patterns
that differ substantially from the ENC groups, specifically where the SNSR applies
(unaccusatives, OV transitive compounds), and in the case of the A-Deacc rule. The
behavior of those speakers who have acquired either Germanic NS or A-Deacc or both is
54
further scrutinized in order to address the timing of acquisition question. Figure 7 provides
a visual break down of the acquisition pattern of the relevant prosodic patterns, where “H”
represents a high proficiency speaker, and “I” a speaker with intermediate proficiency.
Here the nuclear stress score, shown on the x axis, and the A-Deacc score, on the y-axis,
was plotted for each individual.
Figure 7. Proficiency levels: L2 prosodic proficiency and Cloze test proficiency
All speakers with NS in their speech also tested at the high proficiency level. These
results contribute to existing literature regarding the relation between overall proficiency
and pitch accent placement. Ueyama & Jun (1998) also found a correlation between
general proficiency and the presence of pitch accents after focus, such that the less
advanced a speaker, the greater amount of PAs were produced post-focally. In the current
study, there are nine high proficiency speakers that have both English NS and A-Deacc in
their speech, seven high and three intermediate that do have A-Deacc but do not have NS,
and finally 10 high and 16 intermediate that have neither NS nor A-Deacc in their speech,
as established according to the criteria of having 5 or more out of 8 possible target-like
Nuclear stress score
55
prosodic patterns. Crucially, the opposite order of speakers with NS in their speech but
without A-Deacc is not found. These results are shown in Table 4 below where prosodic
proficiency is cross-tabulated with the Cloze test proficiency results.
Table 4. Proficiency levels: L2 prosodic proficiency and Cloze test proficiency
Target-like
NS
Target-like
A-Deacc
High
Proficiency
Intermediate
Proficiency
+ + 9 0
– + 7 3
– – 10 16
+ – 0 0
Hence the hypothesis and predictions in (21) and (22) are borne out: A-Deacc is acquired
before English NS. Furthermore, there is a correlation between A-Deacc and NS such
that A-Deacc can be predicted from NS; if a participant has NS they have A-Deacc.
With regards to A-Deacc, it can be speculated that the L2ers are successfully
substituting a pattern that has enough overlap in form to pass as “target-like” in the L2.
The idea that similarity of a form in the L1 and L2 is of help to the L2er would appear to
run counter to Flege’s Speech Learning Model (see section 2) and the experimental
findings that support it (see section 2, Mennen 1999). However, it is more likely the case
that speakers are, as Flege suggests, substituting a form from their L1, which in this case
is a form close enough for the substitution to successfully pass for the target L2 prosodic
form. Results from experimental data presented here would suggest that forms that
cooperate with existing organizational tendencies (similar in both form and function) do
so to a beneficial end, in support of the hypothesis in (21).
Figures 8 and 9 below that show the prosodic patterns for each language
population (unergGR refers to the unergative results in general, and unergNW to just the
56
noteworthy cases). By looking solely at the occurrence of a phrase-internal prominence
placement, it can be seen that English clearly demonstrates flexibility for prominence
placement, whose location is determined by the various principles discussed: SNSR,
CNSR, A-Deacc, noteworthiness.
Figure 8. Prosodic patterns of prominence for ENC population
Figure 9. Phrase-internal prosodic patterns for L2E population
What these results speak to is the observation based on the first of three
experiments that in English there are two modes of organization with regards to
prominence realization, and these modes can be selected systematically by means of
57
other variables such as noteworthiness, etc. However, the L2 population clearly
demonstrates a difference regarding prominence realization, where a unimodal
organization is seen as influenced by the system-specific organization of their L1
(Spanish) – despite the presence of the same additional variables that affect the behavior
of the ENC speakers. Throughout this chapter, the results have been described as a
categorical presence of one or more positions for NS realization, but the behavioral
indices (such as pitch and duration) that correspond to those distinct modes vary in a
continuous fashion. This is one of the challenges that researchers have consistently faced
when attempting to define prosody automatically in a corpus (Hasegawa-Johnson et al.
2005). Nonetheless studies have shown that there are significant differences between the
behavioral indices (pitch and duration) that are systematic correlates of NS. Here, we
attempt to understand the ability to make the transition to one of these NS modes as a
function of changes in the more microscopic variables controlling speech production
(rhythm). It thus becomes important to explicitly and theoretically model the relationship
between the continuous speech production processes that underwrite these distinct
modes. It is necessary to describe the English system of flexible modes and the Spanish
system of inflexible modes in a principled way that allows us to relate these systems to a
continuous measure of speech production.
5. A dynamic modeling approach to qualitative change
One approach in the literature that has attempted to make the explicit connection
between two distinct grammatical modes and continuous variation along physical
parameters is the cognitive dynamics approach laid out in Gafos & Benus (2006). The
58
theoretical model employed in bridging the continuous and the discrete is reviewed below
with the merited detail, which they applied to the examples of final devoicing in German
and vowel harmony in Hungarian. The key concepts that their model formalizes are a) the
use of a single parameter value within a fixed grammatical dynamical system to change
an attractor landscape from a single mode of behavior to two, and b) the modeling of
communicative intent as a dynamical system containing an attractor within the same state
space as the grammatical dynamical system, allowing the behavior to emerge from the
joint contribution of these two dynamical systems.
The models employed by the authors are within the class of first-order dynamical
systems (Percival & Richards 1982), which is described by the following differential
equation:
(31) x = f(x)
In this equation, x is the state of the system and f(x) is the force function. The equation
describes a gradient system where force is expressed as a derivative of a potential. In the
equation (32) below, V(x) is the potential and f(x) is the force function.
(32) V(x), x = f(x) = -dV(x)/dx
The authors illustrate this relationship with a graphic that is reproduced below in Figure
10. In the figure, the state space is the entire x-axis, in any application in complex
systems this will be an order parameter, the value that wraps up the relevant properties of
the complex behavior of the system. The round “particles” are examples of behavior
when the potential is varied. Here there are examples of stable fixed points, such as x
1
and x
3
that correspond to the minima of the potential. The force function f(x) is a
decreasing function of x, which in this case is represented by arrows which show a flow
59
toward those points. An example of an unstable fixed point is x
2
, and these correspond to
the maxima of the potential V(x). The force function is an increasing function of x around
these points, with the arrows of the flow pointing away. These unstable fixed points are
known as repellers, and the stable fixed points as attractors.
Figure 10. Dynamical system potential attractors and repellers, Gafos & Benus (2006: 4)
Because of the stable nature of the attractors, without noise this dynamical model
would predict that only two states of the system would ever be observed, x
1
and x
3
. In the
grammatical examples of Gafos & Benus, as well as in the application here and in most
real-life situations, systems exhibit a range of values around these stable preferred values.
This can be modeled as the effect of noise. Noise is a component of all dynamical
systems due to the complexity of behavior of a system with self-organization, i.e. there is
parallel activity of distinct properties at different levels. With noise comes less
predictability regarding the behavior of a given order parameter in a system. As a
consequence, the probability of finding the order parameter is limited to a given region of
values, where the probability is described as the probability density function multiplied
by the length of the region. One way to represent the computation of a probability density
is with the same kind of graph seen above in Figure 10. What this graph shows is the
60
probability that a given value will occur in the data, a representation of this is given
below in Figure 11. This graph shows the potential function V(x) that defines this
dynamical system, as well as the resulting probability density distribution. What is
immediately noticeable is that the probability of finding the state of the system around the
two attractors is high.
Figure 11. V(x) and it probability function p(x), taken from Gafos & Benus (2006: 5)
This particular dynamical system exhibits two stable modes, represented as
attractors. Gafos & Benus show that by changing one of the fixed parameters in the
equation, known as a control parameter, the potential function corresponding to that
system can shift from one with a single mode to a system with two modes and vice versa.
A given order parameter must be resistant to noise within certain ranges for a control
parameter value. That is, once the control parameter moves beyond a threshold value, a
qualitative change in the attractor state occurs. When a system undergoes observable
large changes as a result of the variation along a control parameter, this is known as
61
nonlinearity. When the scaling of a control parameter results in qualitative changes, these
qualitative changes are known as phase transitions.
This can be illustrated with the dynamical system defined by the following
equation (which is used by Gafos & Benus to produce Figure 11 above):
(33) V(x,k) = kx
2
/ 2 + x
4
/ 4 + C
Figure 12 below shows plots of different results as the value of k is varied, where k is a
control parameter. What is most noteworthy to point out is the qualitative change that can
be observed when k “passes through” a value of zero. The system shifts to a unimodal
state at that point, with a single attractor.
Figure 12. Potential as a function of control parameter k, Gafos & Benus (2006: 8)
62
What is crucial to capture at this point is that once k passes a critical value, a qualitative
change can be observed. Gafos & Benus employ this dynamical system with its phase-
transition properties in modeling some of the subtle effects of vowel harmony in
Hungarian, showing how changes in a single continuous variable related to the control
parameter can result in qualitative shifts in the system’s behavior. Here, this dynamical
system is adopted to model the different grammatical prominence placement preferences
for speakers of Spanish and English, where the same dynamical system with different
parameter value settings can yield the single mode pattern of Spanish (final) and the
bimodal (final, non-final) pattern of English. To anticipate, the process of acquisition of
the English prominence pattern will involve learning to shift the value of the relevant
control parameter.
5.1 The superposition of influences on the order parameter
An additional aspect of Gafos & Benus’ modeling that is key to our present
concern is modeling the combined effect of multiple factors by representing each by
using a potential function defined in the same state space. (The formal representation of
this communicative intent attractor is discussed in the next section.) The authors give an
example of this by providing an account of how this works in the case of final devoicing
in German. Since German has a regular process of final devoicing, there is a unimodal
potential function, which we will call the final voicing function, that describes the overall
tendency for a continuous value of the order parameter (degree of glottal abduction) for
voicing to be in the voiceless range for a stop in final position. However, the actual value
of this order parameter has been shown to be slightly shifted toward the voiced values
63
(less glottal abduction) when speakers produce a form with an underlying voiced stop.
For example, they show that in German the voicing property of a final stop, for example
[ʁat], exhibits small variations based on whether speakers “intend” to produce a voiceless
stop, corresponding to an underlying form associated with a particular meaning (/ʁat/
“advice”, nominative case), or intend to produce a voiced stop, associated with a different
meaning (/ʁad/, “wheel”, nominative case). They model this shift by postulating a
separate potential function representing the intention to produce a word with a final
voiced or voiceless stop. The potential function associated with the voiced intention is
assumed to be active when a speaker intends to produce a form with an underlying voiced
stop. By weighted addition of the voiced intention function to the final voicing function
they show that the resulting composite potential function has a mode that is shifted in the
direction of voicing when the intent is to produce a voiced stop. The key idea is that
multiple factors can be defined in the space of the same order parameter (in their case
voicing), and the overall system behavior can be modeled through creating a composite
potential function through the weighted addition of the individual potentials. Here, it is
shown how additional factors can shift the composite mode for a speaker of English with
a bimodal grammatical prominence function, but the same factors will fail to do so in
Spanish.
5.2 An analysis of English prosodic pattern acquisition
The position of prominence in the data of this experiment is modeled by means of
an order parameter, which is the relative prominence of non-final vs. final position. This
order parameter is in principle a continuum corresponding to the potential continuous
64
physical parameters that give rise to the perception of prominence. The presence of an
attractor, whose form is described below, causes the observed values to cluster in two
discrete ranges in English and a single discrete range in Spanish. In the data from this
experiment, prominence is not measured in a continuous way; essentially what is
measured is whether a response falls within the non-final or final range of the continuum.
In considering the overall results for the ENC speakers (see Figure 8 above in section 4.5)
prominence can be placed either on the final word or a pre-final word as a function of a
variety of conditions to be discussed below. But the relevant point here is that realizing
prominence in either of those locations is a stable pattern in English. To model this, an
equation known as the tilted anharmonic oscillator is used (Gafos & Benus 2006, Tuller
et al., 1994).
(34) V(x) = – Rx – x
2
/ 2 + x
4
/ 4
Spontaneous changes in category have been tested in the literature using this potential
function, to probe the notion that such changes are only possible if there is more than one
stable mode (Tuller et al. 1994, Gafos & Benus 2006). When the value of R (the
(a)symmetry parameter) in equation (34) is set to zero, the potential function graphed
below is obtained.
65
Figure 13. Potential function as plotted for English
The potential shows two attractors into which relative prominence can fall. One attractor
has a positive value, and the other a negative value. If relative prominence is defined as a
ratio of the prominence of the final word over the non-final word, then a positive value of
the log of that ratio corresponds to more prominence on the final word, and a negative
value corresponds to more prominence on the non-final word. Thus the two modes of this
attractor correspond to final vs. non-final prominence. This function will be called the
relative prominence potential (RPP). Thus if the hypothesis is that English-speakers’
prominence placement is guided by the dynamics of the anharmonic oscillator, with the
asymmetry parameter R set to zero, the fact that relative prominence can occur either
finally or non-finally without any particular preference in English – all things being equal
– can be modeled. In order to model the behavior of Spanish speakers, the identical
anharmonic oscillator equation is used, but with the value of R set to 1. The result is the
potential function shown in Figure 14 below.
66
Figure 14. Potential function as plotted for Spanish
There is a single attractor, with a positive value – indicating final prominence. However,
note that there is an “inflection” in the function at a negative prominence value as well
(indicated by the superimposed arrow in the figure), which means that there can be
greater than zero probability of finding non-final prominence depending on conditions of
noise and other factors. The formal model of prominence proposed assumes that there is a
fully-formed, non-dynamic representation of a sentence that includes parsing into
constituents, ordering, and primitives like argument and head. The dynamical
determination of prominence is thus a formally (modularly) separate process that does not
interact with the syntax. For the current cases, in which the RPP only models relations
between adjacent constituents, such as the short subject-verb phrases presented from
Experiment 1, this approach could be adequate. However, this modularity will not likely
scale up for more complex phrases where additional structural considerations at the level
of the syntax would be relevant, and also cases, like in Spanish, where the syntactic
ordering itself may be dependent on prominence and information structure. In future
versions of the model, a dynamical recasting of some aspects of syntactic structure itself
67
will be necessary in order to accomplish the relevant computations in a dynamical
system.
The bimodal nature of English prominence placement as modeled by the RPP,
with R = 0 can be observed most directly in the condition in Experiment 1 where the
speakers produced unergatives with wide focus. As shown in Figure 8, the results in this
condition show a 50-50 split in prominence placement in the case of the ENC speakers.
However, the results of the unaccusatives show close to 100% non-final prominence. The
question is how to model the unimodal preference in this particular context. As
mentioned before in section 4.2, the difference between the unergatives and
unaccusatives is that in the case of the unaccusatives the subject is the internal argument
of the verb, but this is not the case with the unergatives. Therefore, the potential function
associated with placing prominence on the argument is not relevant in the unergative
case.
Following the method employed by Gafos & Benus, a separate potential function
associated with placing prominence on the syntactic argument can be hypothesized. This
will be referred to as the argumenthood potential. The function associated with this
potential is given below in (35):
(35) V(x) = α(x-x
0
)
2
X
0
represents the value of relative prominence at which the function will have an
attractor, and α represents the relative strength of this potential when added to other
potentials. Here the value of α was set at .25. When the argument is non-final, the value
of x
0
is negative, implying a potential function with a single mode of non-final
prominence. The resulting composite potential function can be obtained by addition of
68
the RPP and the argumenthood potential. This can be seen in Figure 15 below. At the
upper left is the RPP, where the R = 0, as appropriate for English; at the upper right is the
unimodal argumenthood potential function, which has a negative attractor indicating
preferred non-final prominence in the case of an unaccusative sentence. The composite
potential has a single attractor at a negative value, indicating there will be a strong
preference for producing non-final prominence in this case as is consistent in nearly
100% of the ENC speakers’ productions.
Figure 15. Derivation of composite potential for unaccusative, ENC population
While the composite potential presented here maps quite well on to the actual results
obtained for the unaccusative verbs in the case of the ENC, an additional point should not
be overlooked. Here only the change of state and change of location subclasses of
unaccusative verbs were tested, and it is well known that as a class, the behavior of
69
unaccusative verbs is not uniform (Perlmutter 1978, Sorace 2000). A means of
accounting for a greater degree of variability than observed here as part of the current
version of the model is to include a focus potential as part of the overall composite
potential. The addition of a focus potential is particularly relevant in the case of the
unergatives where it was mentioned that these intransitives are structurally different from
their unaccusative counterparts, in that the subject argument is external to the verb
phrase. Unergatives were shown to be particularly sensitive to discourse focus factors
such as noteworthiness, which is an aspect that could be accounted for with the proposed
focus potential. Said potential would add a value to the overall composite function that
would be determined by the weight of focus for a given constituent. For example, in the
case of a wide-focus, all new information discourse context, the value for the focus
potential would be very low or almost zero, with an increase in value corresponding to
increase in information focus on one constituent as opposed to another. The focus
potential could be added to the RPP, much as was done for the Argumenthood Potential.
A similar solution could be adopted for the case of anaphoric deaccenting, where
it was argued above that the relationship is not one of relative prominence of adjacent
constituents but rather paradigmatic in nature. Thus, the function of the focus potential in
the case of anaphoric deaccenting is not one of selecting the constituent to receive the
major (nuclear) accent, but rather changes in the prominence level of the most nuclear-
accented constituent. That is, the function of anaphoric de-accenting is more
paradigmatic than syntagmatic. Thus, simply adding the focus potential in this case to the
RPP will not likely lead to the right solution, as in some cases it could lead to shifts in the
attractor state in which relative prominence settles. This suggests the need for some non-
70
linear interaction between the RPP and the potential relevant to anaphoric de-accenting.
The exact form of how their dynamics interact will have to be addressed in future work.
A similar solution would be relevant to cases of narrow focus, which are also not
currently modeled.
We now turn to the results of the L2 speakers in Experiment 1. In the case of the
L2 speakers, the results in Figure 9 (section 4.5) show that there are no shifts in
prominence based on the argumenthood relation operating in the case of unaccusatives. If
it is assumed that the L2 speakers have an RPP with R = 1 then if this is combined with
an argumenthoood potential with the same parameters with the English speakers (α = .25,
x
0
< 1) there is no resulting shift in the composite attractor. The prediction is that even in
this case, despite the argumenthood potential pushing the opposite direction there will be
a strong preference for final prominence placement. This can be seen in Figure 16 below.
Figure 16. Derivation of composite potential for unaccusative, L2E population
71
A question to ask is whether it is reasonable to expect that the L2 speakers have an
intention to produce non-final prominence on the argument, i.e. have they learned that
this is the relevant resource to mark argumenthood in English. However, what this model
shows is that even if they have learned this, it will still not be sufficient to shift their
prominence production if their overall prominence realization is still unimodal. By way
of metaphor, it might be in their minds, but it is not yet in their bodies.
In this way of modeling, in order to acquire this flexibility they have to change
this value of R in their RPP away from 1 toward 0. More generally, looking at Figure 9,
the various conditions manipulated in Experiment 1 had little or no effect on prominence
placement in the case of the L2 population. However, regular effects are observed in the
case of the ENC speakers for these conditions. In a manner similar to the unaccusative
case, intentional potential functions could be constructed associated with the various
factors relevant in those experiments such as anaphoric deaccenting and noteworthiness.
Especially in the case of noteworthiness, the relatively balanced 50-50 split seen
generally in unergatives shifts to a final prominence placement when the verb is
noteworthy. Thus an intention to produce final-prominence associated with
noteworthiness would also shift the English potential function when combined with the
RPP so that it would predict the overall preference for final prominence for this
condition. Again, none of these factors yield any systematic shifts in the L2 speakers,
which is predicted in this model due to their intrinsic asymmetry associated with the RPP
as determined by the organization of Spanish, their L1.
As described above, the order parameter here is assumed to be relative
prominence and for the sake of the current chapter the physical characteristic for that
72
parameter is deliberately left vague. In Chapter 3, the experiment undertaken will lend
support to the view that the relative prominence can be construed as relative duration of
final and non-final vowels, a parameter intimately tied to the language’s rhythm.
5.3 Discussion
As mentioned above, an attractor refers to a state or particular mode of behavior
that a system “prefers” (Kelso 1995). A system’s stabilization into a given attractor state
is guided by control parameters, which in the case of prosodic patterns are the specific
verb type being used (for example, unaccusative), the discourse structure (for example,
wide-focus context), word order, a speaker’s emotional state, speaker rate, and any
number of others that could exist. “… the patterns and changes … as they unfold in time,
are not prescribed by an external representation or plan; they emerge from the intrinsic
properties of the system itself for a specific control parameter value.” (Van Lieshout
2004: 4)
The order parameter associated with the attractor is the position of phrasal
prominence. The theoretical underpinnings for phrasal prominence optionality in English
(discussed in section 3.2) and the experimental results that confirm this (presented in
section 4.5) merge in the conclusion that English has a bimodal state for phrasal
prominence: phrase-internal or phrase-final. In English, the order parameter for an
attractor associated with a wide-focus context phrase with an unaccusative verb is phrase-
internal prominence. In Spanish, due to the specifics of the self-organization and the
nature of the control parameters of that particular system, the order parameter for that
attractor is phrase-final prominence. Thus Spanish is not bi-modal in terms of phrasal
73
prominence, however Spanish has complementary bistability in word order (see section
3.2.1, examples (11), (12), (13)).
In order for a native speaker of Spanish to move towards a different attractor
space, a restructuring of the self-organizing components within that system must take
place. It becomes increasingly more and more untenable to maintain the same modes of
organization engaged when speaking Spanish (the L1) while speaking English (the L2).
The very identity of the L2 input (differences in vowel quality, syllable structures, etc.)
forces an accommodation that brings in to question L1 modes of coordination whose
functionality and efficiency decrease as exposure to and use of the L2 increase. “… a
pattern will be maintained until it is no longer efficient … At that point, the system would
tend to move away from the existing pattern towards a more optimal pattern. These
behavioral states … reflect the presence of inherently preferred solutions or attractors,
induced by physical … and or functional constraints.” (Van Lieshout 2004: 5) An
example of this (explored concretely in Chapters 3 and 4), is that as vowel durations shift
in the production of the L2 speaker speaking English, this is the necessary change to
move the speaker’s system into a different mode of organization, which has repercussions
(sometimes favorable) at the phrasal level.
An equally important point is the reconceptualization of the notion “transfer” that
this analysis brings with it. Transfer is a cover term that researchers in SLA have made
blanket use of to describe such unrelated processes as “phonetic transfer” to “semantic
transfer”. While the desire to describe stages of L2 acquisition using a unified
terminology is understandable, it is suggested here that to get at the process of acquisition
is to speak of modes of organization – which is equally as applicable to any aspect of the
74
acquisition process (phonetic, syntactic, etc.). It is suggested here that an L2 speaker
whose behavior is described as “non target-like” is still engaged in the native language
mode of organization.
The hypothesis given in (19) has the merit that it isolates the nature of a
relationship (selectional ordering and argumenthood relations) and contrasts the resources
employed by speakers of English and Spanish to mark this relationship. However, it
references an end state, and given the nature of the research question and the experiments
designed to address it, a stronger hypothesis can be proposed that gets at the nature of the
process. The acquisition of a non-native prosodic pattern is the result of a restructuring
process in which the components of the system enter into new and different relations with
one another, and undergo reorganization, starting with the nested subsystems at the lower
levels, i.e. vowel durations at the rhythmic level. This can be summarized by the
hypothesis in (36).
(36) The acquisition of phrasal events (such as phrasal prominence) by second
language speakers will be preceded by the re-organization of events at the
rhythmic level.
This brings with it the following prediction:
(37) L1Spanish/L2English speakers with English NS in their speech will have acquired
the English-like distribution of the coordination of vowel durations.
In sum, if L2 speakers with NS but without English-like coordination of vowel durations
are found then there is support for the null hypothesis, that no relationship exists between
rhythmic and phrasal events such that the acquisition of the re-organization of the former
precedes that of the latter. However, if evidence is found to the contrary, namely that all
75
speakers with NS in their speech also have English-like vowel durations in their speech,
then there is support for the hypothesis that relates rhythm and prominence.
The hypotheses and data presented thus far speak to the existence of states: with
the hypotheses presented in (19) and (21) reference is made to an L2 speaker’s possible
state, taking into account the circumstances of her starting point (the L1). This approach
has provided a useful means of understanding not only the cross-linguistic differences
between an acquirer’s L1 and L2, but has allowed us to isolate the control parameters that
contribute to phrasal prominence acquisition. However, L2 acquisition is not a succession
of ordered states that track linear movement – in fact it is not a linear process at all.
Instead the change in speakers’ performance can be seen as a consequence of the re-
structuring of within-system organization. This does not challenge the existence of
relations that the SNSR and CNSR speak to, but rather integrates them as modes of
organization that form part of a broader process. This is a dynamic process where
behavior emerges from the interaction of component parts, behavior that is fed by the
intrinsic self-organization of its components as well as the external factors that serve as
input.
There are key differences that separate complex systems apart from systems in
general terms. A “simple” system is a set of components whose interaction yields some
overall form or state at a determined point in time, and the comprising components are
connected in predictable ways. A complex system also consists of components that
interact, but crucially a CS is open and adaptive: energy can come in to the system from
the outside, and it adapts to the context. A CS will adjust in response to changes in
context, and in fact it is through this adaptation that the order of the system can be
76
maintained. A CS is a non-linear system, non-linear in that the change is not
proportionate to the input – the strength of the cause is not directly proportional to the
strength of the effect. As Larsen-Freeman describes, “ … complexity arises from the non-
linear nature of the connections or interactions between the components of a dynamic
system. In a non-linear system, the elements or agents are not independent, and relations
or interactions between elements are not fixed but may themselves change." (Larsen-
Freeman 2008: 31)
One of the most crucial aspects of CS is its self-organizing nature. Self-
organization is the response of the dynamic properties of the system; it is not an
externally controlled process but rather how the interaction among components guides
the maintenance of order, where “global ordering can emerge from local interactions”
(Kelso 1995: 27). Changes in modes of self-organization can result in emergent behavior
of the system that is different from the behavior of the system before that point.
Finally, perhaps the attribute most crucial and relevant to the application of CS to
the study of L2 acquisition is a circular causality (also known as reciprocal causality by
some authors). Circular causality is an alternative to the direct cause-effect relations
(Larsen-Freeman 2008: 60) in that effects are found across the system, from both higher
and lower levels. It is typically assumed that lower level activity is constrained by
activity at higher levels, but this relationship is best conceptualized as a figure eight type
schema: the interaction of components at the lower levels feeds behavior at the higher
level, which in turn contributes to the activity of components at the lower levels. Figure
17 below depicts a possible representation of this relationship among interacting levels.
77
Figure 17. Complex systems schematized (Adapted from Chris Langton)
No doubt the reader has already begun to piece together how the above concepts
can be applied to the data presented in this chapter. Any and all languages qualify as a
CS: the behavior is a result of the interaction of components on multiple scales with
multiple dimensions, it is an open system that adapts to input from outside influences,
higher and lower levels enjoy a symbiosis, etc. But it seems even more appropriate to
look at the language of the L2 from a CS perspective, where a path can be traced through
the restructuring that takes place as speakers move from one mode of production to
another.
A speaker’s L1 is driven by its own initial conditions, adapting to external input
and changing contexts, with a self-organization emerging from its unique set of
components. In the case of the L1, there is generally speaking a consistency in terms of
initial conditions across individual speakers: the nature of the input, the different types of
input, the environment of development, the contexts of use, etc. However, in the case of
individual L2 speakers, these initial conditions can and do vary greatly from speaker to
78
speaker, and this variation can be held responsible for the wide-ranging individual
differences that abound in L2 research, as any change in initial conditions can lead to
large variations in the states or quasi-states of the CS that characterize a speaker’s
language and ergo that speaker’s observable behavior (Larsen-Freeman 2008: 57). As an
initial approximation to understanding how these individual differences give rise to
patterns of proficiency across L2 speakers, Table 5 shows L2 background results now
broken down by phrasal prominence proficiency. In this table the Cloze test proficiency
results are not given, as all L2 +NS speakers scored in the high proficiency range. While
the differences may appear minor, it is nonetheless observable that the L2 +NS have a
lower age of exposure on average, a longer overall average for years spent in an English-
speaking country, and an earlier age upon arrival.
Table 5. L2 background information by NS proficiency
L2 TOTAL L2 +NS L2 -NS
Average Range Average Range Average Range
Age at testing 34 19-55 37 20-52 31 19-55
Age at exposure 14 3-50 8 3-14 13 3-50
Years in country 6 0-28 7 0-22 5 0-28
Age at arrival 23 13-43 20 13-24 24 15-43
We return to the question of differences in initial conditions and individual L2
performance in Chapter 3 as part of a more detailed discussion of proficiency.
6. Conclusion
Any changes in behavior as a result of the acquisition process are the outcome of
the changes to the underlying dynamics. Or in Van Lieshout’s words, “… in a Dynamic
Systems Theory perspective actions do not stand on their own, but rather reflect a
79
coupling with the environment in which they occur. If we interact with our environment,
information is provided that specifies possibilities for further action … in relationship to
our own frame of reference.” (Van Lieshout 2004: 6) In this chapter the differences that
exist between English and Spanish for prosodic patterns associated with a particular
interpretation have been reviewed. Additionally it has been shown that some speakers
from the L1Spanish/L2English test population have achieved some degree of
restructuring that results in prosodic productions not unlike that of native English
speakers for the same contexts. In the following chapter the hypothesis and predictions
proposed in (31), (32) are explored, that this acquisition is fed by the underlying dynamic
processes of the coordination of vowel durations as part of larger rhythmic units, and
simultaneously other initial condition factors are examined that contribute to a greater
degree of “native-likeness” observed in a subset of the L2ers.
80
Chapter 3. Experiment 2: Rhythmic events
1. Introduction
In the previous chapter the differences in phrasal events between English and
Spanish for certain discourse contexts were laid out, and experimental findings from
ENC control and L2English test populations were discussed. A non-linear dynamical
model was proposed, whose instantiation provides support to the notion that there are
bimodal stable states in English (non-final and final NS) and a unimodal state in Spanish
(final NS, as evidenced in the NS patterns of the L2E speakers) for the same discourse
contexts.
A resulting hypothesis from that analysis further proposed that, as larger prosodic
events at the phrasal level are composed of interacting, component level events, the
acquisition of phrasal prominence (NS) is preceded by the emergence of certain rhythmic
properties. In specific terms, the focus is on how seemingly small differences in
distributing events in time (for example, differences in vowel durations) can result in
profound consequences for the system (in this case, phrasal prosodic patterns) (Kauffman
1995: 73-74). Before turning our attention to the investigation and testing of this
hypothesis, previous studies in the area of L2 rhythm acquisition are reviewed.
2. Studies of L2 rhythm acquisition
An extensive amount of both theoretical and pedagogical research has been
carried out in the area of L2 acquisition of word-level stress. However this section will
not review those findings for a number of reasons, principally because as demonstrated
81
very early in the literature on the subject by Bolinger (1965) and Huss (1978) word stress
is subordinate to overall rhythm in English. While the interaction between word order,
word-level stress and accent placement will be discussed in later sections and chapters,
the main focus of this section is the L2 acquisition of rhythm by speakers whose
languages differ typologically. Many of the seminal studies in this area reported on here
were conducted when linguists still held faith in the original “stress-timed” vs. “syllable
timed” classifications (Pike 1945, Abercrombie 1967). As will be discussed in some
detail in section 3, these classifications no longer hold under the same terms. The studies
presented here are nonetheless very applicable to the present study in that they were
guided by an attempt to get at the differences in vowel qualities and the acquisition
thereof across the particular languages under study.
Wenk (1985) examines the acquisition of English rhythm by native speakers of
French. The importance of Wenk’s work extends beyond this section on rhythm transfer
and acquisition. It is a preliminary attempt to connect phrasal and rhythmic events
through a cross-linguistic comparison, nested within a second language acquisition study
– much like the present work. The author descriptively compares the prosodic contours of
English and French and isolates distinct and separate characteristics of what he calls the
“rhythm group” (changes in pitch and duration of vowels) of each language’s “rhythm
curve” (resulting in a phrasal-level prosodic pattern). This analysis leads him to describe
English as having “leader-timed” rhythm and French “trailer-timed”. In describing the L2
acquirer’s “rhythmic interlanguage”,
he isolates the acquisition of vowel reduction as
key in moving from one type of rhythm to the other.
82
However, most works in this area, in contrast to Wenk’s investigation, are limited
to solely rhythmic factors. Adams & Munro’s (1978) oft-cited work is an extensive study
of the acoustic correlates of stress in sentence productions of native and non-native
English speakers (all non-native speakers in this study were speakers of “various Asian
languages” Adams & Munro 1978:129), and the repercussion of stress production for
rhythmic integrity. This investigation found that the most distinguishing aspect was not
the ability by non-native speakers to produce native-like durations of vowels in stressed
syllables, but their ability (or rather, the lack thereof) to produce native-like durations of
vowels in unstressed syllables – which where consistently longer in duration than those
of native speakers.
In their studies on how “the effect of stress and resulting variants of vowels
contribute to the rhythm of English”, Fokes & Bond (1986, 1989) measured the acoustic
correlates of stress of words in isolation and in sentence contexts of three native English
speakers and six native speakers of different languages (all considered at that time to be
“syllable timed”, including Spanish). In both studies they found that the most common
divergence from native-speaker rhythmic behavior was the duration of unstressed
syllables. The vowels of unstressed syllables were longer in the speech of the L2
acquirers, when compared to those of native speakers. The study is somewhat limited,
however, as there is no discussion of L2 proficiency or background, and only one speaker
per L1 language was used.
In a more recent work, Gutiérrez-Díez (2001) found that Spanish acquirers of
English likewise differed from English speakers in their durational values of stressed vs.
83
unstressed syllables. However, these conclusions should be interpreted with caution as
the reading materials across population groups were not well controlled.
The early part of this century has continued to produce a small cluster of studies
on L2 rhythm acquisition that all attempt to address the syllable vs. stress-timed debate in
a renewed light. Some of these works will be commented on in more detail in section 3 as
part of the discussion on rhythm measurement techniques, and a few that, as the ones
above, focus exclusively on the difference in vowel durations across native and non-
native speaker populations are reviewed below.
Carter (2005) found that the infrequent occurrence of vowel reduction in the L2
English speech of native Spanish speakers was the contributing factor to their “rhythm
score” in English (discussed below in section 3.3). Gut 2003 likewise found vowel
reduction and/or deletion to be a significant source of influence in L2 rhythm. The study
investigated L1 rhythmic influence and transfer by focusing on the difference in vowel
reduction and deletion across speaker populations including L1 speakers of English,
Chinese, and Romance languages (French, Italian, Romanian) learning German as their
L2. Results showed that L1English speakers reduced or deleted vowels in more contexts
in their L2 German as compared with the L1 German control, and Romance speakers did
not reduce vowels enough. In fact the author found vowel reduction by L1 Romance
speakers to be very rare in their L2 German data.
The evidential consensus is overwhelming: a major factor in the native-like
production of rhythm is the acquisition of those parameters governing the duration of
vowels, and the management of the fluctuation of durational properties as vowels
sequence into larger units. The current work adds to the existing data in two important
84
ways: the novel use of a technique for measuring vowel duration (forced text alignment;
see section 5), and by providing evidence for a tighter connection between rhythmic
properties such as vowel duration and phrasal prominence events.
3. Component events: rhythm
It is in fact misleading to even separate out these events – phrasal and rhythmic –
as though they were tangibly independent. Rather, these events form part of an integrated
whole, with the component rhythmic events (such as vowel duration) guiding the
realization of pitch events, which in turn determine the distribution of the component
level events (Tajima et al. 1997). However, in order to get at a clearer picture of the
differences in self-organization that underwrite prosodic production in English and
Spanish, the relevant component elements in the two languages must first be isolated and
then compared.
3.1 Rhythm in English and Spanish
As outlined in Chapter 1, rhythm refers to the regular occurrence of a beat event,
such that there is a perceived patterning of “heavy” (or strong) and “light” (or weak)
elements, and this perception results from the acoustic correlates, such as duration, pitch,
intensity, and spectral quality, associated with stressed versus unstressed syllables. The
nature of the processes that underwrite this regularity has been at the heart of research
detailing cross-linguistic differences in rhythm, that looks to uncover the mechanisms
involved in this idealized distribution of periodicity of stressed syllables. While
languages vary regarding the acoustic correlates of rhythmic units (Frye 1958, Roach
85
1983, Kager 1989, Hayes 1995, Ortega-Llebaria & Prieto 2007, inter alia), the cross-
linguistic commonality is that speakers of all languages perceive a regularity associated
with speech production. English rhythm is composed of a complex of a wide variety of
syllable structures (to be discussed below in section 3.2) that combine with vowels of
varying length that yield gradient degrees of stress (primary, secondary, etc.), and these
syllabic units combine with others to yield larger rhythmic units, i.e. feet – whose
inventory is likewise varied. Perhaps the most characteristic aspect of English rhythm, its
flexibility, has also proved the most challenging and at times elusive to comprehensively
capture, both in the theoretical and experimental realms. This regularity-preserving
flexibility in English rhythm has been formalized as the Rhythm Rule, and a number of
rule-based mechanisms have been proposed in the literature to account for the nature of
its application across different contexts (Liberman & Prince 1977, Hayes 1984, Selkirk
1984, Halle & Vergnaud 1987). The Rhythm Rule is claimed to be responsible for the
perceptions of stress shift in response to “stress clash”, i.e. two adjacent strong beats, a
maneuver that trumps the lexical stress pattern of a word and orchestrates the integration
of stresses into the overall rhythmic composition of a phrase. Example (38) below (as
cited in Hayes 1984: 33) is a representation of how lexically assigned stress is claimed to
shift in the context of an adjacent strong beat.
(38) Mississippi legislature → Mississippi legislature
There is considerable controversy in the literature as to its actual operation as a rule-
based mechanism that adjusts stress placement, or as a perceptual phenomenon. A
number of investigations do appear to provide evidence for a difference in stress patterns
between citation form and the same sets of words embedded in discourse (Grabe &
86
Warren 1995, Shattuck-Hufnagel et al. 1994). However, what has moved to occupy the
center of debate is the directionality of this adjustment: whether the perceived shift is due
to the unexpected presence of stress on a given syllable (Reversal Analysis) or the
deletion of stress on the expected syllable (Deletion Analysis) (Vogel et al. 1995). The
relationship between vowels as members of rhythmic units is key to the analysis
presented in this work, especially insomuch as English and Spanish differ regarding this
organization.
In Spanish, the syllable structure inventory is not as wide as in English, and there
is considerable debate as to whether or not the foot is an operative rhythmic unit in
speech in Spanish (cf. Quilis 1975, Harris 1983, 1992, Roca 1988, 1997). Furthermore,
not all languages exhibit the flexibility observed in English where overall rhythm is
concerned. In Spanish, stress clash is not avoided (Hualde, in press). The primary stresses
in compound words or across word boundaries do not shift in response to adjacency, as
shown in example (39) (as cited in Hualde, in press).
(39) a. sofá-cama b. sé poco
The nature of the differences in vowel duration, one of the acoustic correlates of stress, is
what underwrites this difference in rhythmic structure across languages: in English the
durational differences of vowels across different stress contexts is much greater than in
Spanish. In Spanish the durational differences of vowels in a (wide-focus) phrase is
minimal until the last, stressed, nuclear pitch-accented vowel whose duration is the
longest in the context of phrase-final lengthening. In contrast, in English the durational
differences of vowels in a phrase differ substantially, where the longest vowel is that of
87
the stressed vowel within the nuclear pitch-accented word – independent of phrase-final
lengthening.
Delattre’s comprehensive and pioneering comparative studies of vowels in
Romance and Germanic languages stands as some of the clearest and well-controlled
research within this domain. In Delattre (1969), results of an acoustic and articulatory
study of the difference in vowel durations in stressed versus unstressed syllables for
English, German, French and Spanish were presented, where it was found that English
was the language with the greatest durational discrepancy, while in Spanish the
difference was minimal. These differences in vowel length at the lexical level have far-
extending consequences for the rhythmic composition of both languages, as mentioned
above with respect to nuclear pitch accent realization. In section 4 the analysis of data
from Experiment 2 presents further results as to the differences in vowel duration
between English and Spanish.
3.2 Cross-linguistic rhythm classification
Rhythm is an organizing principle of speech that reflects the temporal factors of a
given language (Buder 1991, 1996; Culicover & Nowak 2003), and more specifically the
language-specific interaction of these factors. Researchers have long sought to formalize
along principled lines their perception of what aspects give rise to differences in rhythm
across languages. Lloyd James (1940) proposed that breath groups were responsible for
the “machine-gun rhythm” of languages like French and Spanish, as opposed to the
“Morse-code rhythm” of English. It was Pike (1945) who first suggested the terms
88
“syllable-timed” and “stress-timed” languages to refer to the aforementioned groups.
11
Pike (1945) and Abercrombie (1967) were among those who championed the notion that
inter-stress intervals for stress-timed languages and syllable durations for syllable-timed
were of equal length, i.e. isochronous. But numerous subsequent studies failed to provide
acoustic evidence for strict measures of isochrony. Subsequent analyses of these claims
suggest that in formulating their original hypotheses, Pike and Abercrombie were
responding to the perceptual differences in the distribution of vowel identity (vowel
durations, spectral quality, etc.) across languages (Lehiste 1977, Roach 1983), but that
this is not corroborated by actual production data in the same terms of their proposal.
It wasn’t until the work of Dauer (1983) that the complexity behind rhythmic
classification was detailed with any success. Following her widely-accepted analysis,
languages are considered to be organized along a continuum, ranging from more or less
syllable-timed to more or less stress-timed based on the language-specific distribution of
the following phonotactic properties: vowel reduction, syllable structure inventory, and
physical correlates of word-level stress. English, considered a par excellence example of
a “stress-timed” language, and “syllable-timed” Spanish are sharply distinguished by the
properties given below, taken from Dauer (1983):
a. English has vowel reduction, Spanish does not.
b. The syllable structure inventory of English is more varied than that
of Spanish: open syllables comprise 70% of total syllable types in
Spanish, as opposed to 44% in English; 60% of syllables are of the type
CV in Spanish, whereas only 34% in English.
11
For the sake of expository convenience, I will continue to make use of these terms throughout this work
even though their current instantiation no longer embodies the originally intended meaning.
89
c. The difference in the duration of stressed syllables compared to that of
unstressed syllables is greater in English than in Spanish: stressed syllables are
50% longer than unstressed in English, whereas in Spanish that difference is only
10%.
More recent attempts to provide evidence for rhythmic classification on the basis of
consonantal and vocalic interval variability have included a number of different
measurement techniques, which have all reported significant cross-linguistic rhythm
distinctions across typologically different languages. In what follows, a few studies are
reviewed that, in response to the idea that languages could be distinguished along the
abovementioned parameters, were designed to measure the acoustic correlates of these
rhythmic differences.
3.3 Studies in cross-linguistic rhythmic classification
Ramus et al. (1999, 2002) used Delta C and Delta V measurements, the standard
deviation of consonantal and vocalic intervals, respectively, in their cross-linguistic study
of English, Dutch, Polish, Spanish, Italian, French, Catalan and Japanese. So for example
in English both Delta C and Delta V values are large, whereas for Spanish both values for
these measures would be small.
A quantitative measure that calculates the patterning of successive intervals, known
as normalized pairwise variability index (nPVI), was the measurement technique adopted
by Grabe & Low 2002. nPVI was used to measure vowels and a raw pairwise variability
index (rPVI) for intervocalic (consonantal) intervals in their comparative study of Dutch,
German, British English, French, Spanish, and Japanese. Gibbon and Gut's (2001)
90
“Rhythm Ratio” employs a method similar to that of the previous authors in taking the
average of the ratio of adjacent syllables, in their comparative study of British English,
Nigerian English, and Ibibio.
All of the aforementioned studies report statistically significant groupings of the
languages under study, such that values from “stress-timed” languages (English, Dutch,
German) group together, as do the values from “syllable-timed” languages (Spanish,
French), with “mora-timed” Japanese constituting a third group.
An additional study by Prieto et al. (in press), whose findings are very relevant to
the current one, looks not only at differences in vowel duration as an indicator of rhythm
but also how these differences correlate with other aspects of prosodic structure such as
durational marking of prosodic heads and boundary domain effects. The authors
compared English, Catalan, and Spanish by using virtually all of the measurement
techniques mentioned above for vowels in contexts controlled for syllable structure and
vowel reduction. Results from the vocalic interval measure, the raw PVI measure, and
varcoV (the variability of vowels) all pointed to the conclusion that when syllable
structure is controlled, cross-linguistic differences in rhythm between English, Catalan
and Spanish remain. In addition, this study found that syllable durations were
significantly longer in Spanish at the end of an intonational phrase than syllable durations
measured phrase-medially or at the end of an intermediate phrase. In English the same
pattern was found, and English durations were longer than those of Spanish in all three
contexts. This study provides evidence of how rhythmic differences between these
languages can be understood in terms of vowel durations and how these are integrated
into the overall prosodic structure.
91
Dellwo et al. (2007) have advanced convincing arguments for the use of voicing
parameters (voiced vs. unvoiced intervals of speech) to classify languages according to
rhythm (this argument is presented in more detail below in section 2). The overall
percentage of voicing in speech (VO) and the variability of voiceless intervals (varcoUV)
were measured for English, German, Italian, and French. As expected, English and
German showed a higher value for varcoUV, due to a wider variety of syllabic structure
inventory that includes complex consonant clusters.
The studies reviewed so far were all based on cross-linguistic comparisons of
monolingual speech. Second language studies have also taken up the investigation of the
differences between monolingual and L2 rhythm. White & Mattys (2007) report on
several studies investigating the rhythm of L2 speech, specifically addressing the
question of what the value (voicing ratio value, rhythm ratio value, nPVI value, etc.) of
an L2 speaker would look like when compared with the value of the target language as
well as that of the L2 speakers’ L1.
Carter (2005) found that the L2English nPVI vocalic values from
L1Spanish/L2English bilinguals were intermediate between the low Spanish nPVI
vocalic value, and the high value for L1English. The author attributes this result to the
infrequent occurrence of vowel reduction in the L2English speech of speakers, whose
L1Spanish lacks vowel reduction.
Gut 2003 likewise found vowel reduction to be a significant source of influence in
L2 rhythm. The study investigated L1 rhythmic influence and transfer by examining the
difference in vowel reduction and deletion across speaker populations including L1
speakers of English, Chinese, and Romance languages (French, Italian, Romanian)
92
learning German as their L2. In English, vowels reduce or delete in unaccented syllables
in even more contexts than in German. In German simple words, reduced vowels only
occur in final syllables or in inflectional morphemes, whereas in English they can occur
in a wide variety of positions. The Romance languages included in Gut’s study do not
have significant vowel reduction. Based on the data obtained from Rhythm Ratio values
(described above), Gut indeed found evidence for L1 rhythmic influence in the L2
population speech. L1English speakers reduced or deleted vowels in more contexts in
their L2 German as compared with the L1 German control, and Romance speakers did
not reduce to a native-like degree; in fact, she found reduction by L1 Romance speakers
to be very rare in their L2 German data.
Results from these studies point to the same conclusion: L2 rhythmic production is
susceptible to influence from the rhythmic properties of the native language. The current
study also undertakes an analysis of the rhythmic properties of monolingual English,
monolingual Spanish, and L2 English, and more pointedly examines the patterns of
acquisition of the relevant rhythmic properties as they correlate – or not – with the
acquisition of phrasal prominence (NS). As hypothesized in Chapter 2, it is expected that
re-organization at the rhythmic level is a necessary precursor to re-organization at the
phrasal level. Experiment 2 was designed to address the rhythmic aspect of this
relationship.
4. Experiment 2: Methods
4.1.1 Participants
Of the total 35 ENC speakers that participated in Experiment 1, 23 participated in
93
Experiment 2. The L2 values given here reflect those of only 41 of the 45
L1Spanish/L2English participants due to one missing file and three corrupted files.
4.1.2 Procedure and materials
Participants were asked to read “The North Wind and the Sun”,
12
following
Experiment 1, the Q&A experiment. “The North Wind and the Sun” is a phonetically
balanced and widely translated reading passage that is commonly used for experimental
recordings. L2 participants were asked to read the passage in English as well as in
Spanish (“El viento norte y el sol”), in that order following the Q&A experiment. In
addition, 20 monolingual Spanish speakers were recorded in Spain for the sake of
comparison.
13
Participants were asked to read the passage(s) silently to themselves before
recording, and were encouraged to ask for clarification about the pronunciation or
meaning of any unfamiliar words.
4.1.3 Analysis 1: Measurements
A number of techniques used for measuring the acoustic correlates of certain
rhythmic properties were reported on in section 3.3. The advantages, soundness, and
superiority of a given technique over another are at the center of a heated, ongoing debate
in this area of investigation. While it is not a central aim of this work to contribute to this
debate, the data presented here do add to the existing literature on this topic (also, see
section 5 where an additional analysis using a different measure technique is presented).
12
Please refer to the appendix for the complete versions of the text in English and Spanish.
13
Many thanks to Ben Parrell for his willingness to record this group during his research trip to Spain,
sponsored by the Del Amo Foundation.
94
The motivation for siding with Dellwo et al.’s 2007 use of voicing parameters to classify
languages according to rhythm responds to both theoretical and practical considerations.
Among Dellwo et al.’s convincing arguments is the reasoning that if neonates are
able to distinguish rhythm classes among languages using acoustic information (a
neonatal ability reported in repeated studies in the literature; cf. Bertoncini et al., 1987;
Mehler et al. 1998, inter alia) then only very low frequency information is being
transmitted through the walls of the uterus, and this will only allow for crude distinction
between voiced and voiceless intervals. This is nonetheless sufficient to allow the
neonates to make cross-linguistic distinctions. Based on that, they consider the durational
variability of voiced and voiceless intervals in speech to be the relevant parameters for
distinction-based classifications.
Among the practical considerations falls the oftentimes-subjective criteria for
deciding where a vowel ends and a voiced consonant begins – criteria subject to inter-
coder variability. Another non-trivial factor is the time-consuming nature of these
manually conducted measurements. The classifications made by Ramus et al., Grabe and
Low, and Gut were obtained via hand-measurements of four speakers per language (for a
total of 32), 1 speaker per language (total 18), and 14 total speakers, respectively. The
manpower that an analysis of the data for 100+ speaker files (the ENC,
L1Spanish/L2English and monolingual Spanish participants recorded for Experiment 2,
see below) would necessitate is well beyond the scope of the current resources.
Following the same principles used by Dellwo et al. (2007), an autocorrelation-
based signal processing tool was employed to determine voiced and voiceless intervals
(vocDetect, http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html). Based on this
95
algorithm, the voiceless-to-voiced ratio was calculated by taking the sum of the duration
of the voiceless intervals over the sum of the duration of the voiced intervals. A separate
algorithm was developed for detecting pauses (as opposed to voicing intervals). Pauses
over 250 milliseconds were not counted as voiceless, thus eliminating the possibility that
speech pauses might erroneously be counted as voiceless intervals. Each recorded
passage was subsequently broken down into individual sentences, therefore a voicing
ratio for the passage was calculated as the mean of the voicing ratios for the mean of the
individual sentences that composed it. In addition, for each sentence the percent of the
sentence’s duration that was voiced was also calculated, as the sum of the voiced
intervals divided by the entire duration of the sentence, minus any pauses. As well, a third
measure was the standard deviation of the voiceless intervals for a given utterance. The
average across participant values yielded group values.
4.2 Results
The results for each major group (ENC, L2English, L1Spanish and monolingual
Spanish) are shown in Figure 18.
Figure 18. Voicing ratio: Monolingual, L1 and L2 groups
96
The results are in line with the previous findings regarding L1 rhythmic classification, as
well as the above-mentioned research that finds L2 rhythm values between those of the
target language and the L1. A one-way analysis of variance (ANOVA), revealed a
significant main effect of speaker group (F (2,79) = 50.94; p < .001).
Figure 19 below shows the percentage of speech that is voiced on the x-axis and the
standard deviation of voiceless intervals on the y-axis, following Dellwo et al’s
technique. As expected, there is more overall voicing for monolingual Spanish than for
English, and the variability of unvoiced intervals in English is greater than in
monolingual Spanish. The L2English group as a whole plots closer to the ENC values for
both overall voicing and voiceless variability.
Figure 19. Percent voicing and standard deviation of voiceless across speaker populations
97
Figure 20 below gives the results when broken down by prosodic proficiency
group. “+ +” refers to those speakers with both NS and A-Deacc, “- +” to those without
NS but with A-Deacc, and “- -” to those without either. What the pooled, group data
seem to indicate is that those speakers with NS in their speech have approximated the
English “stress-timed” rhythm. It further indicates that the appropriate correlation is
between NS and English-like rhythm and not A-Deacc and English-like rhythm, i.e. those
that have only A-Deacc in their speech do not show a similar approximation to English-
like rhythm.
Figure 20. Voicing ratio by prosodic proficiency breakdown
A one-way analysis of variance (ANOVA) was conducted on the ENC and the L2English
data. The voicing ratio values for ENC, L2 + + , L2 - +, and L2 - - were compared and a
significant difference was found (F(3,59) = 5.83; p < 0.001). A one-way ANOVA was
also done on only the L2Eng groups (+ +, - +, - -) and this difference was also significant
(F(2,41) = 3.72; p < 0.05).
98
Post-hoc Scheffe tests at the .05 level revealed a significant difference between the
ENC and L2 groups, and also between the ENC and L2 - - groups. There is also a
significant difference between the L2 + + and the L2 - +, but no significant difference
between the L2 - + and the L2 - -. Crucially there is no significant difference between the
ENC and the L2 + +. So, for those speakers with hallmark Germanic NS in their speech,
the acquisition of native-like English VR further separates them from the other L2
subgroups.
We now return to the plot of percentage of voicing and variability in the voiceless
measure, separating the L2English group out by prosodic proficiency, shown below in
Figure 21. Most remarkable is that those speakers whose NS is most native-like show a
standard deviation value on par with native speakers. Because this value for voicelessness
hides more than one property (aspiration, consonant-vowel timing, complex consonant
clusters, etc.), it is necessary to further tease this apart with additional analyses to ensure
an accurate assessment of which properties have been acquired. Such an analysis has
indeed been carried out on this data and is presented in section 5.
99
Figure 21. Percent voicing and standard deviation of voiceless for L2English groups
Based on the data presented in this section, the observation that there is a relationship
between the acquisition of NS and the acquisition of certain rhythmic properties, as
commented in the introduction of this chapter, does appear to hold. The following section
is a continued exploration of this connection, with a hypothesis as to its directionality.
4.3 The directionality of acquisition of NS and rhythm
In the previous chapter the following hypothesis (restated here as (40)) was put
forth:
(40) The acquisition of phrasal events (such a phrasal prominence) by second language
speakers will be preceded by the re-organization of events at the rhythmic level.
If it is true that rhythmic re-organization precedes that of phrasal, then the expectation is
that those speakers with native-like NS in their speech will also have native-like
100
organization of rhythmic properties. However, this does not exclude the possibility of
speakers with native-like rhythmic values but without NS, as that speaker’s system could
be in a certain stage of the re-organization process. But, given the hypothesis, the
unexpected direction is to find speakers with native-like NS but without native-like
rhythmic values. These predictions are formalized below in (41).
(41) Second language speakers with native-like prominence patterns
in their speech will also have acquired native-like rhythmic properties.
As a preliminary step in testing the above hypothesis and predictions, the Germanic NS
value was plotted against the VR value for each L2 speaker that participated in
Experiment 2. In Figure 22 below, the VR values are given on the y-axis, and on the x-
axis are the NS values, comprised of number of target-like NS responses (out of 8) for
OV compounds and unaccusative SV structures.
Figure 22. Scatterplot of nuclear stress and voicing ratio for L2 speakers
101
A trend is observable, especially when looking at how the values group together within
quadrants. In the right-most, upper-hand quadrant it can be noted that those L2 learners
with 50% or above target-like NS also show a corresponding high, target-like VR, i.e.
partitioning the data at that point reveals a correlation. However, looking at the NS values
50% on the graph, both high and low VR values can be observed. If the y-axis is turned
on its side, there is no point at which the data can be partitioned such that a meaningful
correlation is revealed. Thus, the prediction made in (41) is confirmed: it is not necessary
to have acquired Germanic NS in order to have a high VR, but a high VR is a prerequisite
for acquiring Germanic NS. Statistical measures likewise confirm that this is the case.
An overall regression analysis was performed to predict VR from NS. As expected
this overall analysis was not significant, given that for the L2-NS speakers there is no
relation between VR and NS (R
2
= 0.0558; F(1,41)= 2.3623; p = 0.1322).
However, as mentioned in the paragraph above, informally it was observed that for
high values of NS there was a predictive relation such that NS is predictive of VR,
whereas this was not true for low values of NS. On the other hand, there was no obvious
division of the VR continuum that would allow any predictability of NS in either
partition. This difference is a way of revealing the asymmetric relationship between NS
and VR: the emergence of rhythmic properties precedes NS acquisition. In order to test
this statistically, median partitions were performed on both NS and VR and then four
regression analyses were run, predicting VR from high NS, VR from low NS, NS from
high VR, and NS from low VR.
We found, as expected, that it was possible to predict VR from NS significantly,
only for the high NS partition: R
2
= 0.7557, F(1,19)= 52.58, p < 0.001. In Figure 23,
102
observe the regression line that fits the trend of the data.
Figure 23. NS high
However, for the low NS partition the prediction of VR was not significant: R
2
= 0.1350,
F(1,22)= 3.2767, p = 0.0846; see Figure 24.
Figure 24. NS low
103
In the case of the VR partitions, neither did the high VR exhibit a significant prediction
of NS: R
2
= 0.1018; F(1,21)= 2.1530; p = 0.1586, nor did the low VR exhibit a
significant prediction of NS: R
2
= 0.0009; F(1,21)= 0.0163; p = 0.8998); see Figures 25
and 26, respectively.
Figure 25. VR high
Figure 26. VR low
104
Before moving on to a summary of the results, a brief note is made about the
relationship between VR and A-Deacc. In Chapter 2 it was explained that A-Deacc is of a
different nature than NS, and that there is an overlap between the A-Deacc prosodic
pattern of English and prosodic patterns in Spanish for similar contexts. For these reasons
no predictive relationship was posited to exist between VR and A-Deacc. A quadrant-by-
quadrant analysis likewise confirms that at no point along the continuum is a predictive
relationship found – in contrast to the NS-VR analysis. In the case of the A-Deacc
partitions, neither did the high A-Deacc (R
2
= 0.1153; F(1,22) = 2.6071; p = 0.1221) nor
did the low A-Deacc (R
2
= 0.0042; F(1,23) = 0.0894; p= 0.7679) exhibit a significant
prediction of VR. Likewise, for the VR partitions, neither did the high VR (R
2
= 0.0617;
F(1, 22) = 1.3161; p = 0.2648) nor the low VR (R
2
= 0.0649; F(1, 22) = 1.3887; p =
0.2525) exhibit a significant prediction of A-Deacc.
In this section the prediction that the acquisition of NS is preceded by the
emergence of rhythmic properties has been tested, and in fact the data are consistent with
the hypothesis that the latter is a necessary condition for the former. Those L2 speakers
with native-like NS in their speech also had native-like rhythm values, as represented by
the VR. The analysis also revealed that while there are speakers with native-like VR who
have not yet acquired NS, the opposite order is not found: there are no cases in which a
speaker has NS but does not have native-like VR values. Perhaps even more so than the
well-behaved acquirers, those speakers who have native-like rhythm values but do not yet
exhibit native-like levels of NS in their speech are of great interest, for it is these speakers
whose system is in the throws of re-organization. Evidence of this is apparent when
Figure 22 is revisited, where at first glance it might seem out of place to observe speakers
105
with 3 or less NS points whose VR values are higher than those of speakers with native-
like scores. But in fact this is not unexpected at all. A complex system is expected to
exhibit fluctuating behavior as part of the reorganization process. In this case, the
exposure to and production of vowel durations, among other rhythmic properties, are the
conditions that seed the re-organization process, which in turn exhibits fierce fluctuations
in response to such vicissitude before stabilizing into a new mode of organization. These
observations will be fleshed out in the context of additional examples from the further
data analysis presented in section 5.
A more detailed analysis is in order because while the VR does give a general
indication of how rhythmic properties work together to shape a particular rhythmic value,
it is very global in its scope. The vowel is at the heart of the rhythmic composition, and
straining away the extraneous material affords a more exacting measurement of the
nature of its contribution to the overall composition.
5. Experiment 2: Forced Text Alignment
As a means of extending our analysis to a more detailed account of rhythm that
moves away from the more global VR measure, here an approach is adopted that allows
for the isolation of vocalic units in order to understand language-specific hallmarks,
which allows us not only to make generalizations about how rhythmic components vary
across languages but also compare these properties across speech populations. It would
seem that a return to isolating specific segments (or phones) is to bring the argument full
circle and side with those researchers who compare vowel and consonant durations as
part of their means of measuring rhythm. However, there are important differences
106
between that approach and the one adopted here, known as Forced Text Alignment (FTA)
(Nava, Tepperman et al. 2009). As mentioned in section 3.3 above, isolating the
boundaries (beginnings and ends) of phones by hand is not only time-consuming but is
subject to consistency problems between labelers. FTA is a technique commonly used in
Automatic Speech Recognition (Yuan & Liberman 2008) and once trained on the target
acoustic model the alignment is applied consistently by machine.
5.1 Analysis 2: Measurements
As all speakers read the same passage, “The North Wind and the Sun” in English or
Spanish, the target phoneme sequence for each audio file was known, but not the
segment-level boundaries. Vowels in the text were coded as having primary or non-
primary stress based on citation form. In addition, words were coded as belonging to
either the function or content class. The algorithm for finding segment boundaries
proceeded as follows. First, initial models were trained using the Baum-Welch embedded
re-estimation algorithm (Young et al. 2002). These preliminary models were used to
decode each target phoneme sequence in the data set, which allowed for optional pauses
at expected phrase boundaries. The resulting phoneme segmentation times were in turn
used to train new Hidden Markov Models (HMMs) from scratch, this time using the
hypothesized segmentation for model initialization with Viterbi alignment and the
embedded (iterative) re-estimation on each isolated phoneme. Then, these target
sequences were again decoded, and those new segmentation times were in turn used to
train new acoustic models. This process was repeated for 5 iterations, at which time a
sample were checked against expert human segmentation and found to be accurate.
107
A separate set of HMMs was trained for each of the three speaker populations.
There were a total of 10.4 and 10.7 minutes of speech available, for ENC speakers and
monolingual Spanish speakers respectively, and for all L2E speakers (both +NS and -NS)
there were 53.8 minutes of speech available, due to the fact that each speaker read the
passage in both Spanish and English. The fully trained monolingual English and
monolingual Spanish models served to initialize those of the L2 speakers, for whom the
pronunciation was highly variable and potentially drawing on both phoneme sets. For
these reasons, decoding the L2English recordings also allowed for expected English
pronunciations that reflected the influence of Spanish phonology. The recognition
pronunciation lexicon included variants derived from Spanish letter-to-sound rules
including, for example, the substitution of Spanish dental stops for English alveolar stops,
or the possible lack of English-like aspiration in syllable-initial voiceless stops. After
automatic alignment, all durations extracted from segments of interest were normalized
for speaking rate by dividing by the total duration. However, these automatic
segmentations were potentially inaccurate if the speaker paused at an unexpected place
while reading the stimuli. In those cases, the alignment would include the pause as part of
an abnormally long segmentation for the preceding phoneme. To eliminate these outliers,
any voiceless sequence over 250 milliseconds was considered a pause and subsequently
removed from the analysis.
5.2 Forced text alignment results and statistics
The duration of these different vowel types (in content words with primary stress,
content words with non-primary stress, and function words) were compared across
108
speakers for different language groups. As mentioned above in the rhythm review in
sections 2.2 and 2.3, English and Spanish differ regarding the vowel duration in stressed
versus unstressed vowels. Furthermore, the vowels in function words in English are
typically reduced, unless they receive stress in response to a particular focus context
(Inkelas & Zec 1990). In contrast, vowels in Spanish function words are never reduced –
due to the general lack of vowel reduction in this language. Due to this, it is expected that
English and Spanish will exhibit different distributions regarding content primary,
content non-primary, and function vowels. In Spanish, only the content primary vowel is
expected to be slightly longer than the other two vowels. However in English, the
expectation is that greater differences between content primary and non-primary vowels
will be observed, and especially between content primary and function word vowels. In
the case of the L2 speakers, those L2 speakers with native-like NS in their speech are
expected to have a vowel duration distribution not significantly different from the ENC,
whereas this is not expected to hold for those L2ers without NS in their speech.
(42) L2 speakers with native-like NS in their speech are expected to exhibit a
native-like distribution of vowel duration properties.
The results of the analysis are shown below in Figure 27. A two-way ANOVA was
first run on the three vowel types in English and the three English-speaking populations
(ENC, L2+NS, L2-NS), which revealed a significant main effect of both language
population (F = 34.6, p < .001) and vowel type (F = 210.16, p < .001), and a significant
interaction (F = 3.74, p < .001). Post-hoc Scheffe tests revealed that while content
primary vs. non-primary differed significantly for all groups, only the ENC and L2+NS
groups showed a significant difference between the content primary and function vowels.
109
Second, a two-way ANOVA was run on all the data, including Spanish. The
ANOVA again revealed a significant main effect of both language (F = 86.11, p < .001)
and vowel type (F = 75.54, p < .001), and a significant interaction (p < .001). A post-hoc
Scheffe test revealed that Spanish does not distinguish among any of the three vowel
classes, and further that the values for each of the values in Spanish was significantly
different from the values in the English groups.
Figure 27. Vowel durations by category across speaker groups
These results confirm the expected cross-linguistic differences regarding vowel
durations for English and Spanish, contributing substantial evidence as to one of the
factors comprising overall rhythmic differences. Additionally, the results add to existing
literature regarding these differences, yet have potentially offered an even more accurate
assessment of this difference due to the technique used.
Also confirmed is the prediction that L2 speakers with NS will exhibit native-like
distribution patterns where the different vowel types are concerned. Even though the L2-
NS speakers as a group have long content vowel duration vowels, and they make a
distinction between primary and non-primary vowels, they do not make a distinction
110
between content primary and function words. Referring exclusively to the group results,
while these speakers might have acquired certain vowel durations that move away from
their native system, they have not yet acquired the relevant organization of the in-context
distribution of these vowels. Their L2 system is still in flux and undergoing constant
fluctuation on its way to stabilization of a new mode of organization – a stabilization
already attained by the L2+NS speakers. Said stabilization is the necessary condition in
order for the phrasal mode of organization (phrase-internal NS) to be achieved. “The
emergence of new forms in a learner's second language and the degree to which such
forms are variable are determined by the processing skills available to the learner at each
stage. The dominance of certain patterns may arise through a gradual building up process
or through a period of fluctuation among competing patterns, followed by a phase shift in
the system when a certain critical threshold is crossed, and some wider reorganization is
triggered.” (Larsen-Freeman & Cameron 2008: 143)
Numerous SLA studies document vacillation in L2 production behavior, where
individual acquirers will demonstrate both target-like and non-target-like production
during the same stage of acquisition. Among the solid proposals to account for this
behavior is the Fluctuation Hypothesis (Ionin et al. 2004), stating that acquirers fluctuate
between parameter settings for article use until input leads them to settle on the
appropriate value for a given (in this case grammatical) form. It is argued here that
behavior of this type is characteristic of a system undergoing re-organization in response
a change in conditions, and that as input increases the system will stabilize into the target
mode.
111
A number of the works already referred to in section 2 of Chapter 2 make explicit
mention of the fact that the behavior of the L2 acquirer cannot be attributed to
interference from the L1 nor attributed to the L2. Backman (1979) expresses
consternation at the fact that non-native like intonation patterns produced by the
Venezuelan Spanish speakers learning English could not be explained by a direct
comparison with their L1 Spanish intonation contours for the equivalent question and
declarative contexts. Wenk (1985) alludes to the idea that a process of re-organization is
responsible for the “interlanguage of rhythmic group” observed in his group of L2
speakers. Among the most remarkable characteristics listed by Adams & Munro (1978)
of the L2 speakers under study was the observation that the native and non-native
speakers of English differed greatly in the distribution of stress at the sentence level, but
not so much in the acoustic correlates used to signal it. The production of stress at the
sentence level was anomalous and too frequent in the case of the L2ers. All of these
examples point in the same direction. In the case of the L2 acquirer, the entire system is
undergoing re-organization, and this dynamic nature is evident in the evolution of the
acquirers’ interlanguage. The attractors for phrasal and/or rhythmic production are not yet
stable or deep, and the vast fluctuation that these researchers witnessed bears testimony to
the response of a complex system as part of this reorganization process. Any
interlanguage is characterized by high degrees of instability, and the variation across ILs
(in the form of individual data) is the result of different factors contributing to initial
conditions, and this combination of factors can contribute to unique outcomes across
individuals.
112
5.3 Individual analysis
Turning now to the individual analysis, a vowel duration ratio (VowR) was
calculated for each English speaker (ENC and L2 groups); the mean of a speaker’s
content primary vowel duration was divided by the mean of that speaker’s function vowel
duration. In order to determine whether or not NS could be predicted from this
relationship, a step-wise multiple regression was conducted attempting to predict NS
from VowR and Voice Onset Timing (VOT). The definition of VOT and motivation for
including this analysis will be discussed following this initial VowR discussion.
The results of the regression revealed that VowR is a significant predictor of NS
only in the case of the L2+NS speakers (p < .001) (marked in Figure 28 below with
circles and labeled as L2HIGH), and 34% of the variance of NS was accounted for. This
predictive relationship between vowel ratio and NS means that the L2+NS shows a range
of variability in NS performance, but for this group only this performance correlates
significantly with their vowel ratios. Figure 28 below is a plot of this correlation. On the
other hand, there are a number of L2-NS speakers (marked in the figure with triangles
and labeled as L2INTER) whose ratio falls within the range of the ENC ratio values, yet
they have not yet acquired English NS.
113
Figure 28. Scatterplot of content-to-function vowel duration ratio and NS score
As mentioned above, VOT was included in the overall regression equation,
however the correlation results are quite different: VOT was not predictive of NS (p =
0.7) for the L2ers. See Figure 29 below.
Figure 29. Scatterplot of VOT mean and NS score
114
Voiceless consonants in English, /p, t, k/, have a positive voice onset time, meaning
that there is a positive lag time between the release of the consonant and the beginning of
voicing for the vowel. The same consonants in Spanish have VOTs near zero. English
voiceless consonants /p, t, k/ have VOTs around 60, 70 and 80 milliseconds, respectively,
whereas the same consonants in Spanish have VOTs around 5, 10 and 30 milliseconds,
respectively. VOT values have been used in a number of SLA studies as a partial
indicator of native-like phonological proficiency.
In particular, works by Flege and colleagues have closely mapped the differences in
VOT values between early vs. late bilinguals, as well as the variation of within-speaker
bilingual VOT values for both languages (Flege 1987, 1991). His data show that early
bilinguals typically exhibit native-like VOT values for both languages, while late
bilinguals typically do not reach native-like values in their L2. Sancier & Fowler’s (1997)
work on within-speaker VOT value variation agrees with that of Flege, namely that VOT
values in both of the bilingual’s languages drift in response to context and frequency, and
that there was evidence for influence of the L1 on the L2 values as well as of the L2 on
the L1 values. However, a number of researchers disagree with the claim that VOT is
strongly correlated with overall foreign accent proficiency (Majors 1987, Flege op. cit.),
in light of evidence that native-like VOT values and accent proficiency do not always go
hand in hand (Yavas 1996, Zampini & Green 2001, inter alia). Regardless of one’s
position regarding this debate, the relevance for the VOT measure for this study is clear:
whether or not an L2er is producing native-like VOT values is not predictive of their NS
behavior. And more precisely, the VowR is not functioning as a general indicator of some
overall phonological proficiency, but rather is an indication that the connection between
115
the NS events at the phrasal level and the vowel-based rhythmic events is specific.
Here the proficiency discussion is further extended with an analysis that takes into
account the various factors that contribute to a speaker’s initial conditions of acquisition.
Tables 1, 2, and 5 in Chapter 2 give information about speakers’ performance on a
standard proficiency test, their age at the time of testing, age at time of first exposure to
English, age at time of arrival to an English speaking country (where applicable). As part
of a step-wise regression analysis, the aforementioned factors were included along with a
break down of the type of L2 instructional input reported by each participant. The
possible categories included attending a bilingual school from kindergarten to high
school (this was the case for 9 of the 10 Paraguay participants), English classes in
elementary through high school (not part of a bilingual curriculum), English classes at the
university level, intensive English courses, private tutor, and “natural” context (no formal
classroom instruction). The analysis revealed that Cloze proficiency (p < .001) and age at
arrival (p < .05) were predictive factors of NS. Beyond a speaker’s general grammatical
proficiency, the age at which a learner is immersed in an environment where L2 input is
abundant is a determinant factor in their success at achieving native-like prosody.
6. Vowel adjacency distributions
So far in this chapter what has been said is that there is a correlation in the behavior
of the L2E +NS such that a development of increased reduction in vowel duration
correlates with a more English-like production of phrasal prominence. The ratios
discussed above give an overall sense of the range of duration observed in English and
Spanish, where it was shown that there is a predictive value between this ratio value and
116
the phrasal prominence performance in the case of the L2 speakers. In this section, these
developments are more formally linked by building on the mathematical model laid out
in the previous chapter.
Thus far measurements have included the durational values of vowels of different
types (content, function), which, as expected, show a much greater durational difference
in English than in Spanish. One way of gaining some understanding of the predictive
relationship described above is by describing the local variation in the duration of
successive vowels, i.e. the variation in duration of nearby vowels, which the previous
analyses have not yet approximated. Part of controlling the difference between final and
non-final prominence placement may involve the ability to regulate the relative duration
of the vowel in a final vs. a non-final word. Indeed, the results of the current chapter
suggest that one possible interpretation for the order parameter in Chapter 2, the relative
prominence potential (RPP), is simply related to vowel duration, however factors such as
pitch, which might also be relevant, are not considered here. Therefore the goal of this
section is to investigative the potential differences between English, Spanish, and the L2
populations in the patterns of sequential vowel durations. And to the extent that these
patterns show qualitative distributions across speaker populations, an additional goal is to
model these kinds of distributions with the dynamic models employed in Chapter 2.
To this end, for each speaker for each language a duration ratio was calculated of
every sequential pair of vowels in that speaker’s production of “The North Wind and the
Sun”. The resulting ratios were trimmed for outliers in which the ratio was greater than
10 or less than .1. The log of each ratio was calculated and the probability density
117
function of the resulting log values was determined.
14
Figure 30 below shows the
resulting probability density distribution of these ratios for each language group.
Consider the distribution for the monolingual Spanish speakers located at the top
left of the figure. The shape of this distribution is close to the normal distribution and it is
centered at zero, which is the value that indicates that the two sequential vowels are equal
in duration (i.e. their duration ratio would be 1, and the log of 1 is zero). The y-axis
shows the probability of the log of the ratio and it can be seen that a value right at zero is
obtained with a probability of .5, i.e. it is very likely. Top right shows the comparable
distribution for the ENC speakers, where it is immediately observable that this
distribution is of a different shape than the one for Spanish. Note that the sides of the
density distribution are somewhat jagged and there are two protrusions in the range from
1 to 2, and from -1 to -2, which are not seen in Spanish. In fact, the Spanish distribution
looks like it was drawn from a single normal distribution with a single mode, while the
English distribution appears to be the composite of multiple modes. One way of
evaluating this observation is through computing the deviation of the distribution from a
normal distribution whose mean and standard deviation are equal to the observed
distribution. The method for computing the deviation used here is the sum of the squared
deviations across the 100 points that correspond to the bins of the probability density
function. This means that the smaller the deviation score, the closer to normal the
distribution is. As seen on the right hand side of each graph within the figure, the
deviation in the case of Spanish is .0397, while the English is .0658. This is consistent
with the impression of the shape of the English distribution that looks like it harbors
14
The number of bins was 100, the width of a normal smoothing filter was .15.
118
multiple as opposed to singular modes. In addition, the overall distribution in English is
squatter and wider. While it is still the case that a value of zero is the most probable, its
probability is only about .35. The differences in the width of the distribution are captured
by the estimates of the underlying standard deviation, which in Spanish is .923 and in
English 1.04, i.e. the ratios are more variable in English.
Now turning to the behavior of the L2 populations, the expectation is that L2+NS
speakers would have already developed a distribution that would demonstrate the
multimodal character shown in the case of the English speakers, whereas the L2-NS will
not have developed additional modes and yet still appear normally distributed. Observing
the graphs in the lower half of Figure 30, these general expectations are shown to be
confirmed. The distribution for the L2-NS, while showing more variability than the
speakers of Spanish, is still largely normal in shape and could in fact be drawn from a
single normal distribution. In fact, the deviation from normal is actually slightly less than
that of Spanish, at .0343, although the standard deviation is .981, which is greater than
that seen in Spanish. In contrast, the L2+NS population on the bottom right shows some
of the jagged protrusions characteristic of English. The deviation from normal of this
distribution is considerably higher than that of L2-NS, increasing to .0586, and the
standard deviation is 1.00.
119
Figure 30. Probability density distributions for vowel ratios for all speaker groups
The bar graph presented below in Figure 31 gives an overview of the deviation from a
normal distribution, where it can be noted that the Spanish and the L2-NS pattern
together, as do the L2+NS and the English, with the latter groups as further away from a
normal distribution due to the presence of multiple modes.
120
Figure 31. Deviation from normal distribution for all speaker groups
Based on these results, it is reasonable to hypothesize that there is a local (in the
sense of adjacent vowels) rhythmic attractor for both Spanish and English but that the
attractor layouts differ in the two languages. The hypothesis of a unimodal attractor in the
case of Spanish would account for the fact that the distribution of the duration ratio of
adjacent vowels appears to be well fit by a normal distribution about equal vowel
durations. However, the attractor layout for English is likely multimodal. In particular
there seem to be additional modes where the log of the duration ratio is between 1 and 2
or between -1 and -2. These would correspond to vowel sequences where the duration
ratio is between 3 to 1, and 7 to 1 – a ratio that clearly can be operative in English in the
case of vowels in reduced syllables adjacent to vowels in full syllables. This
multimodality influences the choice of the control parameter in the prominence potential
function to a value that is consistent with placing relative prominence on the final or non-
final word. And complementarily, the unimodal nature around equality of the duration of
adjacent vowels limits its ability to place extra duration prominence arbitrarily on one or
121
another adjacent vowels. And this lack of flexibility is seen to be responsible for the
results where prominence cannot be retracted from final position. This interpretation is
furthered bolstered by the results of the L2 speakers. Those L2ers who indeed can place
NS non-finally, i.e. those who have developed a bimodal attractor in the RPP, are shown
here to have also developed a modal structure in the duration of adjacent vowels that
approximates that of English. However, those L2ers who have not developed this
bimodal attractor show a different distribution of the duration ratio, where their speech
shows a unimodal attractor distribution regardless of the fact that they are speaking
English. What remains to be completed within this analysis is to probe more precisely the
nature of the underlying attractor for the duration ratio, in particular in the case of the
English speakers (ENC and L2), which will be pursued in future work (see Chapter 5).
Unlike other measurement techniques such as VarcoV presented above, the
distribution modeled in this chapter gives a more complete picture of the systematic
structure in the pattern of variation of the relationship between vowel durations of
successive syllables, even when that relationship is not one of equality. But like VarcoV,
it focuses on the vowel durations themselves and is not conflated by the number and
nature of consonants and consonant clusters in the language.
7. Discussion and conclusion
This chapter has reviewed rhythmic production behavior of English and Spanish
monolinguals and L2English speakers as measured with a more global voicing indicator
(VR) and then forced-alignment vowel duration values and the vowel-specific ratio
measure (VowR), and finally the modeled vowel adjacency distributions. These
122
measurements provided an overall picture of the particular mode of organization that the
L2er starts with (Spanish) and the target mode of organization towards which the acquirer
is aimed (English). Furthermore, both the group and individual results provided insight as
to the nature of the acquisition process where rhythm is concerned, with critical
fluctuations being observed for those speakers who are not yet native-like. And most
importantly these results positively contribute to the hypothesized relationship between
elements in the prosodic hierarchy: reorganization at the rhythmic level is a necessary
condition for reorganization at the phrasal level.
However, neither are these differences in vowel duration isolated events, nor are
they rendered in isolation. They compose a speech act, a prosodic production, and these
must be contextualized within the event-making context, which is the focus of the
following chapter. As shown in the group results from the FTA in section 5.2, it is not
enough to have acquired English-like durational values of vowels, as evidenced by the
differences in vowel durations between the L2+NS and the L2-NS groups. While it is a
necessary condition to have acquired an English-like distribution of content primary
vowels of much longer duration than function vowels, the sufficient condition is to be
able to coordinate this discrepancy in duration in a meaningful way. Now, the hypothesis
given in (32), restated here as (43) is more pointedly addressed:
(43) L1Spanish/L2English speakers with English NS in their speech will have acquired
the English-like distribution of the coordination of vowel durations.
Chapter 4 presents a repetition task experiment designed to test the above
hypothesis in its examination of the nature of the organization of vocalic units into larger
123
prosodic units in an effort to further expose the organizational relationship between
prosodic events at different levels.
124
Chapter 4. Experiment 3: Coordination
1. Introduction
In the previous chapter it was shown that L2 speakers with English NS in their
speech also demonstrated a within-system distribution of vowel durations for content and
function words that were not significantly different from the within-system distribution of
the native English speakers. However, some L2 speakers without English NS in their
speech also had long content vowel durations and short function vowel durations, which
are expected targets in English. So while the appropriate ratio is a necessary condition for
acquiring the necessary asymmetry in order to create a phrase-internal prosodic pattern, it
is not a sufficient condition. The focus of this chapter is to test what is hypothesized to be
the sufficient condition: the coordination of adjacent durations into larger prosodic units.
This hypothesis is tested with a repetition task experiment, where language-specific
preferences for forming rhythmic units are expected toemerge in response to task
demands. A review of the literature relevant to this area of research is given before
moving on to the details of the experiment.
2. Coordination and rhythmic organization
This introduction to research concerning rhythmic organization and prosodic
nested subsystems begins with a review of work that has attempted to model this relation
and describe their consequences for the action that takes place as part of the speech act.
125
2.1 Theories and models of rhythmic organization
In their own investigation into the mechanics underlying cross-linguistic rhythmic
differences, O’Dell and Nieminen (1999) propose a dynamical, coupled oscillator model
of the differences between stress-timed and syllable-timed languages, in an effort to build
past the purely descriptive mathematical formulas used by Ramus et al., Grabe et al., etc.
(see Chapter 3). Examples of coupled oscillators that are used when modeling biological
rhythmic behavior posit the operation of subrhythms that, if observed in isolation,
demonstrate simple oscillatory behavior. However, when said subrhythms are folded into
larger systems where additional, present subrhythms exert influence on one another, the
pattern of rhythm that is yielded as a consequence can be more complex than that of the
individual oscillators that compose it.
Here the origin of the observation of coupled harmonic oscillatory behavior is
briefly mentioned. Seventeenth century Dutch physicist, mathematician, and horologist
Christiaan Hugyens was credited with many discoveries in his lifetime, among them was
the observation of what he termed an “odd kind of sympathy”. While working on a
design for clocks intended for naval navigation, he noticed that pendula mounted side by
side became synchronous with one another over time. This observation that single
oscillators with independent cycles could “couple” marked the first observation of
coupled oscillation.
The particular application of coupled oscillators for speech rhythm is tied to
O’Dell & Nieminen’s assumption that in every language there is a stress group oscillator
and a syllable oscillator and that the two oscillators are coupled by a function dependent
on the number of syllables per stress group. Each oscillator is modeled by a differential
126
equation, defining a dynamical system. The function of the stress group oscillator is to
keep the duration of the foot constant, and that of the syllable oscillator to keep the
duration of the syllable constant. The authors propose that the differences between
languages described as stress-timed and those described as syllable-timed lie in the
“relative influence” between stress groups and syllables, where stress groups “dominate”
in stress-time languages but not syllable-timed languages, i.e. the coupling from foot to
syllable dominates the coupling from syllable to foot.
The implication in terms of modeling is that in languages such as English where
stressed syllables are significantly longer than unstressed syllables, the oscillator
associated with the stress group would include a “stress function” whose purpose is to
slow down the syllable associated with stress at the relevant point in the cycle. A clock
whose range spans across individual syllable oscillations, in turn, allows for the modeling
of compression, or polysyllabic shortening, by which the duration of a stressed syllable
decreases in response to a larger number of overall syllables within a particular stress
group. However, in the case of syllable-timed languages such as Spanish, the stress group
oscillator, where the stress function is operative, would be virtually impotent, and as a
consequence the syllable to syllable durations remain relatively fixed across a longer
stretch of syllables. And as follows, the inertness of the stress group oscillator in the case
of Spanish also speaks to the lack of compression (polysyllabic shortening) as a function
of syllable number increase (see section 2.2 below for a discussion of experimental work
that addresses this question). While O’Dell and Nieminen do not explicitly model the
asymmetry in duration between stressed and unstressed syllables or the details of
polysyllabic shortening, their model does correctly predict the cross-linguistic differences
127
in the relative duration of feet as a function of differences in strength of coupling between
syllable and stress oscillators.
The approach to the general oscillator model by Barbosa (2000, 2001, 2002) is
similar to that proposed by O’Dell and Nieminen, expect for the explicit modeling of the
interaction between the stress and syllable oscillators and “higher level linguistic input
(semantic, syntactic, and lexical components) …” (Barbosa 2002: 4), on the one hand,
and gestures in the lexicon, on the other. In general terms, the greater the coupling
strength in the model between the stress and syllable oscillators, the more stress-timed a
language is.
The idea that the between-level coupling is responsible for cross-linguistic rhythmic
variation has also been explored in the work of Saltzman et al. (2008), whose models of
nested syllable and foot, and nested foot and phrase oscillators have reinforced the
hypothesis that these nested rhythmic systems can account for a prosodic hierarchy where
articulatory patterns are shaped by the coordinative relations among levels. This analysis
results in a representation of “an utterance’s central ‘clock’ as a mutually entrained
oscillatory ensemble” (Saltzman et al. 2008: 183).
In the analysis proposed by Saltzman et al. the stress function is actually modeled,
whereas O’Dell and Nieminen only suggest its operation and what it is purported to be
responsible for. This stress function is modeled with the use of a µT -gesture, whose
purpose is to modulate, and in this case slow, the rate of the overall flow of phase in the
syllable oscillators during the primary stressed syllable of polysyllabic words. This
allows for the symmetry of syllables nested within feet to be broken, yielding the
characteristic foot pattern of longer stressed than unstressed syllables. The authors also
128
present a similar model of polysyllabic shortening, where stressed syllables shorten in
response to the increase of syllables within a foot unit.
All of the work mentioned above in the area of modeling has focused on addressing
the relation among syllable durations in an adjacency context, and the resulting temporal
behavior. Below we review how this relation has been probed in experimental work.
2.2 Experimental work
This section begins with a discussion on polysyllabic shortening with examples of
cross-linguistic differences in this behavior. A largely circulated proposal in the literature
is that English shows polysyllabic shortening, also known as stress-timed shortening
(Beckman & Edwards 1990), of the stressed syllable when more unstressed syllables are
added to a particular rhythmic unit. Kim & Cole (2006) measured the interstress
intervals, mean vowel duration, and normalized vowel duration of native English
productions from the Boston University Radio News corpus and found that English
indeed shows shortening of the stressed syllable as a function of number of syllables in
the “within ip” (intermediate phrase) prosodic context.
Hoequist 1983 used a reiterant speech task to measure and compare syllable
durations in English, Spanish, and Japanese. English was the only language in the study
found to exhibit shortening of stressed syllables as the number of following unstressed
syllables in the word increased. Spanish and Japanese showed no such shortening effects.
Additional studies on Spanish also corroborate the finding that there is no reduction of
vowel duration in response to an increase in the number of syllables (Pamies 1999, Font
& Mestre 1991).
129
These and related findings have led a number of researchers to conclude that the
durationally-modulated foot is the relevant unit of timing in English (Tajima et al. 1997),
and that this unit mediates between individual syllable durations and broader phrasal
patterns. These are exactly the ideas motivating the work by Cummins & Port 1998, and
Cummins 2002 that explores rhythm as an organizational principle whose effect is in the
realm of prosodic structure (Cummins & Port 1998: 145).
In their 1998 study, Cummins & Port use a repetition task to probe the relation
between overall patterns and the components that comprise them. Participants were asked
to align the first word in the phrase with a repeating high tone and the last word with a
repeating low tone as part of the repetition task. The time from high tone to low tone was
continuously varied. Results from the native English speakers showed that they
responded to this continuous variation by exhibiting a stable preference for grouping
words into evenly spaced intervals that nested an integral number of times within the
phrase repetition cycle. They divided the phrase cycle into either two or three equal
intervals depending on the task parameters. “In the presence of repetition the speech
production system necessarily becomes coordinated, such that a higher level dynamic
emerges within which the timing of subordinate processes are constrained” (Cummins &
Port 1998: 23). However, results from native speakers of Italian and Spanish for the same
study (reported in Cummins 2002) revealed that speakers’ efforts were unsuccessful in
terms of task compliance, and they did not reveal any systematic differences as a function
of the duration from high to low tones. They most oftentimes demonstrated a single,
simple rhythmic pattern.“ … the obvious inference to be drawn was that the stress foot,
which enables English speakers to coordinate the relative timing of stresses within the
130
Phrase Repetition Cycle, was simply not available to these speakers as a unit, despite the
existence of lexical stress in their language.” (2002: 4) Thus it is possible to draw the
following conclusion based on Cummins’ report: the key difference between speakers of
English, Italian, and Spanish is in the flexibility to divide a temporal span into a variable
number of sub-intervals. Speakers of English can divide the span into either two or three
subintervals depending on the task parameters, but speakers of Italian and Spanish appear
to only be able to divide the span into two, although even this is not entirely clear based
on the scanty report of the results in Cummins 2002. Since none of the details of Spanish
are given, this possible conclusion is taken as a hypothesis to be tested in the present
experiment, in which what will also be evaluated is whether the development of this
flexibility in terms of dividing temporal units into intervals is a sufficient condition in L2
speakers to acquire nuclear stress.
This leads to the hypothesis that there is a systematic relation between the
flexibility to subdivide rhythmic intervals and the existence of an active foot oscillator.
As noted earlier, the organization into feet requires producing asymmetric syllables
within a foot-sized unit, and the flexibility to subdivide the rhythmic interval can be the
basis for, or a reflection of, this asymmetry. So for example in producing a rhythmic
sequence of iambic feet in English, the interval between feet can be seen in this case as
being divided into three, where the unstressed syllable occupies one of those three
temporal units, and the following stressed syllable occupies two. The allocation of
durational intervals in this way can also be seen as the coordination of adjacent vowel
durations. In Spanish, there is no stress related modulation of syllable durations, therefore
131
such flexible division is not operable, and there is effectively no coordination of adjacent
durations.
Due to this connection between divisions into multiple rhythmic intervals and
coordination of adjacent vowel durations, it should be possible to use a variety of the
Cummins & Port task to obtain a quantitative measure that reveals the operation of a foot
oscillator in English and the lack of it in Spanish. The goal of Experiment 3 is to obtain
that measure and to test whether the value of that measure for L2 speakers predicts their
nuclear stress behavior.
3. Experiment 3: Repetition task
Experiment 3 was designed to reveal and uncover the realization of the language-
specific organization of monosyllabic words into larger rhythmic units. Specifically, a
repetition task was used to probe the coordination of vowel durations given different task
demands, a syllable condition and a foot condition, and a variation in a control parameter,
speech rate.
The results from the data using the Forced Alignment Technique showed that
there were some L2 speakers with content to function vowel ratio values that fell within
the range of the ratios for ENC speakers, yet they have not yet acquired English NS (see
Figure 28, Chapter 3). The proposal laid out in Chapter 3 was that acquiring English-like
vowel durations, i.e. vowel reduction in the case of function words and non-primary
content word vowels, and longer content vowel durations, were the necessary rhythmic
conditions for acquiring English NS, but not a sufficient condition. Hence a hypothesis
was presented in (43) that is restated below as (44):
132
(44) L1Spanish/L2English speakers with English NS in their speech will have acquired
the English-like coordination of vowel durations, resulting in adjustments to the
successive syllables in the foot condition.
In concrete terms, this hypothesis proposes that L2 speakers will have acquired the
coordination of vowel durations into meaningful units, essentially the foot, that span
across rhythmic events that are broader than the syllable event in isolation. In order to test
this hypothesis, ENC, monolingual Spanish speakers, and members from both L2 groups
(L2+NS and L2-NS) were tested on a repetition task whose details are given below.
Participants responded to two task demands, one of which required synchronizing each
word of a two word phrase with a regular metronome beat (the “syllable” condition), and
a second, which required synchronizing the entire phrase with a regular metronome beat
(the “foot” condition). The predictions that correspond to these two conditions are given
below in (45) and (46).
(45) Speakers from all groups are expected to exhibit similar behavior in the syllable
condition, namely that the ratio of vowel durations for the two words in the phrase
is expected to have a value of around one. Relatedly, speakers from all groups are
expected to divide the interval of time between successive phrases into equal
intervals, one for each word of the phrase.
(46) In the foot condition, the ENC group is expected to form a foot, resulting in
vowel duration ratios either substantially greater than one or substantially less
than one, depending on foot pattern preference. Relatedly, such patterns involve
dividing the time between successive phrases into unequal intervals. Whereas the
Spanish speakers are not expected to vary their vowel duration ratios from what is
133
seen in the syllable condition, i.e. it should still remain around one, and divide the
phrase into two equal intervals. L2+NS speakers are expected to pattern with the
ENC speakers, and L2-NS like the Spanish speakers.
3.1 Methods
3.1.1 Participants
The third of a series of three experiments was designed and run after Experiments
1 and 2 had already been completed, in order to test the hypothesis that the coordination
of adjacent vowel durations into a larger rhythmic unit is a sufficient condition for
acquiring NS. Four members of the original ENC group participated as part of the
English control group, and eight of the original L2ers. Of the eight L2ers, four were +NS
and four were –NS. Three monolingual Spanish speakers of Mexican Spanish were
recruited for the experiment.
Due to the deviation in behavior of ENC speaker number 3 when compared with
the other speakers, particularly notable in the ratio and DtoT measures (presented in
detail below), this speaker was excluded from the group data results, presented below in
section 3.3. Data from only two monolingual Spanish speakers are presented in both the
individual and group results, as the data from one speaker were extremely variable and
this speaker did not seem to be reliably aligning productions with the metronome.
3.1.2 Design and stimuli
Three conditions, word pair condition, rhythmic unit condition, and speech rate
condition, were manipulated as part of the design. In the word pair condition, four
134
possible combinations of content and function words were employed, all of which were
homophonous, as given below in Table 6. The word pair condition was designed to see
whether a difference in lexical category sequence across the pairs would result in a
different relation between vowel durations such that for a function-content or content-
function pair the durational difference between vowels is expected to be greater than in
the case of pairs with words from the same lexical category. While it is expected that this
might be observed in English, since the vowels in function words are generally reduced,
this in not expected to be observed in Spanish.
Table 6. Word pair stimuli
Language
Function
Function
Function
Content
Content
Function
Content
Content
English do to do two due to due two
Spanish de te de té dé te dé té
The rhythmic unit condition varied along whether participants were asked to align
each word in the word pair with a single click, the syllable condition, or whether they
were asked to align two words with one click, the foot condition.
The speech rate condition refers to the time between the clicks used to guide
speakers’ production. In the syllable condition, each click occurred 400 milliseconds (ms)
apart for the first 40 repetitions (the “before” speech rate), after this point in the trial the
time between clicks was decreased by 5 ms for every subsequent repetition (for a total of
80 repetitions), with the final repetitions occurring 200 ms apart (the “after” speech rate).
In the foot condition, clicks occurred 600 ms apart for the first 40 repetitions in the before
135
condition, and increased by 5 ms until reaching 400 ms apart in the after condition (a
total of 80 repetitions).
3.1.3 Procedure
Participants were tested in two separate experiment sessions; one test session
corresponded to the foot condition, and the other corresponded to the syllable condition,
with one week between testing sessions. Half of the participants participated first in the
syllable condition, and half participated first in the foot condition. Clicks were
transmitted through a pair of standard earphones. Participants faced a computer screen
where the word pairs were presented, and their responses were recorded using a head
mounted microphone. Participants received compensation for their participation.
Both experiments were programmed and run using the Magda software (Tiede
2009). There were sixteen target trials, with four repetitions of each word pair described
above in section 3.1.2, Table 6. There were 16 filler trials, also representing four different
lexical category pairings each presented four times (see appendix for complete list), for a
total of 32 trials. Trials were presented in four blocks of eight trials each. There was a
pause, whose length was controlled by the participant, between each block. The order of
the target trials was randomized within block, and the experiment began with a filler trial,
with fillers flanking each target trial.
Instructions appeared on the screen before the beginning of each experiment. In
the syllable condition, participants were instructed that they would hear a repeating
sequence of clicks and that they should align each word with a click. Both words
appeared simultaneously on the screen, and a box that was synchronized with the click
136
highlighted each word in succession. In the foot condition, participants were instructed
that they would hear a repeating sequence of clicks and that they should align both words
with a click. Both words appeared simultaneously on the screen, and a box that was
synchronized with the click highlighted both words simultaneously.
3.1.4 Measurements
Broad phonetic transcriptions were made of each trial for all participants. The
same Forced Text Alignment (FTA) technique described in section 5 of Chapter 3 was
used to align these transcriptions to the speech recordings for the current experiment. The
segment-level acoustic models used to find boundaries between segments were the same
as those used for “The North Wind and the Sun” text. Based on these transcriptions, three
measurements were made on the vowels in the data set: the ratio of the duration of the
vowel in the first word of the pair (either one of the forms of “do” or one of the forms of
“de”) over the duration of the vowel in the second word of the pair (either a “to” form or
a “te” form), the duration of the interval between the release of the [d] consonant up to
the release of the following [d] consonant (this will be referred to as “DtoD”), and the
duration of the interval between the release of the [d] consonant to the release of the
following [t] consonant (“DtoT”).
3.1.5 Statistics
Separate two-way factorial analyses of variance (ANOVA) were run on the data,
one for the foot condition and one for the syllable condition, with language group as one
137
independent variable (ENC, monolingual Spanish, L2+NS, L2-NS) and speech rate
(before rate increase, after rate increase) as the second independent variable.
3.2 Results compared across speaker groups
A two-way factorial ANOVA with speech rate and lexical category trial type as
independent variables was run within each speaker group. Results showed that lexical
category pair was not significant for any of the speaker groups (see appendix for details).
Thus all results discussed are pooled across trial type.
The group results are presented here to determine overall patterns and differences
that characterize speaker groups and speak to the hypothesis laid out in the beginning of
section 3. There is noise in the data resulting in outliers that most likely reflect problems
with the algorithm as part of the forced alignment process. The data were therefore
trimmed at values determined by examining the box plots for individual combinations for
individual participants and conditions. All the trimmed values were outside the
interquartile range. The trimmed values are the same across subject but did vary
according to syllable or foot condition. The trimmed values are given in Table 7 below.
Table 7. Trimmed values for all speakers groups and task conditions
Speaker group Ratio
syllable
Ratio
foot
DtoD
syllable
DtoD
foot
DtoT
syllable
DtoT
foot
Monolingual English > 4 > 4 >1.3 < .35 > 1.7 > .7 > .7
Monolingual Spanish > 5.5 > 3 > 1.2 > 1.8 > .78 > .9
L2English > 4 > 4.2 > 1.47 >1.8
< .6
> .7 > .95
< .15
In all the figures presented below, the y-axes are given in milliseconds. As can be
seen in Figure 32, for the before speech rate in the syllable condition the ratio value for
138
all groups is just above one for the before speech rate: ENC mean = 1.014, Spanish mean
= 1.2085, L2+NS mean = 1.1218, L2-NS group mean = 1.4594. In the after speech rate,
the ratio values for all groups decrease, as expected, in response to task demands. In the
after speech rate, the values of the ENC, Spanish, and L2+NS remain around one (means
= 1.063, 1.164, 1.105, 1.377, respectively), meaning that they are keeping the durations
of the vowels equal in response to task demands.
There was a main effect of speech rate (p < .001) and a main effect of language
group (p < .001); there was no significant interaction (p = .314). The post-hoc Scheffe
tests revealed that the three English-speaking groups are significantly different from each
other both before and after. In addition, the Spanish speakers differed significantly from
English and the L2-NS but was only significantly different from the L2+NS in the before
condition.
Figure 32. Group results of ratio values for syllable condition
In the case of the ratio values for the foot condition in the before speech rate, as predicted
the emergence of the foot is indeed observable in English, where the second vowel of the
139
word pair is much longer than the first vowel in the pair (iambic pattern), which gives a
mean less than one (mean = 0.526). However, in the case of the Spanish speakers, the
vowels are again kept roughly the same length (mean = 0.968), as in the syllable
condition. The L2+NS as a group also shows the same before as in the syllable condition
(mean = 1.1165), but as will be shown in the individual results presented in section 3.3
below, three of the four did exhibit mean values below one. The L2-NS as a group have a
mean well over one (mean = 2.0129). While the details of this analysis are not pursued in
this section, it is speculated that this high mean is a result of greater variability in the
data, which likely speaks to the system being in a heightened state of fluctuation.
In the case of the ratio value for the foot condition, the two-way factorial
ANOVA revealed a significant main effect for language group (p < .001), but there was
no significant main effect for speech rate condition (p = .125). There was significant
interaction between language group and speech rate condition (p < .001). Post-hoc
Scheffe tests revealed that all speaker groups were significantly different from one
another in both speech rate conditions.
Figure 33. Group results of ratio values for foot condition
140
The DtoD measure is a way of keeping track of whether the participants
performed the task in the intended manner, and no particular speaker group differences
were predicted. For this measure in the syllable condition, all speaker groups are keeping
the monosyllabic words of roughly the same length, equaling the 800 ms between two
clicks (ENC mean= 0.80058, Spanish mean = 0.79224, L2+NS mean = 0.78686, L2-NS
mean = 0.80483). In the after rate there is an overall decrease in duration in response to
the decrease in time between clicks. There was a significant main effect of speech rate (p
< .001) but not of speaker group (p = .0197); there was an interaction (p < .001).
Figure 34. Group results of DtoD values for syllable condition
In the DtoD foot condition, the DtoD measure is twice as long as the click interval, which
is 600 ms: ENC mean = 1.204, Spanish mean = 1.205 L2+NS mean = 1.1 L2-NS mean =
1.2. Listening to the sound files reveals that even though the durations of the groups are
the same, all groups are producing one repetition of the phrase for every two clicks,
however as the data presented below from DtoT will show they are filling the interval
differently. There was a main effect of language (p < .001) and a main effect of speech
141
rate (p < .001), and an interaction (p < .001). Post-hoc Scheffe tests revealed that the
L2+NS have a shorter value from DtoD, which differs significantly from the other
speaker groups.
Figure 35. Group results of DtoD values for foot condition
An important difference emerges between rhythmic unit conditions in the case of the
DtoT measure. In the syllable condition, all speaker groups have values around 400 ms,
(ENC mean = 0.3514, Spanish mean = 0.4, L2+NS mean = 0.378, L2-NS mean = 0.402)
meaning that each monosyllabic word is produced within the click interval. There was a
main effect of language (p < .001) and a main effect of speech rate (p < .001), and an
interaction (p < .001). Post-hoc Scheffe tests revealed the following patterns of
significance for the before condition: ENC was significantly different from all groups,
Spanish was significantly different from all except L2-NS, L2+NS and L2-NS were
significantly different from each other. The differences in the after condition were
smaller and the significance patterns can be found in the appendix.
142
Figure 36. Group results of DtoT values for syllable condition
However, in the foot condition, a different pattern emerges. As alluded to above, speaker
groups are dividing the 1200 ms interval between two clicks differently. The ENC
speakers are producing both words within one click interval (as can be seen from their
DtoT value of approximately 250 ms), and then pausing on the next click. The Spanish
speakers are dividing the interval roughly equally (DtoT is approximately 500 ms), and
saying one word with each click, i.e. they are turning the foot task in to a syllable task.
The L2English speakers pattern with the ENC speakers’ strategy.
Thus, the ENC DtoT value is much lower than in the syllable condition,
consistent with the observation made above that the first vowel is shortened when
speakers form feet (mean = 0.206). In the case of Spanish, however, the value does not
lower, but rather is kept largely the same as in the syllable condition (mean = 0.4663).
The L2+NS value remains lower than 400ms, (mean = 0.39969), whereas in the case of
the L2-NS, the value is well above 400 ms (mean = 0.48506). There was a main effect of
speech rate condition (p < .001) and a main effect of language (p < .001); there was also a
143
significant interaction (p < .001). Post-hoc Scheffe tests revealed that all groups are
significantly different from each other in both the before and after conditions.
Figure 37. Group results of DtoT values for foot condition
The significant difference between the L2+NS and the L2-NS in the foot
condition shows the relation between the ability to form a foot in this rhythmic task and
their behavior in prominence placement. The L2+NS has significantly more shortening
from “do” to “to” than native Spanish speakers, although not as much as the ENC. On the
other hand, the L2-NS speakers shift the timing of “to” even later with respect to “do”
than do Spanish speakers. Again, this may reflect some kind of fluctuation in this less
proficient speaker group. Importantly, however, in order to test whether successful foot
formation in this rhythmic task is a sufficient condition for a speaker to produce native-
like nuclear stress, it is important to break down the L2 groups and look at the individual
results. Therefore, in the next section the data for individual participants for all speaker
groups will be broken down.
144
3.3 Individual results
The individual results for the ENC, Spanish, and L2English for all measurements,
ratio, DtoD, and DtoT, are presented and discussed below.
3.3.1 ENC individual results
The results for the ENC individual vowel duration ratios for the syllable condition
are shown in the figure below. As can be seen in Figure 38, two out of the four ENC
participants show a ratio of around one for this condition, meaning that the duration of
the first and second vowels were kept roughly equal before the rate increase and after the
rate increase. Subject number three has a ratio well above one in both the before and after
conditions, while subject one has a ratio just under one.
Figure 38. ENC individual results for vowel duration ratio in syllable condition
In the foot condition, whose results are shown in the figure below, all four ENC speakers
show a pattern that is consistent with creating a durationally-modulated foot: three
145
speakers show an iambic pattern (second vowel longer than the first), and speaker three
exhibits a trochaic pattern (first vowel longer than the second).
Figure 39. ENC individual results for vowel duration ratio in foot condition
In the DtoD syllable condition, all speakers produce both monosyllabic words within 800
ms in the before rate, as expected given that this is the time between two clicks in this
condition. In the after condition, the intervals are shorter, again reflecting shorter time
between clicks in the after condition, which on average is 700 instead of 800 ms.
Figure 40. ENC individual results for DtoD in syllable condition
146
All ENC speakers are producing 1200 ms between successive phrases in the before rate
for the foot condition. This pattern of production means that participants produced one
repetition of the phrase per two-click interval. In the after condition, this interval drops to
1000 ms, which indeed is the average time between every other click in the after
condition.
Figure 41. ENC individual results for DtoD in foot condition
For the DtoT measurement in the syllable condition, all four of the English speakers
show the same pattern, where the time between consonants is just under the time between
clicks, 400ms, which means that they are aligning each monosyllabic word with the click.
In the after condition, the interval is shorter, reflecting the shorter interval between clicks.
147
Figure 42. ENC individual results for DtoT in syllable condition
For the DtoT measurement in the foot condition, ENC speakers 1, 2, and 4 are producing
values around 200 ms in the before rate, which is about one third of the inter-click
interval for this condition, 600 ms. This means that they are indeed dividing the click
interval into non-equal sub parts in this condition. However, this is not the pattern
observed in the case of speaker number 3, who does not appear to be sub-dividing
interval, aligning each word to a click.
Figure 43. ENC individual results for DtoT in foot condition
148
3.3.2 Spanish individual results
Turning first to the ratio results for the syllable condition, both Spanish speakers
have ratios just over one for both the before and after speech rate.
Figure 44. Spanish individual results for ratio in syllable condition
In the foot condition, both speakers had ratios right around one for both the before and
after speech rates. This is in sharp contrast to the behavior of the ENC speakers for this
condition.
Figure 45. Spanish individual results for ratio in foot condition
149
In the syllable condition for the DtoD, both speakers are producing one word per click in
the before speech rate, as were ENC speakers.
Figure 46. Spanish individual results for DtoD in syllable condition
In the foot condition for this measure, both speakers produced one repetition of the
phrase per two-click interval, as did the ENC speakers. This is shown by the 1200 ms
between phrases in the before condition, and the 1000 ms between phrases in the after
condition.
Figure 47. Spanish individual results for DtoD in foot condition
150
In the syllable condition for the DtoT measure, both speakers produce this interval
between clicks, around 400 ms, in the before condition.
Figure 48. Spanish individual results for DtoT in syllable condition
In the foot condition for the DtoT measure, one participant is maintaining the same
interval as that in the syllable condition, around 400 ms. The second participant is
producing an interval of around 600 ms, which speaks to this participant’s strategy of
producing one word per click.
Figure 49. Spanish individual results for DtoT in foot condition
151
3.3.3 L2English individual results
What is of particular interest here is the relationship between the participant’s
behavior in the before rate in the foot condition and their nuclear stress behavior. In
particular, it is of interest whether all of the speakers who form feet the way that English
speakers (longer second vowel) do also exhibit nuclear stress, i.e. are +NS, and likewise
whether only speakers who form feet in this way exhibit nuclear stress. The measures that
revealed foot formation in English are the vowel duration ratio and the DtoT measure,
and those are the results that will be considered here.
The L2+NS speakers are participants number 2, 4, 7, and 8, and participants 1, 3,
5, and 6 comprise the L2-NS group. The data for speaker 7 was excluded in the syllable
condition due to a high degree of variability and problems with the forced alignment, and
speaker 6 did not participate in the foot condition.
For the syllable condition, the ratio values for the L2+NS speakers (participants 2,
4, 8) are on par with the values of the ENC for the same condition in the before speech
rate, which are around one. However the remaining speakers, all L2-NS show ratio values
greater than one for the before rate, which is also the pattern for the monolingual Spanish
speakers for this condition and speech rate. There was a main effect of speaker (p < .001)
and speech rate (p < .001), and there was an interaction (p < .001). All speakers were
different from each other and the +NS had lower values than the others.
152
Figure 50. L2English individual results for ratio in syllable condition
In the foot condition, the ratio values for L2+NS participants 2, 4, and 8 are considerably
lower than the rest of the L2ers in the before speech rate, and are likewise on par with the
ratio values of the ENC for this condition. The values of the rest of the participants are
well above one, as was the case with the Spanish speakers for this condition as well.
The ANOVA revealed a main effect of speaker (p < .001), but not of speech rate (p =
.727); their was a significant interaction (p < .001). All speakers were different from each
other and the +NS had lower values than the others.
153
Figure 51. L2English individual results for ratio in foot condition
In the DtoT measure for the syllable condition, all participants produced one word per
click during the before rate. With L2+NS speakers number 4 and 8 in particular showing
values lower than 400 ms, as was the case with the ENC speakers. The two-factor
ANOVA reveals a main effect of speaker (p < .001) and speech rate (p < .001), and there
is an interaction (p < .001). The post-hoc Scheffe results reveal that participants also
group according to NS behavior, with +NS speakers being significantly different from
-NS speakers, except for participant 8 who is significantly different from all other
participants.
154
Figure 52. L2English individual results for DtoT in syllable condition
In the foot condition for the DtoT measure, L2+NS speakers show the lowest values
within the L2 group in the before speech rate, with speakers number 2 and 4 in particular
exhibiting values lower than 400 ms, as do the ENC for this condition. There was a main
effect of speaker (p < .001) and speech rate (p < .001), and a significant interaction (p <
.001). Post-hoc Scheffe tests show that all the participants are significantly different from
one another, and that all the +NS participants (numbers 2, 4, 8) have values that are lower
than the other speakers, except for speaker 1 who has a value intermediate between the
+NS and the –NS values (5 out of 8 target-like prosodic patterns was considered +NS,
and this speaker had 4 out of 8).
155
Figure 53. L2English individual results for DtoT in foot condition
An examination of the individual results of the L2ers has revealed reliable
differences between the speakers in the foot condition. As observed with the post-hoc
tests, individual speakers give a continuum of values, and they divide into groups based
on their NS behavior in the foot condition, as clearly shown in the case of the DtoT
measure. In the ratio measure this pattern was almost the same except for participant 1,
whose intermediate value was closer to that of the +NS speakers. It should be noted that
participant 1 was on the edge of being considered +NS, as their nuclear stress score was 4
out of 8, and the cut off for +NS was 5 out of 8.
While there was also a significant effect in the syllable condition, this is
attributable to just one participant who is more variable than the others in the ratio
measure, and for the DtoT measure the speakers do not group into +NS and -NS in a
simple way. These results speak to something close to an “all and only” condition, i.e. all
L2+NS and only L2+NS have these values.
156
3.4 Modeling of sub-interval flexibility
Here the relation between flexibility in both the rhythmic and phrasal realms can
be observed by examining participants’ behavior regarding the flexibility of dividing the
temporal interval of the phrase cycle and their flexibility with nuclear stress placement. It
was hypothesized that a sufficient condition for acquiring nuclear stress was the prior
acquisition of the flexibility in coordination of adjacent vowel durations. This hypothesis
speaks to how as values change continuously along a microparameter, a change in the
macroparameter at the phrasal level is realized. We are now in a position to observe this
interaction with data from individual speakers. In order to examine the issue of the
different ways – or lack thereof – of dividing durations into sub-intervals in English, the
first step is to combine for any given subject the data from the foot and syllable
conditions in order to reveal whether they have bimodal or unimodal distributions for
these measures, as revealed by the proportion of a phrase cycle that is occupied by the
first word. This measure is calculated with the ratio of DtoT over DtoD.
Results from the individual monolingual English speakers are shown below in
Figure 54. Speaker three is again excluded from the analysis, as mentioned above.
Speakers 1, 2, and 4 clearly show a bimodal distribution, which speaks to differences
between the foot and syllable conditions in terms of the proportion of first word as part of
entire phrase cycle. These results show that speakers clearly can divide the interval either
symmetrically (syllable condition), or asymmetrically (foot condition) by beginning the
two-word interval a quarter of the way through the phrase cycle. Speakers can divide the
interval between phrases either into two or into four parts. The results from these English
157
speakers are consistent with the hypothesis that flexibility in the rhythmic domain is
mirrored in flexibility in the phrasal domain.
Figure 54. Distribution of English individual speakers ratio of DtoT over DtoD
In the case of the Spanish speakers, the expectation was that they would not
exhibit the same degree of flexibility as the English speakers, and would divide the
interval only into two parts. For Spanish speaker 2, the distribution is clearly unimodal,
speaking to the fact they have one way of dividing the interval. However, the other
Spanish speaker does appear to exhibit bimodal behavior. No speakers were pre-screened
regarding musical ability. In the debriefing period after the second test session, Spanish
speaker 1 mentioned that as a musician, he much enjoyed the task because of its
similarity to musical entrainment techniques. Thus, it is hypothesized that in the case of
this speaker his behavior was more a reflection of treating this as a musical task than as a
speech task, leveraging their musical background as a strong ability to divide sub-
158
intervals flexibility. And of course there is no English data for this participant, but it
could be hypothesized that this speaker would be a good candidate for early acquisition
of flexible phrasal prominence. Based on what is known about the background of the
participants, this was the only participant that could be considered a serious musician.
Cummins & Port (1998) also found that there was a correlation between participant
behavior in the repetition task and background in music. Those authors did not prescreen
participants but found that their performance in the task grouped participants in terms of
musical background, and as a result the authors then ran a separate experiment with
musical background as a condition. Thus, it is reasonable to maintain that Spanish is
linguistically inflexible with regards to this microscopic parameter.
Figure 55. Distribution of Spanish individual speakers ratio of DtoT over DtoD
Results for the L2 speakers are shown in Figure 56 below. Again, the data for
speakers 6 and 7 are not included here because in both cases there is only data from one
of the tasks. As can be seen in the figure, participants 3 and 5 exhibit unimodal
distributions, while participants 1, 2, and 4 exhibit bimodal distributions. The hypothesis
159
that acquiring this flexibility is a sufficient condition for acquiring flexibility at the
phrasal prominence level receives some support here. Speakers 2 and 4 have clear
bimodal distributions and are also speakers with flexible nuclear stress placement.
Speaker 1 shows a bimodal distribution, and while not classed as +NS as mentioned
above, this speaker was on the border, with a 4 out of 8 nuclear stress score. Additionally
it must be considered that there was at least one year between the two tasks (the Q&A
dialogue task and the repetition task), and this speaker continued to live in an English-
speaking environment and possibly increased generally in proficiency. Speakers 3 and 5
show the unimodal distribution and are among those speakers without flexible nuclear
stress placement. Speaker 8 indeed has flexible nuclear stress but does not show a clear
bimodal distribution. On the one hand, this is not a counter example to the hypothesis that
this is sufficient condition, but it would be a counter hypothesis if the condition were that
it is a necessary condition. There may be other variables or control parameters that lead to
the bimodal distribution of phrasal prominence. Additionally, however, even though this
speaker does not exhibit bimodal distribution, the speaker clearly has flexibility in
dividing the interval into a different number of ways, although not necessarily discretely
in two ways. Note that the distribution of speaker 8 is wider than that of speakers 3 and 5
(which are narrowly distributed around .5), so that speaker 8 has a large proportion of
observations that have values substantially less than .5. An additionally potentially
relevant fact about speaker 8 is that they revealed only after the repetition task that they
are a bilingual speaker of Catalan, but did not share this information at the time of the
Q&A. Catalan, like English, has vowel reduction (although more limited in scope than
English) but, unlike English, does not have flexible nuclear stress. All of the hypotheses
160
that have been put forth thus far have been based on duration, i.e. other factors such as
fundamental frequency have not been examined as part of this study. So it is possible that
Catalan already has a system of prominence not based on the duration in the same way as
English, which leads to a situation where it would be difficult to predict Catalan
speakers’ performance with regards to this task.
Figure 56. Distribution of L2English individual speakers ratio of DtoT over DtoD
4. Discussion and conclusion
In English, there is flexibility at the microscopic level in dividing a repeating
temporal interval in more than one way, and at the macroscopic level there is flexible
phrasal prominence location. On the one hand, there seems to be a predictive relationship
between these two flexibilities in acquisition, such that if you have flexibility at the
161
microscopic level you should also have flexibility at the macroscopic level. But the
question is why this relation should exist, and how it can be formally captured.
Relative prominence placement in English occurs both phrase-finally and with
non-final constituents, to the extent to which being able to place prominence in more than
one location implies different ways of dividing the temporal ways where the prominence
finds itself. The differential prominence placement requires dividing the temporal
intervals associated with the words or syllables in different ways. The relative time at
which the amount of the sentence cycle allocated to the prominent word when it is non-
final is different than when it is final. Different ways of assigning prominence require
dividing the sentence into temporal intervals in different ways. If the temporal interval
can only be divided in two ways, as in Spanish, then it is not possible to get the patterns
of discrepancy that allow for the non-final phrasal patterns.
In theory, the sentence from the Q&A experiment presented in Chapter 1 could be
divided into temporal intervals where different patterns of intervals in the final and non-
final cases could be seen. This can be formalized by again returning to the value of R in
the RPP parameter, which in English has a value of zero. One way of enforcing this
relation between the microscopic and macroscopic phrasal potential function is to
propose there is also a nonlinear dynamical attractor at the microscopic level which in
turn is governed by a parameter like R, which we could call Rint (R interval). But the
actual development of the formal model of these distributions is not attempted here. The
hypothesis is that one of the determinants of RPP is the value of Rint for the potential
function. There is a yoking of the control parameter in the phrasal parameter at the level
of the durational interval division (the temporal subdivision) attractor. Again, the
162
quantitative part of working this out in detail is not pursued here. But if this were the only
function governing RPP then this could not account for the results of L2 participant 8, for
example. Therefore, it may be necessary to either assume that there are other variables
upon which RPP can also depend, or whether the relevant parameter is not the modality
but rather the percentage of observations substantially below .5.
The hypothesis that was tested here with Experiment 3 was that the sufficient
condition for acquiring nuclear stress in English was the coordination of adjacent vowel
durations. It was shown that in the phrase repetition task, the ENC speakers divide the
temporal interval between two clicks into quarters, and in the first quarter they produce
the first monosyllabic word, for example “do”, and the second word, “to” is produced in
the middle two quarters, and the final quarter is the rest before the next production.
Another way of looking at this is that the ENC are putting the two words together
between the two clicks, which is good evidence that they are controlling the coordination
of these events in a joint way. This is a very different pattern than was observed in the
case of the Spanish speakers, who are producing one word per click, and essentially only
dividing this same between two-click interval into two parts, as opposed to 4 in the case
of the ENC. An additional and relevant difference is that the ENC are making iambic feet
(second vowel longer than the first), while the Spanish speakers are forming trochaic.
This difference actually provides additional support to the overall hypothesis because if
there were no evidence that the Spanish speakers could form trochees, then the argument
could be made that Spanish is incapable of any kind of non-final prominence at all –
which is clearly not the case.
163
The results presented here do seem to provide some support for the hypothesis
that the coordination of adjacent vowel durations is a sufficient condition for the
acquisition for nuclear stress in English, while at the same time indicating that this might
not be the only sufficient condition, i.e. that other paths could exist to nuclear stress
acquisition, as evidenced by the results from speaker 8 presented above.
164
Chapter 5. Conclusion
The experimental data presented in this work contributes to our understanding of
the acquisition of target-like nuclear pitch accent placement and prominence patterns.
The advantage of the current work is in large part due to the successful isolation of the
components that contribute to the specific patterns under study, and more importantly
how prominence placement interacts with other aspects of temporal organization. Support
was found for the hypothesis that English monolinguals and second language English
speakers whose first language is Spanish would differ regarding prominence placement
for certain wide-focus contexts, due to the difference in the modes of organization for
prominence placement realization in English and Spanish. It was then hypothesized that
those L2English speakers whose prominence placement was English-like would
necessarily exhibit certain rhythmic properties characteristic of English rhythm. While
support was also found for the necessary condition hypothesis, it was further proposed
that a sufficient condition for the acquisition of nuclear stress in English was the
acquisition of the coordination of durational properties of adjacent vowels. This
hypothesis was partially borne out, although it was also suggested that there could be
other sufficient conditions that compose the path to nuclear stress acquisition.
One of the properties of self-organizing systems much touted in this work is that
moving from one native language to competence in the L2 involves a process by which
the speakers may show increased fluctuations with respect to any of the stable
distributions that characterize either their L1 or the L2 they are moving towards. While
this is a general property of self-organizing systems it is important to see how this
property plays out in the specific modeling of the phrasal prominence and its relation to
165
temporal interval decomposition that has been developed in this work. In the model
developed here, the critical control parameter with respect to phrasal prominence that
differentiates ENC and less advanced L2 speakers is the relative prominence potential,
which shows a bistable potential function for L1 speakers of English and a unimodal in
both Spanish and less advanced L2 speakers. It is therefore important to show that as the
parameter that controls the shape of this potential shifts from the unimodal value to the
bimodal value, that the distribution of relative prominence for utterances of exactly the
same type would show fluctuations from instance to instance.
It was proposed in Chapter 4 that the process of acquisition of English-like
phrasal prominence can be modeled as a gradual evolution of the R parameter that
controls the symmetry of the Relative Prominence Potential (see Chapter 2, equation in
(34)) from the Spanish-like value of 1, defining a unimodal potential function to an
English-like value of 0, defining a bimodal function. The evolving value of R is set, in
turn, by the speaker’s rhythmic behavior, i.e. the ability to flexibly subdivide temporal
intervals, as discussed in Chapter 4. To demonstrate the effect of this ongoing evolution
on the stability of the system at any stage, the value of R was manipulated in a series of
five steps from the Spanish-like value of 0 to the English-like value of 1. At each step, a
composite potential function was calculated, by adding the RPP to the Argumenthood
Potential (Chapter 2, equation in (35)) whose X
0
parameter is set to -1 for the
unaccusative construction, so as to attract the relative prominence to a negative value,
indicating greater prominence of a non-final constituent. (Recall that the order parameter
for the proposed dynamical system is the log of the ratio: (prominence of the final
constituent) / (prominence of the internal constituent)). When the log is negative, the ratio
166
is less than one, meaning that the prominence of the internal constituent is greater than
the prominence of the final one). The relative weight of the Argumenthood Potential α
was set to .26, which is about a quarter as strong as the RPP.
The top panels of Figure 57 show the composite potential functions for each step.
When R = 1, the weaker Argumenthood function is not able to budge the strongly
asymmetric RPP, so the composite potential exhibits a final prominence attractor. As the
value of R decreases towards 0, the composite becomes bistable and ultimately shows an
attractor for non-final prominence, when R = 0.
Figure 57. Composite potential functions
The system behavior at each step was probed by finding the probability density of
the dynamical system outputs under conditions of noise. To do so, 300 simulations of
each system were run, with random starting positions, and Gaussian noise of mean = 0
and standard deviation = 10 added to the force (the derivative of the potential) at every
167
time step. The duration of each simulation was 1000 time steps. The bottom panels of
Figure 57 show the histogram of the 300 final states of the simulations. As can be seen,
there is a mode in the outputs of 1 when R = 1, and a mode of -1 when R = 0. At
intermediate values, the distribution appears multimodal. To relate these simulations to a
categorical measure of final vs. non-final prominence (such as obtained in Experiment 1),
the probability that the system’s output has a value less than 0 was calculated, as these
are the outputs that correspond to more relative prominence phrase-internally. This
probability (P(int)) is given at the top of each panel. For R = 1, P(int) = .13, which means
that the system stably produces final prominence, with 13% exceptions (due to noise).
This value is very close to what is seen in Chapter 2 for the L2 group as a whole, for pre-
final prominence in the unaccusative condition. For R = 0, P(int) = .91, which indicates
stable pre-final prominence, and is close to the value of 97% reported for the ENC group.
For a value of R=.5, however, the system exhibits fluctuating behavior. P(int) = .54,
which means that the system will exhibit near-random behavior, vacillating between final
and non-final prominence. These predicted stability and fluctuation patterns map well
onto observed individual L2 behavior. L2 speaker 1 in Experiment 4 produced a
somewhat bimodal distribution of the rhythm measure (subdivision of the repetition
interval), though not as clearly bimodal as speakers 2 and 4. Speakers 2 and 4 produced
7out of 8 combined unaccusative and compound tokens with non-final prominence (prob
= .87), which is close to the value of P(int) = .91 seen here for R = 1. Speaker 1, however,
exhibited fluctuation in the combined unaccusative and compound task, producing 4 out
of 8 with pre-final prominence, corresponding to a probability of .5, which is quite close
to the .543 value obtained for R = .5.
168
In summary, the dynamical model and transition proposed can account both for
the grammatical stability seen at the endpoints of the learning process, and also the
fluctuations seen as the system evolves. However, the quantitative connection between
the dynamics underlying rhythm and the dynamics underlying phrasal prominence is still
missing in the current version. Ultimately the goal is to equate the specific parameter that
regulates the bimodality of rhythm with the parameter that regulates the bimodality of
phrasal prominence. In the current work, it was demonstrated that individuals’
distributions of the rhythmic subintervals in the foot condition of the repetition task
experiment could be related to their behavior in phrasal prominence. However, the degree
of bimodality of their rhythmic behavior in the repetition task was not evaluated
quantitatively, and it is this degree of bimodality that ultimately needs to be related to
their phrasal prominence behavior. This is currently being pursued with the use of
Gaussian Mixture Models (GMMs) (Nabney 2002), a statistical method used for data
clustering and estimating probability densities. Relevant to the current work GMMs can
be used to determine the degree to which the overall distribution of a set of data, for
example the subintervals of the repetition task, can be modeled as two Gaussians (two
normal distributions) with approximately equal probability density. The equality of these
densities is the condition that will be taken to be true bimodality. To the extent to which
one of these distributions has a substantially greater probability density than the other,
this would be a case of weaker bimodality. So the degree of bimodality can be quantified
as the difference in the probability density of the resulting model Gaussians. Further, this
density difference can be used to set the value of the R parameter in the phrasal
169
prominence potential function that determines the tilt of that potential function and
therefore its symmetry.
This work as a whole contributes uniquely to the field of linguistics and the area
of second language acquisition. Broadly, this dissertation is the first work (to my
knowledge) to use quantitative data in modeling the acquisition process within the
complex systems/dynamic systems theory. The specific contributions are outlined in
continuation.
Richness of the database As has been mentioned throughout this work, research
on the second language acquisition of prosody has suffered neglect despite its
importance, and the current investigation makes a solid contribution to and provides an
inroad for burgeoning research agendas in this area. The contribution of this work as
among the first to look at the acquisition of both rhythm and phrasal prominence within
an L2 group was made possible in large part due to the size of the data set and the unique
use of forced alignment measurement techniques. Most of the studies reviewed in the
previous chapters use few speakers, and in many cases the data is hand-labeled. However,
in order to accurately assess the structure of variability, a large amount of data is needed
in order to see the structure of what is inherently variable. The current work has recruited
tools in the area of machine-learning techniques and applied its use to a very different
domain with successful outcomes. Relatedly, most of the studies reviewed here have
relied primarily on measures to characterize rhythm that yield one or two specific
numbers pertaining to a specific value for a given language. However, these studies have
not looked at distributions of the data set – which contain a structure – due to the paucity
of the data, which does not allow for the respective histograms. The amount of data
170
presented here drawn from large speaker groups has allowed for a picture of the
underlying structure of variability, and in this way reducing the entire language to a
single number is avoided. A single number on its own is not informative about an
individual language, and can only be informative when compared with other individual
numbers that represent separate languages. A distribution, however, does indicate
structure and can stand on its own in terms of an informative representation of the
language.
Categories and their consequences This work has contributed to linguistic theory
in granting a deeper understanding of prosodic structure through the use of empirical data
with a unique means of probing the relationship among prosodic events and the
relationship between prosodic categories and their continuous physical manifestations. A
core issue in the field of linguistics is understanding the relation between qualitative
categories and change. Here, this relation has been modeled with quantitative data where
it has been shown that over time (represented here through the acquisition process) a
change can be observed that results in a qualitatively different state. This was modeled
with the continuous durational data from speech groups whose variation yielded
qualitatively different distributions.
Modeling organization An additional outcome from the resulting dynamical
model proposed here is that it allows for predictions about the fluctuations that the system
should exhibit. From the model-derived potential functions, predictions were made
regarding the percent values of phrasal prominence for bimodality and the temporal
interval task. The importance of having a model that is both quantitative and sufficiently
171
abstract is that it provides a means to reveal categorical structure and at the same time is
specific enough to predict states of and changes in modes of organization.
The fabric of this work leaves many threads to follow. In order to gain a more
complete understanding of the acquisition process as it relates to the flexibility explored
here, the current work will be extended as a bidirectional study, including a test
population of first language English speakers learning Spanish. Since these speakers will
be acquiring word order flexibility and forfeiting the durational interval flexibility
characteristic of English rhythm, this acquisition process grants additional insight into the
reorganization of modes of organization. An additional population of early bilinguals
should also be included as part of a more global research project in order to understand
how the organization of speakers who simultaneously acquire two modes of organization
might differ from those whose acquisition is sequential.
Different approaches to modeling prosodic structure were reviewed in Chapter 4.
Apart from the coupled oscillator model, it was also mentioned that in the current version
of the text-to-speech synthesis application TaDA incipient modeling of prosody has been
advanced. The current work can add to this modeling agenda in proposing at what point
the foot node would need to be coupled to the phrasal cycle. Currently, the model
includes a stress function, modeled using a µT –gesture, that slows the overall flow of the
phase in the syllable oscillators of a stressed syllable. However, in order for this to yield
the desired consequence at the phrasal level, it is necessary to know when in the phrasal
cycle that the foot gesture should be activated. The current work provides insight in to
where the “right place” for a foot node would be in English, and that in order to produce
English prosody this is the relevant coordination to acquire. In the case of the TaDA
172
model for Spanish, options are that either no foot nodes exist or the coupling strength is
maintained weak. TaDA in the current version goes in the direction of text-to-speech, but
work is already being done to go from speech-to-text as well. So given an input of
speech, the model would provide a representation of the articulators involved and their
respective coordination. Observing how real speech is modeled with different sentence
types in both Spanish and English, with both monolingual and second language speakers,
informs decisions about the specific architecture of both the speech-to-text and text-to-
speech models.
Additionally, other aspects of prosody that are not considered here will have to be
addressed as part of continuing research. A limitation of the current work is that the
relation between rhythm and phrasal events is only considered in terms of duration, and it
must be determined how pitch contributes to this overall relation. While it can be argued
(as it has been here) that the relative duration properties are the key elements in the
puzzle of NS acquisition, complete acquisition of prominence structure may include pitch
gestures and their distribution.
Finally, the work presented here culminates in suggesting a correlation between
the acquisition of rhythmic and phrasal events in the speech of the acquirer. However, it
is always dangerous to infer causality from correlation. With this in mind, I plan to also
design a task to examine this relationship more directly with the hopes of providing direct
support to the ideas suggested by the correlation. In theory, if this correlation does exist,
then speakers who do not show evidence for nuclear stress in their speech (as determined
by the Q&A task or a related task) could be exposed to a “rhythmic intervention” task
much like the repetition task presented in Chapter 4. During the task they would
173
essentially be trained to break the vowel-to-vowel durational symmetry and develop the
necessary flexibility that characterizes the foot and larger rhythmic units in English. If the
relation holds, then the same speakers should produce nuclear stress in their speech after
a determined amount of time of training using the intervention task.
174
References
Abercrombie, D. (1967). Elements of general phonetics. Chicago: Aldine.
Adams, C. & R. R. Munro. (1978). In search of the Acoustic Correlates of Stress:
Fundamental Frequency, Amplitude, and Duration in the Connected Utterance of
Some Native and Non-Native Speakers of English. Phonetica 35, 125-156.
Archibald, J. (1997). The acquisition of L2 phrasal stress. In: Focus on Phonological
Acquisition. Edited by M. Young-Scholten & S.J. Hannah. John Benjamins.
Archibald, J. (1998). Second Language Phonology. Amsterdam: John Benjamins.
Archibald, J. (2000). Second Language Acquisition and Linguistic Theory. Oxford:
Blackwell.
Barbosa, P. A. (2002). Explaining cross-linguistic rhythmic variability via a coupled-
oscillator model of rhythm production. In: Proceedings of the 1st International
Conference on Speech Prosody. Aix-en-Provence, France.
Barbosa, P. A. (2007). From syntax to acoustic duration: A dynamical model of speech
rhythm production. Speech Communication 49, 725-742.
Backman, N. E. (1979). Intonation errors in second language pronunciation of
eight Spanish speaking adults learning English. Interlanguage Studies Bulletin 4,
239-266.
Baumann, S. & M. Grice. (2006). The Intonation of Accessibility. Journal of Pragmatics
38, 10, 1636-1657.
Beckman, M. & G. M. Ayers. (1994). Guidelines for ToBI Labelling. Online MS and
accompanying files available at http://www.ling.ohio-state.edu/~tobi/ame_tobi.
Beckman, M. E., Hirschberg, J., & S. Shattuck-Hufnagel. (2005). The original ToBI
system and the evolution of the ToBI framework. In: Prosodic Typology: The
Phonology of Intonation and Phrasing. Edited by S.-A. Jun. Oxford University
Press, pp. 9-54.
Berwick, R. C. (1985). The acquisition of syntactic knowledge. Cambridge: MIT Press.
Barbosa, P. A. (2002). Explaining cross-linguistic rhythmic variability via a coupled-
oscillator model of rhythm production. In Proceedings of 1st International
Conference on Speech Prosody, Aix-en-Provence, France.
175
Barbosa, P. A. (2007). From syntax to acoustic duration: A dynamical model of speech
rhythm production. Speech Communication 49, 725-742.
Beckman, M. E. & J. R. & Edwards. (1990). Lengthenings and shortenings and the nature
of prosodic constituency. In: Papers in laboratory phonology I: Between the
grammar and the physics of speech. Edited by J. Kingston & M. E. Beckman.
Cambridge: Cambridge University Press. 152-178.
Beckman, M. E., Hirschberg, J., & S. Shattuck-Hufnagel. (2005). The original ToBI
system and the evolution of the ToBI framework. In: Prosodic Typology: The
Phonology of Intonation and Phrasing. Edited by S.-A. Jun. Oxford University
Press.
Bolinger, D. (1954). English Prosodic Stress and Spanish Sentence Order. Hispania 37,
152-156.
Bolinger, D. (1964). Intonation as a universal. Proceedings of the Ninth International
Congress of Linguists. In: Edited by H.G. Lunt.
Bolinger, D. (1965). Forms of English. Edited by I. Abe & T. Kanekiyo. Cambridge,
Mass.
Bolinger, D. (1972). Accent is Predictable (if You are a Mind-Reader). Language 48,
633-644.
de Bot, K., Lowie, W. & M. Verspoor. (2005). Second Language Acquisition: An
Advanced Resource Book. London & New York: Routledge.
Buder, E. H. (1996). Dynamics of speech processes in dyadic interaction. In: Dynamic
patterns in communication processes. Edited by J. H. Watt & C. A. Vanlear.
Sage: Thousand Oaks, CA.
Büring, D. (2007). Intonation, Semantics and Information Structure. In: The Oxford
Handbook of Linguistic Interfaces. Edited by G. Ramchand & C. Reiss.
Byrd, D. (2000). Articulatory vowel lengthening and coordination at phrasal junctures.
Phonetica, 57, 3-16.
Calhoun, S. (2006). Information Structure and the Prosodic Structure of English: a
Probabilistic Relationship. Unpublished doctoral dissertation. University of
Edinburgh.
Carstairs-McCarthy, A. (1999). The Origins of Complex Language: An Inquiry into the
Evolutionary Beginnings of Sentences, Syllables, and Truth. Oxford University
Press.
176
Carter, P. M. (2005). Quantifying rhythmic differences between Spanish, English, and
Hispanic English. In: Theoretical and experimental approaches to romance
linguistics: Selected papers from the 34th linguistic symposium on romance
languages (Current issues in linguistic theory 272). Edited by R. S. Gess, & E. J.
Rubin. Amsterdam, Philadelphia: John Benjamins, pp. 63–75.
Chomsky, N. & M. Halle. (1968). The sound pattern of English. New York: Harper Row.
Chomsky, N. (1971). Deep structure, surface structure, and semantic interpretation. In:
Semantics. Edited by D. Steinberg & L. Jakobovits. pp. 183-216.
Chun, D. M. (2002). Discourse Intonation in L2: From Theory and Research to
Practice, with accompanying CD-ROM. Amsterdam: John Benjamins.
Contreras, Heles (1976). El orden de las palabras en español. Madrid: Cátedra.
Cruttenden, A. (1997). Intonation. Cambridge University Press.
Culicover, P. & A. Nowak. (2003). Dynamical Grammar. Volume Two of Foundations of
Syntax. Oxford University Press, Oxford.
Cummins, F. (2002). Speech Rhythm and Rhythmic. In: Taxonomy Proceedings of
Prosody 2002 Aix en Provence, pp. 121-126.
Cummins, F. & R. Port. (1998). Rhythmic constraints on stress timing in English.
Journal of Phonetics, 26, 2, 145-171.
Dauer, R. (1983). Stress-timing and syllable-timing reanalyzed. Journal of Phonetics 11,
51-62.
de la Mota, Carme (1995). La representación grammatical de la información nueva en el
discurso. Doctoral dissertation, Universitat Autónoma de Barcelona.
Delattre, P. (1962). A comparative study of declarative intonation in American English
and Spanish. Hispania 45, 233-241.
Delattre, P. (1969). An Acoustic and Articulatory Study of Vowel Reduction in Four
Languages. International Review of Applied Linguistics in Language Teaching, 7,
4, 294-325.
Dellwo,V., Fourcin,A. & E. Abberton. (2007). Rhythmical classification based on voice
parameters. International Conference of Phonetic Sciences (ICPhS).
177
Eckman, F. R., A. Elreyes & G. K. Iverson (2003). Some principles of second language
phonology. Second Language Research 19, 169-208.
Face. T. (2002). Local intonational marking of Spanish contrastive focus. Probus 14, 71-
92.
Flege, J. E. (1987). The production of ‘new’ and ‘similar’ phones in a foreign language:
evidence for the effect of equivalence classification. Journal of Phonetics 15, 47-
65.
Flege, J. E. (1991). Age of learning affects the authenticity of voice onset time (VOT) in
stop consonants produced in a second language. Journal of the Acoustical Society
of America 89, 395-411.
Flege, J. E. (1995). Second-language speech learning: Theory, findings and problems. In:
Speech Perception and Linguistic Experience: Theoretical and Methodological
Issues in Cross-Language Speech Research. Edited by W. Strange. Timonium,
MD: York Press Inc, pp. 233-272.
Fokes, J. & Z. S. Bond. (1986). Non-native English speakers' stress patterns in words and
sentences. Human Communication Canada 10, 5-10.
Fokes, J. & Z. S. Bond. (1989). The vowels of stressed and unstressed syllables in non-
native English. Language Learning 39, 341-373.
Font, C. & R. Mestre. (1991). Compensatory shortening in Spanish spontaneous speech.
Proceedings of ESCA (Barcelona) 16, 1-5.
Fox, A. (2000). Prosodic Features and Prosodic Structure. The Phonology of
Suprasegmentals. Oxford University Press, Oxford.
Fry, D. B. (1958). Experiments in the perception of stress. Language and Speech 1, 126-
152.
Gafos, A. & S. Benus. (2006). The dynamics of phonological cognition. Cognitive
Science 30, 5, 905-943.
Gibbon, D. & U. Gut. (2001). Measuring Speech Rhythm. In: Proceedings of
Eurospeech. Aalborg, 91-94.
Grabe. E., & Low, E. (2002). Durational Variability in Speech and the Rhythm Class
Hypothesis. In: Papers in Laboratory Phonology 7. Edited by C. Gussenhoven &
N. Warner. Berlin: Mouton de Gruyter, pp. 377–401.
178
Grabe, E. & P. Warren. (1995). Stress Shift: do speakers do it or do listeners use it? In:
Papers in Laboratory Phonology IV. Phonology and Phonetic Evidence. Edited by
B. Connell and A. Arvaniti. Cambridge: CUP.
Gussenhoven, C. (1983). Focus, Mode, and the Nucleus. Journal of Linguistics 19, 377-
417.
Gussenhoven, C. (1984). On the Grammar and Semantics of Sentence Accents.
Dordrecht: Foris.
Gut, U. (2003). Prosody in second language speech production: the role of the native
language. Fremdsprachen Lehren und Lernen 32, 133-152.
Gutiérrez-Díez, F. (2001). The Acquisition of English Syllable Timing by Native Spanish
Speakers Learners of English. An Empirical Study. International Journal of
English Studies 1, 1, 93-113.
Haken, H. (2004). Synergetics, Introduction and Advanced Topics. Berlin: Springer.
Hale, K. & J. Keyser. (2002). Prolegomena to a theory of argument structure. Linguistic
Inquiry Monograph, 39, Cambridge, MA: MIT Press.
Halle, M. & J.-R. Vergnaud. (1987). An essay on stress. Cambridge, MA: MIT Press.
Harris, J. (1983). Syllable Structure and Stress in Spanish: A Nonlinear Analysis.
Linguistic Inquiry Monograph 8, Cambridge, MA: MIT Press.
Harris, J. (1983). The Stress Erasure Convention and Cliticization in Spanish. Linguistic
Inquiry, 20, 3.
Harris, J. (1992). Spanish Stress: The Extrametricality Issue. Indiana University
Linguistics Club, Indiana.
Hasegawa-Johnson, M., Chen, K., Cole, J., Borys, S., Kim, S-S., Cohen, A., Zhang, T.,
Choi, H. K., Yoon, T. & S. Chavarria. (2005). Simultaneous Recognition of Words
and Prosody in the Boston University Radio Speech Corpus. Speech
Communication 46, 3-4, 418-439.
Hayes, B. (1984). The phonology of rhythm in English. Linguistic Inquiry 15, 33–74.
Hayes, B. (1995). Metrical Theory. Chicago: Chicago University Press.
Hoequist, C. (1983). Syllable Duration in Stress-, Syllable- and Mora-Timed Languages.
Phonetica 40, 203-237.
179
Hoskins, S. (1996). A Phonetic Study of Broad and Narrow Focus in Intransitive
Verb Sentences. In: Proceedings of the Fourth International Conference on Spoken
Language Processing, October 3-6, Philadelphia, PA.
Hualde, J. I. (2002). Intonation in Spanish and the other Ibero-Romance languages:
Overview and status quaestionis. In: Romance Phonology and Variation.
Selected papers from the 30th Linguistic Symposium on Romance Languages,
Gainesville, Florida, February 2000. Edited by Caroline Wiltshire & Joaquim
Camps. Amsterdam: Benjamins, pp. 101-115.
Hualde, J. I. (2007). Stress removal and stress addition in Spanish. Journal of Portuguese
Linguistics 5.2/6.1, 59-89.
Hualde, J. I. (To appear). Secondary stress and stress clash in Spanish. In: The
Proceedings of Laboratory Approaches to Spanish Phonology 4.
Huebner, T. & C. Fergurson. (1991). Crosscurrents in Second Language Acquisition and
Linguistic Theories. Amsterdam/Philadelphia: John Benjamins Publishing
Company.
Huss, V. (1978). English word stress in the post-nuclear position. Phonetica 35, 86-105.
Ionin, T., Ko, H., & K. Wexler. (2004). Article semantics in L2-acquisition: the role of
specificity. Language Acquisition 12, 1, 3-69.
Inkelas, S. & D. Zec. (1990). Prosodically Constrained Syntax. In: The Phonology-Syntax
Connection. Edited by S. Inkelas & D. Zec. CSLI publications and the University
of Chicago Press.
Jenner, B., (1976). Interlanguage and foreign accent. Interlanguage Studies Bulletin 1 2–
3, 166–195.
Jilka, M. (2007). Different Manifestations and Perceptions of Foreign Accent in
Intonation. In: Non-Native Prosody - Phonetic Description and Teaching
Practice. Edited by J. Trouvain & U. Gut. Mouton De Gruyter, Berlin, pp. 77 –
96.
Jun, S.-A. & M. Oh. (2000). Acquisition of 2nd Language Intonation. In: Proceedings of
International Conference on Spoken Language Processing, Volume 4. Beijing,
China, pp. 76-79.
Kager, R. (1989). A Metrical Theory of Stress and Destressing in English and Dutch.
Linguistic Models, 14. Dordrecht: Foris Publications.
180
Kauffman, S. (1995). At home in the universe The Search for Laws of Self-Organization
and Complexity. New York: Oxford University Press.
Kelm, O. R. (1987). An Acoustic Study on the Differences of contrastive Emphasis
Between Native and Non-Native Spanish Speakers. Hispania 70, 627-633.
Kelm, O. R. (1995). Acoustic measurement of Spanish and English pitch contours: native
and non-native speakers. Hispanic Linguistics 6–7, 435–448.
Kelso, J. (1995). Dynamic Patterns: The Self-Organization of Brain and Behavior.
Cambridge, MA.: MIT Press.
Kim, H. & J. Cole. (2006). Rhythmic shortening in American English In: Proceedings of
42nd annual Meeting of the Chicago Linguistic Society.
Ladd, R. (1980). The Structure of Intonational Meaning: Evidence from English. Indiana
University Press, Bloomington.
Ladd, R. (1996). Intonational Phonology. University Press.
Larsen-Freeman, D., & M. H. Long. (1991) An Introduction to Second Language
Acquisition Research. New York: Longman.
Larsen-Freeman, D. (1997). Chaos/complexity science and second language acquisition.
Applied Linguistics 18, 2, 141–65.
Larsen-Freeman, D. & L. Cameron. (2006). Complex systems and applied linguistics.
Oxford University Press.
Leather, J. & A. James. (1991). The acquisition of second language speech. Studies
in Second Language Acquisition 13, 3, 305-341.
Lehiste, I. (1977). Isochrony reconsidered. Journal of Phonetics 5, 253-263.
Levin, B. & M. Rappaport-Hovav. (1994). Unaccusativity: At the Syntax-Semantics
Interface. Cambridge, MA: MIT Press.
Liberman, M., & A. Prince. (1977). On stress and linguistic rhythm. Linguistics Inquiry
8, 249-336.
Lloyd James, A. (1940). Speech signals in telephony. London.
Mallows, D. (2002). Non-linearity and the observed lesson. ELT Journal 56, 1, 3-10.
181
Major, R.C. (1987). English voiceless stop production by speakers of Brazilian
Portuguese. Journal of Phonetics 15, 197-202.
Major, R.C. (1998). Interlanguage phonetics and phonology: An introduction. Studies in
Second Language Acquisition 20, 2, 131-137.
Major, R.C. (2001). Foreign accent: The ontogeny and phylogeny of second language
phonology. Mahwah, NJ: Lawrence Erlbaum Associates.
Mayoral-Hernandez, R. (2006). A variation study of verb types and subject position:
Verbs of light and sound emission. Romance Linguistics 2006. In: Selected
papers from the 36th Linguistic Symposium on Romance Languages (LSRL). New
Brunswick, March-April 2006. John Benjamins.
Meara, P. (2004). Modeling vocabulary loss. Applied Linguistics 25, 2, 137-155.
Meara, P. (2006). Emergent Properties of Multilingual Lexicons. Applied Linguistics 27,
4, 620-644.
Menezes de Oliveira e Paiva, V. L. (Submitted). Second language acquisition: from main
theories to complexity.
Mennen, I. (1999) The realisation of nucleus placement in second language intonation.
Proceedings of the International Congress of Phonetic Sciences. San Francisco,
August, 1999.
Mennen, I. (2007). Phonological and phonetic influences in non-native intonation. In:
Non-native Prosody: Phonetic Descriptions and Teaching Practice. Edited by J.
Trouvain and U. Gut The Hague: Mouton De Gruyter, 53-76.
Mohanan, K.P. (1992). Emergence of Complexity in Phonological Development. In:
Child Phonological Development. Edited by Ferguson, C. L, Menn & C. Stoel-
Gammon. Timonium, pp. 635-662.
Monroy, R. & F. Gutierrez. (2001). Perspectives on Interlanguage Phonetics and
Phonology, International Journal of English Studies (Universidad de Murcia) 1, 1.
Moraes, J. (1998). Intonation in Brazilian Portuguese. In: Intonation Systems: a Survey of
Twenty Languages. Edited by D. Hirst & A. Di Cristo. Cambridge: Cambridge
University Press, pp. 179-194.
Nabney, I. T. (2002). NETLAB: Algorithms for Pattern Recognition. London: Spinger-
Verlag.
182
Nam, H., Goldstein, L., Saltzman, E. & D. Byrd. (2004). TADA: An enhanced, portable
Task Dynamics model in MATLAB. The Journal of the Acoustical Society of
America 115, 2, 2430.
Nam, H., Goldstein, L. & E. Saltzman. (2005). A coupled oscillator model of
intergestural timing within syllables. The Journal of the Acoustical Society of
America. 118, 2034.
Nava, E. (2006). Prosody and Focus Alignment in L2 Speech. Manuscript, University of
Southern California.
Nava, E. (2007). Word Order in Bilingual Spanish: Convergence and Intonation Strategy.
In: Selected Proceedings of the Third Workshop on Spanish Sociolinguistics.
Edited by Jonathan Holmquist and Augusto Lorenzino. Somerville MA:
Cascadilla Proceedings Project.
Nava, E. (2009). Web-based survey of Spanish native-speaker word order preference.
Unpublished manuscript. Supported by Del Amo Foundation (Summer study in
Spain).
Nava, E., Tepperman, J., Goldstein, L., Zubizarreta, M.L., & S. Narayanan. (2009)
Connecting Rhythm and Prominence in Automatic ESL Pronunciation Scoring.
Interspeech 2009, 10th Annual Conference of the International Speech
Communication Association. Brighton, UK (September 6-10, 2009).
Nava, E. & M. L. Zubizarreta. (2010). Deconstructing the Nuclear Stress
Algorithm: Evidence from Second Language Speech. In: The Sound patterns of
Syntax. Edited by N. Erteschik-Shir & L. Rochman. Oxford University Press.
Navarro Tomás, T. (1944). Manual de la entonación española. New York: Hispanic
Institute in the United States.
Nowak, M. (2006). Evolutionary Dynamics. Cambridge, MA: Harvard University Press.
O’Dell, M. L., & T. Nieminen. (1999). Coupled oscillator model of speech rhythm. In:
Proceedings of the XIV
th
International Congress of Phonetic Sciences, Vol. 2.
Edited by J. J. Ohala, Y. Hasegawa, M. Ohala, D. Granville, & A. C. Bailey. New
York: American Institute of Physics, pp. 1075-1078.
Oshita, H. (1997). The Unaccusative Trap: L2 Acquisition of English Intransitive Verbs.
Unpublished doctoral dissertation, University of Southern California.
183
Ortega-Llebaria M. & P. Prieto. (2007). Disentangling stress from accent in Spanish:
production patterns of the stress contrast in deaccented syllables. In: Segmental
and Prosodic Issues in Romance Phonology. Edited by P. Prieto, J. Mascaró, &
M.-J. Solé, John John Benjamins: Amsterdam/Philadelphia, pp. 155-175.
Pamies Bertrán, A. (1999). Prosodic Typology: on the Dichotomy between Stress-Timed
and Syllable-Timed Languages. Language Design: Journal of Theoretical and
Experimental Linguistics 2, 103-131.
Percival, I., & D. Richards. (1982). Introduction to dynamics. Cambridge, UK:
Cambridge University Press.
Perlmutter, D. (1978). Impersonal passives and the unaccusative hypothesis. In:
Proceedings of the Fourth Annual Meeting of the Berkeley Linguistics Society,
February 18-20, 1978. Edited by Jeri J. Jaeger, pp.157-89.
Pierrehumbert, J. & J. Hirschberg (1990). The Meaning of Intonation in the Interpretation
of Discourse. In: Intentions in Communication. Edited by P. Cohen, J. Morgan &
M. Pollack. Cambridge, MA: MIT Press, pp. 271-311.
Pike, K.L. (1945). The intonation of American English. Ann Arbor, MI: University of
Michigan Press.
Piske, T., MacKay, I.R.A. & J.E. Flege. (2001). Factors affecting degree of foreign
accent in an L2: A review. Journal of Phonetics 29, 2, 191-215.
Prieto, P., Vanrell, M., Astruc, L., Payne, E., & B. Post. (2010). Speech rhythm as
durational marking of prosodic heads and edges. Evidence from Catalan, English,
and Spanish. Speech Prosody 2010, Chicago, Illinois.
Quilis, A. (1975). Las unidades de entonación. Revista Española de Lingüística 5, 261-
279.
Quilis, A. (1985). Entonación dialectal hispánica. Lingüística Española Actual 7, 2, 145-
190.
Ramus, F., Nespor, M., & J. Mehler. (1999). Correlates of linguistic rhythm in the speech
signal. Cognition 73, 265-292.
Ramus, F. (2002). Acoustic correlates of linguistic rhythm: Perspectives. In Proceedings
of speech prosody 2002 Aix-en-Provence, pp. 115-120.
Reinhart, T. (2006). Interface strategies: optimal and costly computation. Cambridge,
Mass.: MIT Press.
184
Roach, P. (1983). On the distinction between stress-timed languages and syllable-timed
languages. In: Linguistic Controversies: Essays in Honour of F.R. Palmer. Edited
by D. Crystal. London: Arnold.
Roca, I. (1988). Theoretical Implications of Spanish Word Stress. Linguistic Inquiry
19, 393-423.
Roca, I. (1997). On the role of accent in stress systems: Spanish Evidence. In: Issues
in the Phonology and Morphology of the Major Iberian Languages. Edited by F.
Martínez-Gil & A. M. Front. Georgetown University Press, Washington D.C.,
pp. 618-663.
Ronat, M. (1982). Logical Form and discourse islands. Journal of Linguistic Research 2,
33-48.
Rutherford, W. (1987). Second language grammar: Learning and teaching. London:
Longman.
Saltzman, E., Nam, H. Krivokapic, J. & L. Goldstein. (2008). A task-dynamic toolkit for
modeling the effects of prosodic structure on articulation. In: Proceedings of
Speech Prosody 2008. Edited by P. Barbosa, S. Madureira & C. Reis. Campinas,
Brazil.
Sancier, M. L., & C.A. Fowler. (1997). Gestural drift in a bilingual speaker of Brazilian
Portuguese and English. Journal of Phonetics 25, 421-436.
Sasse, H.-J. (1987). The thetic/categorical distinction revisited. Linguistics 25, 511-580.
Sasse, H.-J. (1995). Theticity and VS order: A case study. Sprachtypologie und
Universalienforschung 48, 3-31.
Selkirk, E. (1984). Phonology and Syntax. The Relation between Sound and Structure.
Cambridge, MA: MIT Press.
Shattuck-Hufnagel, S., Ostendorf, M. & K. Ross. (1994). Stress shift and early pitch
accent placement in lexical items in American English. Journal of Phonetics 22,
357-388.
Schmerling, S. (1976). Aspects of English sentence stress. Austin: University of Texas
Press.
Silverman, K., Beckman, M., Pitrelli, J., Ostendorf, M., Wightman, C., Price, P.,
Pierrehumbert, J., & J. Hirschberg. (1992). TOBI: A Standard for Labeling English
Prosody. In: Proceedings of the 1992 International Conference on Spoken
Language Processing. Banff, Canada, pp. 867-70.
185
Sorace, A. (2000). Gradients in auxiliary selection with intransitive verbs. Language 76,
859-890.
Sosa, J. M. (1999). La entonación del español. Madrid: Ediciones Cátedra.
Stockwell, R. & J. Bowen. (1965). The Sounds of English and Spanish. London &
Chicago: University of Chicago Press.
Tajima, K., Port, R. & J. Dalby. (1997). Effects of temporal correction on intelligibility of
foreign-accented English. Journal of Phonetics 25, 1, 1-24.
Thelen, E., & L. Smith (1994). A Dynamic Systems Approach to the Development of
Cognition and Action. Cambridge, MA: MIT Press.
Tilsen, S. (2009). Multitimescale dynamical interactions between speech rhythm and
gesture. Cognitive Science 33, 839-879.
Titze, I.R. (1994). Principles of Voice Production. Prentice Hall.
Toledo, G. (1989). Señales prosódicas del foco. Revista Argentina de Lingüística 5, 205-
230.
Trofimovich, P., Baker, W., Flege, J. E., & Mack, M. (2003). Second-language sound
learning in children and adults: Learning sounds, words, or both? In: Proceedings
of the 27th Boston University Conference on Language Development. Edited by
B. Beachley, A. Brown, & F. Conlin. Somerville, MA: Cascadilla Press, pp. 775-
786.
Truckenbrodt, H. (2006). Phrasal Stress. In: The Encyclopedia of Languages and
Linguistics. 2nd edition. Edited by Keith Brown. Oxford: Elsevier, pp. 572-579.
Tuller, B., Case, P., Ding, M., & , J. A. S. Kelso. (1994). The nonlinear dynamics of
speech categorization. Journal of Experimental Psychology: Human Perception
and Performance, 20, 1, 3-16.
Tuller, B. (2004). Categorization and learning in speech perception as dynamical
processes. In: Tutorials in Contemporary Nonlinear Methods for the Behavioral
Sciences. Edited by M. A. Riley & G.C. Van Orden.
Turk, A. & S. Shattuck-Hufnagel. (2007). Phrase-final lengthening in American
English. Journal of Phonetics 35, 4, 445-472.
Turk, A. & L. White. (2007). Structural effects on pitch accentual lengthening in English.
Journal of Phonetics 27, 171-206.
186
Ueyama , M. & S.-A. Jun. (1998). Focus realization in Japanese English and Korean
English intonation. In: CSLI, vol. 7. Cambridge University Press, pp. 629-645.
Van Lieshout, P. (2004). Dynamic systems theory and its application in speech. In:
Speech motor control in normal and disordered speech. Edited by B. Maassen, R.
Kent, H.F.M. Peters, P. Van Lieshout, and W. Hulstijn. Oxford University Press,
pp. 51-82.
Vogel, I., Bunnell, H. T., & S. Hoskins. (1995). The phonology and phonetics of the
rhythm rule. In: Papers in Laboratory Phonology IV. Edited by B. Connell.
Cambridge: Cambridge University Press, pp. 111-127.
Wenk, B.J. (1985). Speech rhythms in second language acquisition. Language and
speech 28, 2, 157-175.
White, L. & S. L. Mattys (2007). Calibrating rhythm: First language and second language
studies. Journal of Phonetics 35, 501-522.
Wightman, C. W., Shattuck-Hufnagel, S., Ostendorf, M., & Price, P. (1992). Segmental
durations in the vicinity of prosodic phrase boundaries. Journal of the Acoustical
Society of America 91, 1707-1717.
Xu, Y., & C. X Xu. (2005). Phonetic realization of focus in English declarative
intonation. Journal of Phonetics 33, 159-197.
Yavas, M. (1996). Differences in Voice Onset Time in early and later Spanish-English
bilinguals. In: Spanish in contact: Issues in bilingualism. Edited by Ana Roca &
John B. Jensen. Somerville, MA: Cascadilla, pp. 131-141.
Yoon, T. J., Cole, J., & M. Hasegawa-Johnson. (2007). On the edge: acoustic cues to
layered prosodic domains. In Proceedings of the XVIth International Congress of
Phonetic Sciences. Edited by J. Trouvain, & W. J. Barry. Dudweiler: Pirrot GmbH,
pp. 1264-1267.
Young, S. (2002). The HTK Book. Cambridge, U.K.: University of Cambridge.
Available: http://htk.eng.cam.ac.uk/
Yuan, Jiahong & M. Liberman. (2008). Speaker identification on the SCOTUS corpus.
Proceedings of Acoustics '08.
Zampini, M.L. & K.P. Green. (2001). The voicing contrast in English and Spanish: the
relationship between perception and production. In: One Mind, Two Languages:
Bilingual Language Processing. Edited by Janet Nicol. Oxford: Blackwell, pp. 23-
48.
187
Ziglari, L. (2008). Affordance and Second Language Acquisition. European Journal of
Scientific Research 23, 3, 373-379.
Zubizarreta, M. L. (1998). Prosody, focus, and word order. Cambridge, Mass.: MIT
Press.
188
Appendix
I. Language Background Questionnaire
Please answer the following questions
1. What is your native language?
Spanish (please indicate variety of Spanish) _________________________
Other (please indicate) ________________________
2. What is your age?______ What is the date of your birth? ____________
3. Sex: M____ F____
4. What language(s) do you speak at home?
Spanish______ English _______
Another language (please say which)____________________________________
5. How old were you when you were first exposed to English?_______
6. Did you study English before arriving in the U.S. in any of the following institutions?
preschool __________
elementary school ______
middle school _____
high school ______
college _____
intensive English courses _______
private tutor _____
other (please describe) ____________
7. When did you move to the United States (month/year)? ___________
8. Please check whether you use English in any of the following situations:
at work ____
in English as Second Language classes _____
in other college/university classes _____
in everyday life _____
189
II. Cloze test (Oshita 1997)
CLOZE TEST
Directions: Read the following three stories carefully and fill in each blank with a
contextually and grammatically appropriate word. To get an idea about each story, first
you can read it once or twice and then try to fill in the blanks. Do not worry too much
about difficult blanks. Just try to fill in as many as you can, by guessing if necessary. You
have approximately 25 minutes. Remember that you can write only one word in each
blank.
Story A
Some people do not seem to have a mind of their own. They seldom make their
1)_________ decisions and never express 2)_________ own opinions. My brother
3)_________ one of these people. 4)_________ night, for example, he 5)_________
planning to spend a 6)_________ evening at home reading 7)__________ book. At about
seven-thirty, 8)__________, his friend Tom dropped 9)__________ and said, “Let’s
watch 10)________ tonight,” “Okay”, my brother 11)_________. By ten o’clock my
12)__________ was tired and sleepy, 13)_________ I am sure he 14)__________ to go
to bed. 15)_________ Tom was not tired. “16)_________ go out and get 17)________
hamburger”, Tom said. “Good 18)_________,” my brother replied. Like 19)_________,
he very often says 20)_________ he does not mean 21)__________ order to please
others. 22)_________ than that, he does 23)__________ tell others what he
24)__________ wants to do, thinking 25)__________ might offend them. In any case,
my brother did not come home until midnight and woke up very late this morning.
Story B
We were about to gather up our picnic things and return to our car when a man
showed up. He looked very annoyed 1)_________ asked us angrily if 2)_________
realized that we were 3)__________ private property. My father, 4)___________ looked
very confused at 5)__________ man’s statement, said that 6)_________ did not. The man
7)_________ pointed to a sign 8)__________ said that camping and 9)__________ were
strictly forbidden in 10)__________ area where we were 11)__________. Poor father
explained that 12)_________ had not seen the 13)__________ until then and had
14)_________ realized that it was 15)___________ property. Despite my father’s
16)__________, the man did not 17)__________ satisfied at all and 18)___________
him for his name 19)___________ address. All the way 20)___________, we were so
upset 21)_________ hardly anyone said a 22)___________. Everyone in the car
23)__________ wondering if the angry 24)__________ would report us to
25)__________ police. Although he didn’t after all, this unpleasant incident completely
ruined the wonderful time we had had in the country that day.
190
Story C
Hunting was originally a means of providing food, but it has now become a sport or a
cultural tradition. Although even today in 1)__________ parts of the world
2)___________ are still people who 3)___________ wild fish, birds, and 4)___________
to provide themselves with 5)__________, in many countries hunting 6)__________ now
as much a 7)___________ activity as anything else. 8)__________ great many years ago,
9)___________ in a small African 10)___________ used birds to catch 11)__________.
The birds were trained 12)____________ that they would dive 13)___________ the
water and come 14)___________ to the fisherman after 15)___________ a few fish. This
16)__________ of fishing is said 17)___________ be at least a 18)____________ years
old and is 19)____________ in the country’s mythological 20)___________. Today,
however, fishing in 21)___________ way has simply become 22)___________ tradition,
since those who 23)___________ in this manner are 24)___________ longer seriously
interested in 25)_____________ fish for food. Their real concern is simply to maintain
this old cultural tradition.
III. Stimuli for Question & Answer Dialogue Task, Experiment 1.
I. UNACCUSATIVES vs. UNERGATIVES.
A.Wide sentence focus: SV
Unaccusatives
Change of location (presentational verbs)
1. Why are you so happy?
My friend arrived.
2. How was the parade?
Fun. The band came.
3. What’s all the excitement in the stadium?
The football team entered.
4. Why are the kids looking outside?
A rabbit appeared.
Change of location (non-presentational verbs)
5. What happened at the game?
The goalie fell.
6. What’s in the news today?
Nothing really. A prisoner escaped.
191
7. What happened at the playground?
A boy disappeared.
8. What’s the matter?
My purse vanished.
Change of state
9. What was that crashing sound?
A glass broke.
10. Why is it suddenly so cold in here?
A door opened.
11. Why can’t I feel the breeze anymore?
A window closed.
12. Why is that child crying?
A cat died.
Unergatives
13. What did they do to celebrate the new track at school?
A student ran.
14. Why didn’t they finish the play?
An actress was crying.
15. How did the party end?
A guest sang.
16. What was that noise in the waiting room?
A patient sneezed.
B. Wide sentence focus. SVPP
Unaccusative + PP (complement)
17. What happened at the ceremony?
The flag fell on the ground.
18. What are the kids so excited?
A clown is coming to school.
19. Why do you look so scared?
A vampire appeared in the house.
192
20. How did you find out about the explosion?
I saw the smoke rise from behind the mountains.
Unergative +PP (adjunct)
21. What did the kids tell you?
They said Jerry cried at school.
22. What went on at the studio today?
It was bizarre. Melanie danced in the dark.
23. Why do the neighbors complain?
Because Nina sings in the shower.
24. Why are all the towels wet?
The girls swam in the pool.
C. Unaccusatives: Wide sentence-focus with adverbs
25. Why is everybody so concerned?
A boy suddenly disappeared.
26. Why is everybody so concerned?
A boy disappeared suddenly.
27. What happened when the teacher came in the room?
The kids immediately left.
28. What happened when the teacher came in the room?
The kids left immediately.
29. Are the kids eating already?
Yes, the pizza quickly arrived.
30. Are the kids eating already?
Yes, the pizza arrived quickly.
31. Why are you waiting at the door?
The guests will soon arrive.
32. Why are you waiting at the door?
The guests will arrive soon.
Unaccusatives: VP Focus
33. What did the Mayor do to make everybody laugh?
The Mayor fell!
193
34. What did the police do when you called about the crime?
The police came.
35. What did the magician do to get the crowd so excited?
The magician disappeared!
36. What did the aliens do that has everyone so scared?
The aliens arrived!
Unergatives: VP Focus
37. Why did the whales do to fascinate the kids?
The whales danced!
38. What is that group of dogs doing to make everyone smile?
A dog is singing!
39. What did you see the lions do when you visited the zoo?
I saw a lion smile
40. What were the dolphins doing to entertain the crowd?
A dolphin was talking!
D. Definiteness
Unaccusative
41. Why is it suddenly so cold in here?
Something opened.
42. What was that crashing sound?
Something broke.
Unergative
43. What did they do to celebrate the new track at school?
Somebody ran.
44. Why didn’t they finish the play?
Somebody was crying.
Transitives
45. What did Sandy just do?
She called somebody.
46. Are you hungry?
No. We already ate something.
194
E. Noteworthiness
Unergatives (control)
47. How was your field trip?
It was cool. A lion roared.
48. Did you see anything interesting today?
Yes. I saw a whale swim.
49. Why are those children screaming?
Because a dog is barking.
50. Why is everybody at the aquarium?
Because there is a dolphin swimming.
Unergatives (noteworthy)
51. How was your field trip?
Guess what? A lion smiled!
52. Did you see anything interesting today?
I can’t believe it. I saw a whale dance!
53. Why does everybody look so surprised?
A dog is singing!
54. Why does everybody look so surprised?
A dolphin is talking!
Unaccusatives (noteworthy)
55. What happened at the game?
You won’t believe it! The mayor fell!
56. Why is the show over?
Guess what? The magician disappeared!
57. What just happened?
The aliens arrived!
58. How was the parade?
Not good. The police came!
II. TRANSITIVE. Wide VP focus.
A. VP new info
195
Complements
59. What did Mary just do?
She spread cream on the bread.
60. What did Lucy do before dinner?
She hid her toys under the bed.
Adjuncts
61. What did Nick do at the party?
He danced mambo in the garden.
62. What did Allison do last weekend?
She ran a marathon in the desert.
Givenness Wide Focus/VP
63. Did Barbara taste your dish?
Yes. She likes spinach.
64. Does Jason make good grades in school?
No, he doesn’t read books.
65. Is Ellen coming out with us tonight?
No, Ellen doesn’t watch movies.
66. Do you have a hobby?
Yes, I collect stamps.
Givenness Wide Focus/VP
67. Why didn’t you eat at Monica’s barbeque?
Because she put salsa on the meat.
68. Are you finished with the coloring books?
No, I’m drawing pictures on the covers.
69.Why are all the cars slowing down?
Because there is ice on the road.
70. Why was the race cancelled?
Because there was water on the track.
B. GIVENNESS: Anaphoric de-accenting
SVO: Object
71. Did Mary try the spinach quiche?
No, she doesn’t eat spinach.
196
72. Did Jason go to the book fair?
No, he doesn’t read books.
73. Why don’t you take Ellen to the movie festival?
Because Ellen doesn’t watch movies.
74. Why are you buying that old stamp?
Because I collect stamps.
SVOPP: PP
75. How did Monica prepare the meat?
She put salsa on the meat.
76. Why are these notebooks missing their cover?
Because I’m drawing pictures on the covers.
77. Why don’t you take the back road to work?
Because there is ice on the road.
78. Why was the racetrack closed down?
Because there was water on the track.
III. NARROW FOCUS
A. Transitives: OBJECTS
Complements:
79. What did Mary spread on the bread?
She spread cream on the bread.
80. What did Lucy put under the bed?
She hid her toys under the bed.
Adjuncts:
81. What did Nick dance in the garden?
He danced mambo in the garden.
82. What did Allison run in the desert?
She ran a marathon in the desert.
B. Transitives: SUBJECTS
83. Who spread cream on the bread?
Stacy spread cream on the bread.
84. Who hid her toys under the bed?
Mandy hid her toys under the bed.
197
Adjuncts
85. Who danced mambo in the garden?
Mike danced mambo in the garden.
86. Who ran a marathon in the desert?
Carolyn ran a marathon in the desert.
C. Intransitives. Narrow focus on the subject
Unaccusatives
87. Who arrived?
My friend arrived.
88. What broke?
A glass broke.
89. What fell on the ground?
The flag fell on the ground.
90. What appeared in the house?
A vampire appeared in the house.
Unergatives
91. Who ran?
A student ran.
92. Who was crying?
An actress was crying.
93. Who sings in the shower?
Nina sings in the shower.
94. Who danced in the park?
Melanie danced in the park.
IV. Compound
95. What will Tim do in Africa?
He will go lion-hunting.
96. Does Jill like to visit parks?
Oh yes. She is a bird-watcher.
97. What will Harry do this summer?
He will go whale-watching.
198
98. Did Barbara like the Italian restaurant?
Oh yes. She is a pasta-eater.
V. Pronouns
99. Do we have tomatoes?
No, I didn’t buy them.
100. Did you call your father?
Yes, but I didn’t find him.
101. Did you season the soup?
Yes, I added salt to it.
102. Why is this bag so heavy?
We put sand in it.
103. Why is Peter so happy?
Because they sent him an invitation.
104. What happened to the bottle of wine that was on the table?
We gave it to Silvia.
IV. Reading passages, Experiment 2
English
The North Wind and the Sun
The North Wind and the Sun were disputing which was the stronger, when a traveler
came along wrapped in a warm cloak. They agreed that the one who first succeeded in
making the traveler take his cloak off should be considered stronger than the other.
Then the North Wind blew as hard as he could, but the more he blew the more closely did
the traveler fold his cloak around him; and at last the North Wind gave up the attempt.
Then the Sun shined out warmly, and immediately the traveler took off his cloak.
And so the North Wind was obliged to confess that the Sun was the stronger of the two.
Spanish
El viento norte y el sol
El viento norte y el sol porfiaban sobre cuál de ellos era el más fuerte, cuando acertó a
pasar un viajero envuelto en ancha capa. Convinieron en que quien antes lograra obligar
al viajero a quitarse la capa sería considerado más poderoso. El viento norte sopló con
gran furia, pero cuanto más soplaba más se arrebujaba en su capa el viajero; por fin el
viento norte abandonó la empresa. Entonces brilló el sol con ardor, e inmediatamente se
199
despojó de su capa el viajero; por lo que el viento norte hubo de reconocer la
superioridad del sol.
V. Statistics: ANOVA results by chapter
Chapter 3
Table 8. ANOVA results for all speaker groups, voicing ratio
Table 9. ANOVA results for all speaker groups, vowel duration by lexical category
Chapter 4: Repetition task
Table 10. ANOVA results for group results, syllable Ratio measure
200
Table 11. ANOVA results for group results, syllable DtoD measure
Table 12. ANOVA results for group results, syllable DtoT measure
Table 13. ANOVA results for group results, foot Ratio measure
Table 14. ANOVA results for group results, foot DtoD measure
Table 15. ANOVA results for groups results, foot DtoT measure
201
Table 16. ANOVA results for L2 individual results, foot DtoT measure
Table 17. ANOVA results for L2 individual results, syllable Ratio measure
Table 18. ANOVA results for L2 individual results, syllable DtoT measure
Abstract (if available)
Abstract
This dissertation investigates the relation between prosodic events at the phrasal level and component events at the rhythmic level. The overarching hypothesis is that the interaction among component rhythmic events gives rise to prosodic patterns at the phrasal level, while at the same time being constrained by the latter, and that in the case of second language acquisition, acquisition at the rhythmic level will precede that of the phrasal level.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Prosody in contact: Spanish in Los Angeles
PDF
The prosodic substrate of consonant and tone dynamics
PDF
The phonology and phonetics of Turkish intonation
PDF
Prosodic recursion and syntactic cyclicity inside the word
PDF
Copy theory of movement and PF conditions on spell-out
PDF
Agreement on the left edge: the syntax of left dislocation in Spanish
PDF
Building adjectival meaning without adjectives
PDF
Register and style variation in speakers of Spanish as a heritage and as a second language
PDF
Against optionality in derivation and interpretation: evidence from scrambling
PDF
The planning, production, and perception of prosodic structure
PDF
The morphosyntax of states: deriving aspect and event roles from argument structure
PDF
A need for context: understanding the second language learning process
PDF
A reduplicative analysis of sentence modal adverbs in Spanish
PDF
Articulatory dynamics and stability in multi-gesture complexes
PDF
Signs of skilled adaptation in the co-speech ticking of adults with Tourette's
PDF
Processing the dynamicity of events in language
PDF
What 'you' and 'I' can say about reference resolution and non-structural constraints
PDF
Building phrase structure from items and contexts
PDF
Prosody and informativity: a cross-linguistic investigation
Asset Metadata
Creator
Nava, Emily Anne
(author)
Core Title
Connecting phrasal and rhythmic events: evidence from second language speech
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Linguistics
Publication Date
05/25/2010
Defense Date
05/10/2010
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
complex systems theory,OAI-PMH Harvest,prosody,rhythm,second language acquisition
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Goldstein, Louis (
committee chair
), Zubizarreta, Maria Luisa (
committee chair
), Saltarelli, Mario (
committee member
), Vergnaud, Jean-Roger (
committee member
)
Creator Email
eanava@gmail.com,emily.nava@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m3098
Unique identifier
UC1425198
Identifier
etd-Nava-3777 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-346986 (legacy record id),usctheses-m3098 (legacy record id)
Legacy Identifier
etd-Nava-3777.pdf
Dmrecord
346986
Document Type
Dissertation
Rights
Nava, Emily Anne
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
complex systems theory
prosody
rhythm
second language acquisition