Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Dynamics of consonant reduction
(USC Thesis Other)
Dynamics of consonant reduction
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
DYNAMICS OF CONSONANT REDUCTION
by
Benjamin Parrell
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(Linguistics)
August 2014
Acknowledgements
I am very grateful to have been able to spend my graduate career at USC, and to have
been able to work with the huge number of intelligent, supporting people here. From the
day I arrived, Louis Goldstein has helped me focus my thinking, expanded my intellectual
horizons, and helped me discover my own interest in speech science. I have also been lucky
to work closely with Dani Byrd, who not only gave me opportunities to lead a number
of research projects, but made me a much better experimentalist. I have also benefited
enormously from interacting with Rachel Walker and Khalil Iskarous. I’m happy to have
overlapped here with Sam Tilsen, who taught me to think both more creatively and more
expansively. Thanks to everyone else in the phonetics group, who have helped me both
academically and in life outside of school over the years.
Some of the most influential people at USC have been from outside the Linguistics
department. I am grateful to have been a part of SPAN, and to Shri Narayanan, Mike
Proctor, and Vikram Ramanarayanan. Adam Lammert, especially, has been a frequent
collaborator, sounding board, and occasional cheerleader (when I needed one). I have been
lucky to have received support from the NIH through the Hearing and Communication
Neurosciencegroup. Learningfromsuchadiversesetofexcellentscientistshasencouraged
me to ask new questions, explain myself more clearly, and made me a better scientist. It
has undoubtedly changed my research career for the better.
Robert Davis first taught me linguistics as an undergraduate, and inspired me to keep
pursuing it. Joaquín Romero not only graciously accepted me spending a year doing
ii
research with and learning phonetics from him after I finished my BA, but encouraged
me to go to graduate school in the first place and steered me towards USC. I wouldn’t be
where I am today without either of their help and influence.
My family—Anna Parrell, Sara Parrell, and Tom Parrell—has been enormously sup-
portive along the way, and helped me continue through both the good and the difficult
times. Katy Stanton has been supportive and patient when I actually needed to finish.
Thank you all.
iii
Table of Contents
Abstract v
I Introduction 1
I.1 Outline of the dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
II Spirantization of voiced stops in Spanish 9
II.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
II.1.1 Production of Spanish stops . . . . . . . . . . . . . . . . . . . . . . 9
II.1.2 The nature of /b, d, g/ and how they are distinguished from /p, t, k/ 11
II.1.3 Proposal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
II.2 Articulatory study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
II.2.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
II.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
II.2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
II.3 Gestural simulation study . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
II.3.1 The Task Dynamics model . . . . . . . . . . . . . . . . . . . . . . . 33
II.3.2 Model input for Spanish stops . . . . . . . . . . . . . . . . . . . . . 33
II.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
II.3.4 Further evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
II.3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
II.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
II.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
IIIProsody and articulatory posture differences in cross-linguistic coronal
reduction 48
III.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
III.1.1 English flapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
III.1.2 Comparing English and Spanish reduction . . . . . . . . . . . . . . 52
III.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
III.2.1 Subjects and stimuli . . . . . . . . . . . . . . . . . . . . . . . . . . 53
III.2.2 Real-time MRI data collection . . . . . . . . . . . . . . . . . . . . 55
III.2.3 MRI data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
III.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
iv
III.3.1 American English . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
III.3.2 Spanish . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
III.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
III.5 Cross-linguistic evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
III.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
IVA dynamical-systems model of prosodic variability and possible sound
change 92
IV.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
IV.2 Modeling variability in production from phonological invariance . . . . . . 97
IV.3 Recovery of phonological targets . . . . . . . . . . . . . . . . . . . . . . . 108
IV.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
V Discussion and conclusions 122
V.1 Comparisons with other theories . . . . . . . . . . . . . . . . . . . . . . . 125
REFERENCES 132
v
Abstract
Many speech sounds undergo weakening, or lenition. Flapping of English /t/ intervo-
calically is one well-documented example (e.g. wri[t]e vs. wri[R]er), as is the shift of the
series /t:, t, d/ to /t, d, Ø/ from Latin to many Romance varieties. Many phonological
analyses of lenition have been proposed, ranging from feature spreading to a modulation
of the intended gestural constriction degree at the planning level. Most of these analy-
ses, however, fail to account for the wide variability in realization of segments which can
undergo lenition; this variation is often relegated to a separate phonetic level or ignored
entirely. This dissertation suggests an alternative proposal, based in Articulatory Phonol-
ogy, which posits that many cases of lenition can be attributed to an intrinsic link between
duration and magnitude of speech gestures. Specifically, I hypothesize that lenition results
from undershoot of articulatory targets caused by reduced gestural duration, particularly
in prosodically weak positions.
This dissertation consists of three studies. The first two explicitly test the hypothesis
that lenition is driven by prosodically-conditioned temporal reduction. The first chapter
examines spirantization in Spanish voiced stops via electromagnetic articulometry and
articulatory modeling. The second study tests reduction in coronal oral and nasal stops
(/t/, /d/, and /n/) in American English and Spanish using real-time magnetic resonance
imaging. Results from both studies support the hypothesis that spatial reduction is linked
to prosodically-conditioned temporal variation. The third study proposes a model of how
this prosodically-conditioned variability could lead to diachronic sound change over time.
vi
Chapter I
Introduction
This dissertation deals principally with consonant reduction, also often referred to as
lenition or weakening. Such processes are common across languages, but one well-known
example is English /t/. When this segment occurs at the beginning of a word, it produced
as a voiceless, aspirated stop with long VOT ([t
h
]) or as an unaspirated, and sometimes
unreleased, stop word-finally ([t] or [t^]), but occurs as a flap ([R]) before an unstressed
vowel. For example, the /t/ in writer ([raIR@~]) is reduced compared to the /t/ in write
([rait]).
Though lenition has been a frequent subject of interest in phonological research, it is
often vaguely defined. A common definition is "A segment X is said to be weaker than a
segment Y if Y goes through an X stage on its way to zero" (originally from Venneman,
cited in [81, p. 165]). This is fundamentally a historical perspective, as it relies on a
series of documented sound changes over a large number of languages to figure out what
segments may fit the requirements for Y and X. Though the precise typology of which
segments can reduce to which others differs slightly from proposal to proposal, one such
typology of lenition—established in [71]—can serve as an illustrative example and is shown
in Figure I.1.
1
t:
d: t
d θ ʔ
ð h
r,l ɦ
j
Ø
Figure I.1: Typology of possible lenitions, after Hock [71] and Bauer [6]. Dashed lines
show possible but unobserved changes.
A wide variety of phonological explanations have been put forward for lenition, includ-
ing changes in phonological features such as [cont], spreading of features from adjacent
segments, feature underspecification, consonant strength, movement along the sonority
scale, or decreases in consonant strength (see [88] and [100] for a thorough review of these
proposals). None of these proposals can explain all the phenomena generally referred to
as lenition, however [88, 89].
LookingforamoregeneraldefinitionoflenitionthatcapturesthephenomenainFigure
I.1, Kirchner [88] summarizes the types of changes referred to as lenition as reduction in
constriction degree or duration, including degemination, flapping, spirantization, loss of
2
oral constriction, complete loss, and voicing of unvoiced segments (this last category a
reduction in a glottal opening movement). This approach has the advantage of avoiding
reliance on historical change as the basis of lenition—such processes can easily be seen
in synchronic variation in many languages (though the diachronic changes still fall under
the scope of this definition, as well). He proposes that these spatiotemporal reductions
result from a planned decrease in articulatory effort, which he captures in an Optimality-
Theoretic framework using a family of constraints (Lazy) mitigating against effort in
speech.
A proposal on similar lines is Lindblom’s H & H theory [103, 104]. This theory starts
from the initial finding that durational reduction leads to spatial reduction in vowels
[102]. However, subsequent research found that this is not always the case; while such
durationally-conditioned undershoot does indeed frequently occur, in other cases subjects
produce short movements with higher velocities and no spatial undershoot [93, 103, 50,
126, 167]. H & H theory proposes that speakers control the overall amount of articulatory
effortduringspeech, eitherhypo-or hyper-articulating(henceH&H).Inthisview, spatial
reduction is conditioned by duration, but only for a given amount of effort. Speakers can
chose to be more effortful (further along the continuum towards hyper speech) by increas-
ing the velocities of their speech movements to avoid such spatial reduction. A notable
difference between this theory and Kirchner’s Lazy constraint family is that this setting
of effort takes place on a global scale rather than as a segment-by-segment phonological
optimization problem.
While widely cited, the least-effort hypothesis has come under critical scrutiny. Pri-
marily, the problem is one of circularity: since there is no clear way to measure metabolic
3
effort in speech movements, it is simply assumed that reduced segments are less effort-
ful [145]. It is also not clear that a simple measurement, such as displacement relative to
time, is able to capture articulatory effort given the complex structure of the speech motor
system [145, 87]. An attempt to demonstrate that reduced effort results in more reduced
productions found inconclusive results [83], though it is not clear that the methods used
in that study (induced inebriation) cause subjects to speak in a less effortful manner.
An alternative is that, rather than having an articulatory cause, lenition is planned
at the phonological level in order "to increase intensity and thereby reduce the extent to
which the affected consonant interrupts the stream of speech" [87, p. 1]. In this view,
lenition is a way for a speaker to signal the absence of a prosodic break to a listener. This
proposal is similar to the idea that lenition is a decrease in sonority as discussed above,
but with a functional motivation.
All of these analyses, with the exception of H & H theory, fail to account for the wide
variability in realization of segments under lenition, which is often relegated to a separate
phonetic level or ignored entirely. This basis of this dissertation is that the variability in
production is not merely phonetic but necessarily reflects the interaction of the phono-
logical, prosodic, and speech motor control systems, and that produced variability can
be studied to understand the underlying phonological control of the speech production
system. The theory of Articulatory Phonology [19, 21, 60], in fact, explicitly predicts
that reduction at both the "phonological" and "phonetic" level (at least, according to
traditional analyses) arise from the same cause: spatial reduction of speech articulator
movement.
Theworkin this dissertationproposes thatlenitionis theundershootofan articulatory
target. These articulatory targets are taken to be invariant at the level of phonological
4
control, following the theory of Articulatory Phonology [19, 21]. A somewhat similar
approach can be seen in the definition given by Bauer, who defines lenition as "the failure
to reach a phonetically specified target: articulatory undershoot or underachievement"
[6, p. 611], though that proposal does not specify what might cause such undershoot.
Similarly, this proposal builds on the suggestion that, within the theory of Articulatory
Phonology, spontaneousreductionsincasualspeechmightarisefromreductionsingestural
magnitude [18]. Both of these previous analyses, however, do not concretely hypothesize
about the cause of gestural undershoot. The current proposal views undershoot not as
a random process but as conditioned by duration. Everything else being equal, long
movements will result in larger spatial displacements (and achievement of the articulatory
target, given enough time), while decreases in duration will lead to smaller displacement
and undershoot of the target. Understanding the cause of gestural undershoot is crucial
for explaining regular patterns of lenition.
One study has attempted to differentiate between the sonority, effort, and spatiotem-
poral reduction hypotheses [100]. This study examined reduction in English and Spanish
via speech acoustics and electropalatography. Thus study found tentative support for the
spatiotemporal reduction hypothesis, though with a few caveats (though see the discussion
in Chapter V regarding some methodological issues with this work that may have confused
some of the results). Importantly, this study and the accompanying cross-linguistic survey
of reduction patterns highlighted the important role of prosodic structure in lenition, par-
ticularly the short durations found in prosodically weak positions (such as word-medially
in English and phrase-medially in Spanish). Similar evidence for reduction in prosodically
weak positions has been found in other work (e.g. [142, 43]).
5
The work in the following chapters proposes that prosodically weak positions, typically
characterized by short segmental durations, lead to undershoot of articulatory targets.
What counts as a "weak" prosodic position in any particular language is an open ques-
tion. As a starting point, we can take the general prosodic hierarchy as discussed in, e.g.,
[157], whereprosodyisunderstoodtobetheorganizationalframeworkthatsectionsspeech
into smaller units. Though there are slight differences between particular proposals, there
is general agreement that the prosodic structure of speech forms a hierarchical structure,
with larger units at higher levels encompassing multiple smaller, lower-level units. A gen-
eral schema might be Utterance > Intonational Phrase > Intermediate or Phonological
Phrase > Prosodic Word > Syllable [84]. It is clear that different languages show different
patterns of reduction—many show reduction only within the prosodic word, while others
show reduction within an intonational or phonological phrase, regardless of word bound-
aries [100]. The question of how different languages, and perhaps different individuals,
implement the prosodic hierarchy has received some attention [84, 36, 52, 28, 27, 92], yet
there is as of yet no theory of cross-linguistic prosodic implementation. As such, it is
difficult if not impossible to say a priori which boundaries may count as "weak"—that is,
are conducive to reduction processes—in a particular language. The important thing for
the current proposal is that the patterns of reduction in many, if not all, languages, are
influenced by prosodic structure.
This analysis captures the importance of prosodic position in reduction (c.f. [100]) but,
by realizing that prosodic factors influence the fundamental factor of movement duration
to drive reduction, allows for a more complete understanding of the underlying cause of
reduction. While weak prosodic position causes shorter durations, so do faster speech,
more casual speech registers, and other factors, all of which have been associated with a
6
higher frequency of reduction [159, 103, 104, 88, 89, 91, 18]. As will be shown in Chapter
II, this single dynamic approach also accounts for contextual influences on reduction, such
as the height of adjacent vowels.
While this proposal focuses principally on explaining synchronic lenition patterns, it
also provides the basis for how such patterns may be phonologized over time and lead to
different types of sound change. Such diachronic considerations are discussed in Chapter
IV.
I.1 Outline of the dissertation
Chapter II consists of an articulatory investigation of spirantization in Spanish voiced
stops in comparison to voiceless stops. A dynamical account of reduction is proposed
usingarticulatorymodelingtoderivesimilarpatternsofprosodically-conditionedvariation
found in the articulatory study. Further modeling extends this analysis to contextual and
dialectal variation. The majority of this chapter has previously appeared in Laboratory
Phonology [136].
Chapter III compares reduction of coronal stops in English and Spanish. Both lan-
guages show prosodically-conditioned reduction of /t/ and /d/, though the outcomes differ
between the two languages: English reduces both /t/ and /d/ to a flap, while Spanish
reduces only /d/ to an approximant. An articulatory study is presented comparing the
two languages. A subset of this data appeared in Proceedings of the 10th International
Seminar on Speech Production [138].
7
Chapter IV presents a model of lenition-based sound change. First, a dynamical sys-
tems model is proposed to account for various patterns of prosodically-conditioned reduc-
tion seen synchronically. Second, a process is outlined whereby language learners recover
the underlying control parameters of the system from the variability in production in their
ambient language. It is shown how errors in this process may lead to both prosodically-
conditioned and unconditioned sound changes. This work was done collaboratively with
Adam Lammert.
Chapter V presents a general discussion of the results in the previous chapters and
compares the results with other theories of lenition.
8
Chapter II
Spirantization of voiced stops in Spanish
II.1 Introduction
II.1.1 Production of Spanish stops
All dialects of Spanish are characterized by a process of spirantization, wherein the voiced
stops /b, d, g/ are produced with full occlusion only phrase initially, after a homorganic
nasal, or, for /d/ only, after /l/. In all other positions they commonly are realized as
the voiced approximants [B, D, G]. The first treatment of this alternation in the generative
tradition was Harris [67]. That study analyzed these sounds as underlying stops that
undergo a rule of lenition except after a homorganic, non-strident obstruent or a phrase
boundary. Thisproposalhasbeenwidelyadopted, ingeneralterms, asthecorrectanalysis.
A number of later works in both linear and autosegmental phonology have adopted the
same or similar analyses [57, 75, 118]. One recent approach in Optimality Theory also
specifiesthesesoundsasvoicedstops,andreliesonmarkednessconstraintsandarticulatory
effort (after Kirchner [88, 89]) to drive spirantization [141].
Another view is that these sounds are underlying approximants that undergo a process
of fortition to derive the full stops. The first proposal along these lines is found in Lozano
9
[110] who proposed that these sounds are unspecified for the feature [ continuant], with
a rule that fills in [+ continuant] where appropriate and a complementary rule that fills in
[– continuant] elsewhere. González [62] extends this interpretation to Optimality Theory,
arguing for a constraint ranking that derives the correct output regardless of whether stops
or spirants are specified in the input. Other proposals in Optimality Theory have gone
farther, explicitly specifying the input for OT analysis as spirants [3, 4].
The assumption underlying most of these proposals is that there exists a clear alterna-
tion between the stop and approximant allophones. However, there is a large and growing
body of evidence that the production of these segments is variable in all positions, and
can be influenced by a large number of factors. A number of studies have used amplitude
ratio–that is, the ratio of the amplitude minimum during a consonant to the maximum of
the following vowel–as a measure of constriction degree. Stress seems to play a significant
role in the realization of these segments, with longer and more constricted productions at
the onset of a stressed syllable compared with an unstressed one [39, 45, 134]. The pre-
ceding and following segments have also been shown to influence production, though the
results are somewhat inconsistent across studies [33, 39, 80, 134]. Speech rate also plays
a role, with fast speech leading to more open productions [159]. There is also evidence
that, for the dialect spoken near Toledo, Spain, /b, d, g/ can be realized as voiceless stops
in syllable initial position while /p, t, k/ often surface as voiced [163]. In general, these
studies show productions that range from wide approximants to full stops both where,
in the traditional description, we expect spirantization to occur and where we expect full
occlusion.
In addition to the variability found in the production of the voiced stops, recent work
has shown that the voiceless stops themselves are highly variable. Machuca [111], in
10
her examination of casual speech in Barcelona, finds that roughly 40% of productions of
phonologically voiceless stops show some degree of voicing. Additionally, about 9% of the
voiceless stops in her corpus were produced as approximants, the majority voiced. Other
studies have found similar results, though there were large differences in the frequency
with which the two processes occur between different dialects [101, 116, 79].
We are left with a situation where /b, d, g/ can be realized as voiced stops, voiced
approximants, and perhaps even voiceless stops, while /p, t, k/ can be produced as voice-
less stops, voiced stops, or even voiced approximants. The question we must ask, then, is
how Spanish speakers reliably distinguish these sounds. There is experimental evidence
to support that they are, in fact, reliably distinguished in both production and perception
[150]. That study found that, even though there was some overlap in intensity ratio (used
as a measure of constriction degree and voicing) between /p, t, k/ and /b, d, g/, they were
nonetheless significantly different in production. Additionally, listeners correctly identified
even the voiceless stops with the highest intensity ratio (most voiced/approximant pro-
duction). Hualde et al. [79] also found even voiced productions of /p, t, k/ to be produced
with greater constriction than /b, d, g/. The large variability in production also raises
the undecided question of the phonological representation of /b, d, g/. Are these stops
that spirantize, approximants that undergo fortition, or perhaps something else entirely?
II.1.2 The nature of /b, d, g/ and how they are distinguished from /p,
t, k/
One possible difference between /b, d, g/ and /p, t, k/ is their duration. There is ample
evidencethatthevoicelessseriesisgenerallylongerthanthevoicedseries[69,100,148], for
Catalan, which shows the same pattern of spirantization. Additionally, there is evidence
11
that listeners use duration to distinguish the voiced and voiceless stops. Reducing the
period of silence associated with the closure of a voiceless stop causes it to be perceived
as voiced [112, 114]. Machuca [111] found that voiced, spirantized /p, t, k/ are still longer
than phonological /b, d, g/ produced in the same manner. Based on this, Hualde [76]
posits that, in addition to a voicing distinction that may be neutralized in some contexts,
duration still reflects the phonological contrast between /p, t, k/ and /b, d, g/.
There is some evidence, however, that /p, t, k/ can be realized with the same duration
as /b, d, g/ [115, 116]. Some authors take the fact that these two sets of sounds generally
differ in voice, constriction degree, and duration, but in no category consistently, as evi-
dence for a phonological distinction by the feature [ tense] [69, 117, 112, 113, 114, 115,
116]. However, most of the evidence used in support of phonological tension comes from
durational differences between voiced and voiceless stops. Martínew Celdrán writes of
short voiceless stops, "[i]n fact, if attention is paid to the sound in question, it is perceived
as voiced. By contrast, if we hear the whole word, the sound is perceived as voiceless,
based on the use that we make of our knowledge of the word and the context." ([115, p.
43]). He takes this as evidence that the feature [tense] must distinguish the stops here,
but it can easily be taken as evidence that the stops in fact do not differ absolutely, but
that contextual knowledge (e.g. all segments are short because the talker is speaking at
a fast rate) conditions our perception of the stops. The question of the articulatory or
acoustic causes or consequences of tension are left unresolved.
An issue with both proposals discussed here is that, in focusing on the phonological
distinction between /p, t, k/ and /b, d, g/, they fail to account for the large amount of
variation in production. Why, for example, do both tense and lax stops surface as stops
after a nasal or phrase initially but lax stops spirantize elsewhere? It is not made clear
12
what the relationship between phonological representation and attested phonetic variabil-
ity is. Two relatively recent proposals have attempted to do just that. Lavoie [100], in her
book on lenition, finds the voiced stops are consistently shorter than and have a higher
intensity ratio than voiceless stops. She concludes that /b, d, g/ are therefore underlying
approximants (in contrast, one assumes, with the voiceless stops) that undergo strength-
ening, which "may simply be an articulatory error, an overshot closing gesture" [100]:
169. Additionally, the realization of approximants as stops following a nasal consonant
is due to either mistiming of the oral and nasal gestures or a repair to a sequence of
nasal + continuant segments that are articulatorily incompatible. Carrasco and Hualde’s
approach is similar in general terms, though it makes explicit reference to articulatory
gestures [33]. These authors argue that, while spirantization of voiced stops most likely
started as gestural reduction, since intervocalically it is nearly universal, the constriction
target for the stops in that position must be that of an approximant. They rely on an
allophonic distribution of articulatory gestures (full stops after nasals and at a phrase
boundary, approximants elsewhere) to derive the attested patterns.
It is not clear from these proposals, however, why /B
, D
, G
/ (if we assume underlying
approximants) should surface as stops after nasals. Honorof [73] showed that nasals as-
similate to both the place and constriction degree of following non-continuants, so that in
an /ns/ sequence [n] is produced with the same stricture as [s]. Given that these nasal
+ continuant sequences are possible, and the fact that /b, d, g/ have the same surface
constriction degree as fricatives [149], we would expect /mb/ to be realized as [m
B], rather
than the attested [mb]. Carrasco and Hualde’s solution of allophony, while descriptively
adequate, does not address any larger causes behind the distribution of allophones. Nor
13
does either theory account for the differences seen in constriction degree due to stress,
speech rate, or segmental context.
II.1.3 Proposal
This chapter argues for an analysis of Spanish stop spirantization based in Articulatory
Phonology [19]. In this theory, the basic units of abstract phonological contrast are ges-
tures, which are also taken to be the units involved in the control of articulator motion.
These gestures are goal-directed actions with particular dynamical parameters set for
stiffness (implemented via a mass-spring model), constriction degree, and constriction lo-
cation. Importantly, and different from most other theories of phonology, the duration of
gestures can also explicitly form part of the specification for each gesture.
The argument here leverages the fact that temporal differences can serve as a contrast
between two gestures, and is based on two findings from previous studies. First, Spanish
voiced and voiceless stops show significant durational differences. Second, there is ample
evidence that duration and constriction degree of voiced stops in Spanish are related,
and that productions of these segments with longer temporal durations have more spatial
occlusion. It follows, then, that the shorter duration of voiced stops may lead to their less
constricted productions, and that the difference between voiced and voiceless stops (in
addition to the presence of a glottal spreading gestures for the voiceless stops) may lie not
in their target constriction degree but solely in their duration. Constriction differences,
as well as the widespread variation in constriction within each class, can be attributed to
the temporal differences.
Fromtherelativelysimpleassumptionthatvoicedandvoicelessstopsdifferinduration,
anumberofconsequencesfallout. First, thesamegesturaltargetasthatforvoicelessstops
14
will result in greater undershoot, and systematic spirantization, for voiced stops due to
their shorter active duration. Second, this account provides a straightforward explanation
of phrase-initial strengthening of voiced stops. It has been shown that increased duration
leads to a closer approximation/achievement of target for gestures at a prosodic boundary
[29, 32, 27, 36, 35] Given this extended duration, it follows that gestures at a phrase
boundary will achieve their target. This extended duration, however, will result in full
closure (as is the case is Spanish) only if the target is full closure; adding duration to
a gesture for an approximant, for example, will not result in closure. By explaining the
surface alternation in constriction degree as the dynamic consequences of interactions of a
single, invariant spatial target and variable gesture duration, this approach avoids needing
to posit different allophones at the level of gestural control. Last, this theory predicts (in
agreement with previous work) that the amount of spirantization will vary with duration;
shorter productions will also be less constricted.
An articulatory study was conducted to test the above hypothesis. The study aims to
examine the duration of the relevant constriction gestures, their constriction degree, and
the relationship between constriction degree and duration.
II.2 Articulatory study
II.2.1 Methods
Stimuli and Subjects
For this study, data were collected from two subjects (A and B), both native speakers of
northern peninsular Spanish. The subjects lived in Spain until attending school in the US
and since then have lived on-and-off in the US and Spain. The subjects, even while in the
15
US, use Spanish on a daily basis. Neither subject reported any previous history of speech
or hearing impairment.
Stimuli were designed in order to examine articulatory differences in production of
voiced and voiceless stop consonants in a variety of prosodic conditions. Due to difficulty
in measuring the constriction degree for velar and coronal consonants [149], stimuli were
limited to the labial stops (/p/ and /b/). Measuring coronal constriction for Spanish
is particularly difficult as the very tip of the tongue, rather than the blade, is used. A
pilot conducted for this study using electromagnetic articulometry showed very unreliable
measurements for tongue tip constriction degree with a sensor approximately 8 mm dorsal
to the tongue apex. In all cases, the stimulus contained the sequence /aCa/, where C
is one of the target stops. In addition to voicing (voiced vs. voiceless), conditions were
included to test the effect of prosodic boundaries on the production of intervocalic stops.
The prosodic boundaries examined included: phrase boundary, word boundary, and no
boundary(wordinternal)conditions. Lexicalstressforthetargetwordswasbalanced,with
targetstopsoccurringintheonsetpositionofbothstressedandunstressedsyllables. These
variables, along with the stimulus word used in the experiment, are presented in Table
II.1 (stress, indicated by ´, is shown even where not normally present in the orthography).
Target words were chosen to minimize the effects of coarticulation by alternating labial
and coronal/velar stops when possible. To create each stimulus, the target words were
embedded in a carrier phrase. The carrier phrases used for each condition, presented in
Table II.1, were varied in order to avoid any possible repetition effects.
Due to an error in the presentation of the stimuli, there were no data were collected
for /b/ in word-internal position. Data for this condition were taken from other sentences
collected during the same experimental session. For chaval, relevant data were taken
16
Boundary Stress /p/ /b/
Phrase boundary Stressed
pága (/paga/) vága (/baga/)
(s)he pays (s)he wanders
Unstressed
pagába (/pagaba/) vagába (/bagaba/)
(s)he payed (s)he wandered
Word boundary
Stressed
pánta (/panta/) bánda (/banda/)
ribbon band
Unstressed
pantálla (/pantaja/) bandáda (/bandada/)
screen flock
No boundary
Stressed
Tapádo (/tapado/) chavál (/tSabal/)*
covered kid
Unstressed
Tápa (/tapa/) faltába (/faltaba/)*
lid (s)he delayed
Table II.1: Target words divided by independent variables. Items with a (*) were taken
from other carrier phrases used in the same study (see text for details).
17
Boundary Stress Carrier phrase
Phrase boundary
Stressed
La chica juega. también.
The girl plays. She as well.
El niño salta. también.
The boy jumps. He as well.
Unstressed
El chico jugaba. también.
The boy jumped. He as well.
La niña saltaba. también.
The girl jumped. She as well.
Word boundary Stressed/Unstressed
El chico canta dos veces.
The boy sings twice.
La niña canta muchas veces.
The girl sings many times.
No boundary Stressed/ Unstressed
Las chicas cantan dos veces.
The girls sing twice.
Los niños cantan muchas veces.
The boys sing many times.
Table II.2: Carrier phrases for different prosodic boundary and stress conditions.
18
from all repetitions of the sentences El chaval tardaba demasiado / El chaval danzaba
demasiado; for faltaba, of the sentences El chico canta faltaba dos veces / La niña canta
faltaba muchas veces. The stimuli were randomized across variables in 6 blocks. For each
target word, a different carrier phrase was used in odd- and even-numbered blocks so that
1) each block contained one and only one stimulus per target word and 2) adjacent blocks
contained different stimuli. The stimuli were presented on a computer monitor positioned
roughly 1m away from the subject. When cued by an auditory click, the subject read
the sentence on the screen; after the subject read each sentence, the monitor changed to
show the next sentence in the block and there was a roughly two second pause before the
subject was cued again. There was a break of a few minutes at the end of every block.
While it was intended to collect six blocks per subject, only four could be collected for
subject A due to time limitations. This gave four repetitions of each target for subject A
and six for subject B. Prior to data collection, the subjects were instructed to speak in a
casual, relaxed manner, as if speaking with close friends.
Data Collection
Articulatory data was collected using an electromagnetic articulometer (Carstens AG500).
This device allows three-dimensional tracking of transducers glued to various points in the
subject’s vocal tract. For this study, transducers were attached at the vermillion border of
boththeupperandlowerlips, thetonguetip(fordatacollectedconcurrentlyforaseparate
study),apointonthetonguedorsumapproximately2cmposteriortothetonguetipsensor
(as above), and the lower jaw. Additionally, reference sensors were attached to the bridge
of the nose and behind each ear; a sample of the subject’s occlusal plane was also taken.
Articulatory data were collected at 200 Hz, and acoustic data at 16 kHz. After collection,
19
the articulatory data was smoothed with a 9th-order Butterworth low pass filter, rotated
to match the subject’s occlusal plane and corrected for head movement using the reference
sensors.
Data Analysis
In order to measure Lip Aperture (LA), the Euclidean distance in the sagittal plane
between the sensors on the upper and lower lip was calculated. This derived variable was
used for all subsequent analysis. Gestural identification was conducted using the MVIEW
software package, developed by Mark Tiede at Haskins Laboratories. The identification
algorithm used takes as input a manually located estimate of the midpoint of constriction
of one EMA sensor or derived variable. Using the velocity of that sensor or variable (the
absolute value of the first difference of the signal), it then locates the velocity minimum
crossing closest to the input point (measurement point: time of maximum constriction).
It then finds the peak velocity between that point and both the preceding and following
velocity minima (measurement point: time of peak velocity). It then locates the onset of
gestural motion by locating a point where the velocity signal from the preceding minima
to the first time of peak velocity crosses some arbitrary threshold of the velocity difference
between the two points. Gestural offset is defined as the point where the velocity falls
below the same threshold from the second time of peak velocity to the velocity minimum
following the point of maximum constriction. Onset and offset of the constriction proper
are also defined by the points where the velocity crosses a threshold between the times of
peak velocity and the point of maximum constriction. A representative example is shown
in Figure II.1.
20
LA velocity
LA
Audio
Gesture
Offset
Max.
Const.
Const.
Onset
Peak
Velocity
Peak
Velocity
Gesture
Onset
Const.
Release
Total duration
Movement duration
Constriction duration
[ t a p a n t ]
El chico canta panta dos veces
Figure II.1: Representative example of measurement of a constriction gesture. The exam-
ple here is /p/ taken from "El chico canta panta dos veces." LA in the figure refers to Lip
Aperture, the Euclidean distance in the sagittal plane between the sensors on the upper
and lower lips.
Threshold for identification of constriction onset and offset were set at 30%, and those
for gesture onset and offset at 20%. These thresholds are consistent with past work (e.g.,
[30]) and allowed for consistent measurements across subjects, with small fluctuations
in velocity during the constriction captured between the points of constriction onset and
offset. Alllocationswerecheckedbyhand. Foronetrialofword-internal/b/,thealgorithm
incorrectly identified the points of maximum constriction, constriction release, peak offset
velocity, and gesture offset (these points were labeled after the following vowel). This trial
was excluded from further analysis.
21
From these measurements, a number of derived variables were calculated. First, total
duration of the gesture was defined as the time between the gesture onset and the constric-
tion release. This latter point was chosen as it coincides with the theoretical end of active
control of the gesture [17, 153], and has been used in previous work as the location of the
end of a gesture [146]. Movement after this point is heavily influenced by the following
gestures. Total duration was further broken down into constriction duration (time be-
tween onset and release of constriction) and movement duration (time from gesture onset
to constriction onset). In order to better compare LA across both subjects, normalized
constriction degree was also calculated. This was done by measuring the minimum value
(most constricted) across all tokens at the point of maximum constriction and subtracting
that value from the measurement for each individual token, giving a scale where 0 mm is
the most constricted, and higher values correspond to more open productions.
Statistical analyses were conducted using linear mixed models implemented in the
lme4 package in R [5]. For all tests, fixed factors were boundary (phrase boundary, word
boundary, or no boundary), segment (/b/ or /p/), and an interaction between boundary
and segment. Subject was included as a random intercept. Markov Chain Monte Carlo
samplingbasedonthetstatisticwasusedtoestimatethepvalues[1]. Post-hoccomparison
of factor levels was done with paired t-tests using a Bonferroni correction for multiple
comparisons with an experiment-wise alpha of 0.05. Regressions were conducted with the
linear regression function ‘regress’ in MATLAB.
22
II.2.2 Results
Duration
For total duration, there was a significant effect of prosodic boundary (t = -11.40, p
< 0.0001), though not of segment (t = -0.86, n.s.). There was a marginally significant
interaction between the two factors (t = 1.76, p < 0.08). A box plot showing the durations
at the different conditions is shown in Figure II.2. Given that post-hoc tests reveal a
significant difference between phrase boundary (M = 219 ms) and non-phrase boundary
(M
wordinternal
= 79 ms, M
wordboundary
= 83 ms) conditions and no difference between the
non-phraseboundaries, aseparatemodelwasconductedcombiningthetwophrase-internal
positions. In this case, there was a significant effect of boundary (t = 19.48, p < 0.0001),
segment (t = 2.64, p < 0.01), and their interaction (t = -2.90, p < 0.01). At the non-
phrase boundary level, voiceless stops (M = 90 ms) are significantly longer than voiced
stops (M = 72 ms); there is no difference at the phrase boundary level.
The pattern for constriction duration closely mirrors that for total duration. Figure
II.3 shows the constriction duration split by prosodic boundary and voicing. There is a
significant effect of prosodic boundary (t = -11.70, p < 0.0001), though not for voicing (t
= -1.15, n.s.). There is, however, a significant interaction between the factors (t = 2.41,
p < 0.05). As for total duration, there is a difference between phrase boundary (M =
131 ms) and non-phrase boundary conditions (M
wordinternal
= 30 ms, M
wordboundary
= 31
ms), with no difference between word internal and word boundary conditions. Again, a
second model combining phrase-medial shows significant effects of all factors (t
boundary
=
20.32, p < 0.0001; t
voicing
= 3.76, p < 0.0001; t
interaction
= -4.10, p < 0.0001). Post-hoc
23
50
100
150
200
250
300
/b/ /p/
phrase boundary
/b/ /p/
word-internal
/b/ /p/
word-boundary
Total duration (ms)
Figure II.2: Total duration by prosodic boundary and voicing. For this and all following
boxplots, thecentermarkrepresentsthemedian, theedgesoftheboxarethe25thand75th
percentiles, and the whiskers extend to those values not considered outliers (more than
2.7 from the mean). /b/ is shorter than /p/, except at a phrase boundary. Durations of
both /b/ and /p/ are longer phrase-initially than either phrase-medial position.
tests reveal a significant difference between voiced (M = 22 ms) and voiceless (M = 40
ms) phrase-internally but not at a phrase boundary.
Movementduration, likeconstrictionandtotalduration, showsasignificantmaineffect
of prosodic boundary (t = -6.45, p < 0.0001), but differs in that it shows no significant
effect of voicing and no interaction between the factors. A model combining word- and
no-boundary conditions shows the same result. Movement duration is shown in Figure
II.4. For prosodic boundary, there is a significant difference between phrase boundary (M
= 88 ms) and non-phrase boundary conditions (M
wordinternal
= 49 ms, M
wordboundary
=
51 ms). As for total duration, the difference between word internal and word boundary
24
50
100
150
200
/b/ /p/
phrase boundary
/b/ /p/
word-internal
/b/ /p/
word-boundary
Constriction duration (ms)
Figure II.3: Constriction duration by prosodic boundary and voicing. /b/ is shorter than
/p/, except at a phrase boundary. Durations of both /b/ and /p/ are longer phrase-
initially than either phrase-medial position.
conditions is not significant. There is no difference between /b/ and/p/ for any boundary
condition.
Constriction degree
For Lip Aperture, there was a significant effect of prosodic boundary (t = 7.13, p < 0.0001)
and a marginally significant effect of voicing (t = -1.75, p < 0.08), as well as a significant
interaction between the two factors (t = -4.87, p < 0.0001). Post hoc tests reveal a
significant difference between phrase medial /b/ (M
wordinternal
= 3.4 mm,M
wordboundary
=
3.5 mm), on the one hand, and phrase initial /b/ (M = 1.4 mm) and /p/ in all prosodic
positions (M
wordinternal
= 1.3 mm,M
wordboundary
= 1.4 mm,M
phraseboundary
= 1.3 mm), on
25
40
60
80
100
120
140
160
180
200
/b/ /p/
phrase boundary
/b/ /p/
word-internal
/b/ /p/
word-boundary
Movment duration (ms)
Figure II.4: Movement duration by prosodic boundary and voicing. /b/ and /p/ have
equal duration in all prosodic positions.
the other. Normalized Lip Aperture is shown by prosodic boundary and voicing in Figure
II.5.
Relationship between constriction degree and duration
In order to test the initial hypothesis that the reduced constriction degree of voiced stops
was due to durational differences between voiced and voiceless stops, constriction degree
was regressed onto the duration measures. Additionally, since there was no difference in
durationorconstrictiondegreebetweenthetwophraseinternalboundaryconditions(word
boundary and word internal), they have been collapsed and treated as a single condition.
Phrase-initial and -medial conditions are treated separately. Correlations are presented
for each subject individually.
26
0
1
2
3
4
5
/b/ /p/
phrase boundary
/b/ /p/
word-internal
/b/ /p/
word-boundary
Normalized Constriction Degree (mm)
Figure II.5: Produced LA by prosodic position and voicing. /b/ is more open phrase
medially than phrase initial /b/ and /p/ in all phrasal positions.
For the phrase initial condition, there is no relationship between any measure of du-
ration and constriction degree for either /p/ or /b/ or for either subject. As both are
produced with full closure, this is not unexpected–once a gesture reaches the point of
maximum displacement, little change will be shown in Lip Aperture even with increased
duration. For the phrase medial condition, there is a significant negative relationship be-
tween constriction degree and constriction duration measures for /b/ for both subjects (A:
R
2
= 0.26, p < 0.02; B: R
2
= 0.32, p < 0.03). Additionally, subject A shows significant
correlations of constriction degree with both movement (R
2
= 0.21, p < 0.05) and total
duration (R
2
= 0.31, p < 0.005), though subject B does not. There is no significant corre-
lation of constriction degree with any duration measure for /p/ for either subject (again,
these are produced with full closure so we do not expect to see an effect of duration). A
27
plot of constriction degree and constriction duration, along with the relevant regression
lines, is shown in Figure II.6.
20 30 40 50
0
1
2
3
4
5
6
Constriction duration (ms)
Normalized constriction degree (mm)
20 30 40 50
0
1
2
3
4
5
6
Constriction duration (ms)
Normalized constriction degree (mm)
b
p
b
p
Figure II.6: Regression analysis between constriction degree and constriction duration
for /p/ and /b/, as measured by LA. Subject A is on the left; subject B, on the right.
Correlations are significant for /b/ for both subjects; neither subject shows a significant
correlation for /p/.
Acoustics
While the primary focus of this paper is an articulatory analysis, a brief overview of the
acoustic results may be informative. The stops produced by the subject in this study
followed the well-attested pattern in Spanish. Voiceless stops in phrase medial position
were categorically realized with full closure, as indicated by the consistent presence of
a silent period followed by a release burst. They were unvoiced, with 5-20ms of VOT.
28
Voiced stops in the same context were always realized as spirants, with robust formant
values visible throughout the consonant period and no release burst. In phrase initial
position, voiceless stops again were produced with full closure and a short VOT. Voiced
stops were generally produced as full stops with a period of prevoicing ranging from 0 to
103ms. These results are consistent with the articulatory findings.
II.2.3 Discussion
The initial hypothesis had 3 main predictions: 1) voiced and voiceless stops differ in
duration 2) voiced and voiceless stops differ in their produced constriction degree, at
least phrase medially, and 3) the durational differences between the two classes of stops
underlie the constriction differences. We can clearly see that voiced and voiceless stops
do indeed differ in duration, with the voiceless stops roughly 15-20 ms longer than their
voiced counterparts in phrase medial position, though this difference is not present at a
phrase boundary. This difference is attributable to the duration of the constriction of the
gesture, and not the movement prior to constriction. Notably, this durational difference
is somewhat smaller than what has been reported in previous literature. Lavoie [100],
for example, reports a difference of 41 ms between /p/ and /b/, and 51.5 ms for /t/ and
/d/, though the finding here is similar to the 18 ms difference reported for /k/ and /g/.
The differences between the current study and previous work may be due to the inherent
inaccuracies in measuring the duration of a non-stop consonant without clear breaks in
the visible formants from the acoustic signal and comparing those measurements to more
easilymeasurablestops. Theapproachtakeninthischapter, measuringthedurationofthe
gestures directly from articulator movement, seems a more accurate method of comparing
29
gestures with full occlusion to those without. Though the thresholds used for gestural
identification here are somewhat arbitrary, they are consistent between /p/ and /b/.
The data here also clearly support the traditional description of phrase medial spiran-
tization. Voiced stops in phrase medial position have a more open posture at their point
of maximum constriction when compared to voiceless stops, but phrase initial voiced and
voiceless stops have the same constriction degree, equal to that of voiceless stops phrase
medially. Because /b/ and /p/ differ in both constriction degree and duration, this chap-
ter initially proposed the hypothesis that the two do not differ in their target constriction
degree, but that constriction differences between voiced and voiceless stops are due to
shorter duration for the former, which leads to a greater degree of undershoot. However,
the data from this study do not entirely support this hypothesis. If we look to Figure II.7,
we can see that for any measure of duration, there is significant overlap in the duration of
/p/ and /b/, but little to no overlap in their respective constriction degrees. For example,
when stops have duration of 75-80 ms, the voiceless stops are consistently realized with a
LA roughly 2 mm less than the voiced stops for subject A.
It would seem that we must conclude that voiced and voiceless stops have different
targets for constriction degree. However, there is one additional possibility that must be
ruled out. There is ample evidence that it is possible to vary the speed of articulator
movement independently of changes in the magnitude and duration of that movement.
This has been referred to as the stiffness of a gesture [7, 31, 46, 151]. A gesture with
lower stiffness will take longer to reach its target position than the same gesture with a
higher stiffness. If voiced stops were to have a lower stiffness than voiceless stops, then we
would expect the voiceless stops to more closely approximate their target than the voiced
stop given the same duration of gestural activation. This hypothesis was tested from the
30
60 70 80 90 100
0
1
2
3
4
5
6
Total duration (ms)
Normalized constriction degree (mm)
60 70 80 90 100
0
1
2
3
4
5
6
Total duration (ms)
Normalized constriction degree (mm)
b
p
b
p
Figure II.7: Duration and constriction degree of /p/ and /b/. Data from subject A is on
the left; data from subject B, on the right. Durations for subject A overlap around 75-80
ms but repetitions show much less overlap in constriction degree.
experimental data. Stiffness was calculated in two separate ways. First, following Roon et
al. [151], stiffness was calculated as the peak velocity of the closing gesture (taken as the
velocity measurement at the point of peak velocity between gestural onset and constriction
onset), divided by the magnitude of the closing gesture (calculated as the difference in LA
between the point of maximum constriction and gesture onset). Second, after Byrd &
Saltzman [31], stiffness was measured by the time from gestural onset to the peak velocity
of the closing movement.
For the velocity/magnitude measure, there is a significant effect of prosodic boundary
(t = -10.72, p < 0.0001). Post hoc tests reveal that stiffness at a prosodic boundary (M =
31
1.50) is lower than that phrase medially (M
wordinternal
= 2.35,M
wordboundary
= 2.41). This
difference is expected, as there is a well-documented effect of lower effective stiffness of
gestures at a prosodic boundary [7, 31, 46]. Importantly, however, there is no difference
at all between voiced and voiceless stops at any prosodic level. For time to peak velocity,
we find the same pattern. There is a main effect of prosodic boundary (t = 5.14, p <
0.0001) and no effect of voicing nor an interaction between the two factors. Post hoc tests
indicate longer time to peak velocity at a phrase boundary (M = 49 ms) than phrase
medially (M
wordinternal
= 23 ms, M
wordboundary
= 25 ms). Based on the evidence from
both measures of stiffness, we must conclude that stiffness cannot be driving the observed
difference in constriction degree between voiced and voiceless stops.
We must conclude, given that we have excluded stiffness differences, that voiced and
voiceless stops differ in their target constriction degree, which is reflected by constriction
differences at the same total gestural and constriction durations. Does this imply that
the voiced stops have a target constriction degree similar to that for an approximant?
Not necessarily. Recall that the voiced stops in phrase medial position showed significant
effects of duration on constriction degree. As the duration increased, stops became more
constricted. Additionally, at very long durations (i.e. at a phrase boundary) the stops are
realized with full closure. This is consistent with a target constriction degree that results
in full closure, at least at a long duration.
There is evidence to support the fact that stops in general have targets for constriction
degree that are actually beyond the point of contacts between the articulators [109, 106].
We know that voiced stop targets must be less constricted than that for voiceless stops.
We can hypothesize, then, that the goal for voiced stops is simply closer to, but still be-
yond, the point of articulator contact than that for voiceless stops. If the short duration
32
of both stops causes roughly the same amount of undershoot (if, of course, both stops
have equal stiffness), it might be possible that this same absolute degree of undershoot
would still result in full contact for the voiceless stops, but lead to incomplete closure
and spirantization for the voiced. However, at long durations (such as a that at a phrase
boundary), both stops would have time to reach their targets, resulting in complete occlu-
sion for both. In order to test this hypothesis, a gestural simulation was conducted, the
results of which are presented in Section 3.
II.3 Gestural simulation study
II.3.1 The Task Dynamics model
The simulation study was conducted using the Task Dynamic Application (TaDA) devel-
oped at Haskins Laboratories to produce both articulatory and acoustic output from an
input of gestural parameters [128, 155]. The model is an implementation of the theories
of Articulatory Phonology [19, 21, 60] and Task Dynamics [154]. Within these theo-
ries, articulatory constriction actions are the basic compositional units of speech. These
context-invariant gestures’ temporal patterning is modeled by means of intergestural cou-
pling relations, represented in a coupling graph for a given utterance [23, 59, 61]. The
coupling graph in turn both represents an utterance’s syllabic phonological structure and
determines the coordination of the gestures in that utterance. For any given arbitrary
input from English or Spanish [139], gestural information is accessed from a dictionary
and a coupling graph is constructed. From the coupling graph, a gestural score is created
with the activation times and durations of the various gestures.
33
II.3.2 Model input for Spanish stops
Within the TaDA model gestures are specified by their constriction target, duration, and
stiffness. For this study, the words cava (/kaba/) and capa (/kapa/) were used as inputs
to the model. Word-initial /k/ and vowels /a/ were chosen in order to a) minimize
coarticulatory influences on the lips and b) accurately model the words used in the EMA
experiment in Section 2. The two words differed in the target constriction degree for the
labial stop as well as in the presence or absence of a glottal spreading gesture to generate
the voiceless /p/. For /p/, target constriction was set to -2 mm–that is 2 mm beyond
the point of contact between the upper and lower lips. This target is the standard in the
English version of the model, in agreement with articulatory evidence for virtual targets.
The target for /b/ was set to -0.5 mm, less than the target for /p/ but still beyond the
point of contact for the lips. An additional utterance was created with the target for /b/
set to 0 mm to test the hypothesis that the target for /b/ must, like that for /p/, be
beyond the point of articulatory closure. After the model generated the utterances, the
durations of the lip aperture gestures were modified directly in the gestural score. To
model the phrase medial condition, duration was set to 80 ms. This reflects a relatively
short /p/ and a relatively long /b/, but there are numerous occurrences of both stops
with this total duration in the EMA study. For the phrase initial condition, duration was
set to 200 ms, again based on the durations found in the articulatory study. For both /b/
and /p/, stiffness was set to the model default.
34
/p/ (target: -2 mm) /b/-0.5 (target: -0.5 mm) /b/0 (target: 0 mm)
Min LA undershoot Min LA undershoot Min LA undershoot
80 ms -0.8 mm 1.2 mm 0.6 mm 1.1 mm 1.0 mm 1.0 mm
200 ms -2 mm 0 mm -0.5 mm 0 mm 0 mm 0 mm
Table II.3: LA at the point of maximum constriction for /p/ and /b/ with two different
constriction degree targets generated by TaDA. The undershoot is the difference between
target and maximum achieved constriction degree.
II.3.3 Results
The articulatory output from TaDA was analyzed using the same MVIEW software and
settings as described in Section 2. Gestural landmark identification was conducted on
the LA signal created by TaDA, which is generated via the same calculation of Euclidean
distance between the upper and lower lips as was used previously. LA at the point of
maximum constriction is compared in Table II.3. In order to test the hypothesis about
different consequences of undershoot for voiced and voiceless stops, the amount of under-
shoot (in mm) was calculated as the maximum constriction less the target constriction
degree.
As predicted, /p/, =b=
0:5
, and =b=
0
show the same amount of undershoot at both
durations. At 80 ms, all undershoot their targets by roughly 1 mm. At 200 ms, all
reach their targets. Though the amount of undershoot for the three gestures is equal, the
consequences of that undershoot are not. For /p/ undershooting the target still results in
closure (any LA of 0 or less is complete closure of the lips). For both /b/s, however, the
same absolute amount of undershoot results in an incomplete closure, with a maximum
LA of 0.6 mm/1.0 mm. At 200 ms, all the gestures reach their target. For /p/ and=b=
0:5
,
this results in complete closure with a negative constriction degree (which translates to full
35
closureandperhapscompressionoftherelevantarticulatorsinreal-worldspeech). For=b=
0
the result is the articulators touching briefly but without any pass-through/compression.
These differences are visible in the acoustic signal generated from the articulatory
patterns by the model, shown in Figure II.8. TaDa generates these acoustic signals by
tracking the changing aerodynamic conditions generated by the model articulators and
generating acoustic output from those conditions via HLSyn [66]. There is a complete
closure for /p/ at both durations, with no formants visible and a strong release burst. For
=b=
0:5
and =b=
0
, on the other hand, there are visible formants throughout the closure
period at 80ms, with no visible release burst. At 200 ms, =b=
0:5
achieves full closure,
with no visible frication and a clear release burst. On the other hand, for =b=
0
, achieving
the target constriction degree results in a brief period of acoustic silence followed by a
period with high-frequency aperiodic noise and no release burst. It may be noted that
there is devoicing of both =b=
0
and =b=
0:5
when their durations are set to 200ms. This
is due simply to passive aerodynamic effects as the walls of the oral cavity in the TaDA
model are fixed, as is the position of the vocal folds; it does not model the expansion of
the oral cavity that may happen either actively or passively in speech to maintain voicing
during a long closure [133]. In any case, this passive devoicing is not particularly relevant
to the argument being made here about the relationship between constriction degree and
duration.
II.3.4 Further evidence
We can also use this model to test how contextual, as well as durational, variation can in-
fluence produced constriction degree. For /b, d, g/, full stops are produced in nasal + stop
sequences. While other accounts must posit allophony [33] or articulatory incompatibility
36
[ a p a ]
[ a β a ]
[ a p a ]
[ a b a ]
[ a β a ] [ a b a ]
Figure II.8: Spectrograms of acoustic output from TaDA. In the top row, left-to-right:
/p/ @ 80 ms, /p/ @ .200 ms; in the middle row, left-to-right: =b=
0:5
@ 80 ms,=b=
0:5
@
.200 ms. In the bottom row, left-to-right: =b=
0
@ 80 ms, =b=
0
@ .200 ms. Both =b=
0:5
and=b=
0
show incomplete closure at 80 ms, but only =b=
0:5
shows the attested variation
with full closure at .200 ms.
between fricatives and nasals [100] to account for this pattern, it also falls out directly
in the current account as a dynamical consequence of an invariant gestural specification.
In these nasal + stop sequences, the duration of active control of the shared constriction
articulators is much longer than that for the intervocalic stops. This increased duration
37
allows time for the articulators to better approximate or reach their target, resulting in
full closure.
In addition to this well-discussion case of possible allophony, however, the produced
constriction degree of /b, d, g/ is influenced by other adjacent segments as well. A number
of studies have shown that /b, d, g/ are generally produced with greater constriction
when they occur between high compared to low vowels [33, 80, 134], though there is some
evidence /g/ behaves somewhat differently [39]. Recent acoustic and electropalatographic
studies have shown constriction degree of these segments to be heavily influenced by the
preceding consonants as well as vowel height. Generally, constriction degree is decreases
(i.e. is more stop like) along the order low vowels –> high vowels –> consonants (/l/, /r/,
/n/, /s/), at least for the majority of Spanish dialects [78, 158, 34]. Though there is some
disagreement between the studies as to which consonants induce tighter constrictions,
there is general agreement that they all induce more contracted productions than the
vowels. Limited electropalatographic data suggests than, at least for /d/, preceding /l/
and /n/ do not differ in palatal contact but both induce more contact than /r/ or /s/ [78]
From a dynamical perspective, these effects can be seen as the result of an invari-
ant constriction target and differing initial positions of the articulators due to segmental
context. This hypothesis was tested by creating a number of utterances in TaDA. The
utterances used were: /ada/ (low vowel), /ida/ (high vowel), /asda/ (/s/), and /anda/
(/n/). /d/ was used rather than /b/ for these utterances as the majority of the work on
contextual effects has been on this segment, but the effects generally hold for all three
places of articulation. Target constriction degree was set to -0.5 mm as established in
II.3.3. The duration of the tongue tip constriction gesture for /d/ was varied from 40 to
38
Context 40 ms 60 ms 80 ms 100 ms 120 ms
ada 6.25 mm 3.04 mm 1.33 mm 0.40 mm -0.02 mm
idi 2.34 mm 0.95 mm 0.23 mm -0.10 mm -0.25 mm
asda 0.47 mm 0.06 mm -0.13 mm -0.21 mm -0.66 mm
anda -1.13 mm -1.16 mm -1.17 mm -1.17 mm - 1.17 mm
Table II.4: Resulting minimum constriction degree of TaDA simulation with a constant
targetCDof-0.5andvariationintheactivationdurationoftheTongueTipclosuregesture
for /d/ and variation in segmental context.
120 ms. Measurements of the produced maximum constriction degree from these simula-
tions are shown in Table II.4:
These results show that a target constriction degree of -0.5 mm, slightly beyond the
pointofarticulatorcontact, canaccuratelymodelthesegmentalvariationpreviouslyfound
for/d/inSpanish. Precedinglowvowelsinducethemostopenproductionsof/d/, followed
by high vowels, /s/, and then /n/. Importantly this data also shows that the hypothesized
target constriction degree always induces complete closure for /nd/ regardless of the du-
ration of /d/, in agreement with the classical allophonic description, yet without needing
explicit allophonic substitution.
Interestingly, there are some dialects of Spanish that show slightly different patterns
of spirantization. As discussed in [34], some dialects from Central America and the
Columbian highlands show a pattern of full stops after all consonants and spirantization
after vowels. There is some debate over the origin of this divergent pattern, but Carrasco
et al. argue that it reflects maintenance of an older pattern of spirantization in Spanish,
reflecting the general pattern in the language during the 15th and 16th centuries. More-
over, they argue that Costa Rican Spanish, taken as an example of these dialects, shows
39
Context 40 ms 60 ms 80 ms 100 ms 120 ms
ada 5.62 mm 2.22 mm 0.41 mm -0.55 mm -0.99 mm
idi 1.67 mm 0.13 mm -0.67 mm -1.05 mm -1.23 mm
asda -0.09 mm -0.74 mm -1.01 mm -1.15 mm -1.63 mm
anda -1.41 mm -1.43 mm -1.46 mm -1.46 mm -1.46 mm
Table II.5: Results of TaDA simulation with a constant target CD of -1.5 and variation
in the activation duration of the Tongue Tip closure gesture for /d/ and variation in
segmental context. These results are constant with the pattern of variation found in
Costa Rican Spanish.
evidence for an allophonic split between /d/ in postvocalic and postconsonantal positions
as there is a multimodal distribution of (acoustically-measured) constriction degree in
their data, with one mode (full closure) after the consonants and another (spirantization)
following /a/. This is unlike standard Spanish as spoken in Madrid, which shows a uni-
modal distribution with a slightly more spirantized mean than the more constricted mode
in Costa Rican Spanish. It is possible, however, to derive the pattern seen in Costa Rican
Spanish by assuming a more negative virtual constriction target than that for Peninsular
Spanish, rather than explicit allophony. Setting the target tongue tip constriction degree
for /d/ to -1.5mm and re-running the simulations above, we can derive this alternative
pattern: full closure after consonants and more open productions after vowels, particularly
after /a/. This is shown in Table II.5. In agreement with Carrasco et al., there is still
variation within each context.
40
II.3.5 Discussion
The evidence from this modeling study support the hypothesis that voiced and voiceless
differ in target constriction degree, and that both must have constriction targets beyond
the point of closure. Setting the target for /b/ to the point of occlusion (0 mm constriction
degree), we see no period of occlusion, but rather light frication. Since we most often see
full occlusion with release bursts for voiced stops with such long durations, we are forced
to conclude the voiced stops must have a target beyond the point of closure. Using a
small negative target for /b/ and a large negative target for /p/, on the other hand, does
generate the appropriate patterns: /b/ is produced as a spirant at a short duration and
as a fully occluded stop at a long duration; /p/ is produced always as a stop. These
differences arise even though the amount of undershoot is equivalent for both stops.
II.4 Discussion
Taken together, these two studies provide evidence that Spanish /b/ should rightly be
viewed as stops, rather than approximants. While the number of speakers is relatively
small, the consistent patterns between speakers, as well as compatible evidence from ar-
ticulatory modeling and the fact that the findings here are consistent with previous work,
suggest these findings may indeed be robust. Importantly, the current study shows that
there is a correlation between duration and constriction degree for /b/. If these segments
were truly approximants that are produced as stops in certain contexts due to prosodic
strengthening, articulatory overshoot or gestural incompatibility [100], we would not ex-
pect to see a relationship between these two articulatory parameters in the absence of
variation in stress, phrasal position, or segmental context. Neither is this relationship
41
predicted in an allophonic account of the variable production of these segments (e.g.,
[33]).
Combining a hypothesized target for /b/ beyond the point of articulator contact and
the established constriction-duration relationship, we can straightforwardly explain what
has until now been seen as allophonic variation between stop and approximant productions
as the dynamic consequences of an invariant gestural specification for constriction degree
and fluctuations in the duration of the gesture. Phrase medially, the short duration of /b/
results in undershoot of the constriction target and, therefore, spirantization, while the
increased duration of the closure gesture at a phrase boundary results in full closure. This
dynamic process additionally, and uniquely, accounts for at least some of the variation
in constriction degree of these sounds phrase medially. Although the initial hypothesis
that these phonemes did not differ from /p/ in their target constriction degree was not
supported by the data, the results do support that they do have a target constriction
degree slightly beyond the point of articulator contact, though less than the hypothesized
target for /p/.
While the current study only examined the labial stops, it is reasonable believe that
the theory proposed here also extends to the coronal and velar stops /t, d/ and /k, g/.
Both /d/ and /g/ show the same quasi-allophonic alternation by prosodic position as /b/,
and all three are similarly affected by stress and segmental context. Although some of
the particular details may be different (see, for example, the surprising finding of greater
occlusion of /g/ in /aga/ vs /ugu/ contexts in [39]), the overall patterns are very similar.
Importantly, this theory of the phonological specification of /b, d, g/ can also account for
the variation we see in productions of these segments in other contexts, as demonstrated in
sectionII.3.4. Theincreaseddurationofconsonantsinastressedsyllable[39,45,134]leads
42
directly, in this account, to more constricted productions; similarly, the reduced duration
of these segments in fast speech [159] leads to more undershoot and greater spirantization.
None of this prosodic or contextual variation (nor the relationship between duration
and constriction degree in the absence of such variation) is predicted by an allophonic or
approximant-targethypothesis. Whileitmaycertainlybepossibletomodifythesetheories
to account for this variation, in the current proposal they can all be seen as the lawful
dynamical consequences of a single invariant gestural target and variation in duration and
initial position due to phonological specification, contextual effects, and prosodic position.
This invariant gestural target, specifically, is one beyond the point of articulator contact.
It should be noted that the final theory proposed here is not radically different from the
analysisofphrase-initialspatialstrengtheningandtemporallengtheninginCho&Keating
[36]. That study found that phrase initial coronal stops in Korean were produced both
with more lingual contact (measured by eletropalatography, EPG) and greater duration
than those same stops produced phrase medially. This is particularly true of the lax stop
/t/ and the nasal /n/, while the tense and aspirated stops (/t*/ and /th/) show a much
smaller difference. In fact, there is no difference in contact degree between the three stops
and the nasal phrase initially, while there is a split between lax/nasal and tense/aspirated
stops phrase medially. They say that the short durations found phrase medially cause
undershoot of the gestural target, leading to less contact. This very closely parallels the
findings of the current study: both /p/ and /b/ have increased duration full contact
phrase-initially, while the shorter durations found phrase-medially cause undershoot of
/b/. While we did not find an effect of shorter durations on the constriction degree of
/p/, that may reflect the limitations of the methodology employed. While EPG is sensitive
to degree of palatolingual contact even beyond the point of closure of the oral tract, once
43
the lips contact there is relatively little change in the EMA signal. It is possible that
patterns similar to those found for Korean tense and aspirated stops may be found for
Spanish /t/.
Given the categorical differences found in VOT between /b/ and /p/ in this study,
it seems probable that they also differ in their specification for voicing. There is limited
laryngoscopic evidence that /p, t, k/ are in fact produced with some, though relatively
small, amount of spreading of the vocal folds [117]. Taking this into account along with
the findings from this study, there is evidence for a three-way distinction between the
/b, d, g/ and /p, t, k/: they differ in the duration of the oral constriction gesture, in
the constriction degree target of that gesture, and in the presence or absence of a glottal
spreading gesture. While it may seem surprising that the two sets of stops differ in so
manydimensions, itisimportanttoseethesedifferencesinlightoftherelativelysegmental
nature of Spanish phonology, where many segments that share the same general feature or
gestural category differ in their precise realization of that feature/category. For example,
we find fine-grained place distinctions between /t/, /n/ and /s/ in addition to stricture
(for /s/) and velar opening gesture (for /n/) differences [73]. These place distinctions,
while not active in any phonological process, nonetheless exist and influence articulator
movement, ascanbeseeninnasalplaceassimilation. Wecandrawaparallelbetweenthese
and the voiced/voiceless stop distinction. Within Articulatory Phonology, "[g]estures of
different organs ‘count’ as different and provide the basis for phonological contrast" [162];
thus, the voicing distinction may rely principally on the presence or absence of the glottal
spreading gesture (different organs), but the differences in the oral constriction gesture
(same organ) are present and play a role in the phonetic production of these sounds.
44
Additionally, it may be possible to extend the finding that undershoot in voiced stops
is the dynamic consequence of decreased duration to the voiceless series as well. As
mentioned before, the glottal spreading gesture for the voiceless stops seems to be rather
small in magnitude. We might predict that decreasing the duration of the voiceless stops
might then lead to undershoot of the open target for this gesture, resulting in continuous
vibration of the vocal folds during the voiceless stop, which would explain the high rate
of voicing found for these sounds in previous studies. Additionally, at extremely short
durations, we might expect that even the oral closure gesture would undershoot its target,
resulting in productions of voiced approximants more or less identical to the phrase medial
voiced stops. This is exactly the attested pattern [111, 116].
The inclusion in this paper of duration at the level of phonological control, though
not unprecedented, warrants discussion. In traditional phonology, duration is not part
of the grammatical specification but is the result of the phonetic implementation of the
phonology. Articulatory Phonology, on the other hand, does include temporal activation
intervals in the phonological specification of gestures; in this manner, overlap in time of
gestures can explain allophonic variation [17, 21] and others. While, in this sense, duration
is phonologically specified, possible durations have usually been restricted to merely two
possibilities: one (relatively shorter) for consonants and one (relatively longer) for vowels
[22]. The durational differences between voiced and voiceless stops in this paper indicate
that this gross distinction may not be sufficient to capture the true role of duration in
language. This is not, however, a novel claim. It is well known that voiced stops in many
languages are significantly shorter than their voiceless counterparts (e.g. [15, 95, 105, 24]).
Additionally, increased duration seems necessary to distinguish geminates from singletons
in at least some languages, as well as long from short vowels [49, 95, 107]. For Spanish,
45
there is articulatory evidence that fricatives are generally longer than stops, and that this
durational difference is phonologically contrastive [149]. For any theory of phonology,
it will be necessary to account for these differences, as in many languages they serve
to distinguish one phoneme from another. This may be accomplished, in Articulatory
Phonology, by specifying particular gestural durations as a percentage of cycle of the
planning oscillator associated with each gesture, following and expanding on the proposal
inBrowman&Goldstein[21]. Afurtherdiscussionofwhatpossiblephonologicaldurations
might exist is beyond the scope of the current study, but such future work will provide
important and necessary insight for phonological theory. It should be noted that the
differencesproposedhereare onpar withthedurational differences reported forvoiced and
voiceless stops in other languages [15, 95, 105]. As such, whatever underlies the differences
in those languages may well cause the durational differences in Spanish as well, and thus
alleviate any need to posit Spanish as uniquely specifying duration phonologically.
Including both duration and voicing in the phonological representation may seem, from
a feature-based phonology point of view, counterintuitive. Duration (disregarding possible
geminates, which are not considered here) is not normally included in such theories. And,
in many languages, voiced segments are shorter than voiceless segments; it does not seem
that any language shows the opposite pattern, though the magnitude of the difference
between voiced and voiceless stops varies greatly between languages [24] and some lan-
guages do not show any difference in duration based on voicing (e.g., [51]). Given that
there is no universal relationship between closure duration and voicing, it seems necessary
to include duration as a separately controlled parameter. Yet, including both voicing and
duration in the phonology would then overgenerate, predicting a possible language with
longer voiced stops and shorter voiceless stops. Such overgeneralization is often found in
46
Articulatory Phonology representations. For example, there are no restrictions on possible
phonologically-defined degrees of constriction, and one could hypothesize a language that
contrasts a huge number. Yet only three seem to be contrastive in any language (stop,
fricative, approximant). This can be explained not by constraints on the representational
system but on the production system: these three manners are stable regions in the quan-
tal articulatory-acoustic relationship [60]. Similar physiological constraints (or, perhaps,
aerodynamic constraints) might bias voiced stops to be shorter than voiceless stops (e.g.
[132, 58]). It is also not necessary that all gestural specifications be contrastive. Recent
work on gestural timing has proposed that, while gestures can be specified in a particular
language for certain timing patterns, these patterns need not form the basis for cross-
linguistic phonological contrast [65, 14].
II.5 Conclusion
In summary, evidence from this study supports the view that /b, d, g/ are voiced stops
whose gestural target is one of complete closure, differing in the absence of a glottal
spreading gesture from the voiceless stops /p, t, k/. The voiced stops, however, have a
somewhat more open constriction target than the voiceless series, though this target is
still beyond the point of articulator contact. This reduced constriction target, coupled
with a shorter duration, explains the well-documented alternation between production of
/b, d, g/ as approximants phrase medially and as full stops phrase initially.
47
Chapter III
Prosody and articulatory posture differences in
cross-linguistic coronal reduction
III.1 Introduction
American English, like Spanish, shows reduction of intervocalic coronal stops in prosodi-
cally weak positions but the outcomes of this reduction are different in the two languages.
As detailed in Chapter II, Spanish reduces intervocalic /d/ (and, variably, /t/) to an ap-
proximant ([D]) in phrase-medial position. American English, on the other hand, reduces
both /d/ and /t/ to a voiced flap ([R]) in prosodically weak positions such as before an
unstressed vowel. As in Spanish, reduction in English has traditionally been described us-
ing a symbolic phonological alternation rule, though recent experimental work has raised
questions about this analysis.
III.1.1 English flapping
For American English, the rule-based account of coronal flapping states that /t/ and /d/
are replaced by [R] when they occur intervocalically, following a stressed vowel and before
an unstressed vowel. There is substantial variation in the precise formulation of this rule
48
among different authors, however. Kahn [82], for example, stresses the importance of the
following vowel being unstressed, allowing for flapping to occur between unstressed vowels.
In some formulations, the conditioning context for flapping can span a prosodic boundary
(such as a word boundary) but not a phrase boundary [68]. This type of substitution rule
predicts clear qualitative acoustic and articulatory differences between stops and flaps, as
some abstract (binary) phonological features differ between the segments.
An alternative account of American English flapping is that flapping is a gradient
process, with productions of [t] and [d] on one end of a continuum and [R] on the other–
rather than a categorical rule. One of the first studies to propose a gradient rather than
categorical explanation of flapping examined the duration of many American English con-
sonants [165]. This study found a range of durations for flaps, and variable voicing of
/t/ in environments that condition the flapping rule. Zue & Laferrierre [170], examining
the acoustic speech signal, found similar variability in the duration of flaps, as well as
identifying two different types of productions that fell somewhere between the short flaps
and full stops: a short flap-like duration with a burst, and a long consonant duration
with no burst. A number of studies, following on the findings of these two papers, have
explicitly investigated flapping in American English, looking at the patterns of variability
and possible dimensions along which flaps and full stops differ [160, 164, 42, 54].
Stone and Hamlet [160] measured the movements of the jaw and tongue, as well as
speech acoustics, during a task requiring reiterant productions of the syllable "da" in an
alternating stressed-unstressed pattern (e.g. [da d@ da d@ da d@]). They found a range of
acoustic productions, which they grouped into categories: a voiceless or partially devoiced
/d/, a fully voiced /d/, a short /d/ that was either voiced or voiceless, and a flap of
49
variable duration. For the articulatory measures, they found that the more canonical-/d/-
liketokenshadahigherjawpositionandhigheraccelerationofjawopeningmovementsout
of the consonant as well as more palatal contact (as measured from an electopalatograph).
For some subjects, two syllables were produced with a single jaw movement when the
medial /d/ was produced as a flap. This study also found a correlation between jaw
height and the amount of linguo-palatal contact, with /d/-like tokens produced with both
higherjawpositionandmorepalatalcontact. Thesepatternswereallgradient, ratherthan
categorical, suggesting a continuum of productions rather than an allophonic alternation.
Turk [164], based on the result from [170] that flaps are much shorter than full stops,
examined the durational variability of all stops in American English, finding that both
labialandvelarstops(exceptfor/g/)showdurationalreductionintheenvironmentswhere
/t/ and /d/ are generally produced as flaps. Turk interprets this result to mean that the
shortening of /t/ and /d/ in these environments is not caused by a specific flapping rule
but by more general prosodic requirements.
The prosodic conditioning of flapping was investigated in more detail in [42]. This
study used an X-ray microbeam system to measure the movements of the tongue and
jaw during production of the words "tote" and "toad" in the phrase I said, "Put the
(blank) on the table", where subjects placed a nuclear accent on either "put," "on," or
the target word. This had the effect of placing the final coronal consonant of the target
word in an unstressed, post-stress, or pre-stress position. The results of this study indicate
that, while the acoustics differentiate between stops and flaps at least quasi-categorically,
there is little articulatory evidence for a categorical distinction. The one exception is a
slightly fronted tongue body position during production of full stops compared to flaps.
The lack of a clear articulatory distinction leads de Jong to posit that this may be a case
50
where a gradient articulatory change gives rise, though quantal articulatory-to-acoustic
relationships, to a more categorical acoustic percept. He goes on to say that, if this is the
case, "from the speaker’s perspective there is no reason to posit a rule which specifically
demands the production of a flap before unstressed vowels. Rather, what is necessary is to
understand the language’s segmental and prosodic convention sufficiently to know when a
salient consonant release is necessary, and when not" ([42], p. 309). That is, the system
is set up so that prosodically driven variation—which affects all segments [164]—causes
variability in the realization of the coronal stops that leads, in some cases, to flapping.
The author examines two possible causes for the flapping: that they arise from reduction
in jaw movement or that they are caused by an increase in overlap between the tongue-tip
gesture for the stop and the following vowel [43]. Because there were no clear differences
in jaw position, the first explanation was rejected. There is some support for the second
account (lower and more retracted tongue body positions), but the author points out
several problems with this account, suggesting it may be an oversimplification.
The most recent study to examine flapping in American English looked at word-final
/t/ in both phrase-medial and phrase-final position, looking at only the position of the
tongue tip as measured via electromagnetic articulometry [54]. As found in previous
studies, there was a clear acoustic distinction between flaps and stops, corresponding
to stops being relatively long and voiceless and flap being relatively short and voiced.
No such clear difference was found in the articulation, however, where subjects produced
tokenswithgradientspatialandtemporalcharacteristics(althoughtherewasahighdegree
of variability between subjects). The authors go on to suggest that the falling-stress
environment (stressed vowel-consonant-unstressed vowel) may be particularly conducive
51
to temporal reduction in gesture duration, perhaps leading to phonologization of flapping
in this context when it occurs word-internally.
III.1.2 Comparing English and Spanish reduction
In both American English and Spanish, reduction processes that were traditionally ana-
lyzed as phonological alternations have been shown to be much more variable processes.
In both cases, prosodically-conditioned durational variability has been hypothesized to
underlie spatial reduction. The outcomes of this temporal variability are quite different
in the two languages, however. In American English, only the production of the coro-
nal stops is altered enough to be described as reduced; though the temporal variability
seems to apply to all places of articulation in a similar way, the magnitude of durational
reduction is larger for coronals than for labials or velars [164, 22] and only for the coronals
does this variability have articulatory and acoustic consequences. Moreover, reduction of
the coronals in American English affects both the voiced /d/ and voiceless /t/ equally,
and results in an articulation that generally maintains at least midsagittal contact be-
tween the tongue and palate. In Spanish, on the other hand, reduction affects all places
of articulation with similar articulatory and acoustic outcomes, and occurs much more
frequently in the voiced compared to the voiceless stops. Perhaps the most notable differ-
ence from American English is that reduction in Spanish results in incomplete closure of
the vocal tract, without any midsagittal contact between the articulators. The fact that
the reduction conditions are similar in the two languages while the outcomes are different
raises a number of questions. What processes underlie reduction in American English and
Spanish—do rule-based accounts of segmental substitution or gradient accounts better
describe the realities of speech production? Are the processes in the two languages similar
52
or is there a fundamental difference between the two that leads to flapping on the one
hand and spirantization on the other? If reduction process in the two languages is similar,
what causes the different articulatory and acoustic outcomes? This study uses real-time
magnetic resonance imaging (rtMRI) data to test the hypothesis that the same process
underlies reduction in the two languages, and that language-specific differences in coronal
production determine the eventual outcome of that reduction process.
III.2 Methods
III.2.1 Subjects and stimuli
Four subjects participated in the current study. Two were native speakers of General
American English (ch2, er3). Two were native speakers of Peninsular Spanish (ag2, sp3).
No subject reported any history of speech or hearing impairment.
Stimuli were designed to elicit coronal oral and nasal stops (/t/, /d/, and /n/) in a
symmetric or near-symmetric low vowel context. The prosodic position of the consonant
was varied to elicit a range of productions including both full and reduced forms. For
American English, prosodic conditions included the stop in word-medial, word-initial, and
phrase-initial positions. Both flanking vowels for the word- and phrase-medial conditions
were /O/ (the two General American English speakers in this study consistently differenti-
ated between /O/ and /A/) . For the word-medial condition, it was not possible to use the
same vowels—a falling stress pattern (and reduced second vowel) is the conditioning factor
for word-internal flapping. For this condition, the vowel context was /AC@/, which was
chosen both to give a fairly close match to the vowels in the rest of the stimuli and to limit
tongue movement between the full and reduced vowels. In order to induce more variability
53
in production, a further set of stimuli was used with contrasting emphatic stress. In these
stimuli, the target consonant appeared in the coda of a monosyllabic word with the vowel
/A/ (pot, pod, swan). The words appeared in the middle of the carrier phrase "Put the X
on the table" and emphatic stress (shown in the stimuli with capital letters) was placed on
either put, the target word, or on. This effectively places the target coronal consonant in
unstressed, post-stress, or pre-stress positions and such stimuli have been shown to cause
a relatively large amount of spatiotemporal variation in production [42].
For Spanish, /R/ was included in the target consonants as well as /t/, /d/, and /n/.
For all stimuli, /a/ was used as the target vowel. Only phrase-initial and word-medial
conditions were included in the Spanish stimuli as word-initial position within a prosodic
phrase does not differ from the word-medial condition in that language (e.g., [39, 136]).
This condition was replaced with one placing the target word in a list, which was designed
to induce a smaller prosodic boundary than that which would occur in sentence-initial po-
sition. This technique has been used previously in English to generate prosodic variability
[27, 137]. Stress variation was included both at the lexical level (with the target consonant
in onset position of stressed and unstressed vowels) as well as at the emphatic level, with
utterances without emphatic stress contrasted with emphatic stress on the word contain-
ing the target consonant in a lexically stressed syllable, again cued visually by capital
letters. The tap /R/ is restricted phonologically to word-internal conditions only (the only
rhotic which appears word-initially is the trill /r/). As such, both the phrase-initial con-
ditions (sentence-initial and list) were not included for this segment. While data for /R/
was collected concurrently with the other coronal segments for Spanish, it is not included
in the current analysis.
54
A full list of the stimuli used for both languages appears in the table below. For each
language there were a total of 18 stimuli, which were randomized into two blocks of 9. The
shortness of the blocks was required to keep the acquisition of any given block under 20
seconds to prevent overheating of the MRI device. Blocks were presented in an alternating
fashion for a total of 6-8 repetitions per target phrase.
III.2.2 Real-time MRI data collection
Data were acquired using an MRI protocol developed especially for research on speech
production, detailed in [129]. Subjects were supine during the scan with the head re-
strained in a fixed position to facilitate comparisons across acquisitions. For the English
data, a 13-interleaf spiral gradient echo pulse sequence was used (TR = 6.164 ms, Field
of view = 200 x 200 mm, flip angle = 15
). For the Spanish data, a 9-interleaf spiral
sequence was used (TR = 6.028 ms, Field of view = 200 x 200 mm, flip angle = 15
).
For both sequences, a 5 mm slice located at the midsagittal plane of the vocal tract was
scanned with a resolution of 68 x 68 pixels, giving a spatial resolution of approximately
2.9 mm per pixel. Videos were reconstructed with a 13-frame (for English) or 9-frame
(for Spanish) sliding window, with one frame reconstructed at every TR pulse. This
gives an effective frame rate of 162.2 frames/s (13-interleaf sequence) or 165.9 frames/s
(9-interleaf sequence). Synchronous noise-cancelled audio was collected at 20 Hz during
MRI acquisition [16].
III.2.3 MRI data analysis
All measurements of speech articulator motion were extracted from the MRI images by
means of pixel intensity values [99, 147, 64]. This method is based on the idea that the
55
Category /t/ /d/
Non-initial I said put the pot ON the table. I said put the pod ON the table.
Non-initial I said put the POT on the table. I said put the POD on the table.
Non-initial I said PUT the pot on the table. I said PUT the pod on the table.
Non-initial He didn’t say "bottom" any
more.
He didn’t say "bada bing" any more.
Word-initial He didn’t awe Tawny any more. He didn’t awe Dawnie any more.
Phrase-initial He didn’t awe. Tawny did. He didn’t awe. Dawnie did.
Category /n/
Non-initial Isaid putthe swan ONthe table.
Non-initial I said put the SWAN on the ta-
ble.
Non-initial IsaidPUTtheswanonthetable.
Non-initial He didn’t say "Tiajuana" any
more.
Word-initial He didn’t awe Naughty any
move.
Phrase-initial He didn’t awe. Naughty did.
Table III.1: American English stimuli used in the coronal reduction study. The target
consonants are all three coronal oral and nasal stops (/t/, /d/, and /n/). A variety of
prosodic contexts are used to facilitate spatiotemporal variability. In the first three carrier
phrases, the location of the emphatic stress is changed to generate pre-stress (ON), post-
stress (target word stressed), or unstressed (PUT) conditions. The other three stimuli
present the target consonant in word-medial, word-initial, and phrase-initial positions.
56
Category /t/ /d/
Non-initial Ella dice "máta" también. Ella dice "náda" también.
Non-initial Ella dice "matámos" también. Ella dice "nadámos" también.
Non-initial Elladice"MATÁMOS"también. Elladice"NADÁMOS"también.
Weak phrase-initial Ella dice "copa", "tápa", y
"mesa."
Ella dice "tipa", "dáma", y "mu-
jer."
Phrase-initial Ella dice "mapa". También lo
digo.
Ella tiene una capa. Dámela.
Category /n/ /R/
Non-initial Ella dice "gána" también. Ella dice "pára" también.
Non-initial Ella dice "ganámos" también. Ella dice "parámos" también.
Non-initial Elladice"GANÁMOS"también. Ella dice "PARÁMOS" también.
Weak phrase-initial Ella dice "mapa", "náve", y
"faro."
Phrase-initial Ella no dice "tapa". Nádie lo
dice.
TableIII.2: Spanishstimuliusedinthecoronalreductionstudy. Thetargetconsonantsare
all three coronal oral and nasal stops (/t/, /d/, and /n/) as well as the tap /R/. A variety
of prosodic contexts are used to facilitate spatiotemporal variability. The target consonant
appearsinword-medialpositioninbothlexicallyunstressedandlexicallystressedsyllables,
aswellasinalexicallystressedsyllablethatreceivesemphaticstress. Thetargetconsonant
also appears phrase-initially in a list and at the beginning of a sentence to induce different
prosodic boundary strengths (/R/ does not have these conditions as it cannot occur word-
initially). Lexical stress on the target words is marked with an accent (´) for illustrative
purposes, even when not normally used in Spanish orthography.
57
changes in pixel intensity of a particular pixel over time reflect changes in tissue density
at that point in the vocal tract. Lower intensities correspond to the absence of tissue (air)
while high values signify the presence of one of the speech articulators at that particular
point. In any given arbitrary region of the vocal tract, then, the average pixel intensity in
that region will reflect the proportion of the region occupied by the speech articulators. By
placing these regions at relevant locations in the vocal tract and measuring the average
intensity over time, we are able to estimate speech articulator motion in that region.
Becausetheshapeofthevocaltractwillvaryconsiderablybetweensubjects, theserelevant
regions (described below) were defined on a by-subject basis relative to each subject’s
anatomy. Each region was defined such that the relevant speech articulators (tongue tip,
tongue body, jaw) were always present in the region, avoiding any floor effects which could
be caused by the complete absence of the articulator from the region.
For the current study, we are interested particularly in the forward motion of the
tongue body during the transition from vowel to coronal stop (or tap), the motion of the
tongue tip towards the palate, and the raising of the jaw. Tongue body (TB) movement
was measured by defining a long, horizontal region in the pharyngeal area of the vocal
tract. This region has a vertical span from the top of the epiglottis to the bottom edge of
the velum at its lowest position. Defining the region in this way focuses the measurement
on only movement of the tongue body, without interference from the presence or absence
of these other structures. The TB region spanned horizontally from the rear pharyngeal
wall to a point roughly in the middle of the hard palate, including for each subject one
pixel of the pharyngeal wall and two to three pixels of the tongue during production of
/i/ (the most forward position of the tongue in the dataset). Because the pixel values in
the pharyngeal region were found to vary substantially from sample to sample (indicated
58
by jagged mean pixel intensity contours), the mean pixel intensity in the TB region was
normalized by the mean pixel intensity in the entire image on a frame-by-frame basis
(Figure III.1). This was sufficient to remove a large portion of the noise from the signal
without losing relevant kinematic information. Jaw (JAW) movement was measured with
a circle with a radius of 2 pixels that was placed at the base of the jaw between the jaw
inflection point and the hyoid bone. The circle was placed such that when the jaw was
closed, some part of the jaw was still in the circle and that when the jaw was maximally
open the circle was not entirely filled by the jaw. This avoids possible saturation effects
that might limit the accuracy of the measurement at extremes of jaw position. The
precise location of the circle was manually determined for each subject. For examples of
ROI locations, see Figure III.2.
Tongue tip (TT) movement was measured in a slightly different way. The tongue tip
can contact a large number of places along the width of the palate. It would be plausible,
then, to use a large region covering the entire length of the palate. However, such a region
would give very different average intensity values when the tongue tip contacts the palate
at a particular point compared to when contact is made with the tip and blade along a
wide portion of the palate (see Figure III.3). In order to measure the movement of the
tongue tip and contact with the palate more accurately, a set of smaller regions were used.
Each region had a horizontal width of only one pixel, with a vertical span of four pixels
beginning at the palate. These regions were arranged in a horizontal array beginning just
posterior to the teeth, past the alveolar ridge, to the end of the hard palate. This method
gives comparable results whether the constriction is apical or laminal (Figure III.3). For
each subject, the one of these regions with the highest maximum pixel intensity during
production of each target consonant was chosen to index tongue tip movement. Each
59
0
1
[nt ɔ t ɔ ]
Displacement (a.u.)
raw TB signal
normalized TB signal
Figure III.1: Example of TB normalization. Raw signal (blue) shows substantial noise
from frame-by-frame intensity fluctuations in pharyngeal region. Normalizing by mean
pixel intensity across entire image image decreases noise in the signal (red), especially the
large aberrant peak between [O] and [t], which does not reflect the single smooth motion
of the tongue body in this token.
speaker was highly consistent in the location of the produced constrictions. For English,
both speakers produced all consonants at the same point, near the inflection of the alveolar
ridge. For Spanish, however, each consonant was produced at a slightly different location.
Each subject was consistent within each segment, however, and the locations were similar
between subjects. /d/ was measured at the first point on the palate, at the upper teeth;
/t/ was measured at a point slightly behind the teeth; /n/ was measured at the alveolar
ridge.
Tonguecontactwiththepalatewasmeasuredsimilarly, byplacingasetofregionsalong
the length of the palate. Each region in this case, however, was only one pixel square.
60
TB
JAW
TT
Figure III.2: Representative ROI locations for measuring JAW, TB, and TT movement.
Figure shows locations for subject er3.
This gives an increase in intensity only when the tongue is in that exact pixel (i.e., when
the tongue is contacting the palate). A pixel intensity threshold was used to determine
the presence of tongue-palate contact for each pixel. This was done by initially finding
the maximum and minimum pixel intensity values across all tongue contact regions across
all repetitions for each speaker. Subsequently the threshold for measuring contact was
set at 25 percent of this range plus the minimum value found. This threshold was found
to adequately measure contact when compared manually against the MRI images. For
61
Apical Laminal
Circular ROI
(2 px radius)
Regtangular ROI
(4 px length)
Figure III.3: Comparison of circular (top row) and rectangular (bottom row) ROI spaces
for measuring tongue tip motion. The left column shows /d/ from PUT the pod on the
table and the right column shows /t/ from He didn’t awe. Tawny did. Looking at the top
row, the circular is all white for the right image, but shows significant black portions for
the left. This would give a larger mean pixel intensity for the right than for the left, despite
the fact that the tongue is in contact with the palate in both images. On the bottom, the
region is largely filled for both images, giving more equal values across images.
each repetition, the width of palatal constriction was measured as the number of contact
pixels above threshold during the point of maximum tongue tip movement (described
above). The location of the constriction was identified as the pixel along the palate with
the highest intensity value at the peak of tongue tip movement. Note that this technique
only measures contact between the tongue and palate. Because the teeth do not appear
62
on MRI images, it is not possible to measure any contact that may occur between the
tongue and upper teeth.
All resulting signals (TT, TB, JAW, tongue-palate contact) were smoothed using a
locally weighted linear regression [147, 98]. The weighting function used was a Gaussian
kernel K with a standard deviation of h samples, where h = 4. As samples lying more
than 3h from the center of the kernel in either direction receive weights near zero, this
gives a smoothing window width of roughly 150 ms given the sampling period of 6.164
ms (13-interleaf sequence) or 6.028 ms (9-interleaf sequence). Gestural identification was
conducted using a an algorithm developed by Mark Tiede at Haskins Laboratories. The
identification algorithm used takes as input a manually located estimate of the midpoint
onederivedvariable(herepixelintensitycontours). Usingthevelocityofthatvariable(the
absolute value of the first difference of the signal), it then locates the velocity minimum
crossing closest to the input point (measurement point: Time of maximum constriction).
It then finds the peak velocity between that point and both the preceding and following
velocity minima (measurement point: time of peak velocity). Where this location incor-
rectly indexed the start of articulator motion, a manual estimate of motion onset was used
and the nearest velocity minimum to that point was selected. This was often necessary
for phrase-initial repetitions as these productions often have multiple velocity peaks. The
algorithm then locates the onset of gestural motion by locating a point where the velocity
signal from the preceding minima to the first time of peak velocity crosses some arbitrary
threshold of the velocity difference between the two points. Gestural offset is defined as
the point where the velocity falls below the same threshold from the second time of peak
velocity to the velocity minimum following the point of maximum constriction (this point
63
was similarly located relative to a manual estimation when multiple velocity peaks oc-
curred). Onset and offset of the constriction proper are also defined by the points where
the velocity crosses a threshold between the times of peak velocity and the point of maxi-
mum constriction. All thresholds were set to 20 percent. On some repetitions, this default
threshold did not correctly locate an obvious inflection point in the pixel intensity signal
indicative of constriction offset. In these cases the threshold was increased incrementally
in steps of 5 percent until the inflection point was correctly located. This was rarely
required, and only needed for a few productions in phrase-initial position.
After measurement, TT, TB, and JAW movements were normalized to a range from
0 to 1 for each speaker. This was done by subtracting the minimum value found for each
region during the vowels preceding and following the target consonants across all repeti-
tions then dividing by the total range of each measure across all repetitions. Movement
duration was calculated as the time between gesture onset and constriction offset and
movement displacement was calculated as the difference in normalized intensity between
gesture onset and the point of maximum constriction. Similar methods have been shown
to give reasonable approximations to displacements hand-measured in pixels [97].
While the TB and JAW measurements straightforwardly measure articulatory move-
ments, the TT measure is a bit different. When the tongue tip is not in contact with the
palate, this measure indexes articulatory movements in the same way TB and JAW are
measured. However, when the tongue tip is in contact with the palate, it is not clear that
the measurement indexes movement in the same way—changes in the TT signal beyond
this point may be an artifact of the rtMRI method not reflecting actual tissue movement.
Alternatively, it may measure movement of additional tissue into the region of interest
rather spatial movement of the tongue tip (which has contacted the palate and so cannot
64
0
1
Displacement (a.u.)
Tongue tip - palate contact
[nt ɔ t ɔ n i ]
Figure III.4: Example TT trajectory from subject er3 producing didn’t awe Tawny. Note
the smooth trajectory profile both before and after linguopalatal contact during the [t] in
Tawny.
move any further). This may be due to incompressible nature of the tongue, which leads
to spatial deformation of the tongue as it moves against the hard structures of the mouth
[86, 2, 140, 161]. In a sense, however, both spatial movement and deformation are the
result of the same action, with more tissue moving into a particular space in the vocal
tract. In fact, the TT measure appears to respond equally to both unrestricted motion
and deformation-based motion. When the TT trajectories are examined, they are smooth,
without a noticeable deflection that would result if the two types of motion resulted in
qualitatively different types of change in the TT measure (Figure III.4). Moreover, the
measure shows a consistent relationship between peak velocity, movement amplitude, and
65
movement duration. This relationship is given with the formula below, where C is a con-
stant:
peak velocity
displacement
=
C
duration
. This relationship has been shown to hold across movements
of many speech articulators as well as limb movements [135, 127, 53], with C having a
lower bound (as in a pure mass-spring system) of=2 (1.57) and an empirically determined
value of roughly 1.8 depending on the particular speaker and articulator measured (though
Fuchs et al. [53] find evidence for some movements with a value for C below =2, and
argue against a simple mass-spring model of stiffness, their data is still well-fit by the same
function). AscanbeseeninFigureIII.5, thissamerelationshipholdstruefortheTTdata.
Fitting the equation above to this data gives a value forC of 1.69 (R
2
=0:98,p<0:0001).
Taken together with the fact that no inflection point is seen in the TT trajectories, this
suggests that the TT measure, despite indexing both unconstrained spatial movement and
deformation along the same scale, is an accurate indication of the overall movement of the
tongue tip and not an artifact of the rtMRI method. TB and JAW movements are also
fit well with this equation, with C values of 1.74 and 1.71, respectively (TB R
2
= 0:98,
JAW R
2
= 0:97, both p < 0:0001). All three articulators show larger deviations from
the predicted values at durations above 200 ms. These were tokens produced at a phrase
boundary, a position which has been shown to be associated with lowered stiffness in pre-
vious studies [31, 28, 32, 35, 46, 7]. Anticipating the results below, such a lowered stiffness
is indeed found in phrase-initial position in the current study (Figure III.8). These results
argue that the TT measurement here is in fact measuring unrestricted spatial movement
and spatial deformation along the same or very similar dimensions, and is appropriate to
use as an general index of tongue tip displacement.
66
68 771
3
23
C = 1.69
R
2
= .98
Duration (ms)
Peak Velocity (a.u. / s)
Maximum Displacement (a.u.)
Figure III.5: Plot of Peak Velocity / Maximum Displacement as a function of move-
ment duration. The data is well fit by the standard equation relating the two variables
(
peak velocity
displacement
=
C
duration
, see text) with a value for C of 1.69. This function fits the data
well (R
2
=0:98, p<0:0001).
III.3 Results
In order to test whether variability between different productions is best explained by an
allophonic (categorical) model or by the hypothesized model where durational variability
underlies variable spatial productions, two mixed-effects linear models will be compared.
The first (categorical) predicts the maximum displacement from the prosodic context in
which the stop occurs (for English: Phrase-initial, Word-initial, or non-initial; for Spanish:
Phrase-initial, Weak phrase-initial, or non-initial). The second (continuous) predicts the
maximumdisplacementfrommovementduration. Bothmodelsadditionallyincludeafixed
effect of token and a random intercept by subject. On visual inspection, productions in
67
phrase-initial conditions in English show limited effects of duration (see Figure III.6). This
seems likely due to saturation effects, where the articulator reaches its maximum possible
position. In order to account for this and avoid fitting the effect of movement duration on
displacement incorrectly, two additional terms were included in the model: a fixed effect
of phrase boundary (phrase-initial or not) and an interaction between phrase boundary
and movement duration. This effectively allows the phrase boundary condition to be fit
with a different intercept and slope compared to the non-phrase boundary condition. A
similar effect was seen in Spanish, where productions longer than roughly 185 ms show
little effect of duration, though this does not line up as well with prosodic boundary as
in English (Figure III.7). For Spanish, the continuous model was fit with an additional
category (called "duration boundary" to differentiate from prosodic phrase boundary)
differentiating short (less than or equal to 185 ms) and long productions (greater than
185 ms) as well as an interaction term between duration boundary and duration. For
both languages, these models were then compared to see which best explains the data,
as measured by the greatest log likelihood. All models were built and assessed using the
lme4 package in R [1]. For the selected model, statistical significance of each predictor
was assessed using the results of the t-tests given by the summary() function in the lme4
package. P values and post-hoc tests were calculated using the lmerTest package [94]. A
full list of models used for all statistical analysis is shown in Table III.3.
Linguopalatal contract width has been shown in previous work to differ between full
stops and flaps in English [100, 156]. This work, however, did not consider the possibility
that this seemingly categorical difference is derived from continuous variability. As for
displacement, above, two models were compared to asses the cause of this difference.
The first (categorical) predicted the maximum constriction width from prosodic context,
68
0
1
0
1
0 857
0
1
Movement duration (ms)
Normalized displacement
TT
TB
JAW
phrase-medial phrase-initial
Figure III.6: Plot of maximum displacement as a function of movement duration for
English. Phrase-initial productions are shown in red and phrase-medial productions are
shown in blue. Movements of all three articulators are shown separately. From top to
bottom: Tongue Tip, Tongue Body, Jaw. For all articulators, the movements in phrase-
initial position show little to no effects of movement duration, possibly due to saturation
effects.
and the second (continuous) from TT movement duration. Because the phrase-initial
productions again show large boundary effects, fixed effects of phrase boundary and the
interaction between phrase boundary and movement duration were included as predictors
in the second model. Both models additionally included a random intercept by subject.
Additionally, both models included fixed effects of the maximum normalized position of
69
0
1
0
1
0 185 555
0
1
Movement duration (ms)
Normalized displacement
TT
TB
JAW
phrase-medial
phrase-initial
weak-phrase-intial
Figure III.7: Plot of maximum displacement as a function of movement duration for Span-
ish. Phrase-initial productions are shown in red, weak-phrase-initial (list) productions in
green, and phrase-medial productions in blue. Movements of all three articulators are
shown separately. From top to bottom: Tongue Tip, Tongue Body, Jaw. For all articu-
lators, the movements with durations longer than roughly 185 ms (dashed vertical line)
show little to no effects of movement duration on displacement, likely due to saturation
as in English phrase-initial position.
both the tongue body and jaw. These were included in the model in order to asses
the relative contribution of these articulators to creating the tongue tip constriction, as
predicted in past studies [125]. Constriction location for the tongue tip was analyzed in
a model fit with prosodic boundary and segment as fixed factors and a random intercept
by subject. As for constriction length, this was compared with a separate model that
replaced prosodic boundary with movement duration.
70
An additional model was fit to asses possible differences in the dynamics of gestures
in different prosodic positions. A hallmark of gestures occurring near a large prosodic
boundary is a decreased stiffness [31, 28, 32, 7, 46]. In fact, it has been suggested that
stiffness may be the parameter used to control movement duration [152, 127, 135, 85].
Stiffness was assessed by the slope of the regression line relating peak velocity to peak
displacement. Gestures with the same stiffness are expected to fall along the same Peak
Velocity/Peak Displacement line (though cf. [53] for an counterargument against estimat-
ing stiffness as a dynamical control parameter from movement kinematics). The model
includes peak velocity as the dependent variable and peak displacement, as well as the
interaction between displacement on the one hand and prosodic boundary and segment
on the other, as fixed factors. Models also included a random intercept by subject. The
relevant comparison will be the interaction between displacement and prosodic boundary,
which will reflect differences in the velocity/displacement slope by prosodic position. An
interaction between displacement and token would similarly reflect differences in stiffness
by segment.
Resultsforeachlanguagewillbepresentedseparately. Comparisonsbetweenlanguages
will be reserved for the discussion (section III.4).
III.3.1 American English
Treating the effect of duration on movement magnitude first, there was no difference
between the continuous and categorical models for tongue tip movements (
2
(1)=0;n:s:),
but for both tongue body and jaw, the continuous model provided a significantly better
fit than the categorical model (TB:
2
(1)=4:95;p<0:05; JAW:
2
(1)=41:7;p<0:0001
). Detailed results will be discussed below only from the models that fit the data better;
71
Variable Model Type Model
Displacement categorical Displacement ˜ Prosodic Boundary + Segment +
(1 | SUBJ)
continuous Displacement ˜ Duration * Phrase Boundary +
Segment +(1 | SUBJ)
Contact length categorical Contact ˜ Prosodic Boundary + Segment + TB
max + JAW max + (1 | SUBJ)
continuous Contact ˜ Duration * Phrase Boundary + Seg-
ment + TB max + JAW max + (1 | SUBJ)
Contact location categorical Location ˜ Prosodic Boundary + Segment + (1 |
SUBJ)
continuous Location ˜ Duration * Phrase Boundary + Seg-
ment +(1 | SUBJ)
Peak Velocity PV ˜ Duration + Duration:Prosodic Boundary +
Duration:Segment +(1 | SUBJ)
Table III.3: Statistical models used to evaluate data. Models for English are shown.
"ProsodicBoundary"includeslevelsforallthreeprosodicboundaries(non-initial, word-or
weak phrase-initial, phrase-initial), while "Phrase Boundary" only has two levels: phrase-
initial and phrase-medial. Models for Spanish are the same, except all instances of "Phrase
boundary" are replaced by "Duration boundary" (see text for details), which similarly has
two levels: long and short.
72
for tongue body and jaw movements, this is the continuous model. Both categorical and
continuous results are presented for the tongue tip.
For the tongue tip movement, the continuous model showed a significant effect of
movement duration ( = 0:021;t = 12:8;p < 0:0001). There was a significantly less
displacement for /n/ compared with both /d/ ( =0:033;t =2:11;p< 0:05) and /t/
( =0:059;t =3:86;p < 0:001) but there was no difference between /d/ and /t/.
The intercept for phrase-initial condition was substantially higher than for phrase-medial
condition ( = 0:67;t = 8:17;p < 0:0001) and a significant interaction between phrase
position and movement duration ( =0:020;t =10:6;p< 0:0001). The term here,
which effectively cancels the overall effect of duration, shows that there was essentially no
effect of movement duration on displacement for phrase-initial productions.
The categorical model shows the same effects of segment on displacement, with /n/
showing less movement than /t/ or /d/ (t-n: = 0:065;t = 4:68;p < 0:0001, d-n:
=0:051;t=2:31;p<0:001). There was also a three-way distinction based on prosodic
context, such that phrase-initial (PI) showed the largest movements, non-initial (NI) the
least, and word-initial (WI) an intermediate amount (PI-WI: = 0:118;t = 5:94;p <
0:0001, PI-NI: =0:350;t=22:57;p<0:0001, WI-NI: =0:232;t=14:59;p<0:0001).
The same general pattern was found for tongue body. There was a significant effect
of movement duration ( = 0:009;t = 5:37;p < 0:0001) and a significant effect of both
phrase boundary ( = 0:400;t = 4:27;p < 0:0001) as well as an interaction between
phrase boundary and duration ( =0:008;t =4:28;p < 0:0001), indicating a lack of
durational influence on displacement in phrase-initial position. There were again small
differences between the three coronal segments, though the distinctions were different
than for tongue tip movement. Here, there was a significantly less tongue body movement
73
for /t/ than for both /d/ ( = 0:057;t = 3:01;p < 0:01) and /n/ ( = 0:043;t =
2:32;p < 0:05) but no difference between /d/ and /n/. The pattern for jaw movements
was slightly different. For the jaw, there was again a significant effect of duration ( =
0:016;t = 6:34;p < 0:0001) but no effect of segment or phrase boundary, though there
was a significant interaction between phrase boundary and duration ( =0:012;t =
4::16;p < 0:0001), indicating a reduced effect of duration of displacement in phrase-
initial position.
For the length of linguopalatal contact, there was no difference between the categorical
and continuous models (
2
(1)=0;n:s:). The continuous statistical model showed a signif-
icanteffectoftonguetipmovementdurationonconstrictionlength( =0:09;t=3:29;p<
0:01). There were also significant effects of tongue body ( = 2:46;t = 2:84;p < 0:01)
and jaw position ( = 2:95;t = 3:12;p < 0:01), with more forward tongue bodies and
higher jaw positions related to longer constrictions. Phrase-initial production showed sub-
stantially more contact than non-phrase-initial productions ( =2:83;t=2:51;p<0:05),
though there was a significant interaction between phrase boundary and duration, such
that duration effectively had no impact on constriction length in phrase-initial position
( =0:09;t =2:97;p< 0:01). There was no difference between segments in constric-
tion length. The categorical model, on the other hand, showed no significant effect of
segment, jaw position, nor tongue body position. This model showed the most contact for
phrase-initialproduction, theleastfornon-initial,andintermediatecontactforword-initial
productions (PI-WI: = 0:7;t = 3:82;p < 0:001, PI-NI: = 3:2;t = 21:33;p < 0:0001,
WI-NI: =2:5;t=16:05;p<0:0001).
For constriction location, the categorical model with prosodic boundary provided a
significantly better fit that the model with movement duration (
2
(1)=68:5;p<0:0001).
74
In the better model, there was no effect of segment, indicating all three segments were
produced at the same location. There was, however, a slight though significant effect
of prosodic boundary such that word-initial and phrase-initial productions were pro-
duced roughly 0.7 pixels more anteriorly than non-initial production (word-initial: =
0:66;t=8:1;p<0:0001, phrase-initial: =0:78;t=9:4;p<0:0001).
There exist clear differences in stiffness between the phrase-initial and non-phrase
initial productions for movements of all three articulators, as can be seen in Figure III.8.
For the tongue tip, comparisons of two models, one with three levels of prosodic boundary
(non-initial, word-initial, phrase initial) and one with only two levels (non-phrase-initial
and phrase-initial) showed no significant difference in fit (
2
(1) = 0:05;n:s:). Because
no difference was found between the two models the simpler model was used, indicating
no difference between word-initial and non-initial productions in stiffness. The model
showed a significant relation between peak velocity and peak displacement, as expected
( = 0:058;t = 24:3;p < 0:0001).There was also a significant interaction effect between
peak displacement and phrase boundary however, such that productions in phrase-initial
position showed a significantly reduced stiffness, as expected ( =0:036;t=26:6;p<
0:0001). There was a very small effect of segment, such that /t/ has a slightly higher slope
than /d/ and /n/ (/d/-/t/: =0:004;t =2:9;p < 0:01, /n/ - /t/: =0:004;t =
3:1;p<0:01). There was no difference between /d/ and /n/.
The results for tongue body and jaw movements were generally similar. Tongue body
showed no difference between a model with word-initial productions as a separate category
and one without (
2
(1) = 0:9;n:s:). The model again showed an significant relationship
between peak velocity and peak displacement ( = 0:050;t = 16:7;p < 0:0001) and a
lower slope (i.e. lower stiffness) for phrase-initial productions ( =0:034;t=18:8;p<
75
0:0001) and no effect of segment. Results for jaw movement similarly did not show a
difference between the two models (
2
(1) = 2:19;n:s:). This model showed the same
effects of peak displacement ( =0:052;t=25:8;p<0:0001) and the interaction between
peak displacement and phrase boundary ( =0:026;t=16:7;p<0:001) and no effect
of segment.
III.3.2 Spanish
The fit of the continuous model was significantly better than the categorical model for
all three articulators (TT:
2
(1) = 4:12;p < 0:05; TB:
2
(1) = 107:7;p < 0:0001; TB:
2
(1) = 52:2;p < 0:0001; ). As for English, this suggests that reduction is continuous
and not strictly allophonic. For the Tongue Tip, the continuous model showed significant
effects of duration ( =0:011;t=4:08;p<0:0001) and duration boundary ( =0:38;t=
5:19;p < 0:0001), indicating that movement displacement varies with duration and that
there is significantly more movement at extremely long durations. A significant interaction
effect between the two factors ( =0:010;t =3:7;p < 0:001) shows that there is
virtually no effect of movement duration on displacement at durations over 185 ms. As
for segment, both /n/ and /d/ show significantly less movement that /t/ (/d/-/t/: =
0:198;t=10:2;p<0:0001, /n/-/t/: =0:182;t=9:3;p<0:0001) but there is no
difference between /d/ and /n/).
The results are very similar for Tongue Body and Jaw movements. For Tongue Body,
displacement varies with duration ( = 0:013;t = 12:0;p < 0:0001) and there is signif-
icantly more displacement at durations above 185 ms ( = 0:35;t = 9:5;p < 0:0001),
though the influence of duration on displacement at these very long durations is negligi-
ble ( =0:013;t =10:1;p < 0:0001). There are again differences by segment type,
76
0
12
Normalized displacement
Peak Velocity (a.u./s)
TT
0
10
TB
0 1
0
10
JAW
phrase-medial
phrase-initial
Figure III.8: Plots of peak velocity by peak displacement for English (from top to bot-
tom): tongue tip, tongue body, and jaw movements. All three articulators clearly show a
stiffness difference between phrase-initial and non-phrase-initial productions, in line with
previous work showing a lower stiffness in the former position. Peak velocity is measured
in arbitrary units (i.e. displacement is normalized from 0 to 1) per second. Regression
lines shown for illustrative purposes and do not necessarily reflect precise results from the
statistical model.
but these are different from the pattern shown for Tongue Tip movement. For Tongue
Body, /n/ shows greater displacement than either /t/ or /d/, which do not differ between
themselves (/n/ - /t/: = 0:15;t = 11:4;p< 0:0001, /n/ - /d/: = 0:015;t = 10:5;p<
77
0:0001). For Jaw movement displacement, there are similarly significant affects of duration
( = 0:016;t = 11:1;p < 0:0001), duration boundary ( = 0:30;t = 4:7;p < 0:0001) and
their interaction ( =0:012;t=7:581;p<0:0001). As with the tongue tip, /t/ shows
greater displacement than either /d/ or /n/ (/d/ - /t/: =0:12;t =7:6;p< 0:0001,
/n/ - /t/: =0:13;t=8:3;p<0:0001).
For linguopalatal contact, the continuous model provides a significantly better fit than
the categorical model (
2
(1) = 19:4;p< 0:0001). Visual inspection suggested that there
was an interaction between duration boundary and segment; while both /d/ and /n/ show
differences in between short and long durations, /t/ shows no such difference. In order
to account for this, a model was fit to the data that included full interactions between
the three fixed terms. Stepwise model comparisons conducted using the step() function
in the lmerTest package showed no significant improvements in the model with with the
three-way interaction term nor the interaction between segment and duration, but an im-
provement by including the interaction between segment and duration boundary. The
results from this model show a significant effect of movement duration on linguopalatal
constriction length ( =0:12;t=7:1;p<0:0001). There were also much longer constric-
tionsatdurationsabove185ms( =5:4;t=10:7;p<0:0001)andasignificantinteraction
between those two factors, such that there was effectively no additional influence of du-
ration on constriction length at very long constrictions ( =0:12t =5:1;p< 0:0001).
Thisresultisindicativeofasaturationeffect, wherethemaximalconstrictionisreachedby
185 ms. There were also significant effects of segment, such that /d/ showed significantly
shorter constrictions than /t/ or /n/ (/d/-/t/: =3:9;t=23:1;p<0:0001, /d/-/n/:
=3:1;t=20:0;p<0:0001) and /n/ showed shorter constrictions than /t/ (/n/- /t/:
=0:8;t=5:2;p<0:0001). There was also a significant interaction between segment
78
and duration boundary, with the effects for both /t/ and /n/ differing from that for /d/
(/t/: =3:1;t=9:1;p<0:0001, /n/: =1:6;t=5:1;p<0:0001).
For the location of the tongue tip constriction, there was no difference between the
categorical and continuous models (
2
(1)=0;n:s). This is because there was no effect of
prosodicboundary(inthecategoricalmodel)orduration(inthecontinuousmodel). There
is, however, a significant effect of segment, such that /d/ was produced most anteriorly at
the teeth with the /t/ slightly posterior to that, and the /n/ significantly retracted at the
alveolar ridge (/d/-/t/: =0:4;t =2:4;p< 0:05, /d/-/n/: =2:6;t =17:1;p<
0:0001, /n/-/t/: = 2:2;t = 21:0;p < 0:0001, results from categorical and continuous
models are identical).
Looking at the stiffness of the coronal stops in Spanish, there was a significant relation-
shipbetweendisplacementandvelocityfortonguetipmovements( =0:065;t=11:2;p<
0:0001) and a reduced slope for both weak-phrase-initial productions ( =0:013;t =
5:2;p < 0:0001) and phrase-initial productions ( =0:025;t =9:9;p < 0:0001).
There was no effect of segment. Tongue Body movements similarly show a significant re-
lationship between peak displacement and peak velocity ( =0:057;t=13:0;p<0:0001)
and a reduced slope in weak-phrase-initial ( =0:019;t=7:9;p<0:0001) and phrase-
initial positions ( =0:023;t =10:2;p < 0:0001), with no effect of segment. The
same effects are found for the jaw (displacement: = 0:093;t = 16:8;p < 0:0001,
weak-phrase-initial position: =0:028;t =8:0;p < 0:0001, phrase-initial position:
=0:042;t =12:1;p < 0:0001). The relationship between peak velocity, displace-
ment, and prosodic position across all articulators is illustrated in Figure III.9.
79
0
13
0
8
0 1
0
10
Normalized displacement
Peak Velocity (a.u./s)
TT
TB
JAW
phrase-medial
phrase-initial
weak-phrase-initial
Figure III.9: Plots of peak velocity by peak displacement for (from top to bottom): tongue
tip, tongue body, and jaw movements in Spanish. All three articulators clearly show a
stiffness difference between phrase-initial, weak-phrase-initial, and non-phrase-initial pro-
ductions, inlinewithpreviousworkshowingalowerstiffnessatlargerprosodicboundaries.
Peak velocity is measured in arbitrary units (i.e. displacement is normalized from 0 to 1)
per second. Regression lines shown for illustrative purposes and do not necessarily reflect
precise results from statistical model.
III.4 Discussion
Broadly speaking, the results from American English and Spanish are very comparable.
In both languages, the tongue body actively moves forward during the production of
fully occluded stops, but moves much less or not at all during the production of reduced
forms ([R] in American English, [D] in Spanish). Reduced forms in both languages also
show reduced durations compared to full stops, as well as smaller jaw movements. The
80
spatial displacement of tongue tip movement is also conditioned by movement duration,
as for the tongue body and jaw. This usually—with the exception of word-medial /d/ in
Spanish, discussed below—still results in consistent contact between the tongue tip and
hard palate in both languages. This analysis agrees in principle with previous findings that
the horizontal position of the tongue body at the time of coronal constriction is one of the
few consistent articulatory differences between auditorily-categorized flaps and full stops
[42], though the evidence here supports reduction as a continuous rather than process.
The results clearly support the continuous model of reduction for Spanish: all three
articulators (TT, TB, JAW) and linguopalatal contact patterns are fit better with the
continuous compared to the categorical model. There was also no difference in stiffness
found between /d/ and /n/, which show regular reduction in non-initial position, and /t/,
inagreementwithresultsinChapterII. ForEnglish, theresultsarenotasstraightforward.
For two of the articulators (TB, JAW), the continuous model provides a better fit for the
data than the categorical one. On the other hand, both tongue tip and linguopalatal
contact are fit equally well with the continuous and categorical models. In neither of these
cases, however, does the categorical model provide a better fit than the continuous model.
What this suggests is that for English, multiple modes exist in the distribution of
maximum displacement for the tongue tip, as the categorical model fit the data fairly well.
However, the fact that the data is equally well fit with the continuous model suggests that
these modes are clusters of productions along the same continuum relating constriction
displacement to duration, rather than arising from independent control of displacement.
Such independent allophonic control would be suggested if the categorical model fit better
than the continuous model, which is not the case here. Moreover, there was no difference
in stiffness between word- and non-initial productions (see below), suggesting that they
81
sharethesamedynamicalcontrolstructureandthatreductionisnotbasedonindependent
modulation of gestural stiffness (c.f. [88, 89]). These two factors, combined with the fact
that the continuous model did provide a significantly better fit for the tongue body and
jaw movements used to produce the coronal stops in this data, suggest that there is a
continuous relationship between displacement and movement duration, in agreement with
the hypothesis that reduction is a gradual, rather than strictly categorical, process in
English. The difference between the results for Spanish and English suggests that English
more strongly differentiates between prosodic categories (at least in word-initial position)
than Spanish; similar differences between languages in the implementation of prosodic
structure have been found in other work (e.g., [84]).
For both languages, then, this spatial and temporal reduction is not the consequence
of a simple phonological alternation, as could be (and has been, in many phonological
accounts of these reduction processes) supposed (e.g., [67, 82]). That is, the evidence
here does not support a symbolic substitution of one segment for another with unrelated
articulatory and acoustic outcomes. Rather, the amount of tongue tip, tongue body,
and jaw movement in both languages varies dynamically with changes in duration. This
undershoot of a spatial target at shorter durations is not unique to this particular case of
coronal reduction but is pervasive in speech production (e.g., [18, 20, 102]). This suggests
that the extreme reduction in the magnitude of these movements in word-medial position
is due to the very short articulatory durations in these positions. This analysis also agrees
with many previous findings on reduction in the two languages. All the studies that have
looked for evidence of clear phonological alternation between full and reduced coronals in
these languages have concluded that these sounds form a continuum of productions rather
82
than distinct allophones. This is true both for American English (e.g., [54, 42, 160]) and
Spanish (e.g., [39, 78, 158, 136]).
The data for both Spanish and English additionally suggest that reduction occurs for
the nasal /n/ as well as the oral stops. For both languages, /n/ shows a similar pattern
of temporally-conditioned spatial reduction as /t/ and /d/. This would be unexpected if
reduction were a planned phonological process as the traditional allophonic accounts of
reduction (flapping in English and spirantization in Spanish) do not include changes to
/n/. It is, however, the expected outcome if reduction is a dynamic process—after all, the
tongue movements required to produce a nasal stop are more-or-less equivalent to those
required for an oral stop. This is particularly true for English, where /n/, /d/, and /t/
are all produced with similar amounts of linguopalatal contact (no significant difference
in constriction length among segments) and at the same location along the palate.
Though the Spanish /n/ is produced slightly differently than the Spanish oral stops
(with contact at the alveolar ridge rather than the teeth/anterior palate), we would still
expect similar dynamical effects. The one study that has examined spirantization in /n/
articulatorily using electromagnetic articulometry (EMA) indeed found evidence for re-
duction in phrase-medial position [74] suggestive of spirantzation, with increased distance
between the palate and EMA sensor placed on the tongue tip. However, given the hypoth-
esis here that tongue shape determines the outcome of prosodically-conditioned reduction,
we would expect that Spanish /n/, like the English apicoalveolar stops, should reduce to
a flap. In fact, the Spanish /n/ is virtually identical to the English /n/, with a small
amount of contact at the alveolar ridge indicating of an apico-alveolar production (Figure
III.10). Previous results from EMA may have missed this distinction as the tracked point
on the tongue tip is necessarily posterior to the apex (meaning the actual constriction
83
is approximated in EMA but not measured), and EMA does not allow for visualization
of tongue shaping. Using rtMRI, with its complete image of the midsagittal vocal tract,
allows us to clearly see that coronal reduction outcomes are conditioned by the shape of
the tongue and the place of linguopalatal contact.
Spanish phrase-medial /n/ English word-medial /n/
Spanish phrase-medial /d/ English word-medial /d/
Figure III.10: Comparison of reduced /n/ (top row) and /d/ (bottom row) in Spanish (left
column) and English (right column). Productions of /n/ in both languages are flaps, with
a small amount of linguopalatal contact and a slightly curled tongue tip. English /d/ is
also flapped, while Spanish /d/ is produced as an approximant at the teeth.
Turning to the results of estimated stiffness, there is no difference in the relationship
between peak velocity and displacement by segment in either Spanish or English. The only
84
difference found was based on prosodic position, such that phrase-initial position had a
significantly shallower slope than phrase-medial position, indicating a lower stiffness. This
agrees with previous articulatory studies which have found a lower stiffness in positions
adjacent to a prosodic boundary [31, 28, 32, 7, 46]. In Spanish, the productions in the
list-initial condition have a stiffness intermediate between phrase-initial and phrase-medial
productions. Importantly, the results from both languages indicate that while stiffness is
modulated near larger prosodic boundaries, it is relatively constant across segments and
across a range of full vs reduced productions in non-phrase-initial position. Because the
same stiffness was found spanning both full and reduced forms, this suggests that stiffness
is not the driving factor in reduction, as might be suggested by a model where reduction
results from effort minimization [88, 89]. These results are also consistent with past work
that has shown temporally-conditioned spatial reduction in labial stops with a constant
stiffness, even when this does not lead to explicit (or perceived) lenition [167].
If the process of reduction is similar in the two languages—as indeed it seems to be
given the results of the current study—it raises the question of why stops in English
show a consistent pattern of reduction across segments to something like a flap (short
duration, smallamountoflinguopalatalcontact)whilestopsinSpanishshowdistinctthree
patterns–reductionof/d/toapproximants, spatialreductioninconstrictioncharacteristics
for /t/, and reduction to something like a flap for /n/. Setting aside Spanish /t/ for now
(discussed below), the answer seems to lie in the precise location where the constriction is
made. Tongue tip constrictions made at the alveolar ridge in both languages (all segments
for English, /n/ for Spanish) show reduction in the amount of linguopalatal contact as
duration is shortened. However, some contact is nearly always maintained in this position.
In fact, electropalataography data on English flapping suggests that contact is generally
85
maintained across the entire width of the palate for flapped productions, though the length
of this constriction is quite short. In [26], only one example is given of a flap, though since
the intention of this article is to give illustrative examples of the EPG contact patterns
of various segments, it can be considered fairly representative. In this single repetition,
there is complete closure across the palate, though only in a single 10ms frame. Similarly,
of the four examples of flaps presented in [100], three show complete closure across with
width of the palate while only one shows incomplete lateral closure. Unfortunately, none
of the studies systematically examine the horizontal width of the constriction for flaps.
What both [100] and [156] do include, however, are examinations of the general size of
the palatal contact for flaps. [100] simply aggregates the total percentage of linguopalatal
contact across the entire EPG apparatus. This study finds a significant reduction in
the amount of palatal contact for flaps (approximately 25 to 55%, depending on the
speaker) compared to full stops (approximately 40 to 75%, with all speakers showing
similar reductions in contact area). [156] measures the width of the linguopalatal contact
in the sagittal dimension (front-to-back) and finds that this width for flaps averages 1.98
mm, while the width for full stops averages 3.8 mm. The average for flaps would be quite a
bit smaller if only the flaps in non-rhotic adjacent contexts were considered (which would
give a comparable measure to the current study), as the width of those flaps averaged
roughly 3.5 mm.
For Spanish /d/ however, when contact is made, it is at the teeth. These results in
many productions showing little to no contact in the linguopalatal contact measure used
in this study. It may be the case, however, that this measure misses patterns of contact
with the upper teeth. On visual observation, many phrase-medial productions of /d/
show a slight indentation in the superior surface of the tongue around where the teeth
86
would be, suggesting slight contact between the two structures in these cases. Data from
EPG studies where electrodes were placed at the top of the front teeth also suggest that
even reduced productions of /d/ often contact the front teeth and very front edge of the
palate [100, 78]. In the current study, reduced duration leads to a loss of contact of the
tongue against the palate, even when contact may be maintained with the teeth. This
would suggest that the palatal contact observed in some productions of /d/ is due to the
spreading of the tongue as it compresses against that target. In which case, reduced forms
may actually be reaching its target (the teeth) as well, though that target still allows for
air to escape, as it does not result in complete oral closure unlike reduced forms of alveolar
stops.
A possible alternative is that there may be some intrinsic factor about apical produc-
tions that leads them to maintain full closure even at very short durations. Perhaps to
produce a closure with the tip of the tongue either at or posterior to the alveolar ridge,
certain muscles must be recruited than allow for relatively large, ballistic motions even at
short durations.
This analysis has, so far, ignored the full productions of /t/ versus spirantization of /d/
in word-medial position in Spanish. This cannot be explained as just the consequence of
durationally-conditioned spatial reduction as both /t/ and /d/ show similarly attenuated
displacement at short durations. One clue to what may be going on comes from the weak-
phrase-initial and phrase-initial /d/ productions. Here, the productions are much longer
and there is a significant amount of tongue body fronting and jaw raising, and there is
generally contact between the tongue and hard palate. However, there is much less contact
than for the /t/ in either phrase-initial or word-medial position (weak-phrase-initial /d/:
1.7 pixels, phrase-initial /d/: 2.4 pixels, non-inital /t/: 5.6 pixels, phrase-initial /t/: 5.6
87
pixels). A possible explanation for the long sagittal contact in /t/ is that it comes from
trying to move the tongue tip to a movement target beyond the teeth or front portion of
the hard palate. Such a a "virtual target" has been suggested for control of oral stops in
other languages [109, 106, 169, 108, 140]. When the tongue hits the palate, it is impeded,
and compresses and spreads out. Based on this idea, it would follow that perhaps the /d/
has a movement target that is closer to the palate, resulting in less spreading of the tongue.
In fact, the analysis of spirantization in Spanish labials in Chapter II proposed just this:
that voiceless stops in Spanish have a large negative constriction target (beyond the point
of articulator contact) while the voiced stops have a target just slightly beyond the point
of articulator contact [136]. The data here seem to agree with this hypothesis, both in the
differences in durationally-conditioned linguopalatal contact patterns and in the fact that
/t/ showed significantly larger tongue tip (and jaw) movements than /d/ in Spanish. If
this analysis is correct, the tongue tip for /d/ on its own would just barely touch the hard
palate when it reaches its target (as might occur at the long durations associated with
occurring in phrase-initial position). This would make achieving a full closure of the vocal
tract in this position crucially dependent on extra factors, possibly including extended
duration and the forward movement of the tongue body, which would help achieve closure
by moving more of the tongue into contact with the teeth. Spirantization of Spanish /d/,
then, is related to a reduction in the fronting of the tongue but is caused mainly by a
constriction target just at the point of contact between the tongue tip and palate and
subsequent undershoot of this target as duration is reduced.
88
III.5 Cross-linguistic evidence
Extrapolating from the data analyzed in the current project for English and Spanish, the
hypothesis follows that, when languages show coronal reduction, laminodental coronals
should always reduce to approximants ([D]), while apicoalveolar coronals should always
reduce to a flap/tap ([R]). Furthermore, retroflex coronals, which must be apical [95],
should also reduce to flaps/taps. In the comprehensive survey of lenition conducted by
[88, 89], there 55 languages which show reduction of coronal stops. Of these, five show
unexpected reductions to either fricatives (Turkana: t > s; Pengo: ú, ó > z) or liquids
(Karao: t > l, d > r; Hausa: d > r; Proto-Bantu: d > l). It is possible that the
reductions to liquids, at least, are actually the result of an approximant production of an
apicoalveolar tongue tip gesture. Impressionistically, modeling reduction of the alveolar
tongue tip constriction for /ada/ in English leads to something that sounds like [ala].
However, since the precise nature of the coronals (dental vs. alveolar) in these cases isn’t
known, and they are very few in number, they will be left out of the remaining discussion,
though they certainly merit further consideration. In the remaining 50 languages, the
majority are left unspecified in this survey for the precise characteristics of the coronal
articulation. In the languages where the detailed manner and place of articulation are
unknown, 20 show reduction to [D] and 12 to [R]. For the cases where the place is known,
the results are presented in III.4.
Note that all the dental stops reduce to [D] and all the alveolar/retroflex stops reduce
to [R]. The hypothesis that the shape of the tongue, as reflected in the manner and place
of articulation, is the determining factor in the outcome of coronal lenition is borne out.
Interestingly, there are two languages in the survey which show lenition of both dental
89
dental alveolar retroflex
D 5 0 0
R 0 3 14
Table III.4: Results of coronal lenition at three places of articulation from the survey in
[88, 89]. The numbers shown in each box reflect the number of occurrences of that pattern
in the survey. Some languages that show reduction at more than one place of articulation
may be included more than once. (
2
(2, n = 22) = 15.9, p < 0.001)
and alveolar stops, Purki and Yindjibarndi. In both cases, the dentals reduce to [D] and
the alveolars to [R] (Yindjibarndi also shows reduction of retroflex stops to [R]).
Interestingly, there are 22 languages in the survey that show reduction of voiced stops
atallplacesofarticulation, justasSpanishdoes. Ofthese22, 15showthepattern/b, d, g/
>[B, D, G].Threeoftheselanguagesalsoreduce/ã/to[R].Theremainingshowthepattern
/b, d, g/ > [B, R, G]. Given this data, it seems that the mechanism driving these patterns of
reduction across multiple places of articulation (hypothesized here to be reduced duration
in prosodically weak positions) always results in approximant labial and velar productions,
but can lead to either approximants or flaps, depending on the language. This is expected
given the hypothesis proposed here that prosodic variation underlies reduction to both
flapsandapproximants, withtheeventualoutcomeconditionedbytheparticularlanguage-
specific articulatory posture of the tongue during coronal production, but is difficult to
explain if the mechanisms driving flapping and spirantization are different.
III.6 Conclusions
The current study examined prosodically-conditioned reduction of coronal stops in Amer-
ican English and Spanish. The results show that the process of reduction is relatively
90
similar in both languages: shorter durations in non-initial position result in smaller move-
ments of the all three articulators involved in making the coronal constriction—the tongue
tip, tongue body, and jaw. The eventual articulatory outcomes of this durational and spa-
tial reduction are conditioned by the precise manner in which each language produces
coronal stops: differences in constriction degree, constriction location, and the part of the
tongue used to form the constriction result in flapping in American English for /t/ and
/d/ as well as /n/, spirantization of Spanish /d/, reduced productions of /n/, and full stop
productionsforSpanish/t/. Reductioninbothlanguagesshouldnotbeseenasasymbolic
allophonic substitution but rather as a dynamic process that is the result of both invariant
(e.g. constriction target, location) and variable (e.g. duration) factors. The word-medial
productions are simply the far end of a continuum of prosodically-conditioned variation
in duration and magnitude, in agreement with previous experimental work examining
American English flapping [54, 42, 160] and Spanish spirantization [136, 150, 159].
91
Chapter IV
A dynamical-systems model of prosodic variability
and possible sound change
IV.1 Introduction
The previous chapters have been concerned primarily with synchronic variation–that is,
relating the variability found in production to the underlying phonological control. The
primary observation is that reduced forms in production can be attributed to the same
control target as full productions, with contextual or prosodic factors causing variable
undershoot of that target. This type of reduction has obvious diachronic parallels in
sound change over time. In fact, a great deal of work in historical linguistics has focused
on these types of consonant reduction, generally referred to in that field as lenition.
The vast majority of historical approaches to lenition take two main approaches: either
lenition is either a step on the way to complete deletion of a segment or lenition is an
increase in the sonority of the segment in question (see [72] for a thorough review of
the history of lenition in phonological theory). While these theories may be adequate
to describe the diachronic changes observed (at least at a phonemic level), relatively less
92
work has considered the process through which sound change occurs (though see, e.g.,
[12, 13, 131, 130]) . That is, most analyses take as a starting point that a sound change
occurred, then attempt to describe what has changed in the language at a structural level.
For example, /t/ might change to /d/ either because it is "stronger" than /d/ or because
it is less sonorous, on the assumption that sound change can be analyzed as a decrease in
strength or an increase in sonority. These analyses, however, do not explain how such a
sound change might arise–what would cause such a decrease in "strength" or increase in
sonority for a particular sound or class of sounds in a particular language at a particular
time? The relevant question isn’t how can we describe the outcomes of lenition after the
fact, but rather how do lenition-type sound changes arise in the first place.
One of the most influential theories of the origin of diachronic sound change posits
that this type of change arises when language learners misinterpret the variable synchronic
productionsintheirambientlanguage—theso-calledlistener-drivenmodelofsoundchange
[131, 130]. The idea is relatively simple: a language learner (or listener) is presented with a
hugely variable range of surface productions, which are influence by contextual, prosodic,
and many other factors. However, much of this variability can be corrected for by the
listener. For example, when the velum is open during production of a vowel, it effectively
changes the vowel formant frequencies such that the vowel is "lower" in F1-F2 space.
Listeners, however, compensate for these changes if a nasal consonant follows the vowel,
such that the correct vowel is perceived [10]. Ohala argues that it is when the listener
either fails to correct for or over-compensates for such contextual factors that leads to
sound change [130]. If the listener does not correct for these additional factors, the surface
variability will be taken as the direct result of the underlying control primitives used to
produce speech—that is, the surface forms will directly reflect the phonological forms.
93
Under this view, then, synchronic variability underlies (though does not directly cause)
diachronic change.
A view of consonant lenition in this listener-driven framework was sketched out by
Beckman et al. [8]. This analysis was based on speech articulation, and particularly
Articulatory Phonology [17, 19, 21]. They focus on the fact that some speech conditions
(such as fast or casual speech, varying prosodic conditions, etc.) can lead to increased
overlap among speech gestures. For example, Browman and Goldstein show that seeming
deletion at the acoustic level of word final /t/ in perfect memory is actually attributable to
an increased overlap of the gestures for, /k/, /m/, and /t/ in some productions, such that
the raising of the tongue tip for /t/, while still produced, is acoustically masked by the
adjacent labial and velar gestures [17]. The particular proposal is that prosodically weak
conditions lead to increased overlap among gestures, which can result in reduced acoustic
salience of a particular gestures, and so may lead to listener-based sound changes.
While they also discuss vowel reduction and consonant voicing in their proposal, the
most relevant section focuses on spirantization of stops. Their proposal is that the gesture
for stops in prosodically weak positions is overlapped substantially with the surrounding
vocalic gestures, such that the consonant gesture is truncated and an incomplete seal
of the oral cavity is produced. This would lead roughly to approximant realizations of
stops, as in the case of Spanish spirantization. However, they acknowledge that stop
spirantization, while perhaps more common in prosodically weak position, often occurs as
an unconditioned change. They propose that unconditioned changes are restricted to—or
at least much more common in—voiceless stops, where "a listener [might] misinterpret the
turbulence of the aspirated release as frication, even in the absence of truncation of the
oral gesture for the stop" [8, p. 50].
94
This first sketch provides a good starting point, beginning with a physically-grounded
explanation of articulatory variability. However, there are a few major shortcomings that
need to be addressed. First, their model of gestural reduction based on articulatory
overlap in prosodically weak positions can necessarily only explain a subset of the class
of consonant-leniiton-type sound changes. In the examples of diachronic spirantization,
Beckman et al. cite both processes conditioned by prosodic context and cases where
prosodic context does not seem to play a role. An example of spirantization occurring
only in prosodically weak contexts they give the example of Latin voiced stops becoming
approximants when they occur in word-medial position in most dialects of Spanish
1
. It
should also be noted that many Latin voiced stops in this environment reduced further,
eventually deleting entirely (e.g. Latin LEGERE! Spanish leer). The examples given
of an unconditioned lenition is the change from ProtoIndoEuropean /*p, *t, *k/ to Pro-
toGermanic /*f, *T, *x/ in all prosodic positions, or the similar unconditioned change from
/b, d, g/ to /B, D, G/ in Classical Greek (note that this is the same type of change as that
from Latin to Spanish, though the conditioning environment differs).
While Beckman et al. do acknowledge that spirantization may occur both as a
prosodically-conditioned and an unconditioned process, they must come up with an alter-
nate explanation for unconditioned sound changes. This is problematic for a number of
reasons. First, in the absence of any evidence that two processes are necessary, it seems
more parsimonious that changes that affect one class of segments in a consistent way have
the same underlying cause. In fact, Beckman et al. suggest that the misinterpretation of
1
Beckman et al. actually say that the voiced stops become voiced fricatives in these positions, but there
is no frication in these productions, and the general consensus is that voiced stops reduce to approximants,
not fricatives [100, 115].
95
the release turbulence may also be misinterpreted as an affricate. This means that one
cause of sound change (misinterpretation of release burst) may have two different out-
comes (affricate or spirant) while at the same time one of those outcomes (spirantization)
has two distinct causes (truncation-based undershoot or the release misinterpretation). A
unified account of spirantization across prosodic contexts would be preferable.
The second issue is perhaps the more crucial one. Beckman et al. sketch out a mainly
a synchronic model of how gestural overlap in prosodically weak position may lead to
online reduction. What is missing is an explanation of why this overlap (and subsequent
reduction in constriction degree) might lead to sound change beyond a general notion of
listener misinterpretation.
The proposal here builds on the insight in [8] that prosodic position can lead to re-
duction in gestural magnitude due to temporal overlap or, particularly focused on in
this proposal, durational reduction of consonant closure gestures. This similar to the
argument in Chapters II and III that spatial reduction in consonants is caused by du-
rational reduction, particularly in prosodically weak position. It expands on the ba-
sic sketch offered in previous work in two main ways: first, in IV.3, the proposal here
explicitly links synchronic change to a specific type of misinterpretation, where listen-
ers attribute prosodically-conditioned variability as reflecting the underlying phonological
control schema, rather than appropriately accounting for the role of prosody in gestural
variation. Second, there is a unified account of both conditioned and unconditioned vari-
ability, based on differing patterns of synchronic reduction. In IV.2, a preliminarily model
of synchronic prosodically-driven reduction is presented in a dynamical systems framework
[56, 55] that is capable of causing either bimodal or unimodal productions patterns, both
of which are shown to exist in languages that show synchronic lenition patterns.
96
We propose a model where language learners attempt to recover the phonological
target constriction degree of stops through analysis of the distribution of produced tokens
in their ambient language. These distributions are shaped by the prosodic system of the
language. We assume that, in agreement with Chapters II and III, that the prosodic
system alters the duration of speech movements and leads to variable undershoot of the
constriction target. This prosodically-conditioned undershoot can lead to sound change
if learners misattribute conditioned variability to phonological control instead of prosodic
influence. Different prosodic structures between languages can lead to different patterns of
of synchronic variability (uni- and bi-modal), which are argued to underlie the difference
between conditioned and unconditioned diachronic lenition patterns.
IV.2 Modeling variability in production from phonological
invariance
The key to the current proposal is that different patterns of synchronic prosodically-
conditioned variability underlie different types of diachronic lenition. We primarily deal
here with variability in constriction degree, assumed to be mediated through prosodically-
conditioned changes in articulatory movement duration, as shown in Chapters II and III.
More specifically, the hypothesis here is that some languages will exhibit reduction that is
limited to prosodically weak positions (leading to conditioned sound change) while others
will show reduction across all prosodic positions (leading to unconditioned sound change),
depending on the structure of the prosodic system in each particular language. The first
question that must be answered is if the two hypothesized types of synchronic variability
are attested in currently spoken languages.
97
In fact, there is evidence of both variability patterns in Spanish spirantization and
English flapping, the two phenomena analyzed in chapters II and III. Recent work has
examined the distribution of constriction degree in Spanish [158]. That study collected
acoustic data from 20 Spanish-dominant Spanish-Catalan bilingual speakers from Majorca
participating in an interactive task (in Spanish). Using an acoustic estimate of constric-
tion degree , that study found that the distribution of constriction degree in Spanish /d/
was not significantly influenced by word position or stress. While there was a significant
effect of surrounding context, such that there was generally more constriction after high
vowels than low vowels, this was a gradient rather than categorical difference. In fact, the
distribution of estimated constriction degree in their dataset showed a clearly unimodal
distribution. The same pattern was found across all voiced stops for monolingual penin-
sular Spanish speakers in a similar, separate study [34], though slightly different results
were found for Costa Rican Spanish, which suggested a categorical effect of low vowels
compared to all other contexts in that dialect (see Chapter II).
We test whether the distributions differ between English and Spanish by examining
the distribution of constriction degrees produced for both languages from the real-time
MRI data collected for the study in Chapter III. All segments which showed prosodically-
conditioned reduction were included, which were /d/ and /n/ for Spanish, and /t/, /d/,
and /n/ for English. In order to test our assumption that differences in spatial magnitude
result from prosodically-conditioned durational differences, we also calculated the distri-
bution of the movement durations for the same segments. Total counts for non-initial
position were reduced by a factor of 4 (for English) or 3 (for Spanish) to give an equal
number of tokens from non-initial, word-/weak-phrase-initial, and phrase-initial produc-
tions. All the resulting distributions can be seen in Figure IV.1. It can be seen that while
98
the Spanish segments have a unimodal distribution for movement magnitude, the English
data show a clear bimodal distribution. Similar results are seen in the distribution of
duration in the two languages, though the Spanish data show a clear leftward skew in this
case.
0 0.5 1
0
0.25
Normalized displacement
Proportion of total tokens
0 70 140
0
0.5
Duration in samples
Proportion of total tokens
Figure IV.1: Comparison of distribution of total displacement (left) and duration (right)
for segments which show prosodically-conditioned reduction in Spanish (/d/ and /n/, in
blue) and English (/d/ , /t/, and /n/, in red). Values were taken from the real-time MRI
dataset described in Chapter III (see text for details). It can be seen that the Spanish
stops show a unimodal distribution in both duration and (skewed) for constriction degree
while the English stops show a bi-modal distribution.
Based on the data from English and Spanish, we can conclude that there are (at least
two) different synchronic patterns of prosodically-conditioned spatiotemporal reduction,
as hypothesized.
2
. One pattern shows a single mode relatively independently of context,
while the other shows a clear bi-modal distribution of reduced and full productions. We
2
Recall that in III, English tongue tip movement was fit equally well with categorical and continuous
models, suggestingquasi-categoricalmodesinthedistributionoftotaldisplacement, thoughtheunderlying
relationship between displacement and duration was constant; Spanish, on the other hand, did not show
a good fit with the categorical model, suggesting the absence of multiple modes in the distribution. The
results here for constriction degree mirror those findings.
99
hypothesize that these two synchronic patterns underlie the two types of diachronic leni-
tion, unconditioned (unimodal) and prosodically-conditioned (bimodal). We suggest that
the different distributions reflect prosodic differences between the two languages. It may
be that the prosodic structure of Spanish is "flatter" or less differentiated that that of
English. It has been shown that different languages show very different effects of both
prosodic boundaries as well as lexical and phrasal stress in both spatial and temporal as-
pects [28, 52, 84, 36, 48, 41, 37]. Particularly relevant to this data, Spanish does not seem
to show spatiotemporal differences for segments in word-initial position (e.g., [39, 136]),
unlike English (e.g., [52]).
In order to develop our model of how such prosodically-conditioned synchronic reduc-
tion patterns may lead to sound change, the next step needed is the development of a
suitable model that relates a single phonological target at the control level to differing
patterns of variability at the production level. We will assume in the following modeling
that the prosodic system shapes the ultimate production of constriction degree through
durational variation, as shown in Chapters II and III. However, our aim here is to show
how prosodic structure could create multiple types of distributions from a single phono-
logical target constriction degree—modeling prosodic structure as a direct influence on
constriction degree considerably simplifies this goal without, we believe, detracting sub-
stantially from the plausibility of the model, as there exists some regular, though perhaps
not precisely known, relationship between duration and ultimate constriction degree.
We base our model relating phonological invariance and produced variability on the
model proposed by Gafos and Benus [56, 55, 11]. This model uses non-linear dynamics to
relate invariant phonological control to variability in production without any intermediate
steps. The fundamental insight is that both phonological representations as well as other
100
constraints on production (say, grammatical constraints) can be represented as first-order
dynamical systems with invariant control parameters that, when combined, determine
the behavior of the system (speech production). These dynamical systems are described
mathematically by the function _ x =f(x), where x is the current state of the system and
f(x) is a force function that determines howx changes. These systems can be understood
at an intuitive level by examining the related potential function V(x), such that _ x =
f(x)=dV(x)=dx. These potential functions can be read as topographical contours, and
we can imagine that the initial value for x can be represented by placing a ball onto the
contour at that particular location (Figure IV.2). Values for x where the ball does not
move (either at a maximum or minimum of V(x), or where f(x) = 0) are knows as fixed
points. Stable fixed points, or attractors correspond to minima in V(x), such that the
forces drive our imaginary ball towardsx from any nearby value. Maxima inV(x), on the
other hand, correspond to unstable fixed points, and small deviations in x will cause our
imaginary ball to move away from the unstable fixed point to an attractor.
V(x)
x
x
1
x
2
x
3
Figure IV.2: Example of a potential function V(x). Balls placed at any x will be forced
towards the nearest attractor (x
1
orx
3
). There is an unstable fixed point at x
2
such that
any ball placed precisely at that point will stay there, but any slight deviation from that
point will be attracted to x
1
or x
2
. Figure after [56].
101
In the dynamical phonological model proposed by Gafos and Benus, phonological rep-
resentations can be expressed as a potential function with an attractor (x
0
) representing
the phonological target along some phonetic continuumx. They give the example of voic-
ing, in which a particular segment (or gesture) can have as a target of voicelessness or
voicing, expressed as positive and negative values of x, respectively. Because each phono-
logical representation is assumed to have a single target (e.g. [ voice]), the simplest
potential function with a single target is used to model these categorical representations.
This is the function _ x =x
0
x, or by integration, V(x) =
x
2
2
x
0
x. The power of this
representation is that it expresses the phonological representation (the attractorx
0
) in the
same terms as the continuous phonetic representation. These phonological representations
interact with other grammatical factors that can serve to bias production towards one end
of the x continuum. In Gafos and Benus’s initial proposal [56], they give the example
of a markedness constraint that serves to bias towards voicelessness (in the case of final
devoicing which they consider). These functions, in contrast to the prosodic category po-
tential functions, must necessarily be able to have two attractors, as constraints may bias
towards one end or another of the continuum. As an initial hypothesis, they posit cubic
forcing functions to represent these constraints, as they are the simplest functions that
exhibit this behavior. These forcing functions take the form M(x) = _ x =k +xx
3
,
with a related potential function V
M
(x) = kx
x
2
2
+
x
4
4
. In these functions, the param-
eter k determines the number, location, and depth of the wells for the attractor(s). The
final production of a particular phonological form in a particular context is then modeled
as a linear combination of the phonological target and markedness constraint, such the
_ x=f(x)+M(x), as shown in Figure IV.3.
102
V(x)
−3 −2 −1 0 1 2 3
0
60
V
M
(x)
−3 −2 −1 0 1 2 3
0
40
V(x) + V
M
(x)
−3 −2 −1 0 1 2 3
0
35
x
0
Figure IV.3: Example of potential functions and associated histograms for a phonological
category (V(x), with attractor at x
0
), a contextual constraint (V
M
(x), and their combi-
nation (V(x)+V
M
(x)). The top row shows the potential functions and the bottom row
histograms generated from 1000 simulations. In this example,x
0
forV(x) is set to -1, and
k for V
M
(x) is set to -.75.
Of course, the speech production system is not as ideal as these potential functions
would suggest. This noise results from the many-layered systems involved in speech pro-
duction(e.g. phonological, neuronal, aerodynamic, andmuscularsystems).Thisisincluded
in the model with the addition of a small, random force pushing parameter x back and
forth. Mathematically, this is represented with an additional factor representing Gaus-
sian white noise with strength q, giving the equations of the form x = f(x)+Noise =
dV(x)=dx+q. Because of the noise in the system, the value ofx can only be computed
probabilistically. These probabilities are estimated by computationally simulating the so-
lution to _ x =f(x)+Noise [56, 70] This simulation is run with the same parameters for
f(x) a large number of times and (here, 1,000) the histogram of the solutions is plotted
as an estimate of the probability density function related to f(x).
103
k = 1
k = -0.6
Figure IV.4: Example of potential functions for potential prosodic conditions. These are
versions of the quartic function V(x) = kx
x
2
2
+
x
4
4
differing only in the value for k.
Positive values for k give an attractor at a negative value of x, and vice versa. Larger k
values results in deeper wells.
We here use the dynamical systems framework of phonological representation proposed
by Gafos and Benus to model the two types of prosodically-conditioned reduction in En-
glish and Spanish. We use the quadratic potential functions (single attractor) to represent
the target constriction degree for stops, on the continuum of constriction degree x, where
negative values ofx are equivalent to productions with incomplete closure (more negative
values = greater aperture) and positive values represent full closure. The general form of
104
this function is V
T
(x) =
x
2
2
x
0
x. Instead of combining these categorical representa-
tions with phonological constraints, however, as Gafos and Benus did, we use the quartic
potential to represent the prosodic context. Just as this function allows for modeling mul-
tiple constrains along the same dimension (e.g. *voice and *voiceless) by manipulation
of only a single parameter (k), we can similarly model the bias of different prosodic con-
texts with the same single parameter. Both prosodic contexts that favor full productions
(e.g. phrase-initial) and those that favor reduction (phrase-medial) can be modeled with
the same family of functions (V
P
(x) = kx
x
2
2
+
x
4
4
), differing only in the parameter k.
Negative values fork result in an attractor with a positivex value, and positive values for
k result in an attractor with a negative x value. Larger values in either direction result
in deeper wells—local minima in the potential function, where deeper wells have a lower
value forx in the potential function—around the attractor and a more narrow distribution
in the histogram density (Figure IV.4)
We model each language with a single value for x
0
, combined separately with two
prosodic contexts, one favoring reduction (positivek) and one favoring full stops (negative
k). For each prosodic condition, the phonological target function V
T
(x) is added linearly
with the prosodic conditioning function V
P
(x). This solution to this combined function
with added Gaussian noise is then simulated computationally 1000 times. The output
of the two combinations (one for each prosodic context) are then combined, giving 2000
values for each simulated language. We take the resulting histograms to represent the
range of productions that would be observed in a particular language, analogous to those
shown in Figure IV.1 for Spanish and English.
While there are, of course, infinite combinations of x
0
and k, which are continuous-
valued variables, our goal here is to show that the model is minimally capable of simulating
105
0
400
−2 −1 0 1 2
0
400
Frequency of occurrence
x
x
0
Figure IV.5: Examples of uni- and bi-modal distributions generated from combining a
single phonological target with two differing prosodic contexts. The phonological target
was the same for both distributions (x
0
=0:1). The uni-modal distribution was generated
with the set of k parameters [-0.1 0.6], representing in the former case a weak bias to-
wards full stops and in the latter a relatively strong bias towards reduction. The bimodal
distribution was generated usingk values of [-0.8 0.8], reflecting strong biases towards full
stops and reduction, respectively.
the qualitative types of variability we see in consonant lenition across languages. Specifi-
cally, weneedtoshowthatthemodelcangeneratethetwotypesofdistributions, unimodal
and bimodal, that we see in Spanish and English and that are hypothesized to underlie
unconditioned and conditioned lenition-type sound changes. Ideally, the model should be
able to generate such patterns from the samex
0
target, as similar stop series can undergo
both types of sound changes (compare unconditioned reduction of voiced stops in Classical
Greek to conditioned reduction in Latin, as noted in IV.1).
106
Whilex
0
andk values exist on a continuous dimension, we assume that each language
has a particular phonological target (x
0
) and particular values fork for each prosodic cat-
egory (e.g., utterance, intonational phrase, prosodic word, etc.). This reflects the proposal
in Articulatory Phonology that while the dimensions of constriction degree and constric-
tion location are continuous, each gesture in a particular language will have a set value for
these parameters [19]. The phonological target (x
0
) in our model directly corresponds to
constriction degree in Articulatory Phonology; we assume that the wide range of possiblek
values, reflecting the influence of the prosodic boundary condition on constriction degree,
is restricted in a similar way as constriction degree and location.
The ability of the dynamical phonological model to generate both types of distribu-
tions is shown in IV.5, where the samex
0
target forV
T
(x) is used to create both unimodal
and bimodal distributions. In these simulations,x
0
is set to 0.1, a "virtual target" beyond
the point of articulator contact, as would be expected for stops [109, 106, 169, 108, 140].
In the unimodal distribution,k values were set to -0.1 and 0.6. These values represent one
prosodic context with a weak bias towards full stops on the one hand and one prosodic
context with a relatively strong bias toward reduction on the other. The bimodal distri-
bution was generated withk values of -0.8 and 0.8, reflecting one prosodic context with a
strong bias towards full stops and the other with a strong bias towards reduction.
Whilethis model iscertainlytoo simple toserveas acompletemodelof prosodic effects
on lenition, it fulfills the basic requirements necessary for our present goal—it models how
different prosodic contexts can lead to production of either full stops or reduced forms, and
also is able to generate the bimodal and unimodal distribution of constriction degree seen
inactuallanguagessuchasEnglishandSpanish. Ofcourse, themodelisquitesimple, with
only two prosodic contexts for any given language. More importantly, the phonological
107
representationandprosodicconditioningaregivenequalweightswhencombined—itseems
probable that languages would differ in the degree to which prosody affects articulation.
Nonetheless, the model proposed here can serve as a minimal first step that reflects crucial
properties of prosodically-conditioned reduction.
IV.3 Recovery of phonological targets
Havingestablishedamodelthatisabletoreproducethefundamentalaspectsofprosodically-
conditioned consonant reduction necessary for the current account of diachronic lenition
patterns, we now turn to the question of how synchronic reduction can lead to sound
change. Recall that our proposal, in line with Ohala’s theory of listener-driven sound
change [131, 130], is that language learners accurately learn the patterns of variability in
their ambient language, but can fail to accurately assign the cause of that variability to
the prosodic conditioning factors, instead positing incorrect phonological targets.
It has been shown both adults [120] and infants [122, 121] are sensitive to the sta-
tistical distribution of the phonetic input. In these studies, listeners were exposed to a
continuum of sounds from voiced /d/ to unaspirated /t/ which differed in Voice Onset
Time (VOT). In each study there were two groups: the first was briefly exposed to a uni-
modal distribution of VOT, centered around the middle of the continuum; the second was
exposed to a bimodal distribution with two peaks near the edges of the continuum. After
the exposure period, the group that was exposed to the bimodal VOT distribution was
able to differentiate between two exemplars with VOT drawn from different distributions,
while the group that was exposed to the unimodal distribution was not. The same effect
108
was found for both adults and infants. This suggests that exposure to the bimodal distri-
bution caused the listeners to separate the VOT continuum into two separate categories,
while the unimodal distribution precipitated the creation of a single category that spanned
the entire VOT continuum. On the subsequent discrimination task, the exemplars that
differed in VOT could only be distinguished if they fell into separate categories.
Pierrehumbert, based on these results, suggests that infants learning language first
learn the positional variants of a particular phoneme [144]. Contextual (either prosodic or
segmental) effects can significantly alter the phonetic production of a given phoneme, such
that the same phoneme can have very different distributions along some phonetic property.
She gives the example of the distinction between /s/ and /z/ in English, which is signaled
by a voicing contrast in phrase-initial position but only by a durational difference in
word-final position, where /z/ undergoes devoicing. She argues that infants first create
categories based on the distributional properties in a particular context, and only later
learn to associate between reflexes of the same phoneme in different contexts.
In this section, we present a computational model of how a language learner builds
categories from the distributional characteristics of the phonetic input. We begin with
a simple model in which the learner does not assign any role to the prosodic system
in shaping the distribution and simply builds categories based on the raw distributional
input. Subsequently, the learner steadily increases the weight given to the prosodic factors
in conditioning phonetic variability, until reaching an "adult" stage where equal weight
is given to both phonological categories and prosodic context in explaining production
variability.
Previous models have attempted to implement computational solutions to the problem
of forming categories from continuous phonetic distributions using neural networks (e.g.,
109
[47, 63]) or different types of Gaussian mixture models (e.g. [96, 123, 166]). Our approach
necessarily differs from that work because we attempt to model recovery of phonological
categories not only from the surface distributional statistics but also include the effects of
prosodic context in the recovery process. One benefit we have is that we are attempting
to form categories over distributions whose underlying control structure is known, while
other attempts must attempt to recover categories from acoustic patterns of real speech
ultimately created by a complex, multilayered system.
This is additionally advantageous because our aim is not simply to recreate the forma-
tion of perceptual categories, but to model a learner’s acquisition of a language. The goal
of such a learner must be to recover the production parameters underlying the produced
variability distribution, not just to form perceptual categories. In the current proposal, we
model this process as the recovery of the dynamical production parameters (x
0
andk) that
determine the production variability of the system. As a first approach, we assume that
the language learner has access to their own production system, which is of an equivalent
nature as that used to produce their ambient language.
We model recovery of the dynamical control parameters as a process of error minimiza-
tion across multiple possible dynamical parameters. The goal of the learner is to minimize
the error between the output of their own system and the distributional patterns seen
in the ambient language. As a first approximation, we assume that the language learner
attempts to recreate the ambient distribution with two phonological targets (x
1
and x
2
),
each ranging between -2 and 2. To speed up computation, we limit the step between
possible candidates to 0.25, giving a total of 17 possibilities for each x
n
. Our testing
indicates that decreasing the step size gives a finer-grained error map but does not alter
the recovered results as long as the initial x
0
is selected from the same set of possibilities.
110
For each combination of targets, the least-squares error between the resulting distribution
and the ambient distribution is calculated. The errors for all combinations of x
1
and x
2
can be visualized on an error map as in Figure IV.7, which shows larger errors in red
and smaller errors in blue. The best solution is the combination with the lowest error
(darkest blue on the error maps). When there are multiple solutions with effectively equal
error (error equal to the best solution 10%), we take the solution closest to the mean
of those solutions in two-dimensional space as the optimal solution. This procedure was
necessary to ensure consistency across multiple runs of the system, as small fluctuations
in the noise value in generating both the ambient and test distributions sometimes led to
different solutions across multiple runs.
3
When the optimal solution gives values for x
1
and x
2
that are effectively equal, we
take that as evidence for a single phonological target and assume that the learner will
subsequently recover only a single target in that case. As a first approximation, we assume
any targets differing by 0.5 or less are "effectively equal," and take the mean of the x
1
and x
2
as the value of the single recovered target. Targets that are this close together
give extremely overlapping distributions (FIgure IV.6). While the fit to the ambient
distribution may be improved with two targets in close proximity, they would not be
perceptually distinguishable, and would provide poor categorical contrast.
We model the development of the language learner as the gradual introduction of
prosodic information into the recovery process. At the first stage, the learner should be
3
We looked for consistency across runs as we are attempting here to model a general process of acqui-
sition / language change in a population. Removing this step may be a way to model individual-speaker
variation across a population. Modeling the interaction among speakers during the acquisition phase may
provide a way to model the emergence of sound change across a population, an avenue we are currently
exploring.
111
−2 0 2
x
0
= 0.75 x
0
= 0.25
Figure IV.6: Example of distributions generated from two phonological targets differing
by 0.5 (x
0
= 0:25 in red, x
0
= 0:75 in blue). There is no effect of prosodic context in
this case. The distributions are highly overlapped, and would provide a poor basis for
formation of two distinct categories.
acquiring positional variants and disregarding any influence of the prosodic system on
shaping phonetic production [144]. This is modeled by having the learner use the simple
phonological target potential functionsV(x)=
x
2
2
x
n
x in generating the candidate dis-
tributions. As development progresses, the learner starts to assign some of the variability
in production to the prosodic system. We model this by combining each x
n
with the two
prosodic conditionsV
P
(x)=w(k
n
x
x
2
2
+
x
4
4
) determined by the parametersk
1
andk
2
.
The parameter w represents the weight assigned to the prosodic structure in creating the
variability seen in production. Successive stages of the acquisition process are modeled
with increasing weights of w, such that at the last stage assigns w = 1. The absence of
prosodic information at the first stage is equivalent to settingw to 0, and each subsequent
112
stage increases this weight by 0.25, giving 5 total stages. These stages are simply used
for illustration purposes and are not meant to necessarily reflect any particular stage of
development, though an interesting avenue for future research would be to attempt to
model developmental stages in this framework.
It is important to note that in this model we assume the learner has access to the
correct values fork
1
andk
2
in the ambient language. This means that sound change is not
due to a lack of knowledge about the prosodic structure of the language; rather, the error
in recovery comes from underweighting the role of the prosodic structure in shaping the
variability in production. We make this assumption for a number of reasons. First, it is
our goal here to model the acquisition of reduction patterns, not necessarily the acquisition
of prosodic structure. The acquisition of prosodic structure is an enormous problem on its
own, and we want to use the simplest possible model at this preliminary stage. Second,
there is evidence that the acquisition of prosodic perception begins at least as early in
development as the acquisition of phonological categories [144, 119, 124, 40]. In agreement
with our proposed model, though, there is a delay before infants begin to incorporate
the prosodic structure into their own productions [44], suggesting that acquisition of the
prosodic structure occurs before it is incorporated into production rather than being used
simultaneously as it is acquired.
Error maps used for target recovery from both uni- and bi-modal distributions with
varying weight assigned to prosodic context is shown in Figure IV.7. Stage 1, where
w = 0, is shown on the left, and each successive plot increases w by .25. Full weight
(w = 1) is assigned to prosodic factors in the far right map. Below each error map is a
comparison of the ambient language distribution and the distribution generated by the
recovered parameters. Note that the error maps are symmetrical around the the line
113
-2
0
2
-2 0 2
BIMODAL UNIMODAL
w = 0 w = .25 w = .5 w = .75 w = 1
w = 0 w = .25 w = .5 w = .75 w = 1
-2 0 2
-2
0
2
-2 0 2
-2 0 2
x
1
= x
2
= -0.75 x
1
= x
2
= -0.675 x
1
= x
2
= -0.5 x
1
= x
2
= -0.125 x
1
= x
2
= 0
x
1
= -0.75
x
2
= 1
x
1
= -0.75
x
2
= 1
x
1
= -0.75
x
2
= 1
x
1
= x
2
= 0.25 x
1
= x
2
= 0.25
Figure IV.7: Recovery of a phonological targets from a bimodal distribution (as in English,
top) and uni-modal distribution (as in Spanish, bottom). Error maps (top row in each
language) show mean square error between ambient distribution and distribution created
at each [x
1
,x
2
] combination. x
1
is shown on the x-axis andx
2
on the y-axis. The original
x
0
used to generate the ambient distribution is shown in the white square in each map (.25
for the bimodal case, .0 for the unimodal case). Below each error map is a comparison of
the ambient distribution (in blue) with the distribution generated by the [x
1
,x
2
] pair with
the least MSE. In the bimodal case, two targets are recovered at w =0, with a switch to
recovery of the one correct target at w =1. In the unimodal case, one target is recovered
in all cases. This target is at the mode of the distribution wherew =0 and moves towards
x
0
as w is increased.
passing through the origin with slope equal to unity. This is simply due to transposition
of the recovered targets. For example, where w = 0 in the bimodal case, the recovered
targets and -1 and 1. The two wells in the error map simply reflect whether -1 is chosen for
114
x
1
and 1 forx
2
, or if 1 is chosen forx
1
and -1 forx
2
. Regardless, the solution is the same.
When the recovered targets are the same (as for the bimodal case when w = 1), there is
only one well. The upshot of this symmetry is that the maps are visually intuitive: when
there are two wells off the line of symmetry, the system has recovered two separate targets;
when there is only one well on the line of symmetry, only one target was recovered.
Considering the unimodal case first, the system always recovers a single phonological
target regardless of the weight assigned to the prosodic context. However, the value
of this target changes as a function of w. When w = 0, the recovered x parameter is
equal to the mode of the ambient distribution, which is significantly less (more open/less
constricted) than the original x
0
. Note that the distribution based on the recovered x
lacks the skewing present in the ambient distribution. As w is increased, the recovered
x approaches the initial x
0
, such that recovery is perfect when w = 1. In the bimodal
case, recovery of parameters without considering the influence of prosodic context leads
the learner to posit two phonological targets, withx
1
andx
2
at the location of the modes
in the ambient language distribution. In both cases, the overall fit improves substantially
as w is increased as well. In the examples shown, the minimal error (i.e. error between
the ambient distribution and the test distribution that best fits the data) in the unimodal
case decreases (comparingw =0 tow =1) from 0.0106 to 0.0002, with a steady decrease
as w increases. In the bimodal case, the error decreases from 0.0121 to 0.0004, though
this reduction only occurs atw =1 (i.e. there is no real reduction in error betweenw =0
and w = 0:75). It remains to be tested how prevalent this pattern is, but if we imagine
that learners are attempting to decrease the error of the fit by increasing the weight of the
prosodic structure, such a plateau in error might cause a learner to settle on a non-optimal
prosodic weight. This suggestion is necessarily speculative at this point, but provides an
115
intriguing clue for the open questions of why sound change begins in a certain population.
In general, the overall agreement between the original and recovered distributions at all
weightsofw isquitehighforboththeunimodalandbimodalcasedespitetheimprovement
in fit with higher weights assigned to w.
Since our model assigns a specific weight to the prosodic contexts when recovering
phonologicaltargetsfromthephoneticdistributionpatternsofalanguage, wecanstraight-
forwardlytranslatealistener-driventheoryofsoundchangeintothemodel. Asmentioned,
we assume that lention-based sound changes occur when the language learner misinter-
prets the effects of prosodic context on phonetic variation as reflecting the underlying
phonological control. We can implement this in the model as a weighting of w below 1.
One benefit of this approach is that we don’t need to assume that the learner entirely
ignores the influence of the prosodic system on produced variability, just that they assign
slightly incorrect weight to prosody.
While both the unimodal and bimodal distributions lead to accurate recovery of a
single target as w = 1, they lead to different behavior at lower weights. For example,
at w = 0:5, a single target is still recovered in the unimodal case, but this target is
substantially less occluded than the originalx
0
. This means that sub-optimally weighting
the prosodic context when faced with a unimodal distribution would lead to a lenition
of that phonological category in all contexts, or an unconditioned sound change. On the
other hand, assigningw the same weight when presenting a bimodal distribution will lead
to the supposition of two separate targets, one with a much less constricted target. This
type of recovery would lead to a phonological split, or prosodically-conditioned sound
change, as the two reflexes of what was a single phoneme now differ in their phonological
specification based on prosodic position.
116
We leave open the question for now of what may cause a group of learners to sub-
optimallyweightprosodiccontextintheirrecoveryofphonologicaltargetsfromproduction
variability. One possibility is that individuals may differ in how much error they tolerate
between the recovered and ambient distributions. If we assume a gradual increase in the
weight assigned to prosodic context through development, some speakers may stop when
the fit of the recovered distribution is "close enough," but not perfect, to the ambient
distribution. These speakers’ productions may not differ substantially from the ambient
distribution or from those of other speakers, but slight changes in the distribution might
accumulate over time, eventually leading to the best fit having two separate phonological
categories. We leave development of these ideas for future work, here having established
the ability of our model to show how such a changes might at least be possible.
IV.4 Conclusions
This is, to our knowledge, the first explicit model of how synchronic reduction patterns
may lead to diachronic lenition. We are aware of two previous computational models of
reduction. Pierrehumbert [143] presents an exemplar-based model, where phonological
categories are represented as distributions of previously perceived productions along at
least one phonetic parameter dimension (e.g. constriction degree). Any given production
is selected probabilistically from this distribution. Each production (and perceived pro-
duction) in turn shapes the overall distribution, increasing the probability that the same
value will be selected in the future. In order to model consonant reduction in this frame-
work, the model also includes a systematic bias towards reduced forms, such that each
production has a value slightly more reduced than the value for production selected from
117
the probability distribution. As the system iterates production, this gives distributions
that are more and more reduced as more tokens are produced. The second model applies
the same fundamental idea, but in the dynamical domain [90]. In the Dynamical Field
Theory model, each phonological target is defined by a continuous activation field over
some phonetic parameter. The activation function is shaped by perceived input, which
acts to increase the weight of the field in the vicinity of the perceived value. In the lenition
model, each production is selected as the highest value of the activation field, plus noise (to
drive probabilistic behavior), plus a systematic bias towards reduction as in the exemplar
model. Iterating this process changes the shape of the activation field representing the
phonological target, moving its peak towards reduced forms and increasing its dispersion.
Though the formulation of phonological categories is significantly different in this model
than in the exemplar model (and, in fact, is broadly similar to the current proposal), the
two models give effectively the same result: namely, that systematic bias in production
leads to generalized reduction of all productions as a function of frequency.
Though both of these models show development of reduction through time, they are
referencing not diachronic time scales but multiple productions (either perceived or pro-
duced) by a single speaker or in a speech community. In this way, these models are able
to reproduce the frequency effects of consonant reduction, such that more frequent pro-
ductions are more often lenited [25]. However, neither model includes the influence of
prosodic structure, and as such neither is able to model the qualitatively different types of
prosodically-conditioned variation seen across languages (see Section IV.2, below). More-
over, neither model shows how diachronic sound change may occur—the logical endpoint
of both models after enough iterations is reduction of all variants to no constriction at
118
all. This may be similar to a series of unconditioned sound changes, but neither model in-
cludes a factor that shows how reduction may occur or not occur in a particular language,
nor how conditioned variability may arise. These models are very successful at model-
ing how frequency and motor efficiency may drive reduction of established phonological
categories, which is certainly a crucial aspect of synchronic reduction (and one which, it
must be said, we do not include in our model), yet they cannot model attested patterns
of sound change. Integration of frequency or phonetic bias effects into the prosodic model
of reduction proposed here may provide a unified explanation of both types of reduction.
In agreement with the current proposal, Cole and Hualde [38] proposed a crucial role
for prosodic structure in sound change, though the way prosodic structure is hypothesized
to influence sound change in their proposal differs from that put forward here. They
propose that some sound changes being first in prosodically restricted locations, where
the synchronic reduction patterns may be well-motivated phonetically, and then spread by
analogy to other prosodic positions. They give the example of word-final devoicing, which
is common in many languages at the end of an utterance due to phonetic conditioning
factors. They propose that word-final devoicing begins as utterance-final devoicing, then
is extended by analogy to phrase-final then word-final positions. Similarly, Hualde [77]
proposes that word-initial reduction, as discussed here, begins as unconditioned reduction
within the phonological phrase, before being reduced to only word-medial position by
later anological/phonological processes. While both of these proposals, in agreement with
the current chapter, argue for an important role of the prosodic system in shaping sound
change, they differ from the current proposal in that they posit extension of changes from
one prosodic position to another, or reduction from more prosodic positions to less. The
current proposal argues, instead, that the prosodically-conditioned variability in speech
119
can directly shape the course of sound change. Of course, all three proposals may be true
for different cases; the advantage of the current proposal is that we have established at
least a basic model of the proposed process, which may be used to test different hypotheses
in future work.
Our aim here was to take a first step towards explaining how prosodically-conditioned
synchronic variation might lead to diachronic sound change, including both prosodically-
conditioned and unconditioned sound change. We have first shown how prosodic context
can affect reduction in different ways across languages, sometimes resulting in a unimodal
distribution of produced constriction degree, and sometimes resulting in a bimodal distri-
bution. Other patterns may be possible, but at least these two are attested in Spanish and
English, respectively. These differences in constriction degree are mediated by durational
differences of segments produced in differing prosodic contexts; durational distributions
were shown to have the same modal patterns as constriction degree for both English and
Spanish. We have proposed a simple dynamical systems model that expresses phono-
logical categories, the influence of prosodic context, and production variability along the
same dimension. This simple model is adequate to capture the two distribution types
of prosodically-conditioned variability seen in spoken languages, and can generate both
types of reduction from a single "virtual" phonological target with a constriction degree
target beyond the point of articulator contact, consistent with suggested targets for stops
in various languages.
We have also proposed a model of how speakers may recover the underlying phono-
logical targets from these produced patterns of variability. Consistent with developmental
evidence, we suggest that learners first posit categories based only on the production vari-
ability, forming categories centered at the modes of distribution. Later, learners begin to
120
appreciate the role of the prosodic system in shaping production variability and are able to
connect the positional variants as arising from a single phonological category. Increasing
the weight given to the prosodic context leads to a more accurate recovery of the correct
phonological target in the unimodal case and a shift from recovering two targets to a single
target in the bimodal case.
We further suggest that insufficient weighting of the prosodic structure in recovering
phonological targets may underlie diachronic sound change. The eventual outcome of that
diachronic lenition is based explicitly on the synchronic variability patterns. Underweight-
ing prosodic structure in unimodal distributions would lead to unconditioned lenition of a
single phonological target, while the same underweighting in a bimodal distribution would
lead to phonological split and a prosodically-conditioned lenition pattern.
121
Chapter V
Discussion and conclusions
This dissertation has provided some evidence to support the hypothesis that consonant le-
nition results from reduced durations and subsequently smaller articulatory displacement.
Such shorter durations are particularly prevalent in prosodically weak position, though
what counts as prosodically weak varies from language to language. In English, reduction
is most often found word-medially, while Spanish shows lenition phrase-medially and does
not seem to be sensitive to word boundaries. It was argued that diachronic lenition-based
sound changes arise from language learners misinterpretation of the cause of synchronic
lenition patterns—attributing prosodically-conditioned variability as part of the under-
lying phonological control structure. This can lead to the emergence of diachronically
reduced phonological forms both in particular prosodic positions (such as word-internally)
or across the entire phonological system.
There is a clear comparison between the work presented in Chapters II and III and the
experiments in [100]. That study also examined coronal oral and nasal stops (among other
segments) in English and Spanish using speech acoustics and electropalatographic data.
The aim of that study was to test three competing hypotheses of consonant reduction:
lenition as a decrease in sonority, lenition as a reduction in articulatory effort, and lenition
122
as reduction in spatiotemporal gestural magnitude. The segments under consideration
were compared in a symmetric [oCo] context under crossed word-position (word-initial
or word-medial) and stress (pre-stress or non-pre-stress) conditions. Results showed that
acoustic durations were reduced word-medially and in non-pre-stress conditions, and that
linguopalatal contact was reduced in the same positions for English /t/ and /d/ but not
English /n/ nor Spanish /d/ (results for Spanish /t/ and /n/ were not discussed). Based
on these results, Lavoie suggests that the articulatory reduction hypothesis best fits the
data given the spatial and temporal reduction seen for English /t/ and /d/ but that,
because there is temporal reduction but no contact difference for English /n/ and Spanish
/d/, some other mechanism must be operating as well.
These conclusions are suspect, however, because of some prominent methodological
issues in that study. First of all, while Lavoie’s aim was to examine the hypothesis that
durational reduction causes spatial reduction, she looked only at group mean differences
across the different experimental conditions (stress and word boundary). She did not, in
fact, examine the relationship between temporal extent and linguopalatal contact. Espe-
cially given that Spanish does not show effects of word-initial position, this means that
any conclusions about the relationship between space and time of speech movements are
suspect. Additionally, the articulatory measure used, electropalatography, does not index
movement magnitude itself but rather a byproduct of displacement. This is especially
true given that the electropalatography measure used was contact across the entire palate,
not restricted to any particular area of interest—this necessarily combines tongue tip and
tongue body contact, which could be problematic. Particularly for Spanish /d/, such a
measure may miss patterns of spatial reduction given the lack of any linguopalatal contact
123
in many tokens. Indeed, the results from Chapter III show that spatial reduction is in-
deed dependent on movement duration in a non-categorical way for coronal stops in both
languages. One difference between this previous study and the current work is in /n/.
Lavoie’s study finds reduction temporally but not spatially, while the current work finds
both temporal and spatial reduction. This may be due to Lavoie’s use of electropalatog-
raphy or to a difference in vowel context–Lavoie used the mid vowel /o/, while the current
study used the low vowels /a/ and /O/. A more detailed analysis might have revealed
reduction in just the coronal region. Even in Lavoie’s data, however, 2/4 subjects show
reduction in contact for /n/ in the locations where /t/ and /d/ show reduction, and a
third who does not show reduced contact shows very little contact even in positions where
a full production is expected (approximately 30% compared to 40-70% for other subjects
/n/ and the same subjects /t/ and /d/). A separate study found significant variation of
linguopalatal contact for /n/ according to prosodic position, with the most contact at the
strongest prosodic boundary (utterance initially), less contact word initially, and the least
amount of contact word-medially, though the difference between word-intitial and -medial
was not robust for all subjects [52], even with using the same /o/ vowel as [100]. Given
that the current work measured spatial movement in a more direct manner and that it
found evidence for spatial reduction in the movements of multiple articulators used to
produce /n/ as well as the fact that the same pattern was found for the EPG data in [52],
this argues that /n/ does indeed show similar spatiotemporal reduction as /t/ and /d/ in
English, in agreement with the current proposal.
124
V.1 Comparisons with other theories
The current proposal differs radically from the idea that lenition is caused by a planned
reduction of articulatory effort on a segmental level, such as the approach suggested by
[88, 89]. Kirchner suggests that constraints mitigating against vocal effort (Lazy) are
incorporated into the phonology of a language and that these constraints, competing with
typical markedness and faithfulness in an Optimality Theoretic framework, drive substi-
tution of one allophone by another, reduced, form. This proposal, as it relies on symbolic
substitution, cannot account for the sub-allophonic covariation between duration and con-
striction degree that has been shown to characterize reduction both in the current disser-
tation and previous work (e.g., [41, 100, 102]). The theoretical mechanism is also very
complex, as different evaluations must be made for different speech rates (and speech rate
must be quantized into "slow," "normal," and "fast," not reflecting the actual gradient
nature of rate) and different levels of effort (which must again be arbitrarily segregated
into discrete levels). Additionally a huge number of constraints must be postulated as each
possible reduced form (and perhaps productions that fall between IPA categories) must
have its ownLazy constraint. Proposals that explain reduction from feature change, fea-
ture spreading, or feature underspecification are similarly unable to explain the continuum
of productions found in actual speech.
Moreover, Kirchner’s proposal makes the prediction that reduced forms will show re-
duced gestural stiffness—he directly equates effort with force (F(t) = m
dv
dt
). Using a
simple mass-spring system as a model of speech production, as Kirchner does [88] and
which has been shown to accurately model many aspects of speech articulator movement
(e.g., [167, 168, 85, 154, 127, 135, 7]), the force is equivalent to the stiffness of the spring
125
(F(x)=kx), which can be estimated from speech movements by measuring peak ve-
locity over displacement (though see [53] for possible issues with this approach). Results
from Chapter III showed that there was no difference in stiffness between word-medial
productions (where flapping is expected) and word-initial productions (where stops are
expected) for English. Similarly, no difference was found between full stops (for /t/) and
approximant productions (of /d/) in Spanish in either Chapter II or III. These results
argue against a planned reduction in articulatory effort (at least via the mechanism pro-
posed by Kirchner), because both full and reduced forms show the same gestural stiffness,
as predicted by the proposal in this work.
The proposal in the current paper is in line with the analyses of lenition proposed
by Lindblom in his theory of hyper- and hypospeech, H & H Theory [102, 104]. That
is, that everything else being equal, shortened durations lead to articulatory and, hence,
acoustic undershoot. Of course, as Lindblom points out, everything is not "always equal"
in speech—speakers have the ability to alter the natural relationship between the duration
and spatial extent of speech movements. For example, when speakers produced the word
"Dad" in three separate conditions (soft, normal, and loud), they showed a different
relationship between duration and spatial extent of jaw movements for each condition
[103]. Similarly, when subjects produced a series of CVC words at both their self-selected
rate and loudness and when instructed to speak as clearly as possible, they showed two
separate relationships between vowel duration and F2 at the vowel midpoint [104]. There
is obviously a parameter (which Lindblom refers to as "vocal effort") that speakers are
manipulating independently of the duration of their speech movements; they can increase
the velocity of their movements to reach the same target in shorter amount of time.
Importantly, though, he showed a consistent relationship between these two factors within
126
each condition. Crucially, for any given level of effort, a consistent duration-magnitude
relationship exists (as well as a consistent gestural stiffness, as found in [167, 85, 9]). The
current proposal also shares with both [103] and Articulatory Phonology (e.g., [18, 19, 21,
60]) the idea that speech gestures have an invariant target and their realization in different
contexts is a function of that target and other contextual factors (e.g. initial position of
the articulators, gesture duration, stiffness modulation, etc).
While H & H theory proposes that vocal effort is adjusted to control hypo- or hyper-
articulation, other possibilities exist. One suggestion is that different registers of speech
function similarly to the different gaits of quadrapedal motion [145]. Quadrupeds, such
as horses, show multiple stable gait patterns, each of which is physiologically optimal
at different speeds. These gaits can be maintained at both higher and lower than opti-
mal speeds, but at a higher metabolic cost. Importantly, each gait is similarly efficient
at its optimal speed. The analogy to speech is that different registers along the hypo-
to-hyperspeech continuum are all equally optimal, though they differ in the patterns of
movement found—for example, in the degree of temporally-conditioned reduction. The
crucial idea relevant to the current work is not what mechanism is used to control regis-
ter variation, but that—within any given register—there is a lawful relationship between
movement duration and displacement.
The existence of the regular relationship between movement duration and magnitude
is the main point of H & H theory relevant to the current proposal. It may be possible
that the manipulation of prosodic structure is, in effect, a local manipulation of effort or
gestural stiffness (e.g., [7, 28]). However, there is strong evidence that the change in ges-
tural stiffness and duration at prosodic boundaries is not a direct manipulation of stiffness
but rather a temporal warping (phrase-finally, a slowing) of the time course of gestural
127
activation, thoughthisslowingdoeseffectively—thoughindirectly—decreasegesturalstiff-
ness, changing the slope of the relationship between peak velocity and displacement for
movements near the boundary [32]. Similarly, it has been shown that changes in speech
rate, in the absence of concomitant increase in the velocity of speech movements, does not
affect the relationship between duration and magnitude of speech movements [102, 126].
In fact, we can test whether spirantization in Spanish and flapping in English result
from a reduction of effort if we take gestural stiffness as an indicator of effort. In English,
flaps do not differ significantly from stops in the velocity of the tongue tip [54]. In agree-
ment with this, in Chapter III, no difference in stiffness was found between word-intial and
word-medial productions. Similarly, in Spanish, spirantized productions of /b/ do not dif-
fer from /p/ in either time to peak velocity or peak velocity/displacement–both measures
of gestural stiffness, as shown in Chapter II. The only case where we do see changes in
gestural stiffness is near a phrase boundary, where stiffness decreases, as shown in Chap-
ters II and III as well as previous work [9, 31, 7, 32]. This means that reduced forms
phrase- or word-medially have a higher stiffness than full forms produced near a prosodic
phrase boundary. Thus, reduction cannot be due to a local decrease in articulatory effort,
at least as indexed by peak velocity or gestural stiffness.
Cross-linguistically, reduction tends to occur in prosodically weak positions—that is,
word-medially, word-finally, and in a syllable coda (here and subsequent observations
are based on data from and discussion in [88, 89]). Reduction generally does not occur
word- or phrase-initially, or in the onset of a stressed syllable. The production of reduced
forms phrase- and word-medially is a prediction of the current hypothesis. Gestures in
these positions tend to be produced with short durations, while the temporal slowing
at prosodic boundaries produces longer durations and less reduction. The case of stress
128
may be slightly different, however. While there is an expansion of duration in stressed
syllables, this only accounts for part of the increase in gestural magnitude of gestures in
this position [41]. It may be the case that the implementation of stress may be a case
of a local increase in vocal effort (perhaps in addition to a local change in speech rate)
along the lines proposed by [104]. This is consistent with the suggestion that "two different
dynamic mechanisms might be associated with two different prosodic phenomena" ([18], p.
103.) It should be noted that reduction in syllable codas and word-final position might be
qualitatively different than reduction in other positions [100]. For example, phonologically
voiced consonants in these positions are often produced without voicing, as in German.
This is precisely the opposite pattern found in other positions, where voiceless consonants
often become voiced.
As proposed by H & H theory, the key parameters mediating the magnitude of speech
gestures are gestural duration and articulatory effort. One additional fact also bears
consideration: consonantal reduction is also much more likely to occur in intervocalic
position than when adjacent to another consonant, or when surrounded by low compared
to high vowels [88, 89]. The explanation of this is simple if a dynamical perspective is
taken. A gesture of constant duration and magnitude will end farther from its target
the farther away its initial position is from that target. Consonants produced surrounded
by vowels, where the lower lip, tongue, and jaw are relatively low, will be more reduced
than those surrounded by other consonants, where the articulators are relatively high.
This analysis is supported by data from spirantization in Spanish and flapping in English.
For Spanish, approximant productions of /b, d, g/ have the greatest aperture after a low
vowel, somewhat less after a mid vowel, even less after a high vowel, and the least after
another consonant [78, 80, 34, 158], which was accurately modeled in Chapter II with
129
a single phonological gestural target. In English, flaps after high vowels show greater
linguopalatal contact than flaps after low vowels, and flaps after a rhotacized vowel (/3~/)
show the most contact [156]. These differences are not based on duration or changes in
movement velocity but only the initial position of the articulators, just as in the case for
Spanish contextual variation.
Consonant lenition has also been seen as a reduction in gestural magnitude within the
theory of Articulatory Phonology [18]. That paper suggests such reductions in magnitude
often, though not always, occur in fast and casual speech. While less explicit that H & H
theory or the current proposal as to the origins of this reduction in gestural magnitude,
the main idea is similar. What Articulatory Phonology does propose is that variation
along the same phonetic dimension should be treated as a unified phenomenon, and not
divided into some which are treated as allophonic and others which are treated as pho-
netic [19]. Not only is it nearly impossible to distinguish allophony from phonetic variation
when examining actual production variability, the use of allophones actually obscures the
underlying phonological control structure. This proposal forms the basis for the initial
hypothesis that reduction is best viewed as prosodically-conditioned spatiotemporal vari-
ation of a single invariant phonological target and is supported by the results from the
studies in this work.
What the current work adds to these previous views is the acknowledgment that con-
sonant reduction does not only spontaneously occur due to changes in speech rate or
speaking style, which are the conditions discussed in both H & H theory and Articulatory
Phonological proposals [103, 104, 18]. This work has demonstrated that regular patterns
of consonant reduction also derive from temporally-conditioned spatial reduction. The
crucial factor in these reductions is the prosodic structure of speech, which causes short
130
durations in prosodically weak positions [100, 164, 165]. This can lead to consonant re-
duction in these positions, though the patterns of reduction across languages will differ
based on the particular prosodic structure of the language, the gestural context, as well
as the details of consonant articulation.
131
REFERENCES
[1] R. H. Baayen, D. J. Davidson, and D. M. Bates. Mixed-effects modeling with crossed
random effects for subjects and items. Journal of Memory and Language, 59(4):390–
412, 2008.
[2] T. A. Baker. A biomechanical model of the human tongue for understanding speech
production and other lingual behaviors. PhD thesis, University of Arizona, Tucson,
AZ, 2008.
[3] E.Baković. StrongonsetsandSpanishfortition. MIT Working Papers in Linguistics,
Proceedings of SCIL 6, pages 21–39, 1995.
[4] J. A. Barlow. The stop/spirant alternation in Spanish: converging evidence for a
fortition account. Southwest Journal of Linguistics, 22:51–86, 2003.
[5] D. Bates. Fitting linear mixed models in R. R News, 5:27–30, 2005.
[6] L. Bauer. Lenition revisited. Journal of Linguistics, 44(03):605–624, 2008.
[7] M. Beckman, J. Edwards, and J. Fletcher. Prosodic structure and tempo in a sonor-
ity model of articulatory dynamics. In G. Docherty and R. D. Ladd, editors, Papers
in Laboratory Phonology II: Gesture, Segment, Prosody, pages 68–86. University
Press, Cambridge, 1992.
[8] M. E. Beckman, K. De Jong, S.-A. Jun, and S.-H. Lee. The interaction of coarticu-
lation and prosody in sound change. Language and Speech, 35(1-2):45–58, 1992.
[9] M. E. Beckman and J. Edwards. Intonational categories and the articulatory con-
trol of duration. In Y. Tohkura, E. Vatikiotis-Bateson, and Y. Sagisaka, editors,
Speech perception, production, and linguistic structure, pages 359–375. Ohmsha,
Ltd., Tokyo, 1992.
[10] P. S. Beddor, R. A. Krakow, and L. M. Goldstein. Perceptual constraints and
phonological change: a study of nasal vowel height. Phonology Yearbook, 3:197–217,
1986.
132
[11] S.Benus, A.Gafos, andL.Goldstein. Phoneticsandphonologyoftransparentvowels
in Hungarian. In Proceedings of the Berkeley Linguistics Society, 2003.
[12] J. Blevins. Evolutionary phonology : the emergence of sound patterns. Cambridge
University Press, Cambridge, 2004.
[13] J. Blevins and A. Garrett. The origins of consonant-vowel metathesis. Language,
pages 508–556, 1998.
[14] T. G. Bradley. Morphological derived-environment effects in gestural coordination:
A case study of Norwegian clusters. Lingua, 117(6):950–985, 2007.
[15] N. Braunschweiler. Integrated cues of voicing and vowel length in German: A pro-
duction study. Language and Speech, 40(4):353–376, 1997.
[16] E. Bresch, J. Nielsen, K. Nayak, and S. Narayanan. Synchronized and noise-robust
audio recordings during realtime magnetic resonance imaging scans (l). The Journal
of the Acoustical Society of America, 120(4):1791, 2006.
[17] C. P. Browman and L. Goldstein. Articulatory gestures as phonological units.
Phonology, 6(2):201–251, 1989.
[18] C. P. Browman and L. Goldstein. Tiers in articulatory phonology, with some im-
plications for casual speech. In J. Kingston and M. E. Beckman, editors, Papers in
Laboratory Phonology I: Between the grammar and physics of speech, pages 341–376.
Cambridge University Press, Cambridge, 1990.
[19] C. P. Browman and L. Goldstein. Articulatory phonology: an overview. Phonetica,
49(3-4):155–80, 1992.
[20] C. P. Browman and L. Goldstein. ‘Targetless’ schwa: an articulatory analysis. In
G. J. Docherty and D. R. Ladd, editors, Papers in laboratory phonology II: Gesture,
segment, prosody, pages 26–56. Cambridge University Press, Cambridge, 1992.
[21] C. P. Browman and L. Goldstein. Dynamics and articulatory phonology. In R. Port
and T. van Gelder, editors, Mind as motion: dynamics, behavior, and cognition,
pages 175–194. MIT Press, Boston, 1995.
[22] C. P. Browman and L. Goldstein. Gestural syllable position effects in american
english. In F. Bell-Berti and L. J. Raphael, editors, Studies in Speech Production: A
Festschrift for Katherine Safford Harris, pages 19–34. American Institue of Physics,
Woodbury, NY, 1995.
[23] C. P. Browman and L. Goldstein. Competing constraints on intergestural coordina-
tion and self-organization of phonological structures. Bulletin de la Communication
Parlée, 5:25–34, 2000.
133
[24] A. Butcher. ‘Fortis/lenis’ revisited one more time: the aerodynamics of some oral
stop contrasts in three continents. Clinical Linguistics & Phonetics, 18(6-8):547–57,
2004.
[25] J. Bybee. Word frequency and context of use in the lexical diffusion of phonetically
conditioned sound change. Language Variation and Change, 14:261–290, 2002.
[26] D. Byrd. Palatogram reading as a phonetic skill: a short tutorial. Journal of the
International Phonetic Association, 24(1):21–34, 1994.
[27] D. Byrd. Articulatory vowel lengthening and coordination at phrasal junctures.
Phonetica, 57:3–16, 2000.
[28] D. Byrd, A. Kaun, S. Narayanan, and E. Saltzman. Phrasal signatures in artic-
ulation. In M. B. Broe and J. B. Pierrehumbert, editors, Papers in Laboratory
Phonology V, pages 70–87. Cambridge University Press, Cambridge, 2000.
[29] D. Byrd, J. Krivokapić, and S. Lee. How far, how long: On the temporal
scope of prosodic boundary effects. Journal of the Acoustical Society of America,
120(3):1589–1599, 2006.
[30] D. Byrd, S. Lee, and R. Campos-Astorkiza. Phrase boundary effects on the temporal
kinematics of sequential tongue tip consonants. Journal of the Acoustical Society of
America, 123(6):4456–4465, 2008.
[31] D. Byrd and E. Saltzman. Intragestural dynamics of multiple prosodic boundaries.
Journal of Phonetics, 26:173–199, 1998.
[32] D. Byrd and E. Saltzman. The elastic phrase: modeling the dynamics of boundary-
adjacent lengthening. Journal of Phonetics, 31:149–180, 2003.
[33] P.CarrascoandJ.I.Hualde. Spanishvoicedallophonyreconsidered. Paperpresented
at the Phonetics and Phonology in Iberia Conference, Las Palmas de Gran Canaria,
Spain, 2009.
[34] P. Carrasco, J. I. Hualde, and M. Simonet. Dialectal differences in Spanish voiced
obstruent allophony: Costa Rican versus Iberian Spanish. Phonetica, 69(3):149–79,
2012.
[35] T. Cho. Manifestation of prosodic structure in articulatory variation: Evidence
from lip kinematics in English. In L. Goldstein, D. Whalen, and C. Best, editors,
Laboratory Phonology 8: Varieties of Phonological Competence, pages 1–34. Mouton
de Gruyter, Berlin, 2006.
[36] T. Cho and P. A. Keating. Articulatory and acoustic studies on domain-initial
strengthening in Korean. Journal of Phonetics, 29(2):155–190, 2001.
134
[37] T. Cho and M. J. McQueen. Prosodic influences on consonant production in Dutch:
Effects of prosodic boundaries, phrasal accent and lexical stress. Journal of Phonet-
ics, 33:121–157, 2005.
[38] J. Cole and J. I. Hualde. Prosodic structure in sound change. In S.-F. F. Chen
and B. Slade, editors, Festschrift for Hans Henrich Hock, pages 28–45. Beech Stave
Press, Ann Arbor, MI, 2013.
[39] J. Cole, J. I. Hualde, and K. Iskarous. Effects of prosodic and segmental context
on /g/-lenition in Spanish. In O. Fujimura, B. D. Joseph, and B. Palek, editors,
Proceedings of the Fourth International Linguistics and Phonetics Conference, pages
575–589, 1999.
[40] S. L. Curtin. Representational richness in phonological development. PhD thesis,
University of Southern California, 2002.
[41] K. de Jong. The supraglottal articulation of prominence in English: Linguistic
stress as localized hyperarticulation. Journal of the Acoustical Society of America,
97bec:491–504, 1995.
[42] K. de Jong. Stress-related variation in the articulation of coda alveolar stops: flap-
ping revisited. Journal of Phonetics, 26:283–310, 1998.
[43] K. de Jong, M. Beckman, and J. Edwards. The interplay between prosodic structure
and coarticulation. Language and Speech, 36:197–212, 1993.
[44] R. A. DePaolis, M. M. Vihman, and S. Kunnari. Prosody in production at the onset
of word use: A cross-linguistic study. Journal of Phonetics, 36(2):406–422, 2008.
[45] D. Eddington. What are the contextual phonetic variants of /B, D, G/ in colloquial
Spanish? Probus, 23:1–19, 2011.
[46] J. Edwards, M. E. Beckman, and J. Fletcher. The articulatory kinematics of final
lengthening. Journal of the Acoustical Society of America, 89(1):369–382, 1991.
[47] J. L. Elman and D. Zipser. Learning the hidden structure of speech. Journal of the
Acoustical Society of America, 83(4):1615–26, 1988.
[48] O. Engstrand. Articulatory correlates of stress and speaking rate in Swedish VCV
utterances. Journal of the Acoustical Society of America, 83(5):1863–75, 1988.
[49] A. Esposito and M. G. Di Benedetto. Acoustical and perceptual study of gemination
in Italian stops. Journal of the Acoustical Society of America, 106(4):2051–62, 1999.
[50] J. E. Flege. Effects of speaking rate on tongue position and velocity of movement in
vowel production. The Journal of the Acoustical Society of America, 84(3):901–916,
1988.
135
[51] J. E. Flege and R. Port. Cross-language phonetic interference: Arabic to English.
Language and Speech, 24(2):125–146, 1981.
[52] C. Fougeron and P. A. Keating. Articulatory strengthening at edges of prosodic
domains. Journal of the Acoustical Society of America, 101(6):3728–40, 1997.
[53] S. Fuchs, P. Perrier, and M. Hartinger. A critical evaluation of gestural stiffness
estimations in speech production based on a linear second-order model. Journal of
Speech, Language, and Hearing Research, 54(4):1067–76, 2011.
[54] T.FukayaandD.Byrd. Anarticulatoryexaminationofword-finalflappingatphrase
edges and interiors. Journal of the International Phonetic Association, 35(01):45–58,
2005.
[55] A. I. Gafos. Dynamics in grammar: comment on Ladd and Ernestus & Baayen.
Laboratory Phonology 8: Varieties of Phonological Competence, pages 51–79, 2006.
[56] A. I. Gafos and S. Benus. Dynamics of phonological cognition. Cognitive Science:
A Multidisciplinary Journal, 30(5):1–39, 2006.
[57] J. Goldsmith. Subsegmentals in spanish phonology: an autosegmental approach. In
W. Cressey and D. J. Napoli, editors, Linguistic Symposium on Romance Languages
9, pages 1–16. Georgetown University Press, Washington DC, 1981.
[58] L.GoldsteinandC.Browman. Representationofvoicingcontrastsusingarticulatory
gestures. Journal of Phonetics, 14:339–342, 1986.
[59] L. Goldstein, D. Byrd, and E. Saltzman. The role of vocal tract gestural action
units in understanding the evolution of phonology. In M. A. Arbib, editor, Action
to Language Via the Mirror Neuron System, pages 215–248. Cambridge University
Press, 2006.
[60] L. Goldstein and C. A. Fowler. Articulatory phonology: A phonology for public
language use. In N. O. Schiller and A. Meyer, editors, Phonetics and Phonology
in Language Comprehension and Production: Differences and Similarities, pages
159–207. Mouton de Gruyter, Berlin & New York, 2003.
[61] L. Goldstein, H. Nam, E. Saltzman, and I. Chitoran. Coupled oscillator planning
model of speech timing and syllable structure. In G. Fant, H. Fujisaki, and J. Shen,
editors, Frontiers in phonetics and speech science, pages 239–249. The Commercial
Press, Beijng, 2009.
[62] C. González. The effect of stress and foot structure on consonantal processes. PhD
thesis, University of Southern California, Los Angeles, CA, 2003.
[63] F. H. Guenther and M. N. Gjaja. The perceptual magnet effect as an emergent
property of neural map formation. Journal of the Acoustical Society of America,
100(2):1111–21, 1996.
136
[64] C. Hagedorn, M. Proctor, and L. Goldstein. Automatic analysis of singleton and
geminate consonant articulation using real-time magnetic resonance imaging. In
Interspeech 2011, pages 409–412, 2011.
[65] N. E. Hall. Gestures and segments: Vowel intrusion as overlap. PhD thesis, Univer-
sity of Massachusetts, Amherst, MA, 2003.
[66] M. H. Hanson and N. K. Stevens. A quasiarticulatory approach to controlling acous-
tic source parameters in a Klatt-type formant synthesizer using HLsyn. Journal of
the Acoustical Society of America, 112:1158–1182, 2002.
[67] J. W. Harris. Spanish Phonology. MIT Press, Cambridge, 1969.
[68] B. Hayes. Metrical stress theory: Principles and case studies. University of Chicago
Press, Chicago, IL, 1995.
[69] J. Herrera. Estudio acústico de /p, t, c, k/ y /b, d, y, g/ en Gran Canaria. In
M. Almeida and J. Dorta, editors, Contribuciones al estudio de la lingüística his-
pánica: homenaje al profesor Ramón Trujillo, pages 73–86. Editorial Montesinos,
Barcelona, 1997.
[70] D. J. Higham. An algorithmic introduction to numerical simulation of stochastic
differential equations. SIAM review, 43(3):525–546, 2001.
[71] H. H. Hock. Principles of historical linguistics. Mouton de Gruyter, Berlin & New
York, 1986.
[72] P. Honeybone. Lenition, weakening and consonantal strength: tracing concepts
through the history of phonology. In J. Brandão de Carvalho, T. Scheer, and P. P.
Ségéral, editors, Lenition and Fortition, pages 9–93. Mouton de Gruyter, Berlin,
2008.
[73] D. N. Honorof. Articulatory gestures and Spanish nasal assimilation. PhD thesis,
Yale University, New Haven, CT, 1999.
[74] D. N. Honorof. Articulatory evidence for nasal de-occlusivization in Castilian. In
Proceedings of the XVth International Congress of Phonetic Sciences, pages 1759–
1763, 2003.
[75] J. I. Hualde. A lexical phonology of Basque. PhD thesis, University of Southern
California, Los Angeles, CA, 1988.
[76] J. I. Hualde. The sounds of Spanish. Cambridge University Press, Cambridge, 2005.
[77] J. I. Hualde. Intervocalic lenition and word-boundary effects. Diachronica,
30(2):232–266, 2013.
137
[78] J. I. Hualde, R. Shosted, and D. Scarpace. Acoustics and articulation of Spanish
/d/ spirantization. In Proceedings of ICPhS XVII, 2011.
[79] J. I. Hualde, M. Simonet, and M. Nadeu. Consonant lenition and phonological
recategorization. Laboratory Phonology, 2011.
[80] J. I. Hualde, M. Simonet, R. Shosted, and M. Nadeu. Quantifying Iberian Spiranti-
zation: Acoustics and articulation. In LSRL 40, Seattle, WA, 2010.
[81] L. M. Hyman. Phonology : theory and analysis. Holt, Rinehart and Winston, New
York, 1975.
[82] D. Kahn. Syllable-based Generalizations in English Phonology. PhD thesis, Mas-
sachusetts Institute of Technology, Boston, MA, 1976.
[83] A. Kaplan. Phonology shaped by phonetics: The case of intervocalic lenition. PhD
thesis, University of California, Santa Cruz, Santa Cruz, CA, 2010.
[84] P. Keating, T. Cho, C. Fougeron, and C.-S. . S. Hsu. Domain-initial articulatory
strengthening in four languages. Papers in Laboratory Phonology 6, pages 145–163,
2003.
[85] J. A. Kelso, E. VatikiotisBateson, and E. Saltzman. A qualitative dynamic analysis
of reiterant speech production: Phase portraits, kinematics, and dynamic modeling.
Journal of the Acoustical Society of America, 77:266–280, 1985.
[86] W. M. Kier and K. K. Smith. Tongues, tentacles and trunks: the biomechanics
of movement in muscular-hydrostats. Zoological Journal of the Linnean Society,
83(4):307–324, 1985.
[87] J. Kingston. Lenition. Selected Proceedings of the 3rd Conference on Laboratory
Approaches to Spanish Phonology, pages 1–31, 4 2008.
[88] R. Kirchner. An Effort-Based Approach to Consonant Lenition. PhD thesis, Uni-
versity of California, Los Angeles, Los Angeles, CA, 1998.
[89] R. Kirchner. Consonant lenition. In B. Hayes, R. Kirchner, and D. Steriade, editors,
Phonetically based phonology, pages 313–345. Cambridge University Press, 2004.
[90] C. Kirov and A. Gafos. Dynamic phonetic detail in lexical representations. In
Proceedings of the XVIth International Congress of Phonetic Sciences, pages 637–
640, 2007.
[91] K. Kohler. The phonetics/phonology issue in the study of articulatory reduction.
Phonetica, 48(2-4):180–192, 1991.
[92] J. Krivokapić and D. Byrd. Prosodic boundary strength: An articulatory and per-
ceptual study. Journal of Phonetics, 40(3):430–442, 2012.
138
[93] D. P. Kuehn and K. L. Moll. A cineradiographic study of VC and CV articulatory
velocities. Journal of Phonetics, 4:303–320, 1976.
[94] A. Kuznetsova, R. H. B. Christensen, and P. B. Brockhoff. lmerTest: Tests for ran-
dom and fixed effects for linear mixed effect models (lmer objects of lme4 package).
R package version 2.0-6, 2013.
[95] P. Ladefoged and I. Maddieson. The Sounds of the World’s Languages. Blackwell
Publishing, Malden, MA, 1996.
[96] B.M.Lake,G.K.Vallabha,andJ.L.McClelland. Modelingunsupervisedperceptual
categorylearning. IEEE Transactions on Autonomous Mental Development,1(1):35–
43, 2009.
[97] A. Lammert, V. Ramanarayanan, M. Proctor, and S. Narayanan. Vocal tract cross-
distance estimation from real-time MRI using region-of-interest analysis. In Pro-
ceedings of Interspeech 2013, pages 959–962, 2013.
[98] A. C. Lammert, L. Goldstein, and K. Iskarous. Locally-weighted regression for
estimating the forward kinematics of a geometric vocal tract model. In Proceedings
of Interspeech 2010, pages 1604–1607, 2010.
[99] A. C. Lammert, M. I. Proctor, and S. S. Narayanan. Data-driven analysis of realtime
vocal tract mri using correlated image regions. In Proceedings of Interspeech 2010,
pages 1572–1575, 2010.
[100] L. M. Lavoie. Consonant strength: Phonological patterns and phonetic manifesta-
tions. Routledge, New York, 2001.
[101] A. M. Lewis. Weakening of intervocalic /p, t, k/ in two Spanish dialects: toward
the quantification of lenition processes. PhD thesis, University of Illinois at Urbana-
Champaign, Urbana-Champaign, IL, 2001.
[102] B. Lindblom. Spectrographic study of vowel reduction. Journal of the Acoustical
Society of America, 35:1773–1781, 1963.
[103] B. Lindblom. Economy of speech gestures. In P. MacNeilage, editor, The production
of speech, pages 217–245. Springer-Verlag, New York, 1983.
[104] B. Lindblom. Explaining phonetic variation: a sketch of the H&H theory. In J. W.
Hardcastle and A. Marchal, editors, Speech Production and Modelling, pages 403–
439. Kluwer Academic Publisher, Dordrecht, 1990.
[105] L. Lisker. Closure duration and the intervocalic voiced-voiceless distinction in En-
glish. Language, 33(1):42–49, 1957.
[106] A. Löfqvist. Lip kinematics in long and short stop and fricative consonants. Journal
of the Acoustical Society of America, 117(2):858–878, 2005.
139
[107] A. Löfqvist. Interarticulator programming: Effects of closure duration on lip and
tongue coordination in Japanese. Journal of the Acoustical Society of America,
120(5):2872–83, 2006.
[108] A. Löfqvist and V. Gracco. Control of oral closure in lingual stop consonant pro-
duction. Journal of the Acoustical Society of America, 111:2811–2827, 2002.
[109] A. Löfqvist and V. L. Gracco. Lip and jaw kinematics in bilabial stop consonant
production. Journal of Speech Language-Hearing Association, 40:877–893, 1997.
[110] M. d. C. Lozano. Stop and spirant alternations: Fortition and spirantization pro-
cesses in Spanish phonology. Indiana University Linguistics Club, Bloomington,
Indiana, 1979.
[111] M. J. Machuca. Las obstruyentes no continuas del español: relación entre las
categorías fonéticas y fonológicas en habla espontánea. PhD thesis, Universitat
Autònoma de Barcelona, Barcelona, 1997.
[112] E. Martínez Celdrán. Duración y tensión en las oclusivas no iniciales del español:
Un estudio perceptivo. Revista Argentina de Lingüística, 7(1):51–71, 1991.
[113] E. Martínez Celdrán. Sobre la naturaleza fonética de los alófonos de /b, d, g/ en
español y sus distintas denominaciones. Verba, 18:235–253, 1991.
[114] E. Martínez Celdrán. La percepción categorial de /b, p/ en español basada en las
diferencias de duración. Estudios de Fonética Experimental V, pages 223–239, 1993.
[115] E. Martínez Celdrán. Some chimeras of traditional Spanish phonetics. In L. Colan-
toni and J. Steele, editors, Selected Proceedings of the 3rd Conference on Laboratory
Approaches to Spanish Phonology, pages32–46.CascadillaProceedingsProject, Uni-
versitat de Barcelona, 2008.
[116] E. Martínez Celdrán. Sonorización de las oclusivas sordas en una hablante murciana.
Estudios de fonética experimental, 18:253, 2009.
[117] E. Martínez Celdrán and A. M. Fernández Planas. Manual de fonética española.
Ariel, Barcelona, 2007.
[118] J. Mascaró. Continuant spreading in Basque, Catalan and Spanish. In M. Aronoff
andR.T.Oehrle,editors, Language Sound Structure: Studies in Phonology presented
to Morris Halle by his Teacher and Students, pages 287–298. MIT Press, Cambridge,
MA, 1984.
[119] S.L.Mattys, P.W.Jusczyk, P.A.Luce, andJ.L.Morgan. Phonotacticandprosodic
effects on word segmentation in infants. Cognitive Psychology, 38(4):465–94, 1999.
140
[120] J. Maye and L. Gerken. Learning phonemes without minimal pairs. In Proceedings
of the 24th Annual Boston University Conference on Language Development, pages
522–533, 2000.
[121] J. Maye, D. J. Weiss, and R. N. Aslin. Statistical phonetic learning in infants:
facilitation and feature generalization. Developmental Science, 11(1):122–34, 2008.
[122] J.Maye, J.F.Werker, andL.Gerken. Infantsensitivitytodistributionalinformation
can affect phonetic discrimination. Cognition, 82(3):B101–11, 2002.
[123] B. McMurray, R. N. Aslin, and J. C. Toscano. Statistical learning of phonetic cate-
gories: insights from a computational approach. Developmental Science, 12(3):369–
78, 2009.
[124] J. Mehler, P. Jusczyk, G. Lambertz, N. Halsted, J. Bertoncini, and C. Amiel-Tison.
A precursor of language acquisition in young infants. Cognition, 29(2):143–78, 1988.
[125] P. Mermelstein. Articulatory model for the study of speech production. The Journal
of the Acoustical Society of America, 53:1070, 1973.
[126] S.-J. Moon and B. Lindblom. Interaction between duration, context, and speaking
style in English stressed vowels. The Journal of the Acoustical society of America,
96(1):40–55, 1994.
[127] K. G. Munhall, D. J. Ostry, and A. Parush. Characteristics of velocity profiles of
speech movements. Journal of Experimental Psychology: Human Perception and
Performance, 11(4):457–74, 1985.
[128] H. Nam, L. Goldstein, E. Saltzman, and D. Byrd. TADA: An enhanced, portable
task dynamics model in MATLAB. Journal of the Acoustical Society of America,
115:2430, 2004.
[129] S. Narayanan, K. Nayak, S. Lee, A. Sethy, and D. Byrd. An approach to real-time
magnetic resonance imaging for speech production. Journal of the Acoustical Society
of America, 115(4):1771–6, 2004.
[130] J. Ohala. The phonetics of sound change. In C. Jones, editor, Historical Linguistics:
Problems and Perspectives, pages 237–278. Langman, London, 1993.
[131] J.J.Ohala. Thelistenerasasourceofsoundchange. InC.S.Masek, R.A.Hendrick,
and M. F. Miller, editors, Papers from the Chicago Linguistic Society parasession on
language behavior, pages 178–203, Chicago, 1981.
[132] J. J. Ohala. The origin of sound patterns in vocal tract constraints. In F. P.
MacNeilage, editor, The production of speech, pages 189–216. Springer Verlag, New
York, 1983.
141
[133] J. J. Ohala and C. J. Riordan. Passive vocal tract enlargement during voiced stops.
In J. J. Wolf and D. H. Klatt, editors, Speech Communication Papers (presented at
the 97th meeting of the Acoustical Society of America). 1979.
[134] M. Ortega-Llebaria. Interplay between phonetic and inventory constraints in the
degree of spirantization of voiced stops: Comparing intervocalic /b/ and intervocalic
/g/ in Spanish and English. In T. L. Face, editor, Laboratory approach to Spanish
Phonology, pages 237–254. Mouton de Gruyter, Berlin, 2004.
[135] D.OstryandK.Munhall. Controlofrateanddurationofspeechmovements. Journal
of the Acoustical Society of America, 77(2):640–648, 1985.
[136] B. Parrell. Dynamical account of how /b, d, g/ differ from /p, t, k/ in Spanish:
Evidence from labials. Laboratory Phonology, 2(2):423–449, 2011.
[137] B. Parrell, S. Lee, and D. Byrd. Evaluation of prosodic juncture strength using
functional data analysis. Journal of Phonetics, 41(6):442–452, 2013.
[138] B. Parrell and S. Narayanan. Interaction between general prosodic factors and
language-specific articulatory patterns underlies divergent outcomes of coronal stop
reduction. In Proceedings of the 10th International Seminar on Speech Production,
pages 308–3011, 2014.
[139] B. Parrell, M. Proctor, and L. Goldstein. Towards a computational articulatory
modelofSpanishphonology. PaperpresentedatLaboratoryApproachestoRomance
Phonology, Provo, Utah, 2010.
[140] P. Perrier, Y. Payan, M. Zandipour, and J. Perkell. Influence of tongue biomechanics
on speech movements during the production of velar stop consonants: A modeling
study. Journal of the Acoustical Society of America, 114(3):1582–1599, 2003.
[141] C.-E. Piñeros. Markedness and laziness in Spanish obstruents. Lingua, 112(5):379–
413, 2002.
[142] J. Pierrehumbert and D. Talkin. Lenition of /h/ and glottal stop. In Papers in Labo-
ratory Phonology II: Gesture, segment, prosody, pages 90–117. Cambridge University
Press, Cambridge, UK, 1992.
[143] J. B. Pierrehumbert. Exemplar dynamics: Word frequency, lenition and contrast.
In J. L. Bybee and P. J. Hopper, editors, Frequency and the emergence of linguistic
structure, pages 137–157. John Benjamins, Amsterdam, The Netherlands, 2001.
[144] J. B. Pierrehumbert. Phonetic diversity, statistical learning, and acquisition of
phonology. Language and Speech, 46(2-3):115–154, 2003.
142
[145] M.Pouplier. Thegaitsofspeech: Re-examiningtheroleofarticulatory. InM.-J.Solé
and D. Recasens, editors, The initiation of sound change: Perception, production,
and social factors, pages 147–164. John Benjamins Publishing, Amsterdam, The
Netherlands, 2012.
[146] M. Pouplier and L. Goldstein. Intention in articulation: Articulatory timing in al-
ternating consonant sequences and its implications for models of speech production.
Language and Cognitive Processes, 25:616–649, 2010.
[147] M. Proctor, A. Lammert, A. Katsamanis, L. Goldstein, C. Hagedorn, and
S. Narayanan. Direct estimation of articulatory kinematics from real-time magnetic
resonance image sequences. In Proceedings of InterSpeech 2011, pages 281–284, 2011.
[148] D. Recasens. Estudis de fonética experimental del català oriental central. Publica-
cions de l’Abadia de Monserrat, Barcelona, 1986.
[149] J.Romero. Gestural organization in Spanish: an experimental study of spirantization
and aspiration. PhD thesis, The University of Connecticut, Storrs, CT, 1995.
[150] J. Romero, B. Parrell, and M. Riera. What distinguishes /p/, /t/, /k/ from /b/,
/d/, /g/inSpanish? PosterprestentedatPhoneticsandPhonologyinIberia, Braga,
Portugal, 2007.
[151] K. D. Roon, A. I. Gafos, P. Hoole, and C. Zeroul. Influence of articulator an manner
of stiffness. In Proceedings of the XVIth International Congress of Phonetic Sciences,
pages 409–412, 2007.
[152] E. Saltzman. Nonlinear dynamics of temporal patterning in speech. In P. L. Divenyi
and R. J. Porter, editors, Proceedings of Symposium on the Dynamics of the Pro-
duction and Perception of Speech, A Satellite Symposium of the XIVth International
Congress of Phonetic Sciences, Berkeley, CA, 1999.
[153] E. Saltzman, A. Löfqvist, J. Kinsella-Shaw, B. Kay, and P. Rubin. On the dynamics
of temporal patterning in speech. In F. Bell-Berti and J. L. Raphael, editors, Studies
in speech production: A Festschrift for Katherine Safford Harris. Woodbury, New
York: American Institute of Physics, pages 469–487. American Institute of Physics,
Woodbury, N.Y, 1995.
[154] E. Saltzman and K. G. Munhall. A dynamical approach to gestural patterning in
speech production. Ecologoical Psychology, 1(4):333–382, 1989.
[155] E. Saltzman, H. Nam, J. Krivokapić, and L. Goldstein. A task-dynamic toolkit for
modeling the effects of prosodic structure on articulation. Proceedings of the Speech
Prosody 2008 Conference, pages 175–84, 2008.
[156] C. C. Saw. Customized 3-D electropalatography display. UCLA working papers in
phonetics, 85:71–96, 1993.
143
[157] S.Shattuck-HufnagelandE.A.Turk. Aprosodytutorialforinvestigatorsofauditory
sentence processing. Journal of Psycholinguistic Research, 25:193–247, 1996.
[158] M. Simonet, J. I. Hualde, and M. Nadeu. Lenition of /d/ in spontaneous Spanish
and Catalan. In Proceedings of Interspeech 2012, Portland, Oregon, 2012.
[159] A. Soler and J. Romero. The role of duration in stop lenition in Spanish. In Pro-
ceedings of the XIVth International Congress of Phonetic Sciences, pages 483–486,
1999.
[160] M. Stone and S. Hamlet. Variations in jaw and tongue gestures observed during the
production of unstressed /d/s and flaps. Journal of Phonetics, 10:401–415, 1982.
[161] M. Stone and A. Lundberg. Three-dimensional tongue surface shapes of English
consonants and vowels. Journal of the Acoustical Society of America, 99(6):3728–37,
1996.
[162] M. Studdert-Kennedy and L. Goldstein. Launching language: The gestural origin
of discrete infinity. In M. H. Christiansen and S. Kirby, editors, Language evolution,
pages 235–254. Oxford University Press, USA, New York, NY, 2003.
[163] M. Torreblanca. La sonorización de las oclusivas sordas en el habla toledana. Boletín
de la Real Academia Española, 56:117–145, 1976.
[164] A. Turk. The American English flapping rule and the effect of stress on stop con-
sonant durations. Working papers of the Cornell phonetics laboratory, 7:103–133,
1992.
[165] N. Umeda. Consonant duration in American English. Journal of the Acoustical
Society of America, 61(3):846–858, 1977.
[166] G. K. Vallabha, J. L. McClelland, F. Pons, J. F. Werker, and S. Amano. Unsu-
pervised learning of vowel categories from infant-directed speech. Proceedings of the
National Academy of Sciences of the United States of America, 104(33):13273–8,
2007.
[167] E. Vatikiotis-Bateson. Linguistic structure and articulatory dynamics: A cross lan-
guage study. Indiana University Linguistics Club, 1988.
[168] E. Vatikiotis-Bateson and J. A. S. Kelso. Rhythm type and articulatory dynamics
in English, French and Japanese. Journal of Phonetics, 21(231-265), 1993.
[169] J. Westbury and M. Hashi. Lip-pellet positions during vowels and labial consonants.
Journal of Phonetics, 25:405–419, 1997.
[170] V. W. Zue and M. Laferriere. Acoustic study of medial/t, d/in American English.
The Journal of the Acoustical Society of America, 66:1039–1050, 1979.
144
Abstract (if available)
Abstract
Many speech sounds undergo weakening, or lenition. Flapping of English /t/ intervocalically is one well‐documented example (e.g. wri[t]e vs. wri[ɾ]er), as is the shift of the series /t:, t, d/ to /t, d, Ø/ from Latin to many Romance varieties. Many phonological analyses of lenition have been proposed, ranging from feature spreading to a modulation of the intended gestural constriction degree at the planning level. Most of these analyses, however, fail to account for the wide variability in realization of segments which can undergo lenition
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
The prosodic substrate of consonant and tone dynamics
PDF
Effects of speech context on characteristics of manual gesture
PDF
Signs of skilled adaptation in the co-speech ticking of adults with Tourette's
PDF
Individual differences in phonetic variability and phonological representation
PDF
Articulatory dynamics and stability in multi-gesture complexes
PDF
Investigating the production and perception of reduced speech: a cross-linguistic look at articulatory coproduction and compensation for coarticulation
PDF
Tone gestures and constraint interaction in Sierra Juarez Zapotec
PDF
Beatboxing phonology
PDF
Harmony in gestural phonology
PDF
Articulatory knowledge in phonological computation
PDF
Sources of non-conformity in phonology: variation and exceptionality in Modern Hebrew spirantization
PDF
Prosody in contact: Spanish in Los Angeles
PDF
Syntax-prosody interactions in the clausal domain: head movement and coalescence
PDF
Dynamics of speech tasks and articulator synergies
PDF
Sound sequence adaptation in loanword phonology
PDF
The planning, production, and perception of prosodic structure
PDF
Generalized surface correspondence in reduplicative opacity
PDF
Speech production in post-glossectomy speakers: articulatory preservation and compensation
PDF
The Spanish feminine el at the syntax-phonology interface
PDF
Soft biases in phonology: learnability meets grammar
Asset Metadata
Creator
Parrell, Benjamin Thomas
(author)
Core Title
Dynamics of consonant reduction
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Linguistics
Publication Date
08/08/2014
Defense Date
05/12/2014
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
articulation,flapping,lenition,OAI-PMH Harvest,phonetics,prosody,reduction,sound change,Speech,spirantization
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Goldstein, Louis (
committee chair
), Bottjer, Sarah W. (
committee member
), Byrd, Dani (
committee member
), Iskarous, Khalil (
committee member
), Walker, Rachel (
committee member
)
Creator Email
parrell@gmail.com,parrell@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-457888
Unique identifier
UC11287181
Identifier
etd-ParrellBen-2796.pdf (filename),usctheses-c3-457888 (legacy record id)
Legacy Identifier
etd-ParrellBen-2796.pdf
Dmrecord
457888
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Parrell, Benjamin Thomas
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
articulation
flapping
lenition
phonetics
prosody
reduction
sound change
spirantization